File format reverse engineering, an introduction.


So you have a file that you know contains something good, if only you could read it. Your searching efforts for documentation proved fruitless, required a significant outlay or to sign a NDA. Looks like you are going to need to reverse engineer the file format so you can use it. Most applications tend to use custom file formats for various reasons. These files are usually containers where other files are kept or application data.  A hex editor (xvi32) and C compiler (VS2008) were used to discover the layout of the culprit file.

First a target file is needed. Looking through the Album Artwork folder for iTunes there are a stack of .itc2 files in various subdirectories and of various sizes. Multiple files improve the reversing accuracy by allowing the testing of ideas across a range of (hopfully) different file configurations. So lets pick a file… (parts of the filename and file have been modified to remove potentially identifying information)  iTunes\Album Artwork\Cache\4E4144414C414252\01\02\03\4E4144414C414252-FEEDDEADBEEF0321.itc2 looks like a good starting point. Before loading the file into the hex editor a few things were noticed about the file name:   

  1. The directory 4E4144414C414252 is the only one available and all the itc2 files have the folder name in their file name. This is probably a System/User/Device wide constant.
  2. Assuming FEEDDEADBEEF0321 is a file constant, the three preceding directories are the last three nibbles of the file name in reverse order. I’m guessing that this is done for indexing reasons and allowing faster searches for the file.
  3. Both are 64-bit numbers, which is consistent amongst the other files present and are probably a random number or a hash of some description.

 Is is possible to find out where these numbers come from or what the represent? The other files which could contain the information are ‘iTunes Library.itl’ or ‘iTunes Music Library.xml’. Checking the xml file in a text editor we discover the line   

<key>Library Persistent ID</key><string>4E4144414C414252</string>

so it looks like 4E4144414C414252 is the ID of this particular iTunes Library. Searching for FEEDDEADBEEF0321 yields the block    

<dict>
	<key>Track ID</key><integer>17534</integer>   
	<key>Name</key><string>The Underground Sounds Show: So Scary Edition 10.23.2009 Part 2</string>   
	<key>Artist</key><string>Ms.Angel</string>   
	<key>Album</key><string>The So Very Show» Podcast</string>
 	...   
	<key>Sort Name</key><string>Underground Sounds Show: So Scary Edition 10.23.2009 Part 2</string>   
	<key>Persistent ID</key><string>FEEDDEADBEEF0321</string>   
	<key>Track Type</key><string>File</string>   
	<key>Podcast</key><true/>
	...   
	<key>Location</key><string>file://localhost/../Podcasts/The%20So%20Very%20Show%C2%BB%20Podcast/01%20The%20Underground%20Sounds%20Show_%20So%20S.mp3</string>
</dict>

 so it looks like FEEDDEADBEEF0321 identifies a track from an album and that we are looking at the artwork of “The So Very Show” podcast by KTUH FM Honolulu. Lets open up the file in a hex editor and see what we get…   

Looking inside the file

The first 512 bytes of an itc2 file

The first 512 bytes of an itc2 file

There are a few things that pop out when initially looking at the file   

  1. Lots of 0x00, which means less work to do, likewise if it was full of 0xFF (usually a ROM dump) or other constants.
  2. The file seems to be composed of several sections, identified by ‘itch’, ‘item’, ‘data’. These strings contain no lengths at the start (Pascal Strings) and while most of them appear to be NULL terminated (C Strings), ‘data’ and ‘locl’ are not. It is more likley that they are 32-bit constants instead of strings.
  3. ‘PNG’ pops out, which leads to a guess that this file contains a PNG image.

 As we have to start somewhere, the beginning of the file is a good location as any. The first 4 bytes could be anything, a number, a set of flags or an identifier for something. As we have an identifier of ‘itch’ (At a guess I’d say it means ‘iTunes Container Header’) already its probably a good guess that its flags or a number. Sections usually have a length or offset to the next section so lets assume its a number. For intel architecture, numbers are stored with the low byte in the lowest address (Little Endian) which means this number is 0x1C010000 or 469827584 and as the file is only 130522 bytes long, its not a length. The opposite of Litte Endian is Big Endian, where the highest byte is stored in the lowest address which yields a size of 0x0000011C or 284, definitely inside the file. If you can read a number from left to right in a hex editor and it makes sense it’s in Big Endian format. Going to offset 0x011C gives us another 4 bytes before ‘item’ and guessing that 0x00003C59 is another offset we arrive at a similar occurrence. All up there are 3 ‘item’ sections in this file.   

The rest of the ‘itch’ header consists of three lots of 0x00000002 which could be flags, or file versions, then 0x00000000 and then ‘artw’, probably meaning album artwork is stored in this file. As it is not known what these values mean we will compare them to all the other files later on. Now for some C code to define the header…   

//Need to swap the numbers around as we are building on a little endian machine
#define SWAP_ENDIAN(x) (((x & 0x000000FF) << 24) | ((x & 0x0000FF00) << 8) | ((x & 0x00FF0000) >> 8) | ((x & 0xFF000000) >> 24)) 

//Defines for the header values
#define ITC2_TYPE_HEADER		(0x69746368)
#define ITC2_HEADER_ARTWORK		(0x61727477) 

//Structure of the ITC2 Header
typedef struct _itc2_header {
	unsigned int SectionLength;	//Size of the header
	unsigned int SectionType;	//ITC2_HEADER_MAGIC
	unsigned int Unknown[4];	//2, 2, 2, 0 respectivly
	unsigned int Contents;		//ITC2_HEADER_ARTWORK
} ITC2Header;

Onto the item section

As the first parts of the item section are known the next part to deal with is 0x000000D0 at file offset 0x0124 and as lengths have been working well for us so far, lets hop down 208 bytes, where we land somewhere after the ‘data’ identifier. Let’s try from another position, the start of the ‘item’ section at offset 0x011C seems like a good idea. This time we land right after the ‘data’ identifier at offset 0x01EC. A quick check with a PNG image in the hex editor shows that this is the start of a PNG file, as identified by 89 50 4E 47 0D 0A 1A 0A. We now have all the information that we need to extract the album artwork from the files, however lets see what else the item section contains. 

The next four 32-bit numbers (0x00000001, 0x00000002, 0x00000001, 0x00000000) could be flags or other identifiers, none of which we can easily guess at the moment and we will have to compare them against other files. Next up are a series of random bytes, 16 in all, just like the file name…. hang on a minute, it is the filename. As previously determined the next 8 bytes are the Library Id, followed by the Track Id.  Then there is the identifier ‘locl’, which at a guess means its a local file and there a possibility for remote (‘remt’?) files as well. ‘PNGf’ follows and is probably an image format identifier, as we know we are dealing with a PNG image. 

The next set of bytes at file offset 0x0150 seem to be a set of numbers 0, 118 and 128 in decimal, which could be related to the image, then 0, 0, 128, 128 and a bunch of zeros after it, until we reach ‘data’ at offset 0x01E8. Lets just call these unknowns for the time being and define our item header structure. 

#define ITC2_TYPE_ITEM		(0x6974656D)	//'item' Item secion
typedef struct _itc2_item {
	unsigned int SectionLength;		//Size of the entire section
	unsigned int SectionType;		//ITC2_TYPE_ITEM
	unsigned int HeaderLength;		//Offset to data from start of item, usaully 0xD0
	unsigned int Unknowns_1[4];		//1, 2, 1, 0 respectivly?
	unsigned char LibraryId[8];		//ID of the library
	unsigned char TrackId[8];		//Track ID
	unsigned int ItemLocation;		//location of the item, see ITC2_ITEM_LOCATION_XXX
	unsigned int ItemFormat;		//format of the item, see ITC2_ITEM_FORMAT_XXX
	unsigned int Unknowns_2[38];		//0, 0, 0, 0x76, 0x80, 0...
	unsigned int Data;			//'data'
} ITC2Item; 

//The types of Data locations that we know about
#define ITC2_ITEM_LOCATION_LOCAL	(0x6C6F636C)	//'locl' Local image file? 

//The types of image formats that we know about
#define ITC2_ITEM_FORMAT_PNG		(0x504E4766)	//'PNGf' PNG file format

Before we are done, we should check the other two embedded items to see if we can get any other information. 

A view of the second item within the itc2 file

A view of the second item within the itc2 file

Looking at the second item at file offset 0x3D75 we first notice that this item is much larger than the last one (0xC7AFbytes) and that it is still a PNG file. The only other thing that seems to have changed is the 2nd, 3rd, 7th and 8th values in Unknowns_2, which are now 236, 256, 256 and 256 respectively. Hmmm, the image size has increased about 3 times while these numbers have doubled, could they be the width and height of the image and its display area? Let’s assume so and check it out later. Our item structure now becomes 

typedef struct _itc2_item {
	unsigned int	SectionLength;		//Size of the entire section
	unsigned int	SectionType;		//ITC2_TYPE_ITEM
	unsigned int	HeaderLength;		//Offset to data from start of item, usaully 0xD0
	unsigned int	Unknowns_1[4];		//1, 2, 1, 0 respectivly?
	unsigned char	LibraryId[8];		//ID of the library
	unsigned char	TrackId[8];		//Track ID
	unsigned int	ItemLocation;		//location of the item, see ITC2_ITEM_LOCATION_XXX
	unsigned int	ItemFormat;		//format of the item, see ITC2_ITEM_FORMAT_XXX
	unsigned int	Unknowns_2;		//0
	unsigned int	Width;			//The width of the image in pixels
	unsigned int	Height;			//The heigh of the image in pixels
	unsigned int 	Unknowns_3[3];		//0, 0, 0
	unsigned int	DisplayWidth;		//The amount of width the image takes on screen
	unsigned int	DisplayHeight;		//The amount of height the image takes on screen
	unsigned int	Unknowns_4[30];		//0...
	unsigned int	Data;			//'data'
} ITC2Item;

And we’re done, for the time being, we still need to check our assumptions and the unknows we came across. 

Putting it all together

Now that we know how to read the itc2 file and extract the embedded image, lets extract the images from all itc2 files that we can find and test our unknown values. Instead of me posting all the code, download the itc2-dump-1.zip containing source code and an assembled executable (VS2008 wanted almost 9M for the project files, so they were scrapped). 

After running the application we are presented with a stack of PNG images. Checking the image size against the one recorded in the filename shows that we got the width and height correct. Here is the output of one run. 

Dumping contents of 4E4144414C414252-FEEDDEADBEEF0321.itc2:
	dumping items...
		[ 1] wrote PNG image to file '89F13E74F2072465-01-118x128.png'(15241)
		[ 2] wrote PNG image to file '89F13E74F2072465-02-236x256.png'(50911)
		[ 3] wrote PNG image to file '89F13E74F2072465-03-276x300.png'(63462)
Dumping contents of 4E4144414C414252-BABEFEEDCAFE0321:
	dumping items...
		[ 1] different Unknowns_1s detected!
		[ 1] unknown file location 646F776E

And the extracted full size 276×300 PNG image

The extracted album artwork from an itc2 file

The extracted album artwork from an itc2 file. Copyright KTHU FM Honolulu (probably).

The file we were working on extracted successfully, however a file was detected that had a different ItemLocation to ‘locl’ and has different Unknowns_1 values. Loading the file into a hex editor and looking at the relevant sections shows that more than just the reported values have changed. 

A itc2 file with differences highlighted

A itc2 file with differences highlighted

The ItemLocation is ‘down’ (not ‘remt’ as previously guessed) but still contains valid image data and not something like a URL. ‘JFIF’ gives the image format away as a JPEG, which can be compared to a JPEG image to verify the file header. Unknowns_1[3] is now 0x00000002 and the DisplayWidth and DisplayHeight values are 0 and the file only contains one image. My guess here is that the JPEG decompressor inside the iPod/iPhone is able to resize the image as it is decompressing for relatively little cost, whereas the PNG decompressor cannot, so pre-resized files are stored. See the Future Work section for more information about this. 

A few extra defines are needed for this extra data. 

#define ITC2_ITEM_LOCATION_DOWNLOAD	(0x646F776E)	//'down'	Downloaded image file
#define ITC2_ITEM_FORMAT_JPEG		(0x0000000D)	//JPEG file format identifier

The final code can be downloaded as itc2-dump-2.zip for you to browse. Now the code runs without reporting any errors or problems and successfully dumps the album artwork. When I ran it aginst my iTunes folder it generated 773 image files. 

Future Work

There are a number of things that can be done if further knowledge is wanted/required about the itc2 file. 

  • Replace the different sized PNG files with noticeable different images (Red, Green and Blue sounds good to me) and notice where they appear on your iPod/iPhone or in iTunes.
  • Play with values, set a value that is always 1 to 0 or a random number (big ones are good) and see what happens. You will probably need to restart iTunes every time and delete then upload the album to your iPod.

Modifications that cause the program to spit out error messages will tell you more about the file and the program itself.

Notes about this reversing journey

This work was performed completely off-line with no prior knowledge of the format or reverse engineering of applications. It covers the process that I use when looking at unknown files. I wouldn’t be surprised if there is documentation about this format on the net. An encrypted/compressed file would make life much harder and probably require reverse engineering the software application that uses it as well. I’ll probably save that for another day.

While this journey doesn’t really have an application and was overly detailed, I have had to reverse file formats for specific reasons. An example of this was processing data from a USB spectrometer, the application was saving the data in a custom format and allowed conversion to csv on an individual basis, however a bulk conversion utility was needed. Figguring that the data was stored in the file as a set of floats I wrote a quick program to print out a float and its file offset ("0x%08x:%f\n") and increment that offset by one. Comparing the output to a converted csv file provided the starting address within the file for an array of floats. In this case I didnt need any header information, just the raw data so the batch conversion utility was fairly simple. That entire process took roughly an hour from start to finish.

, , , , ,

  1. #1 by William on March 15, 2010 - 5:56 am

    This was an interesting read. Thank you for going through the effort to document your thought process.

  2. #2 by Simon on March 25, 2010 - 7:16 pm

    On Windows XP the program you wrote won’t run… any ideas?

  3. #3 by Simon on March 25, 2010 - 7:23 pm

  4. #4 by nada on March 25, 2010 - 7:26 pm

    Hi Simon,

    Sorry about that, all program/code on here will need the C++ runtime from Microsoft. Thanks for posting the URL.

    -nada

  5. #5 by enj0ywh on March 27, 2010 - 11:16 pm

    Hi nada,

    very nice work! I was wondering whether you found out how iTunes generates the filename of the .itc2 cover file if it is downloaded from the web? As far as I see it, there is only one cover file for an album with several tracks.

  6. #6 by nada on March 28, 2010 - 10:12 am

    For all albums/tracks the itc2 filename is based on the library id (first set of characters) and the track id, which I beleive to be randomly generated. For an album with multiple tracks there is only one itc2 artwork file, which is named as above for one of the tracks. In an album the itc2 filename is not always generated from the first track in the album.

    Hope this answered your question,

    -nada

  7. #7 by Slurpie on May 29, 2010 - 7:51 pm

    Probably a newbie question for this site, but how do I load the file on the dumper? I tried “itc2-dump sample.itc2 in cmd but nothing happens.

  8. #8 by nada on June 7, 2010 - 8:13 pm

    You need to provide the path to your iTunes folder, it doesn’t work with an individual file.

  9. #9 by mynab on January 5, 2011 - 2:23 pm

    Hello, I cannot find how the new iTunes names the itc2 file for an album. There is only one file and the ID used in the filename does not seem to match any of the PresistentID of the album tracks. Searched all the places I could where the album PersistentID but to no avail. Has anybody more hints on that?

    Thanks!

  10. #10 by Mactub on January 10, 2011 - 11:38 pm

    Sorry for silly question but could you explain in which way do you represent TrackID in the first part of the name of extracted image? I can not get TrackID the same as in XML library.

  11. #11 by nada on January 11, 2011 - 8:18 am

    @mynab I havn’t upgraded my iTunes yet so I don’t know if the new version has changed the way files are stored. Do you only have one .itc2 file in your ‘iTunes\Album Artwork\Cache’ folder? I’m assuming that you have the artwork for more than one album loaded into iTunes. How large is this file? I may upgrade shortly and poke at the new formats.

    @Mactub: The first part of the image filename should be excatly the same as the string value of the ‘Persistant Id’ key. Have you tried opening the ‘iTunes Library.xml’ file in notepad and searching for the value eg: ‘FEEDDEADBEEF0321’?

    -nada

  12. #12 by Mactub on January 13, 2011 - 12:10 am

    Yes I did that and I know what are you talking about but my question was about names of images generated by your code. The first part of the file name has to be TrackId but it is in some format which I do not understand. Could you please tell how can I make it the same as TrackIDs in ‘iTunes Library.xml’ file ?

  13. #13 by Mactub on January 16, 2011 - 4:27 am

    In your example Track ID is 17534 ( in iTunes Library.xml) but in the name of the extracted file it is 89F13E74F2072465. I am wandering how to get from 89F13E74F2072465 to 17534?

  14. #14 by Arvin on February 24, 2011 - 4:22 am

    I thought itc2 files are encrypted whereas normal itc files are not, which seems very similar to what you have described here.

    I wonder if the added support of png is what makes the new itc2 format?

  15. #15 by Vivek Malik on May 6, 2015 - 4:57 am

    Hello ,
    i have and problem with Persistent id,i have Persistent id in iTunes Music Library.xml and also do have Library Persistent ID.
    but don’t know what happen after lot many updates with in last year ,i can’t find .itc2 file which contain Track Persistent ID it’s file name,
    before updates in Itunes i can find itc2 file name as “[Library Persistent ID – Track Persistent ID].itc2 ”
    Ex:”880C1A40D3D6BCE4-9AA50C39C44957F7.itc2″

    Now the thing is happening in itunes xml is i can’t find such any itc2 file which contain Track Persistent ID like (9F5D9953BFD0978E).

    If i get and if itc2 file then i can get cover art out assign to track .
    any help ,and i am sure about that iTunes is working fine as [Library Persistent ID – Track Persistent ID].itc2 and i am getting art correctly but it’s not working fine now.

  16. #16 by nada on August 16, 2015 - 9:44 am

    Hi Vivek,

    I no longer use iTunes, or any Apple products, so I can’t help with the newer versions of iTunes file formats.

    -nada

  17. #17 by Jeff Mincy on May 24, 2016 - 4:23 pm

    Add folloiwing item location:

    #define ITC2_ITEM_LOCATION_CLOUD (0x434C5055) //’CLPU’ Cloud Purchase

(will not be published)