So, I'm intrigued as to what exactly happens when you install a package. The way to work this out is get a look at the /data filesystem before package installation, then again afterwards, and do a big recursive diff
So lets get a relatively clean image:
emulator -wipe-data
adb push busybox ./
adb shell ./busybox tar c -f /tmp/data.tar /data
adb pull /tmp/data.tar .
mkdir original
cd original
tar xf ../data.tar
Now that we have the clean image. Lets compile the simple Hello World sample.
ant
Now we have HelloAndroid.apk. So we should install this, and then we can find the diff of what happened.
adb install HelloAndroid/bin/HelloAndroid.apk
adb shell ./busybox tar c -f /tmp/data.tar /data
adb pull /tmp/data.tar .
mkdir after_install
cd after_install
tar xf ../data.tar
When we diff this we find not much as happened. There is the new HelloAndroid.apk file installed in the data/app directory. There is also a new com.google.android.hello in data/data, but it is empty, so not that interesting.
Only in after_install/data/app: HelloAndroid.apk
Only in after_install/data/data: com.google.android.hello
diff -ur original/data/system/packages.xml after_install/data/system/packages.xml
--- original/data/system/packages.xml 2007-11-17 14:27:17.000000000 +1100
+++ after_install/data/system/packages.xml 2007-11-17 14:34:20.000000000 +1100
@@ -5,6 +5,7 @@
+
This is pretty interesting to me. I'd really like to know how it finds the thing in the menu. As an experiment I'm going to edit the packages.xml file to see if this removes it from the menu. adb push original/data/system/packages.xml /data/system/packages.xml does this easily. Changing this doesn't update the menu immediately. On reboot it seems that the packages file is regenerated. Not much luck here, time to get a better idea of what is in the .apk file.
So, it turns out the .apk file isn't that difficult. file HelloAndroid.apk, tells us it is just a zip file. After extracting the zip file, we see just 4 files:
AndroidManifest.xml
classes.dex
res/layout/main.xml
resources.arsc
The XML files would presumably the most obvious, but they don't seem to be textual. file suggests it is a DBase 3 file, but that doesn't seem so likely. My guess is some kind of either unicode, or binary XML format, or actually both. I can't actually work it out to be perfectly honest! So, some reverse engineering required. The strings definitely look like UTF-16. hexdump helps to debug what is going on. The first part of the file looks like:
$ cat AndroidManifest.xml | hexdump -C
00000000 03 00 08 00 dc 05 00 00 01 00 1c 00 88 02 00 00 |................|
00000010 13 00 00 00 00 00 00 00 01 00 00 00 68 00 00 00 |............h...|
00000020 00 00 00 00 6a 00 00 00 c8 00 00 00 0a 01 00 00 |....j...........|
00000030 36 01 00 00 70 01 00 00 e8 00 00 00 00 00 00 00 |6...p...........|
00000040 8e 01 00 00 da 01 00 00 ce 00 00 00 c6 01 00 00 |................|
00000050 fc 00 00 00 94 00 00 00 12 00 00 00 52 01 00 00 |............R...|
00000060 28 01 00 00 6e 00 00 00 82 00 00 00 80 01 00 00 |(...n...........|
00000070 07 00 61 00 6e 00 64 00 72 00 6f 00 69 00 64 00 |..a.n.d.r.o.i.d.|
00000080 00 00 2a 00 68 00 74 00 74 00 70 00 3a 00 2f 00 |..*.h.t.t.p.:./.|
00000090 2f 00 73 00 63 00 68 00 65 00 6d 00 61 00 73 00 |/.s.c.h.e.m.a.s.|
000000a0 2e 00 61 00 6e 00 64 00 72 00 6f 00 69 00 64 00 |..a.n.d.r.o.i.d.|
000000b0 2e 00 63 00 6f 00 6d 00 2f 00 61 00 70 00 6b 00 |..c.o.m./.a.p.k.|
000000c0 2f 00 72 00 65 00 73 00 2f 00 61 00 6e 00 64 00 |/.r.e.s./.a.n.d.|
000000d0 72 00 6f 00 69 00 64 00 00 00 00 00 00 00 08 00 |r.o.i.d.........|
000000e0 6d 00 61 00 6e 00 69 00 66 00 65 00 73 00 74 00 |m.a.n.i.f.e.s.t.|
000000f0 00 00 07 00 70 00 61 00 63 00 6b 00 61 00 67 00 |....p.a.c.k.a.g.|
00000100 65 00 00 00 18 00 63 00 6f 00 6d 00 2e 00 67 00 |e.....c.o.m...g.|
00000110 6f 00 6f 00 67 00 6c 00 65 00 2e 00 61 00 6e 00 |o.o.g.l.e...a.n.|
00000120 64 00 72 00 6f 00 69 00 64 00 2e 00 68 00 65 00 |d.r.o.i.d...h.e.|
00000130 6c 00 6c 00 6f 00 00 00 01 00 20 00 00 00 0b 00 |l.l.o..... .....|
00000140 61 00 70 00 70 00 6c 00 69 00 63 00 61 00 74 00 |a.p.p.l.i.c.a.t.|
00000150 69 00 6f 00 6e 00 00 00 08 00 61 00 63 00 74 00 |i.o.n.....a.c.t.|
00000160 69 00 76 00 69 00 74 00 79 00 00 00 05 00 63 00 |i.v.i.t.y.....c.|
00000170 6c 00 61 00 73 00 73 00 00 00 0d 00 2e 00 48 00 |l.a.s.s.......H.|
00000180 65 00 6c 00 6c 00 6f 00 41 00 6e 00 64 00 72 00 |e.l.l.o.A.n.d.r.|
00000190 6f 00 69 00 64 00 00 00 05 00 6c 00 61 00 62 00 |o.i.d.....l.a.b.|
000001a0 65 00 6c 00 00 00 0c 00 48 00 65 00 6c 00 6c 00 |e.l.....H.e.l.l.|
000001b0 6f 00 41 00 6e 00 64 00 72 00 6f 00 69 00 64 00 |o.A.n.d.r.o.i.d.|
000001c0 00 00 0d 00 69 00 6e 00 74 00 65 00 6e 00 74 00 |....i.n.t.e.n.t.|
000001d0 2d 00 66 00 69 00 6c 00 74 00 65 00 72 00 00 00 |-.f.i.l.t.e.r...|
000001e0 06 00 61 00 63 00 74 00 69 00 6f 00 6e 00 00 00 |..a.c.t.i.o.n...|
000001f0 05 00 76 00 61 00 6c 00 75 00 65 00 00 00 1a 00 |..v.a.l.u.e.....|
00000200 61 00 6e 00 64 00 72 00 6f 00 69 00 64 00 2e 00 |a.n.d.r.o.i.d...|
00000210 69 00 6e 00 74 00 65 00 6e 00 74 00 2e 00 61 00 |i.n.t.e.n.t...a.|
00000220 63 00 74 00 69 00 6f 00 6e 00 2e 00 4d 00 41 00 |c.t.i.o.n...M.A.|
00000230 49 00 4e 00 00 00 08 00 63 00 61 00 74 00 65 00 |I.N.....c.a.t.e.|
00000240 67 00 6f 00 72 00 79 00 00 00 20 00 61 00 6e 00 |g.o.r.y... .a.n.|
00000250 64 00 72 00 6f 00 69 00 64 00 2e 00 69 00 6e 00 |d.r.o.i.d...i.n.|
00000260 74 00 65 00 6e 00 74 00 2e 00 63 00 61 00 74 00 |t.e.n.t...c.a.t.|
00000270 65 00 67 00 6f 00 72 00 79 00 2e 00 4c 00 41 00 |e.g.o.r.y...L.A.|
00000280 55 00 4e 00 43 00 48 00 45 00 52 00 00 00 00 00 |U.N.C.H.E.R.....|
00000290 80 01 08 00 54 00 00 00 00 00 00 00 00 00 00 00 |....T...........|
Well, this hasn't exactly been very informative. We've learned that basically all that happens on install is the .apk file is copied to /data/app. We also learned that this directory is scanned on startup to find packages to start. The strings in there are:
android
http://schemas.android.com/apk/res/android
manifest
package
com.google.android.hello
application
activity
class
.HelloAndroid
label
HelloAndroid
intent-filter
action
value
android.intent.action.MAIN
category
android.intent.action.LAUNCHER
We can note that there aren't any duplicated strings, and updating the AndroidManifest.xml source file and regenerating it confirms this.
So looking at the WbXML looks likely. The first byte is '3', which indicates WbXML version 1.3. The next byte is '0', which indicates that public identifier is described by a string, which string index number '8'. The charset is indicated by '0', which means unknown. Next is the string table, and it looks bytes 'dc' and '05', which, if I've decode it right, indicates 11781 bytes. Which makes no sense at all. So we try and guess something else. As a 32-bit little-endian integer 0x5dc is exactly the length of the file. So that seems like a good guess for what that field is. In this case, I'm going to guess that '03 00 08 00', is a magic number to identify the file. The next 4 bytes are '01 00 1c 00', not sure hat this is it is followed by '88 02 00 00', which happen to be 0x288 little endian, which seem to represent the last character in what looks like a string table. Looking at other different binary XML files seems to confirm this hypothesis. The next field appears to be 0x13, which is pretty close to the number of strings we found earlier. And this is about as far as I can be bothered working out right now.
The use of the binary encoding seems a little strange. It certainly doesn't reduce the size, but maybe that isn't really the point. It is probably the case that this makes it a lot easier to parse, but you would expect there to be existing XML parsers in the libraries. So beats me what is going on!
The classes.dex file include in the .apk file is already documented by Retrodev.
The final file is the resources.arsc file. This doesn't seem to have too much information in it. There are a few strings that we would expect including, res/layout/main.xml, HelloAndroid and com.google.android.hello.
And that is about it for now. Not so useful at the end of the day, but it maybe give you some information as starting point for further reverse engineering.