Phase one emulating Asteroids on the GBA. It doesnt support sound, its slow, but it seems to work well.

Thanks to Steve Green, author of Atari Vector Simulator for the Mac who helped greatly with the emulation, and a university web site that I borrowed a generic line draw algorithm from.

This was very straight forward, took out the code specific to I/O on the Mac version, replaced it with GBA I/O and that was it.



Phase two: translate 6502 instructions to equivalent sets of arm instructions and run it as an arm program. Like an emulator but in ASM instead of C. Here is the idea: instead of running a C program that reads a 6502 opcode at run time then doing something like:
static void ora_zp () // opcode 0x05
{
    unsigned short savepc=get6502memoryFast(PC++);
    unsigned char value=get6502memory(savepc);
    A |= value;
    DO_Z (A);
    DO_N (A);

    clockticks += 3;
}
The rom data/instructions are pre-processed and "translated" into native arm instructions:
ROM73BC ;05 ora_zp
    MOV R0,#0x08
    ORR R0,R0,#0x3000000
    LDRSB R0,[R0]
    ORRS R5,R5,R0
    DONZA
    ADD R10,R10,#3
    BL CHECKTICKS

R5 was designated to be the 6502 register A, R10 keeps track of clocktics. DONZA is a macro that updates a virtual flags register (P in the 6502, R8 in the translation).

This was not a trivial project, took about three weeks a few hours a night. I would not have made any progress without the emulator above. I started by writing a program that wrote the skeleton program from the emulator. I took the opcode look up table from the emulator which mapped opcodes to C functions, then the program built a huge switch statement with the emulated C code for that opcode as a comment and a couple of common things for each function:

        case 0x05: //ora_zp
            //static void ora_zp () // 0x05
            //{
            //    unsigned short savepc=get6502memoryFast(PC++);
            //    unsigned char value=get6502memory(savepc);
            //    A |= value;
            //    DO_Z (A);
            //    DO_N (A);
            //
            //    clockticks += 3;
            //}
            printf("ROM%04X ;05 ora_zp\n",0x6800+off);


            printf("    ADD R10,R10,#1\n");
            printf("    BL CHECKTICKS\n");
            off+=1;

Then it was a simple matter of typing in the asm for each of the 256 opcodes...Simple but very time consuming and prone to errors.
Once a significant number of instructions were coded I was able to start "running" the translated program. I found the easiest way to debug it would be to run the emulator, have it print out the opcode and all the registers, clock ticks, etc:
6E80: D0  A 00  X 00  Y 01  P 23  00D2  02
6E82: AD  A 00  X 00  Y 01  P 23  00D6  02
6E85: 10  A 00  X 00  Y 01  P 23  00D9  02
6ED7: 60  A 00  X 00  Y 01  P 23  00DF  02
6858: 20  A 00  X 00  Y 01  P 23  00E5  02
703F: A5  A 01  X 00  Y 01  P 21  00E8  02
7041: F0  A 01  X 00  Y 01  P 21  00EB  02
7043: AD  A 01  X 00  Y 01  P 21  00EF  02
7046: 30  A 01  X 00  Y 01  P 21  00F2  02
7048: AD  A 00  X 00  Y 01  P 23  00F6  02
Then I added code to the translator that would print out the same stuff for each instruction executed (I printed to the console on Mappy), then compare instruction for instruction between the emulated and translated code. Getting through the first 6000 instructions took a few days, from the 6000th instruction to the 20000th instruction took a few minutes. Within a few more days I had it running thousands then millions of instructions without an error. I had to do things like "on the 50000th instruction press the fire button and release it on instruction 55000" so that both the emulation and translation performed the same keystrokes at the same time. Today I reached the point where I just played it. And so far so good. If I find any sigificant problems from here my plan is to keep a log all the keystrokes then go back and have the emulation and translation run side by side using that list of keystrokes.


I am a bit dissapointed in the performance, I think the killer was the vector graphics, it appears that the vector graphics hardware worked off an instruction set as well, jump subroutine, draw line, return from subroutine, etc. My guess is a particular rock is a subroutine which calls several line draw subroutines, then it jumps to the next rock and the ship, and the score, etc. After looking at the emulated C code for this I figured there was no way I wanted to translate that to ASM, so I simply linked in the C code from the emulator. Ideally you would want to figure out this engine, and instead of drawing each of the lines for each of the rocks, we know (by playing the game) that there is a limited number of rock shapes and they dont rotate, a perfect application for a tile or sprite. The ship itself does rotate, I would use a rotated sprite for that case, I think this would get this game up to full speed. The translated version is about 75% faster than the emulated version

The big question: Would speeding up the graphics by using sprites make the emulated version run full speed? Was the translation anything more than a programming exercise?

Note: As you would expect not all of the bytes in the roms are opcodes, some of it is data read by the program. I found a couple of these memory locations early on and as a quick fix made the entire rom available to the translated program. I did not go back and isolate the data from the instructions (ideally only the data would be present and the opcodes would not be available at run time).


Phase Three reverse engineer the program and write a C program that performs the same task. An initial path might be similar to phase two above, instead of generating asm instructions generate C instructions (hell just use the code from the emulator). Then visually go through this and simplify it and find functional boundaries, eventually you should see the program checking the keyboard, rotating the ship as a result, etc, etc.

2003-08-13 WOW! I cant believe how close I was yet so far. I recently was contacted by someone about the staticrecompilers group at yahoo. In a nutshell the group is dedicated to the same thing I am doing here. I did NOT follow the staticrecompilers HOWTO, I did my own thing:

1) gotta have a working emulator...that you can modify...
2) modify the emulator such that you can mark rom addresses of valid opcodes as it is running. (note you need to build this list in such a way that you can add to it later).
3) build the bulk of your translator, run through the rom and ONLY build translated code for the opcodes in your opcode list from step 2.
4) have the emulator and translator run x cycles, compare registers on a cycle for cycle basis.
5) tweak, tweak, tweak. You will find bugs in your translation, you will find addresses that are not listed in your opcode list and will have to add those, some of those new ones will be branches to other opcodes not in the list...


An example:

Clearly I have a working emulator, I went ahead and used it again. I already had it modified to dump registers I added some code to mark rom addresses as it executed. I let it run N instruction cycles and dumped the hitlist.

I essentially took the exact same skeleton switch statement I had built for the asm version above:

	for(opadd=ROMSTART;opadd < ROMSIZE;opadd++)
	{
		if(hitlist[opadd]==0) continue;
		printf("L_%04X: //%02X %02X %02X\n",opadd,rom[opadd],rom[opadd+1],rom[opadd+2]);
		switch(opcode)
		{
			case 0x05: //ora_zp
				//static void ora_zp () // 0x05
				//{
				//    unsigned short savepc=get6502memoryFast(PC++);
				//    unsigned char value=get6502memory(savepc);
				//    A |= value;
				//    DO_Z (A);
				//    DO_N (A);
				//
				//    clockticks += 3;
				//}
				printf("  crash(0x%04X,0x%02X);\n",opadd,op); break;
		}
I added this crash(opadd,opcode) line for each instruction, when executing the translated code this will tell me what opcode is not yet implemented and where to go to work on it.
I literally started running right away, with no opcodes implemented and implemented them as I went along

For the most part you just use the emulators code as is:
        case 0x05: //ora_zp
            //static void ora_zp () // 0x05
            //{
            //    unsigned short savepc=get6502memoryFast(PC++);
            //    unsigned char value=get6502memory(savepc);
            //    A |= value;
            //    DO_Z (A);
            //    DO_N (A);
            //
            //    clockticks += 3;
            //}
			printf("   A |= ReadMemory(0x%04X);\n",rom[opadd+1]);
			printf("   DO_Z(A);\n");
			printf("   DO_N(A);\n");
			printf("   clockticks += 3;\n");
			printf("   showsystem(0x%04X,0x%02X);\n",opadd,opcode);
So this would build code that looks like this:
L_77C8: //05 60 D0
    A |= ReadMemory(0x0060);
    DO_Z (A);
    DO_N (A);
    clockticks += 3;
    showsystem(0x77C8,0x05);
showsystem() generated the register/system dump on an instruction by instruction basis, it also monitored the nmi for this 6502 (another story).

Borrow heavily from the emulator...Keep going

The first big hurdle, was the nmi, on a real 6502 you would save the return address and flags on the stack call the handler which would use the stack info to return. Well we are not running like that we are using hardcoded goto labels. What I did was pull out a second list of opcodes that were used for the nmi handler and built a second translator (in essense), so that the nmi handler is a call to a second translated bit of C code.
The second big hurdle was returning from subroutine calls, and this could still use more work, I first tried to have the translator keep track, it didnt work out. I went ahead and fully implemented the code that puts the program counter on the stack, when I pulled the pc off the stack for a return it fell into an if-then-else:
    if(ret==0x6806) goto L_6806;
    if(ret==0x6809) goto L_6809;
    if(ret==0x680C) goto L_680C;
    if(ret==0x71A0) goto L_71A0;
Crude but it worked.

Note: There was never a need for a disassembler or a disassembled listing of the rom. They key is the emulator, if it is working you dont need anything else.

Now for the BIG news. I got the translation matching the emulator step for step up to 6500 instructions (just shy of the first write to video). I did not try it on the GBA yet (hopefully tonight), but I did try it on another arm platform. I used gcc 3.2.2 and ARM ADS v1.2 (build 842). With all the dumps and prints taken out so that it just emulates as fast as it is going to go:
ADS
emulator:  2148562 ticks   77576 bytes
recompile: 1145697 ticks   49056 bytes
GCC
emulator:  2770723 ticks  115140 bytes
recompile:  930901 ticks  145188 bytes
YEAH! you read that right not only did the recompile improve the rom execution by almost 3x, but GCC beat ARM...Who knows when I try this again on the GBA the numbers might reverse...

Also note that GCC took a really long time to compile the retargeted code, but didnt complain about anything. ADS on the other hand, had nothing nice to say about the code, a few hundred warnings, yet it ran. I did the initial development with MSVC and only at the last second switched to these other compilers. Working in C and letting the compiler optimize is truly the right way to retarget/recompile your favorite old school arcade game.



Downloads (Phase Two)

You need to get the Asteroid roms yourself, take the download below, unzip the files into a directory with the Asteroid roms and run patch.exe from a command prompt, this will build roids.bin and roids2.bin (emulated and translated).

Both versions use the following keyboard commands:
Left = rotate left
Right = rotate right
A = fire
B = thrust
Start = start
left and right shoulder at the same time = hyperspace

DUH!, compiling with thumb makes it faster...the *t.bin files are thumb roids.zip, updated 2002-01-31

A little history
I am trying to get my foot in the door for GBA development, I figured emulating something would demonstrate the ability to do low level stuff since I am a driver guy and not much of an artsy/creative type. Asteroids was my first choice as its still one of my favorite games, another advantage is that its vector graphics which is scalable and not locked into a fixed pixel size. Other games like galaga and centipede and such use more pixels than the GBA has (according to MAME docs).

I started with the MAME source, quite unreadable, gave it a few days and gave up. Did some more research and found that Neil Bradley wrote a program called EMU which is or is one of the first emulators written. It emulated asteroids of course. I couldnt find source for EMU or for any other Asteroids emulator for that matter (I did beg someone for their source and got my hands on one). During this surfing it sounds like Neil Bradley and some others are working on these translators for the X86 and Windows I assume. I figured I could do it for the GBA as I am quite intimate with the ARM instruction set, and here we are...
Links:
Mappy, a GBA emulator for windows
VisualBoyAdvance, can play at realtime speeds with this one
Jeff's site, speaks for itself

gba AT dwelch DOT com