Phase one emulating Asteroids on the GBA. It doesnt support sound, its slow, but it
seems to work well.
Thanks to Steve Green, author of Atari Vector Simulator for the Mac who helped greatly with
the emulation, and a university web site that I borrowed a generic line draw algorithm from.
This was very straight forward, took out the code specific to I/O on the Mac version, replaced it
with GBA I/O and that was it.
Phase two: translate 6502 instructions to equivalent sets of arm instructions and run it
as an arm program. Like an emulator but in ASM instead of C. Here is the idea: instead of running
a C program that reads a 6502 opcode at run time then doing something like:
static void ora_zp () // opcode 0x05
{
unsigned short savepc=get6502memoryFast(PC++);
unsigned char value=get6502memory(savepc);
A |= value;
DO_Z (A);
DO_N (A);
clockticks += 3;
}
The rom data/instructions are pre-processed and "translated" into native arm instructions:
ROM73BC ;05 ora_zp
MOV R0,#0x08
ORR R0,R0,#0x3000000
LDRSB R0,[R0]
ORRS R5,R5,R0
DONZA
ADD R10,R10,#3
BL CHECKTICKS
R5 was designated to be the 6502 register A, R10 keeps track of clocktics. DONZA is a macro
that updates a virtual flags register (P in the 6502, R8 in the translation).
This was not a trivial project, took about three weeks a few hours a night. I would not
have made any progress without the emulator above. I started by writing a program that
wrote the skeleton program from the emulator. I took the opcode look up table from the emulator
which mapped opcodes to C functions, then the program built a huge switch statement with the
emulated C code for that opcode as a comment and a couple of common things for each function:
case 0x05: //ora_zp
//static void ora_zp () // 0x05
//{
// unsigned short savepc=get6502memoryFast(PC++);
// unsigned char value=get6502memory(savepc);
// A |= value;
// DO_Z (A);
// DO_N (A);
//
// clockticks += 3;
//}
printf("ROM%04X ;05 ora_zp\n",0x6800+off);
printf(" ADD R10,R10,#1\n");
printf(" BL CHECKTICKS\n");
off+=1;
Then it was a simple matter of typing in the asm for each of the 256 opcodes...Simple but very
time consuming and prone to errors.
Once a significant number of instructions were coded I was able to start "running" the translated
program. I found the easiest way to debug it would be to run the emulator, have it print out
the opcode and all the registers, clock ticks, etc:
6E80: D0 A 00 X 00 Y 01 P 23 00D2 02
6E82: AD A 00 X 00 Y 01 P 23 00D6 02
6E85: 10 A 00 X 00 Y 01 P 23 00D9 02
6ED7: 60 A 00 X 00 Y 01 P 23 00DF 02
6858: 20 A 00 X 00 Y 01 P 23 00E5 02
703F: A5 A 01 X 00 Y 01 P 21 00E8 02
7041: F0 A 01 X 00 Y 01 P 21 00EB 02
7043: AD A 01 X 00 Y 01 P 21 00EF 02
7046: 30 A 01 X 00 Y 01 P 21 00F2 02
7048: AD A 00 X 00 Y 01 P 23 00F6 02
Then I added code to the translator that would print out the same stuff for each instruction
executed (I printed to the console on Mappy), then
compare instruction for instruction between the emulated and translated code. Getting through
the first 6000 instructions took a few days, from the 6000th instruction to the 20000th instruction
took a few minutes. Within a few more days I had it running thousands then millions of instructions
without an error. I had to do things like "on the 50000th instruction press the fire button and
release it on instruction 55000" so that both the emulation and translation performed the same
keystrokes at the same time. Today I reached the point where I just played it. And so far
so good. If I find any sigificant problems from here my plan is to keep a log all the keystrokes
then go back and have the emulation and translation run side by side using that list of keystrokes.
I am a bit dissapointed in the performance, I think the killer was the vector graphics, it appears
that the vector graphics hardware worked off an instruction set as well, jump subroutine, draw line,
return from subroutine, etc. My guess is a particular rock is a subroutine which calls several line
draw subroutines, then it jumps to the next rock and the ship, and the score, etc. After looking
at the emulated C code for this I figured there was no way I wanted to translate that to ASM, so
I simply linked in the C code from the emulator. Ideally you would want to figure out this engine,
and instead of drawing each of the lines for each of the rocks, we know (by playing the game) that
there is a limited number of rock shapes and they dont rotate, a perfect application for a tile or
sprite. The ship itself does rotate, I would use a rotated sprite for that case, I think this would
get this game up to full speed. The translated version is about 75% faster than the emulated version
The big question: Would speeding up the graphics by using sprites make the emulated version run
full speed? Was the translation anything more than a programming exercise?
Note: As you would expect not all of the bytes in the roms are opcodes, some of it is data read
by the program. I found a couple of these memory locations early on and as a quick fix made the
entire rom available to the translated program. I did not go back and isolate the data from the
instructions (ideally only the data would be present and the opcodes would not be available at
run time).
Phase Three reverse engineer the program and write a C program that performs the same task.
An initial path might be similar to phase two above, instead of generating asm instructions generate
C instructions (hell just use the code from the emulator). Then visually go through this and
simplify it and find functional boundaries, eventually you should see the program checking the
keyboard, rotating the ship as a result, etc, etc.
2003-08-13 WOW! I cant believe how close I was yet so far. I recently was contacted by someone
about the staticrecompilers group at yahoo. In a nutshell the group is dedicated to the same thing
I am doing here. I did NOT follow the staticrecompilers HOWTO, I did my own thing:
- 1) gotta have a working emulator...that you can modify...
- 2) modify the emulator such that you can mark rom addresses of valid opcodes as it is running.
(note you need to build this list in such a way that you can add to it later).
- 3) build the bulk of your translator, run through the rom and ONLY build translated code for the
opcodes in your opcode list from step 2.
- 4) have the emulator and translator run x cycles, compare registers on a cycle for cycle basis.
- 5) tweak, tweak, tweak. You will find bugs in your translation, you will find addresses that are
not listed in your opcode list and will have to add those, some of those new ones will be branches
to other opcodes not in the list...
An example:
Clearly I have a working emulator, I went ahead and used it again. I already had it modified to dump
registers I added some code to mark rom addresses as it executed. I let it run N instruction cycles
and dumped the hitlist.
I essentially took the exact same skeleton switch statement I had built for the asm version above:
for(opadd=ROMSTART;opadd < ROMSIZE;opadd++)
{
if(hitlist[opadd]==0) continue;
printf("L_%04X: //%02X %02X %02X\n",opadd,rom[opadd],rom[opadd+1],rom[opadd+2]);
switch(opcode)
{
case 0x05: //ora_zp
//static void ora_zp () // 0x05
//{
// unsigned short savepc=get6502memoryFast(PC++);
// unsigned char value=get6502memory(savepc);
// A |= value;
// DO_Z (A);
// DO_N (A);
//
// clockticks += 3;
//}
printf(" crash(0x%04X,0x%02X);\n",opadd,op); break;
}
I added this crash(opadd,opcode) line for each instruction, when executing the translated code this
will tell me what opcode is not yet implemented and where to go to work on it.
I literally started running right away, with no opcodes implemented and implemented them as I went along
For the most part you just use the emulators code as is:
case 0x05: //ora_zp
//static void ora_zp () // 0x05
//{
// unsigned short savepc=get6502memoryFast(PC++);
// unsigned char value=get6502memory(savepc);
// A |= value;
// DO_Z (A);
// DO_N (A);
//
// clockticks += 3;
//}
printf(" A |= ReadMemory(0x%04X);\n",rom[opadd+1]);
printf(" DO_Z(A);\n");
printf(" DO_N(A);\n");
printf(" clockticks += 3;\n");
printf(" showsystem(0x%04X,0x%02X);\n",opadd,opcode);
So this would build code that looks like this:
L_77C8: //05 60 D0
A |= ReadMemory(0x0060);
DO_Z (A);
DO_N (A);
clockticks += 3;
showsystem(0x77C8,0x05);
showsystem() generated the register/system dump on an instruction by instruction basis, it also monitored
the nmi for this 6502 (another story).
Borrow heavily from the emulator...Keep going
The first big hurdle, was the nmi, on a real 6502 you would save the return address and flags on the stack
call the handler which would use the stack info to return. Well we are not running like that we are using
hardcoded goto labels. What I did was pull out a second list of opcodes that were used for the nmi handler
and built a second translator (in essense), so that the nmi handler is a call to a second translated
bit of C code.
The second big hurdle was returning from subroutine calls, and this could still use more work, I first
tried to have the translator keep track, it didnt work out. I went ahead and fully implemented the code
that puts the program counter on the stack, when I pulled the pc off the stack for a return it fell into
an if-then-else:
if(ret==0x6806) goto L_6806;
if(ret==0x6809) goto L_6809;
if(ret==0x680C) goto L_680C;
if(ret==0x71A0) goto L_71A0;
Crude but it worked.
Note: There was never a need for a disassembler or a disassembled listing of the rom. They key is the
emulator, if it is working you dont need anything else.
Now for the BIG news. I got the translation matching the emulator step for step up to 6500
instructions (just shy of the first write to video). I did not try it on the GBA yet (hopefully tonight),
but I did try it on another arm platform. I used gcc 3.2.2 and ARM ADS v1.2 (build 842). With all the
dumps and prints taken out so that it just emulates as fast as it is going to go:
ADS
emulator: 2148562 ticks 77576 bytes
recompile: 1145697 ticks 49056 bytes
GCC
emulator: 2770723 ticks 115140 bytes
recompile: 930901 ticks 145188 bytes
YEAH! you read that right not only did the recompile improve the rom execution by almost 3x, but GCC
beat ARM...Who knows when I try this again on the GBA the numbers might reverse...
Also note that GCC took a really long time to compile the retargeted code, but didnt complain about anything.
ADS on the other hand, had nothing nice to say about the code, a few hundred warnings, yet it ran. I
did the initial development with MSVC and only at the last second switched to these other compilers.
Working in C and letting the compiler optimize is truly the right way to retarget/recompile your favorite
old school arcade game.
Downloads (Phase Two)
You need to get the Asteroid roms yourself, take the download below, unzip the files into a directory
with the Asteroid roms and run patch.exe from a command prompt, this will build roids.bin and
roids2.bin (emulated and translated).
Both versions use the following keyboard commands:
- Left = rotate left
- Right = rotate right
- A = fire
- B = thrust
- Start = start
- left and right shoulder at the same time = hyperspace
DUH!, compiling with thumb makes it faster...the *t.bin files are thumb
roids.zip, updated 2002-01-31
A little history
I am trying to get my foot in the door for GBA development, I figured emulating something would
demonstrate the ability to do low level stuff since I am a driver guy and not much of an
artsy/creative type. Asteroids was my first choice as its still one of my favorite games,
another advantage is that its vector graphics which is scalable and not locked into a fixed pixel
size. Other games like galaga and centipede and such use more pixels than the GBA has (according
to MAME docs).
I started with the MAME source, quite unreadable, gave it a few days and gave up. Did some
more research and found that Neil Bradley wrote a program called EMU which is or is one of the
first emulators written. It emulated asteroids of course. I couldnt find source for EMU or
for any other Asteroids emulator for that matter (I did beg someone for their source and got
my hands on one). During this surfing it sounds like Neil Bradley and some others are working
on these translators for the X86 and Windows I assume. I figured I could do it for the GBA
as I am quite intimate with the ARM instruction set, and here we are...
Links:
Mappy, a GBA emulator for windows
VisualBoyAdvance, can play at realtime speeds with this one
Jeff's site, speaks for itself
gba AT dwelch DOT com