Opcode | Description |
.org | Sets the address where the program will run, also upload address |
.68000 | Allow Motorola 68000 instructions (disallow GPU & DSP) |
.gpu | Allow GPU instructions (disallow DSP & 68000) |
.dsp | Allow DSP instructions (disallow DSP & 68000) |
.text | Start text section |
.data | Start data section |
.bss | Start bss section |
.assert <expression> | If expression is false an error message is shown |
.include <file> | Includes a source file at current position |
.incbin <file> | Includes a binary file at current position |
.link <file> | Includes an object file and link it |
.link <address>, &file> | Includes an object file and link it at <address> |
(Data definition can be written without colon prefix)
Opcode | Description |
.dc.b <n>, ... | Define memory constants (byte size), a string is also allowed |
.dc.w <n>, ... | Define memory constants (word size) |
.dc.l <n>, ... | Define memory constants (longword size) |
.dcb.b <n>, <rep> | Define memory block filled by n (byte size) with <rep> items |
.dcb.w <n>, <rep> | Define memory block filled by n (word size) with <rep> items |
.dcb.l <n>, <rep> | Define memory block filled by n (longword size) with <rep> items |
.ds.b <n> | Define a storage block of <s> bytes (filled with zeros) |
.ds.w <n> | Define a storage block of <s> words (filled with zeros) |
.ds.l <n> | Define a storage block of <s> longwords (filled with zeros) |
In Jaguar there are many items that must be in long, phrase or double phrase aligned, use this directives to align your data. You can also omit the .align directive (thanks to ray//.tSCc.)
Opcode | Description |
.align .word | align to next 2 byte boundary |
.align .long | align to next 4 byte boundary |
.align .phrase | align to next 8 byte boundary |
.align .dphrase | align to next 16 byte boundary |
Opcode | Description |
label .macro [paramertes] | Begin a macro definition |
label .endmacro | End a macro definition, the label must match the current macro |
<macroname> [parameters] | Expands a macro |
Labels here aren't like in most assemblers where they must be written on first column, use them to reference your data or code.
Opcode | Description | |
label | Defines a global symbol, you can access it in all this module | |
.label | Defines a local symbol, it's tied to last global label | |
.label .equ <n> | Defines a constant with value n | |
.label = <n> | Defines a constant with value n (thanks to ray//.tSCc) | |
.label .equr <n> | Defines a register equate, the register must be valid on current section | |
.label .reg <n> | Defines a register equate, the register must be valid on current section | |
.unreg .label [, label] | Undefines a register label, or a list of labels | |
.label .set <n> | Bind label with value n, later you can set it to another value |
You can define structures (records if you preffer) with:
Opcode | Description |
.rsreset | Set structure counter to zero |
label .rs.b <n> | Set label to actual count and increments counter n bytes |
label .rs.w <n> | Set label to actual count and increments counter n words |
label .rs.l <n> | Set label to actual count and increments counter n longwords |
.rsset <n> | Set counter to n |
label .rscount | Set label to actual count, used to calculate structures length |
Some GPU/DSP opcodes has been added to make GPU/DSP coding easier. Some new opcodes smash r28 & r29, r29 is used as return address and r31 as stack pointer.
One example: Load with post increment, and Store with predecrement
load (rn)+,rm or strore rm,-(rn)
the assembler will generate:
load (rn),rm subqt #4,rn addqt #4,rn store rm,(rn)
This also works with byte, word and phrase loads/stores. Incrementing or decrementing the index register by the right size
One of the hardest problems of coding GPU/DSP is that, you don't have subroutines opcodes like bsr and rts, to fix this I have included new pseudo-opcodes. Most of the extended opcodes uses r28 and r29 to save/restore some data, so take care.
Opcode | Description |
idiv rn, rm | Generates an integer div adding this code (thanks to ray//.tSCc.):
move rn,r29 abs rn xor rm,r29 abs rm abs r29 jr CC,.\~pos div rn, rm neg rm ~pos: |
label .proc | Defines a procedure, this directive must be used if you want to use the RISC scheduler, in 68000 code it's useless. |
label .endproc | Mark the end of a procedure |
rts | Return from a subroutine using stack, this code is added:
load (r31),r29 jump t,(r29) addqt #4,r31 |
fastrts | Fast return, the assembler will add:
jump t,(r29) nop |
call [condition], label | Call to label subroutine if condition is true (if it's missing a true condition is used), r28 and r29 are smashed:
movei #label,r28 move pc,r29 subqt #4,r31 addqt #10,r29 jump jump_condition,(r28) store r29,(r31) |
call [condition], (rn) | Call to subroutine pointed by rn if condition is true (if it's missing a true condition is used), r28 and r29 are smashed:
move pc,r29 subqt #4,r31 addqt #10,r29 jump jump_condition,(rn) store r29,(r31) |
fastcall [condition], label | Call to label fast subroutine (no stack is used ) if condition is true (if it's missing a true condition is used), r28 and r29 are smashed:
movei #label,r28 move pc,r29 jump jump_condition,(r28) addqt #6,r29 |
fastcall [condition], (rn) | Call to fast subroutine (no stack is used ) pointed by rn if condition is true (if it's missing a true condition is used), r28 and r29 are smashed:
move pc,r29 jump jump_condition,(rn) addqt #6,r29 |
stop | Stop the GPU or DSP |
If GPU/DSP optimizations are enabled, the assembler will reorder the code to minimize pipeline stalls, Pipeline stall happen when you have a read after write (raw), write after write (waw) dependency. Now it only works with raw code (no jumps). Some examples... well... one:
Original | Generated |
foo .proc add r3,r0 shrq #1,r0 => stall raw (r0) add r0,r4 => stall raw (r0) add r5,r1 shrq #1,r1 => stall raw (r1) add r1,r6 => stall raw (r1) foo .endproc |
foo add r3,r0 add r5,r1 shrq #1,r0 shrq #1,r1 add r0,r4 add r1,r6 |
do NOT forget .proc/.endp directives, or the code will NOT be optimized.
You can write numbers in the following format:
666 | Decimal number |
$12be | Hexadecimal number |
0x4e1f | C/C++ like hexadecimal |
%110101 | Binary number |
Opcode | Description |
* | If found in the beginning of a line, it's a comment until the end of line |
; | Just like * |
/* | Begin a C comment, can't be nested |
*/ | End of a C comment |
// | C++ comment, spans until end of line |
Special directives (not found on most assemblers, but very useful)
Opcode | Description |
.sine <one>, <slices> | Creates a full-sine-wave (2*PI) table divided in <slices> slices, with values from -one.w to one.w |
.hsine <one>, <slices> | Creates a half-sine-wave (PI) table divided in <slices> slices, with values from -one.w to one.w |
.qsine <one>, <slices> | Creates a quarter-sine-wave (PI/2) table divided in <slices> slices, with values from -one.w to one.w |
.cosine <one>, <slices> | Creates a full-cosine-wave (2*PI) table divided in <slices> slices, with values from -one.w to one.w |
.hcosine <one>, <slices> | Creates a half-cosine-wave (PI) table divided in <slices> slices, with values from -one.w to one.w |
.qcosine <one>, <slices> | Creates a quarter-cosine-wave (PI/2) table divided in <slices> slices, with values from -one.w to one.w |
Some examples
.sine $7fff, 2048 // 0.15 fixed point, full sine wave in 2048 steps .hcosine $100, 256 // 8.8 fixed point, half cosine wave in 256 steps
...
Parameter | Description |
-h<n> | Set hearder type for output file 0: No header (default) 1: Atari PRG/TOS header 2: Object header, generates a linkable file |
-l | List code to standar output. |
-o<file> | Set output file name. |
-d<symbol> | Define a symbol and include into symbol table, use this and the conditional assembly directive to build different versions. |
-s | Dump symbol table to <file.sym> |
-f<f|g> | Eneable GPU/DSP optimizer |
Some features are still missing, and some of them are a must.
All code has been developed under BeOS Personal Edition by swap d0. The Windows version is a straight compile using DevC++ 4.0, I tried to use Visual C++ 6.0, Visual Studio .NET 2003 and Borland Builder 6.0 but they are not 100% ANSI compatible, still have some missing features.
Special thanks must go to ray//.tSCc. for finding some bugs (only present on Windows version!?!?) ,testing and some new features.