Atari Jaguar Assembler

by Roberto Nadal Martinez (swapd0@yahoo.es)

This package contains an Assembler for the Atari Jaguar Risc Processors and Motorola 68000. It can be used alone or as the back-end of my Jaguar C compiler project. I have tryed to include all features found in most assemblers, if a important feature is missing or buggy e-mail me.

This two pass assembler handles

The following directives are understood by the assembler (all directives start with a colon):

OpcodeDescription
 .org 
Sets the address where the program will run, also upload address
 .68000 
Allow Motorola 68000 instructions (disallow GPU & DSP)
 .gpu 
Allow GPU instructions (disallow DSP & 68000)
 .dsp 
Allow DSP instructions (disallow DSP & 68000)
 .text 
Start text section
 .data 
Start data section
 .bss 
Start bss section
 .assert <expression> 
If expression is false an error message is shown
 .include <file> 
Includes a source file at current position
 .incbin <file> 
Includes a binary file at current position
 .link <file> 
Includes an object file and link it
 .link <address>, &file> 
Includes an object file and link it at <address>

Data definition

(Data definition can be written without colon prefix)

OpcodeDescription
 .dc.b <n>, ... 
Define memory constants (byte size), a string is also allowed
 .dc.w <n>, ... 
Define memory constants (word size)
 .dc.l <n>, ... 
Define memory constants (longword size)
 .dcb.b <n>, <rep> 
Define memory block filled by n (byte size) with <rep> items
 .dcb.w <n>, <rep> 
Define memory block filled by n (word size) with <rep> items
 .dcb.l <n>, <rep> 
Define memory block filled by n (longword size) with <rep> items
 .ds.b <n> 
Define a storage block of <s> bytes (filled with zeros)
 .ds.w <n> 
Define a storage block of <s> words (filled with zeros)
 .ds.l <n> 
Define a storage block of <s> longwords (filled with zeros)

Data alignment

In Jaguar there are many items that must be in long, phrase or double phrase aligned, use this directives to align your data. You can also omit the .align directive (thanks to ray//.tSCc.)

OpcodeDescription
 .align .word 
align to next 2 byte boundary
 .align .long 
align to next 4 byte boundary
 .align .phrase 
align to next 8 byte boundary
 .align .dphrase 
align to next 16 byte boundary

Macros

OpcodeDescription
 label .macro [paramertes]
Begin a macro definition
 label .endmacro
label .endm
End a macro definition, the label must match the current macro
 <macroname> [parameters] 
Expands a macro

Labels and constants

Labels here aren't like in most assemblers where they must be written on first column, use them to reference your data or code.

OpcodeDescription
 label 
Defines a global symbol, you can access it in all this module
 .label 
Defines a local symbol, it's tied to last global label
 .label .equ <n> 
Defines a constant with value n
 .label = <n> 
Defines a constant with value n (thanks to ray//.tSCc)
 .label .equr <n> 
Defines a register equate, the register must be valid on current section
 .label .reg <n> 
Defines a register equate, the register must be valid on current section
 .unreg .label [, label]
Undefines a register label, or a list of labels
 .label .set <n> 
Bind label with value n, later you can set it to another value

Structures

You can define structures (records if you preffer) with:

OpcodeDescription
 .rsreset 
Set structure counter to zero
 label .rs.b <n> 
Set label to actual count and increments counter n bytes
 label .rs.w <n> 
Set label to actual count and increments counter n words
 label .rs.l <n> 
Set label to actual count and increments counter n longwords
 .rsset <n> 
Set counter to n
 label .rscount 
Set label to actual count, used to calculate structures length

GPU/DSP

Some GPU/DSP opcodes has been added to make GPU/DSP coding easier. Some new opcodes smash r28 & r29, r29 is used as return address and r31 as stack pointer.

Pre Decrement/Post Increment addressing is included in load and stores:

One example: Load with post increment, and Store with predecrement

	load (rn)+,rm	or	strore rm,-(rn)

the assembler will generate:

	load (rn),rm		subqt #4,rn
	addqt #4,rn		store rm,(rn)

This also works with byte, word and phrase loads/stores. Incrementing or decrementing the index register by the right size

Subroutines call/ return:

One of the hardest problems of coding GPU/DSP is that, you don't have subroutines opcodes like bsr and rts, to fix this I have included new pseudo-opcodes. Most of the extended opcodes uses r28 and r29 to save/restore some data, so take care.

OpcodeDescription
 idiv rn, rm 
Generates an integer div adding this code (thanks to ray//.tSCc.):
	move rn,r29
	abs rn
	xor rm,r29
	abs rm
	abs r29
	jr CC,.\~pos
	div rn, rm
	neg rm
~pos:
 label .proc 
Defines a procedure, this directive must be used if you want to use the RISC scheduler, in 68000 code it's useless.
 label .endproc 
Mark the end of a procedure
 rts 
Return from a subroutine using stack, this code is added:
	load (r31),r29
	jump t,(r29)
	addqt #4,r31
 fastrts 
Fast return, the assembler will add:
	jump t,(r29)
	nop
 call [condition], label 
Call to label subroutine if condition is true (if it's missing a true condition is used), r28 and r29 are smashed:
	movei #label,r28
	move pc,r29
	subqt #4,r31
	addqt #10,r29
	jump jump_condition,(r28)
	store r29,(r31)
 call [condition], (rn) 
Call to subroutine pointed by rn if condition is true (if it's missing a true condition is used), r28 and r29 are smashed:
	move pc,r29
	subqt #4,r31
	addqt #10,r29
	jump jump_condition,(rn)
	store r29,(r31)
 fastcall [condition], label 
Call to label fast subroutine (no stack is used ) if condition is true (if it's missing a true condition is used), r28 and r29 are smashed:
	movei #label,r28
	move pc,r29
	jump jump_condition,(r28)
	addqt #6,r29
 fastcall [condition], (rn) 
Call to fast subroutine (no stack is used ) pointed by rn if condition is true (if it's missing a true condition is used), r28 and r29 are smashed:
	move pc,r29
	jump jump_condition,(rn)
	addqt #6,r29
 stop 
Stop the GPU or DSP

GPU/DSP Optimizer

If GPU/DSP optimizations are enabled, the assembler will reorder the code to minimize pipeline stalls, Pipeline stall happen when you have a read after write (raw), write after write (waw) dependency. Now it only works with raw code (no jumps). Some examples... well... one:

OriginalGenerated
foo	.proc
	add r3,r0
	shrq #1,r0	=> stall raw (r0)
	add r0,r4	=> stall raw (r0)
	add r5,r1
	shrq #1,r1	=> stall raw (r1)
	add r1,r6	=> stall raw (r1)
foo	.endproc
foo	add r3,r0
	add r5,r1
	shrq #1,r0
	shrq #1,r1
	add r0,r4
	add r1,r6

do NOT forget .proc/.endp directives, or the code will NOT be optimized.

Numbers

You can write numbers in the following format:

String constants can be written with single or double quote: .dc.b 'Hello world!' .dc.b "Bye world!",0 : this is a null-terminated string .string "Hello again" : generate a null terminate string (like above)

Comments

Comments can be written like in C/C++ and like most of assemblers.

 666 
Decimal number
 $12be 
Hexadecimal number
 0x4e1f 
C/C++ like hexadecimal
 %110101 
Binary number
OpcodeDescription
 * 
If found in the beginning of a line, it's a comment until the end of line
 ; 
Just like *
 /* 
Begin a C comment, can't be nested
 */ 
End of a C comment
 // 
C++ comment, spans until end of line

Speceal directives

Special directives (not found on most assemblers, but very useful)

OpcodeDescription
 .sine <one>, <slices>
Creates a full-sine-wave (2*PI) table divided in <slices> slices, with values from -one.w to one.w
 .hsine <one>, <slices>
Creates a half-sine-wave (PI) table divided in <slices> slices, with values from -one.w to one.w
 .qsine <one>, <slices>
Creates a quarter-sine-wave (PI/2) table divided in <slices> slices, with values from -one.w to one.w
 .cosine <one>, <slices>
Creates a full-cosine-wave (2*PI) table divided in <slices> slices, with values from -one.w to one.w
 .hcosine <one>, <slices>
Creates a half-cosine-wave (PI) table divided in <slices> slices, with values from -one.w to one.w
 .qcosine <one>, <slices>
Creates a quarter-cosine-wave (PI/2) table divided in <slices> slices, with values from -one.w to one.w

Some examples

	.sine $7fff, 2048		// 0.15 fixed point, full sine wave in 2048 steps
	.hcosine $100, 256		// 8.8 fixed point, half cosine wave in 256 steps

Command line parameters

...

ParameterDescription
 -h<n>
Set hearder type for output file
0: No header (default)
1: Atari PRG/TOS header
2: Object header, generates a linkable file
 -l
List code to standar output.
 -o<file>
Set output file name.
 -d<symbol>
Define a symbol and include into symbol table, use this and the conditional assembly directive to build different versions.
 -s
Dump symbol table to <file.sym>
 -f<f|g>
Eneable GPU/DSP optimizer

To Do List

Some features are still missing, and some of them are a must.

Credits & Thanks

All code has been developed under BeOS Personal Edition by swap d0. The Windows version is a straight compile using DevC++ 4.0, I tried to use Visual C++ 6.0, Visual Studio .NET 2003 and Borland Builder 6.0 but they are not 100% ANSI compatible, still have some missing features.

Special thanks must go to ray//.tSCc. for finding some bugs (only present on Windows version!?!?) ,testing and some new features.