Last modified on 20 January 2015, at 14:39

Super NES Programming/Super FX tutorial

When/if this tutorial is finished, it will show how to use the Super FX chip, which is included in 8 released games, most notably Star Fox, Doom, and Yoshi's Island.

IntroductionEdit

As the Super FX chip is a processor itself, programming for it is done with Super FX assembly language. The Super FX chip has 4 RAM banks of 64 Kb each, or a total of 256 kb (32 kilobytes) of RAM. <

Existing titlesEdit

The SuperFX chip was used in 8 released SNES games, in Starfox 2(unreleased) and in multiple tech demos; 2 of which binaries are available.

Title SuperFX Version ROM Size Work RAM Size Save RAM Size
Starfox/Starwing Mario Chip 8 MBit 256 kbit None
Dirt Racer GSU-1 4 MBit 256 kbit None
Dirt Trax FX GSU-1 4 MBit 512 kbit None
Stunt Race FX GSU-1 8 MBit 512 kbit 64kb
Starfox 2 GSU-1 8 MBit 512Kb 64kb
Vortex GSU-1 4 MBit 256Kb None
Voxel(demo) GSU-1 3 MBit 512Kb None
Powerslide(demo) GSU-1 3 MBit 512Kb None
DOOM GSU-2 16 MBit 512Kb None
Yoshi's Island GSU-2-SP1 16 MBit 256kb 64kb
Winter Gold GSU-2 16 MBit 512Kb 64kb

Theory of OperationEdit

The SuperFX is a co-processor for the SNES CPU. The SuperFX's task is to execute complex mathematical calculations much faster than the SNES and to generate bitmap pictures for simple 3D rendering of SuperFX games. The SuperFX and SNES processors share access to a common Work RAM and Game pak ROM bus. Only one of the SuperFX or SNES CPU may access the game pak ROM and RAM at any time, controlled by special registers. The flow of the SNES and SuperFX accessing the data busses is an art in optimizing the program's efficiency.

The RAM inside the SuperFX cart is different from battery backup RAM - it can be used for storing results of calculations, for storing a Superfx program, for storing bulk data or for storing a PLOT picture the SuperFX is generating. There is quite a lot of RAM - 256k or 512k, which is more than the SNES has.

The SuperFX can process instructions in 3 ways, reading them from game pak RAM. from the gamepak ROM (reading straight out of the ROM chip) or via a special 512 byte instruction cache.

It is possible for the SuperFX to run in parallel with the SNES CPU when using the 512 byte instruction Cache. It involves loading a program in, and then setting the SuperFX to start its work. The 512 byte cache is in general 3x faster compared to running the program in the game pak RAM or ROM. The SuperFX can interrupt the SNES CPU after it finishes processing.

When using the special bitmap functions of the superFX it's possible to quickly load the bitmap out of the gamepak into the SNES Video Ram and display it on the screen. The SNES by default is a tile and sprite based console - pixel based scene construction used in 3D rendered games is very inefficient with stock SNES hardware. In SuperFX games such as DOOM, Starwing and the like, the SuperFX is rapidly painting pixel based scene bitmaps onto the game pak RAM and then throwing it into the SNES VRAM for graphics display many times per second.

RegistersEdit

The SuperFX registers are mapped from 0x3000 to 0x32FF. Some are 16-bit; some are 8-bit. The explanation of each register is shown in this section.

OverviewEdit

The Super FX chip has 16 general-purpose registers labeled R0 to R15. They are each 16 bits.

Register Address Description Access from SNES
R0 3000 default source/destination register R/W
R1 3002 pixel plot X position register R/W
R2 3004 pixel plot Y position register R/W
R3 3006 for general use R/W
R4 3008 lower 16 bit result of lmult R/W
R5 300a for general use R/W
R6 300c multiplier for fmult and lmult R/W
R7 300e fixed point texel X position for merge R/W
R8 3010 fixed point texel Y position for merge R/W
R9 3012 for general use R/W
R10 3014 for general use R/W
R11 3016 return address set by link R/W
R12 3018 loop counter R/W
R13 301a loop point address R/W
R14 301c rom address for getb, getbh, getbl, getbs R/W
R15 301e program counter R/W

There are also plenty of other internal registers:

Name Address Description Size Access from SNES
SFR 3030 status flag register 16 bits R/W
3032 unused
BRAMR 3033 Backup RAM register 8 bits W
PBR 3034 program bank register 8 bits R/W
3035 unused
ROMBR 3036 rom bank register 8 bits R
CFGR 3037 control flags register 8 bits W
SCBR 3038 screen base register 8 bits W
CLSR 3039 clock speed register 8 bits W
SCMR 303a screen mode register 8 bits W
VCR 303b version code register (read only) 8 bits R
RAMBR 303c ram bank register 8 bits R
303d unused
CBR 303e cache base register 16 bits R

SFR Status Flag RegisterEdit

The SFR is a very important register. It controls branching within the SuperFX after evaluating a calculation and can determine the status of the SuperFX when accessed from the SNES CPU.

Bit Description
0 -
1 Z Zero flag
2 CY Carry flag
3 S Sign flag
4 OV Overflow flag
5 G Go flag (set to 1 when the GSU is running)
6 R Set to 1 when reading ROM using R14 address
7 -
8 ALT1 Mode set-up flag for the next instruction
9 ALT2 Mode set-up flag for the next instruction
10 IL Immediate lower 8-bit flag
11 IH Immediate higher 8-bit flag
12 B Set to 1 when the WITH instruction is executed
13 -
14 -
15 IRQ Set to 1 when GSU caused an interrupt. Set to 0 when read by 658c16


BRAMBR Backup RAM RegisterEdit

Used to allow protection of the save ram inside the Game Pak. This should be set to 0(write disable) normally, and 1(write enable) when saving the game.

Bit Description
0 BRAM Flag (0 = write disable, 1=write enable)
1 Not Used
2 Not Used
3 Not Used
4 Not Used
5 Not Used
6 Not Used
7 Not Used

PBR Program Bank RegisterEdit

When the SuperFX is loading code it references the PBR register to specify the bank being used. The LJMP instruction is the general method used to change this register.

Bit Description
0 A16 Address Select
1 A17 Address Select
2 A18 Address Select
3 A19 Address Select
4 A20 Address Select
5 A21 Address Select
6 A22 Address Select
7 A23 Address Select

ROMBR Game Pak ROM Bank RegisterEdit

When using the ROM buffering system, this register specifies the bank of the game pak ROM being copied into the buffer. The ROMB instruction is the general method used to change this register.

Bit Description
0 A16 ROM Address Select
1 A17 ROM Address Select
2 A18 ROM Address Select
3 A19 ROM Address Select
4 A20 ROM Address Select
5 A21 ROM Address Select
6 A22 ROM Address Select
7 A23 ROM Address Select


CFGR Config RegisterEdit

Controls the clock multiplier and interrupt mask.

Bit Description
0 Not used
1 Not Used
2 Not Used
3 Not Used
4 Not Used
5 MS0 (0=standard,1=high speed)
6 Not Used
7 IRQ (0=normal, 1=masked)

Note: If set to run at 21 MHz through the CLSR flag(1), MS0 flag should be set to 0.

SCBR Screen Base RegisterEdit

This register sets the starting address of the graphics storage area. It is written to directly, rather than through a specific instruction.

Bit Description
0 A10 Screen Base Select
1 A11 Screen Base Select
2 A12 Screen Base Select
3 A13 Screen Base Select
4 A14 Screen Base Select
5 A15 Screen Base Select
6 A16 Screen Base Select
7 A17 Screen Base Select

CLSR Clock RegisterEdit

Controls the clock frequency of the Super FX chip.

Bit Description
0 CLSR, 0=10.7 MHz, 1=21.4 MHz
1 Not Used
2 Not Used
3 Not Used
4 Not Used
5 Not used
6 Not Used
7 Not used

SCMR Screen Mode RegisterEdit

This register sets the number of colors and screen height for the PLOT graphics acceleration routine and additionally controls whether the Super FX or SNES has control of the game pak ROM and work RAM.

Bit Description
0 Color Mode MD0
1 Color Mode MD1
2 Screen Height HT0
3 Game pak Work RAM Access - RAN (0=SNES,1=SuperFX)
4 Game pak ROM Access - RON (0=SNES,1=SuperFX)
5 Screen Height HT1
6 Not used
7 Not used

Screen Height Truth Table

HT1 HT0 Mode
0 0 128 pixels
0 1 160 pixels
1 0 192 pixels
1 1 OBJ Mode

Color Mode Truth Table

MD1 MD0 Mode
0 0 4 colors
0 1 16 colors
1 0 Not used
1 1 256 colors

VCR Version RegisterEdit

Can read out the version of the SuperFX chip in use with this register

Bit Description
0 VC0
1 VC1
2 VC2
3 VC3
4 VC4
5 VC5
6 VC6
7 VC7

RAMBR Game Pak RAM Bank RegisterEdit

When writing between the game work RAM and the Super FX registers, this register specifies the bank of the game pak RAM being used. The RAMB instruction is the general method used to change this register. Only one bit is used to set the RAM bank to 0x70 or 0x71

Bit Description
0 A16 (0x70 when 0, 0x71 when 1)
1 Not Used
2 Not Used
3 Not Used
4 Not Used
5 Not Used
6 Not Used
7 Not Used

CBR Cache Base RegisterEdit

This register specifies the address of either the game pak ROM or work RAM where data will be loaded from into the cache. Both the LJMP and CACHE instructions are accepted ways to change this register

Bit Description
0 - (0 when read always)
1 - (0 when read always)
2 - (0 when read always)
3 - (0 when read always)
4 A4
5 A5
6 A6
7 A7
8 A8
9 A9
10 A10
11 A11
12 A12
13 A13
14 A14
15 A15


Memory MapEdit

From Super NES CPU point of viewEdit

Super FX Interface: Mapped to 0x3000 to 0x32FF, in banks 0x00-0x3F and 0x80-0xBF
Game ROM: Mapped to 2 Megabytes from 0x0000-0x8000. Mirror mapped from bank 0x40 0x0000, stored in 32KB blocks.
Game Work RAM: Mapped to 128KB starting from Bank 0x70:0x0000. 8KB mapped from 0x6000 in each of bank 0x00 - 0x3F. RAM mirror is in banks 0x80-0xBF.
Game Save RAM: Mapped to 128KB from bank 0x78:0x0000
SNES CPU ROM: 6 MB ROM is mapped from bank 0x80:0x8000

From Super FX point of viewEdit

Game ROM: Mapped to 2 Megabytes from 0x0000-0x8000. 2 Megabyte mirror mapped from bank 0x40:0x0000 onwards, stored in 32KB blocks. Other memory locations viewable from the SNES should not be addressed.
Game Work RAM: Mapped to 128KB starting from Bank 0x70:0x0000.
Note: The Super FX accesses memory through three bank control registers: Program Bank Register(PBR), ROM Bank Register (ROMBR) and RAM Bank Register

Instruction SetEdit

The SuperFX instruction set is unique from the Super Nintendo's native instruction set. It allows faster, more sophisticated 16-bit mathematical functions and includes some specific graphics manipulation functions.

Some instructions can be assembled as a single byte. This is where both the instruction(nibble) and argument(nibble) are co-joined into the same storage byte. This allows for faster execution and also greater instruction density. These are important objectives when designing a co-processor. One such instruction is "adc", which starts as 0x5 and takes an argument of one of the 16 general purpose superFX registers(0x0-0xF).

Quite a few instructions require an "ALT" instruction to be executed before the opcode. This modifies the behavior of the same opcode to perform a slightly different operation. There are 3 possible ALT codes - ALT1(0x3D), ALT2(0x3E) and ALT1+ALT2(0x3F). In the table below, the specific ALT code is listed for each instruction.

Most instructions rely on pre-defined pointers for the locations of calculation variables. These are the FROM, TO and WITH instructions. The TO and FROM commands specify the general purpose register that is the variable, and the calculation result respectively. WITH defines both of the variable/result in the same command. The variable and result are known as the source and destination registers respectfully.

Instruction Set TableEdit

Instruction Description ALT(Hex) CODE(HEX) ARG Length(B) B ATL1 ALT2 O/V S CY Z ROM RAM Cache Classification Note
adc Add with carry 3D 0x5 Rn 2 0 0 0 * * * * 6 6 2 Arithmetic Operation Instructions
adc Add with carry 3F 0x5 #n 2 0 0 0 * * * * 6 6 2 Arithmetic Operation Instructions
add Add None 0x5 Rn 1 0 0 0 * * * * 3 3 1 Arithmetic Operation Instructions
add Add 3E 0x5 #n 2 0 0 0 * * * * 6 6 2 Arithmetic Operation Instructions
alt1 Set ALT1 mode None 0x3d / 1 / 1 / / / / / 3 3 1 Prefix Flag Instructions
alt2 Set ALT2 mode None 0x3e / 1 / / 1 / / / / 3 3 1 Prefix Flag Instructions
alt3 Set ALT3 mode None 0x3f / 1 / 1 1 / / / / 3 3 1 Prefix Flag Instructions
and Logical AND None 0x7 Rn 1 0 0 0 / * / * 3 3 1 Logical Operation Instructions
and Logical AND 3E 0x7 #n 2 0 0 0 / * / * 6 6 2 Logical Operation Instructions
asr Arithmetric Shift Right None 0x96 / 1 0 0 0 / * * * 3 3 1 Shift Instructions
bcc Branch on carry clear None 0x0c e 2 / / / / / / / 6 6 2 "Jump, Branch and Loop Instructions"
bcs Branch on carry set None 0x0d e 2 / / / / / / / 6 6 2 "Jump, Branch and Loop Instructions"
beq Branch on equal None 0x09 e 2 / / / / / / / 6 6 2 "Jump, Branch and Loop Instructions"
bge Branch on greater than or equal to zero None 0x07 e 2 / / / / / / / 6 6 2 "Jump, Branch and Loop Instructions"
bic Bit clear mask 3D 0x7 Rn 2 0 0 0 / * / * 6 6 2 Logical Operation Instructions
bic Bit clear mask 3F 0x7 #n 2 0 0 0 / * / * 6 6 2 Logical Operation Instructions
blt Branch on less than zero None 0x06 e 2 / / / / / / / 6 6 2 "Jump, Branch and Loop Instructions"
bmi Branch on minus None 0x0b e 2 / / / / / / / 6 6 2 "Jump, Branch and Loop Instructions"
bne Branch on not equal None 0x08 e 2 / / / / / / / 6 6 2 "Jump, Branch and Loop Instructions"
bpl Branch on plus None 0x0a e 2 / / / / / / / 6 6 2 "Jump, Branch and Loop Instructions"
bra Branch always None 0x05 e 2 / / / / / / / 6 6 2 "Jump, Branch and Loop Instructions"
bvc Branch on overflow clear None 0x0e e 2 / / / / / / / 6 6 2 "Jump, Branch and Loop Instructions"
bvs Branch on overflow set None 0x0f e 2 / / / / / / / 6 6 2 "Jump, Branch and Loop Instructions"
cache Set cache base register None 0x02 / 1 0 0 0 / / / / 3-4 3-4 1 GSU Control Instructions
cmode Set Plot mode 3D 0x4e / 2 0 0 0 / / / / 6 6 2 Plot/related instructions
cmp Compare 3F 0x6 Rn 2 0 0 0 * * * * 6 6 2 Arithmetic Operation Instructions
color Set plot color None 0x4e / 1 0 0 0 / / / / 3 3 1 Plot/related instructions
dec Decrement None 0xe Rn 1 0 0 0 / * / * 3 3 1 Arithmetic Operation Instructions
div2 Divide by 2 3D 0x96 / 2 0 0 0 / * * * 6 6 2 Arithmetic Operation Instructions
fmult Fractional signed multiply None 0x9f / 1 0 0 0 / * * * 11 or 7 11 or 7 8 or 4 Arithmetic Operation Instructions Cycles Depends on CFGR Register
from Set Sreg None 0xb Rn 1 / / / / / / / 3 3 1 Prefix Register Instructions
getb Get byte from ROM buffer None 0xef / 1 0 0 0 / / / / 3-8 3-8 1-6 Data Transfer From game pak ROM to register Cycles varies due to ROM buffer
getbh Get high byte from ROM buffer 3D 0xef / 2 0 0 0 / / / / 6-10 6-9 2-6 Data Transfer From game pak ROM to register Cycles varies due to ROM buffer
getbl Get low byte from ROM buffer 3E 0xef / 2 0 0 0 / / / / 6-10 6-9 2-6 Data Transfer From game pak ROM to register Cycles varies due to ROM buffer
getbs Get signed byte from ROM buffer 3F 0xef / 2 0 0 0 / / / / 6-10 6-9 2-6 Data Transfer From game pak ROM to register Cycles varies due to ROM buffer
getc Get byte from ROM to color register None 0xdf / 1 0 0 0 / / / / 3-10 3-9 1-6 Data Transfer From game pak ROM to register Cycles varies due to ROM buffer
hib Value of high byte of register None 0xc0 / 1 0 0 0 / * / * 3 3 1 Byte transfer Instructions
ibt Load immediate byte data None 0xa "Rn, #pp" 2 0 0 0 / / / / 6 6 2 Data Transfer / Immediate data to register
inc Increment None 0xd Rn 1 0 0 0 / * / * 3 3 1 Arithmetic Operation Instructions
iwt Load immediate word data None 0xf "Rn, #xx" 3 0 0 0 / / / / 9 9 3 Data Transfer / Immediate data to register
jmp Jump None 0x9 Rn 1 0 0 0 / / / / 3 3 1 "Jump, Branch and Loop Instructions"
ldb Load byte data from RAM 3D 0x4 Rm 1 0 0 0 / / / / 11 13 6 Data Transfer From game pak RAM to register
ldw Load word data from RAM None 0x4 Rm 1 0 0 0 / / / / 10 12 7 Data Transfer From game pak RAM to register
lea Load effective address None 0xf "Rn, xx" 3 0 0 0 / / / / 9 9 3 Macro Instructions
link Link Return Address None 0x9 #n 1 0 0 0 / / / / 3 3 1 "Jump, Branch and Loop Instructions"
ljmp Long jump 3D 0x9 Rn 2 0 0 0 / / / / 6 6 2 "Jump, Branch and Loop Instructions"
lm "Load word data from RAM, using 16 bits" 3D 0xf "Rn, (xx)" 2 0 0 0 / / / / 20 21 11 Data Transfer From game pak RAM to register
lms "Load word data from RAM, short address" 3D 0xa "Rn, (yy)" 2 0 0 0 / / / / 17 17 10 Data Transfer From game pak RAM to register
lmult 16x16 signed multiply 3D 0x9f / 2 0 0 0 / * * * 10 or 14 10 or 14 5 or 9 Arithmetic Operation Instructions Cycles Depends on CFGR Register
lob Value of low byte of register None 0x9e / 1 0 0 0 / * / * 3 3 1 Byte transfer Instructions
loop Loop None 0x3c / 1 0 0 0 / * / * 3 3 1 "Jump, Branch and Loop Instructions"
lsr Logical shift right None 0x03 / 1 0 0 0 / 0 * * 3 3 1 Shift Instructions
merge Merge high byte of R8 and R7 None 0x70 / 1 0 0 0 / / / / 6 6 2 Byte transfer Instructions
move Move word data from Rn' to Rn None 0x2n1n' "Rn, Rn'" 2 0 0 0 / / / / 6 6 2 Data transfer register to register
moves Move word data from Rn' to Rn and set flags None 0x2nBn' "Rn, Rn'" 2 0 0 0 / / / / 6 6 2 Data transfer register to register
mult Signed multiply None 0x8 Rn 1 0 0 0 / * / * 3 or 5 3 or 5 1 or 2 Arithmetic Operation Instructions Cycles Depends on CFGR Register
mult Signed multiply 3E 0x8 #n 2 0 0 0 / * / * 6 or 8 6 or 8 2 or 3 Arithmetic Operation Instructions Cycles Depends on CFGR Register
nop No operation None 0x01 / 1 0 0 0 / / / / 3 3 1 GSU Control Instructions
not Invert all bits None 0x4f / 1 0 0 0 / / / / 3 3 1 Logical Operation Instructions
or Logical OR None 0xc Rn 1 0 0 0 / / / / 3 3 1 Logical Operation Instructions
or Logical OR 3E 0xc #n 2 0 0 0 / / / / 6 6 2 Logical Operation Instructions
plot Plot pixel None 0x4c / 1 0 0 0 / / / / 3-48 3-51 1-48 Plot/related instructions Cycles varies due to RAM buffer and program
ramb Set RAM data bank 3E 0xdf / 2 0 0 0 / / / / 6 6 2 Bank Set/up Instructions
rol Rotate left through carry None 0x04 / 1 0 0 0 / * * * 3 3 1 Shift Instructions
romb Set ROM Data bank 3F 0xdf / 2 0 0 0 / / / / 6 6 2 Bank Set/up Instructions
ror Rotate right through carry None 0x97 / 1 0 0 0 / * * * 3 3 1 Shift Instructions
rpix Read pixel color 3D 0x4c / 2 0 0 0 / * / * 24-80 24-78 20-74 Plot/related instructions
sbc Subtract with carry 3D 0x6 Rn 2 0 0 0 * * * * 6 6 2 Arithmetic Operation Instructions
sbk "Store word data, last RAM address used" None 0x9 / 1 0 0 0 / / / / 3-8 7-11 1-6 Data Transfer From register to game pak RAM
sex Sign extend register None 0x95 / 1 0 0 0 / * / * 3 3 1 Byte transfer Instructions
sm Store word data to RAM using 16 bits 3E 0xf "Rn, (xx)" 3 0 0 0 / / / / 12-17 16-20 4-9 Data Transfer From register to game pak RAM Cycles varies due to RAM buffer and program
sms "Store word data to RAM , short address" 3E 0xa "Rn, (yy)" 3 0 0 0 / / / / 9-14 13-17 3-8 Data Transfer From register to game pak RAM Cycles varies due to RAM buffer and program
stb Store byte data to RAM 3D 0x3 Rm 2 0 0 0 / / / / 6-9 8-14 2-5 Data Transfer From register to game pak RAM Cycles varies due to RAM buffer and program
stop Stop processor None 0x00 / 1 0 0 0 / / / / 3 3 1 GSU Control Instructions
stw Store word data to RAM None 0x3 Rm 1 0 0 0 / / / / 3-8 7-11 1-6 Data Transfer From register to game pak RAM Cycles varies due to RAM buffer and program
sub Subtract None 0x6 Rn 1 0 0 0 * * * * 3 3 1 Arithmetic Operation Instructions
sub Subtract 3E 0x6 #n 2 0 0 0 * * * * 6 6 2 Arithmetic Operation Instructions
swap Swap low and high byte None 0x4d / 1 0 0 0 / * / * 3 3 1 Byte transfer Instructions
to Set Dreg None 0x1 Rn 1 / / / / / / / 3 3 1 Prefix Register Instructions
umult Unsigned multiply 3D 0x8 Rn 2 0 0 0 / * / * 6 or 8 6 or 8 2 or 3 Arithmetic Operation Instructions Number of cycles depends on CONFIG register
umult Unsigned multiply 3F 0x8 #n 2 0 0 0 / * / * 6 or 8 6 or 8 2 or 3 Arithmetic Operation Instructions ?
with Set Sreg and Dreg None 0x2 "Rn, ?" ? ? ? ? ? ? ? ? ? ? ? Prefix Register Instructions ?
xor Logical Exclusive Or 3D 0xc Rn 2 ? ? ? ? ? ? ? ? ? ? Logical Operation Instructions ?
xor Logical Exclusive Or 3F 0xc #n 2 ? ? ? ? ? ? ? ? ? ? Logical Operation Instructions ?

Special FunctionsEdit

The SuperFX processor contains two special functions, Bitmap Emulation and Fast Multiply. The Bitmap Emulation mode is a special kind of pixel plotter which makes it quick and easy to send graphics generated by the SuperFX to the SNES PPU. The Fast Multiply function accelerates the calculation of vectors and matrices necessary for the psuedo 3D worlds seen by SuperFX games.

Bitmap EmulationEdit

TODO

Fast MultiplyEdit

TODO