In addition to tracking down how the fireball works in SMB3, dwangoAC has also asked me to give my interpretation of how the bug works in the recent SMB3 TAS run. This is all of my analysis after having spent many hours playing with it in a debugger.
If you haven't seen the SGDQ video of it yet, it's pretty game-breaking. You beat the game in about a second:
Like before, I used a combination of Binary Ninja (for sifting through disassembly) and FCEUX (for emulating and debugging) to do my analysis. I was also recently made aware of Southbird's amazing complete disassembly for SMB3 and used his work to supplement and check my own. I really wish I'd known about this before doing a lot of this analysis...but, such is life.
In order to replicate the glitch, you'll also need this lua script and
this input file. You'll need to load the lua script in FCEUX by doing
File -> New Lua Script Window...
and then browsing to it. The inputs
file needs to be in the same directory as
the lua script to work. These files are not mine - they're the work of total_, who
originally proved this bug could work.
The Bug
The input processing routine is located at $FEBE. It uses the function at $FF12
twice to poll the second controller (Y = 1
at $FEBE):
0F:FEBE: A001 LDY #$01 ; set controller to index 1 (controller 2)
0F:FEC0: 2012FF JSR $FF12 ; call $FF12 to poll the controller
0F:FEC3: A500 LDA $0000 ; load byte at $0000 ($FF12 return value) into A
0F:FEC5: 48 PHA ; push byte in A onto the stack
0F:FEC6: 2012FF JSR $FF12 ; call $FF12 to poll the controller again
0F:FEC9: 68 PLA ; pull byte from the stack back into A
0F:FECA: C500 CMP $0000 ; compare $0000 and A (the two poll results)
0F:FECC: D0F5 BNE $FEC3 ; if they don't match, poll again
; NOTE: each subsequent loop through this basic
; block will compare the new value with the
; previous and the oldest value will be tossed
Why do we poll the controller twice, you ask? It turns out there's a hardware problem with the NES. Specifically, use of DPCM audio during a read of the joypad inputs can cause "phantom" button presses. As the NESDev wiki's errata page notes, a common way to get around this is to poll for input twice. If you get two matching reads, you can be sure those were the correct buttons pressed by the player.
So, the authors of SMB3 poll the controller twice in an attempt to get the user's "real" input. Under normal circumstances, this works fine. Unfortunately, when you're able to press buttons at inhuman speeds, it falls apart: If we can cause each poll's result to never match the previous, this loop will never exit - it'll continue indefinitely. Or, rather, it would continue indefinitely if interrupts weren't a thing.
An interrupt is when the hardware stops executing the current code and
switches to executing something else. There's many reasons why this can happen (like drawing the screen) and they
happen pretty often on the NES. The NES supports two types of interrupts: maskable and non-maskable. A
maskable interrupt is one that can, itself, be interrupted. The non-maskable interrupts can't be interrupted by
anything (they keep the interrupt flag set, usually with the sei
instruction, to prevent other interrupts).
The input processing routine at $FEBE is triggered by an interrupt. This means there's a hardware timer that will
stop whatever code is executing and cause the input processing routine to be executed instead. The interrupts that
trigger input processing are non-maskable, which means they can't be interrupted. But, after they've done their
other jobs, they disable the interrupt flag with a cli
instruction before calling the input processing routine!
This means that the input processing can be interrupted (you can see this at $F55A, $F60C, $F6F5, and $F779 on ROM
bank 31).
So, rather than execute indefinitely in an infinite loop, our interrupt is, itself, interrupted. This can cause interesting things to happen like the SGDQ 2016 speed run I showed above.
The Exploit
Now that we understand the bug, let's look at how it can actually get us to the end credits sequence of the game.
When the game is started, we don't make any weird inputs for a little while. The game, at this point, runs normally. Then, on frame 33, we trigger the bug by mashing inputs at inhuman speed. This causes the input processing routine at $FEBE to call the input polling function at $FF12 over and over and over again while trying, in vain, to find two identical inputs (see the disassembly above for how this works).
Eventually, another maskable interrupt fires and gets the IRQ handler called:
0F:F795:78 SEI ; disable maskable interrupts
0F:F796:08 PHP ; save register state
0F:F797:48 PHA
0F:F798:8A TXA
0F:F799:48 PHA
0F:F79A:98 TYA
0F:F79B:48 PHA
0F:F79C:AD 64 79 LDA $7964 ; check the reset latch
0F:F79F:C9 5A CMP #$5A
0F:F7A1:F0 0D BEQ $F7B0 ; skip reset if it was #$5A
0F:F7B0:AD FF 7A LDA $7AFF
0F:F7B3:48 PHA
0F:F7B4:29 7F AND #$7F
0F:F7B6:8D 10 40 STA $4010
0F:F7B9:AD 01 01 LDA $0101 ; get status bar mode (#$20 = title screen)
0F:F7BC:C9 80 CMP #$80 ; if we're not in mode #$80...
0F:F7BE:D0 03 BNE $F7C3 ; ...skip to $F7C3
0F:F7C3:C9 40 CMP #$40 ; if we're not in mode #$40...
0F:F7C5:D0 03 BNE $F7CA ; ...skip to $F7CA
0F:F7CA:C9 20 CMP #$20 ; if we're not in mode #$20...
0F:F7CC:D0 03 BNE $F7D1 ; ...skip to $F7D1
0F:F7CE:4C 26 A8 JMP $A826 ; jump to $A826 because we are in mode #$20
The game expects ROM bank 24 to be loaded in memory here. Address $A826 contains the handler for the IRQ interrupt used during the title and ending screens. Unfortunately, this is not the case. Instead, we have ROM bank 26 loaded. The code at $A826 on this ROM looks like this:
0D:A826:61 02 ADC ($02,X) @ $0000 ; last 2 bytes of an STA instruction
0D:A828:8D 6D 02 STA $026D ; store 3 values into sprite RAM
0D:A82B:8D 71 02 STA $0271
0D:A82E:8D 7D 02 STA $027D
0D:A831:B9 B5 A7 LDA $A7B5,Y @ $A7B6 ; load another value
0D:A834:8D 65 02 STA $0265 ; store 4 values into sprite RAM
0D:A837:8D 69 02 STA $0269
0D:A83A:8D 75 02 STA $0275
0D:A83D:8D 79 02 STA $0279
0D:A840:CE ED 03 DEC $03ED ; decrement a value
0D:A843:AD ED 03 LDA $03ED ; load that value
0D:A846:D0 03 BNE $A84B ; branch to $A848 if it != 0
0D:A84B:60 RTS --------------------- ; return from subroutine
The rts
instruction assumes the next 2 bytes on the stack are the previous location we were executing when we
branched to this subroutine. In this case, they are not. Instead, they are the bytes we pushed onto the stack
here (re-copied from above):
0F:F79A:98 TYA
0F:F79B:48 PHA
; ...snip...
0F:F7B0:AD FF 7A LDA $7AFF
0F:F7B3:48 PHA
As a result, we return to the stack (the rts
instruction adds 1 to the address it finds, which is why we
begin executing at $0101 and not $0100):
:0101:20 00 00 JSR $0000
The #$20 is here because we are on the title screen and, hilariously, is interpreted as a jsr
instruction
(jump to subroutine) by the processor. This transfers execution to the beginning of memory:
:0000:A8 TAY
After this instruction (which is from our last controller input that was read), we will mostly execute NULLs (#$00).
These decode as brk
(break) instructions. A brk
will trigger a non-maskable interrupt. When that interrupt
returns, it will return to the next instruction plus one (so, the brk
at $0001 will cause an interrupt to return
to the brk
at $0003). As a result of us directly executing our (mostly empty) RAM, we trigger a large number
of mostly brk
instructions. Eventually, these lead to executing the memory at $00F5:
:00F5:50 98 BVC $008F
How did these get here? They're our controller inputs! Remember the code at the very beginning where I showed the bug and how it works? Here's what happens next under normal conditions (we receive two inputs that match from polling the joypad in the $FF12 subroutine):
0F:FECE: 0501 ORA $0001 ; OR the last controller input with the previous
; one we matched (no idea why we do this - they should be identical)
0F:FED0: 48 PHA ; push the overall poll result onto the stack
0F:FED1: 290F AND #$0F ; mask off the upper nibble of the poll result
0F:FED3: AA TAX ; copy the remainder into X (these are the directional buttons ONLY)
0F:FED4: 68 PLA ; pull the overall poll result back into A
0F:FED5: 29F0 AND #$F0 ; mask off the lower nibble of the poll result
0F:FED7: 1DAEFE ORA $FEAE, X ; use the value in X to index the table at $FEAE
; take the value out of that table and OR it with A to combine them
; table is: 00 01 02 00 04 05 06 04 08 09 0A 08 00 01 02 00
; this is done to filter out invalid inputs like simultaneous
; up and down presses and simultaneous left and right presses
0F:FEDA: 48 PHA ; push the normalized input value onto the stack
0F:FEDB: 8502 STA $0002 ; also store this value in $0002
0F:FEDD: 59F700 EOR $00F7, Y ; XOR the value at $00F7[Y] (previous input) with the current input
; to determine what new buttons were pressed/released
0F:FEE0: 2502 AND $0002 ; AND this value with $0002 (now only newly pressed buttons)
0F:FEE2: 99F500 STA $00F5, Y ; store this value at $00F5[Y]
0F:FEE5: 8518 STA $0018 ; also store this value at $0018
0F:FEE7: 68 PLA ; pull the previously pushed value (normalized input) into A
0F:FEE8: 99F700 STA $00F7, Y ; store this value at $00F7[Y]
0F:FEEB: 8517 STA $0017 ; also store this value at $0017
0F:FEED: 88 DEY ; decrement Y (the controller index)
0F:FEEE: 10D0 BPL $FEC0 ; if we are non-negative, branch back to $FEC0
; NOTE: this will re-run the processing loop above
; for the first controller when Y = 0
In a nutshell, the data at $00F6 is the current new button presses from the second controller and $00F5 is the current new button presses from the first controller. As the link above says, these button presses are encoded as:
bit: 7 6 5 4 3 2 1 0
button: A B Select Start Up Down Left Right
Thus, a value of #$50 (0b01010000) in $00F5 means we were pressing the B and Start buttons on the first controller. A
value of #$98 (0b10011000) in $00F6 means we were pressing A, Start, and Up on the second controller. If you inspect
the inputs
file (linked above), you'll see this is the case.
After we execute the bvc
above (which takes us back to $008F because our carry flag is not set), we'll stumble back
through to $00F5 again. Before we get there, however, another input polling interrupt fires. Because we are no
longer mashing buttons to trigger the bug, but holding down specific buttons on each controller, the interrupt will
return normally. The next time we reach $00F5, it looks like this:
:00F5:0A ASL
:00F6:20 5A B8 JSR $B85A
The asl
here is irrelevant. What really matters is the jsr
at $00F6 that we execute directly after it. These
bytes, again, are here because of our controller inputs.
The #$5A (0b01011010) at $00F7 is the buttons we pressed during the last poll on the first controller (B, Start, Up, and Left). The #$B8 (0b10111000) at $00F8 is the buttons we pressed during the last poll on the second controller (A, Select, Start, and Up). The #$20 (0b00100000) at $00F6, as I mentioned earlier, is the new buttons we pressed since the last input on the second controller (which is just Select).
All of the above creates the jsr
to $B85A (addresses are in
little-endian order, so you flip the #$B8 and #$5A). This just so happens
to be, according to Southbird's disassembly, the start of the routine that handles Mario's reunion with Princess
Peach at the end of the game.
That's (apparently) all you need to beat SMB3! Crazy, right?