Hopelessly passionate husband, engineer, hacker, gamer, artist, and tea addict.

The SMB3 Input Polling Glitch

In addition to tracking down how the fireball works in SMB3, dwangoAC has also asked me to give my interpretation of how the bug works in the recent SMB3 TAS run. This is all of my analysis after having spent many hours playing with it in a debugger.

If you haven't seen the SGDQ video of it yet, it's pretty game-breaking. You beat the game in about a second:

Like before, I used a combination of Binary Ninja (for sifting through disassembly) and FCEUX (for emulating and debugging) to do my analysis. I was also recently made aware of Southbird's amazing complete disassembly for SMB3 and used his work to supplement and check my own. I really wish I'd known about this before doing a lot of this analysis...but, such is life.

In order to replicate the glitch, you'll also need this lua script and this input file. You'll need to load the lua script in FCEUX by doing File -> New Lua Script Window... and then browsing to it. The inputs file needs to be in the same directory as the lua script to work. These files are not mine - they're the work of total_, who originally proved this bug could work.

The Bug

The input processing routine is located at $FEBE. It uses the function at $FF12 twice to poll the second controller (Y = 1 at $FEBE):

0F:FEBE: A001    LDY #$01     ; set controller to index 1 (controller 2)
0F:FEC0: 2012FF  JSR $FF12    ; call $FF12 to poll the controller
0F:FEC3: A500    LDA $0000    ; load byte at $0000 ($FF12 return value) into A
0F:FEC5: 48      PHA          ; push byte in A onto the stack
0F:FEC6: 2012FF  JSR $FF12    ; call $FF12 to poll the controller again
0F:FEC9: 68      PLA          ; pull byte from the stack back into A
0F:FECA: C500    CMP $0000    ; compare $0000 and A (the two poll results)
0F:FECC: D0F5    BNE $FEC3    ; if they don't match, poll again
                              ; NOTE: each subsequent loop through this basic
                              ;       block will compare the new value with the
                              ;       previous and the oldest value will be tossed

Why do we poll the controller twice, you ask? It turns out there's a hardware problem with the NES. Specifically, use of DPCM audio during a read of the joypad inputs can cause "phantom" button presses. As the NESDev wiki's errata page notes, a common way to get around this is to poll for input twice. If you get two matching reads, you can be sure those were the correct buttons pressed by the player.

So, the authors of SMB3 poll the controller twice in an attempt to get the user's "real" input. Under normal circumstances, this works fine. Unfortunately, when you're able to press buttons at inhuman speeds, it falls apart: If we can cause each poll's result to never match the previous, this loop will never exit - it'll continue indefinitely. Or, rather, it would continue indefinitely if interrupts weren't a thing.

An interrupt is when the hardware stops executing the current code and switches to executing something else. There's many reasons why this can happen (like drawing the screen) and they happen pretty often on the NES. The NES supports two types of interrupts: maskable and non-maskable. A maskable interrupt is one that can, itself, be interrupted. The non-maskable interrupts can't be interrupted by anything (they keep the interrupt flag set, usually with the sei instruction, to prevent other interrupts).

The input processing routine at $FEBE is triggered by an interrupt. This means there's a hardware timer that will stop whatever code is executing and cause the input processing routine to be executed instead. The interrupts that trigger input processing are non-maskable, which means they can't be interrupted. But, after they've done their other jobs, they disable the interrupt flag with a cli instruction before calling the input processing routine! This means that the input processing can be interrupted (you can see this at $F55A, $F60C, $F6F5, and $F779 on ROM bank 31).

So, rather than execute indefinitely in an infinite loop, our interrupt is, itself, interrupted. This can cause interesting things to happen like the SGDQ 2016 speed run I showed above.

The Exploit

Now that we understand the bug, let's look at how it can actually get us to the end credits sequence of the game.

When the game is started, we don't make any weird inputs for a little while. The game, at this point, runs normally. Then, on frame 33, we trigger the bug by mashing inputs at inhuman speed. This causes the input processing routine at $FEBE to call the input polling function at $FF12 over and over and over again while trying, in vain, to find two identical inputs (see the disassembly above for how this works).

Eventually, another maskable interrupt fires and gets the IRQ handler called:

0F:F795:78        SEI                             ; disable maskable interrupts
0F:F796:08        PHP                             ; save register state
0F:F797:48        PHA
0F:F798:8A        TXA
0F:F799:48        PHA
0F:F79A:98        TYA
0F:F79B:48        PHA
0F:F79C:AD 64 79  LDA $7964                       ; check the reset latch
0F:F79F:C9 5A     CMP #$5A
0F:F7A1:F0 0D     BEQ $F7B0                       ; skip reset if it was #$5A
0F:F7B3:48        PHA
0F:F7B4:29 7F     AND #$7F
0F:F7B6:8D 10 40  STA $4010
0F:F7B9:AD 01 01  LDA $0101                       ; get status bar mode (#$20 = title screen)
0F:F7BC:C9 80     CMP #$80                        ; if we're not in mode #$80...
0F:F7BE:D0 03     BNE $F7C3                       ; ...skip to $F7C3
0F:F7C3:C9 40     CMP #$40                        ; if we're not in mode #$40...
0F:F7C5:D0 03     BNE $F7CA                       ; ...skip to $F7CA
0F:F7CA:C9 20     CMP #$20                        ; if we're not in mode #$20...
0F:F7CC:D0 03     BNE $F7D1                       ; ...skip to $F7D1
0F:F7CE:4C 26 A8  JMP $A826                       ; jump to $A826 because we are in mode #$20

The game expects ROM bank 24 to be loaded in memory here. Address $A826 contains the handler for the IRQ interrupt used during the title and ending screens. Unfortunately, this is not the case. Instead, we have ROM bank 26 loaded. The code at $A826 on this ROM looks like this:

0D:A826:61 02     ADC ($02,X) @ $0000             ; last 2 bytes of an STA instruction
0D:A828:8D 6D 02  STA $026D                       ; store 3 values into sprite RAM
0D:A82B:8D 71 02  STA $0271
0D:A82E:8D 7D 02  STA $027D
0D:A831:B9 B5 A7  LDA $A7B5,Y @ $A7B6             ; load another value
0D:A834:8D 65 02  STA $0265                       ; store 4 values into sprite RAM
0D:A837:8D 69 02  STA $0269
0D:A83A:8D 75 02  STA $0275
0D:A83D:8D 79 02  STA $0279
0D:A840:CE ED 03  DEC $03ED                       ; decrement a value
0D:A843:AD ED 03  LDA $03ED                       ; load that value
0D:A846:D0 03     BNE $A84B                       ; branch to $A848 if it != 0
0D:A84B:60        RTS ---------------------       ; return from subroutine

The rts instruction assumes the next 2 bytes on the stack are the previous location we were executing when we branched to this subroutine. In this case, they are not. Instead, they are the bytes we pushed onto the stack here (re-copied from above):

0F:F79A:98        TYA
0F:F79B:48        PHA
    ; ...snip...
0F:F7B3:48        PHA

As a result, we return to the stack (the rts instruction adds 1 to the address it finds, which is why we begin executing at $0101 and not $0100):

  :0101:20 00 00  JSR $0000

The #$20 is here because we are on the title screen and, hilariously, is interpreted as a jsr instruction (jump to subroutine) by the processor. This transfers execution to the beginning of memory:

  :0000:A8        TAY

After this instruction (which is from our last controller input that was read), we will mostly execute NULLs (#$00). These decode as brk (break) instructions. A brk will trigger a non-maskable interrupt. When that interrupt returns, it will return to the next instruction plus one (so, the brk at $0001 will cause an interrupt to return to the brk at $0003). As a result of us directly executing our (mostly empty) RAM, we trigger a large number of mostly brk instructions. Eventually, these lead to executing the memory at $00F5:

  :00F5:50 98     BVC $008F

How did these get here? They're our controller inputs! Remember the code at the very beginning where I showed the bug and how it works? Here's what happens next under normal conditions (we receive two inputs that match from polling the joypad in the $FF12 subroutine):

0F:FECE: 0501    ORA $0001            ; OR the last controller input with the previous
                                      ; one we matched (no idea why we do this - they should be identical)
0F:FED0: 48      PHA                  ; push the overall poll result onto the stack
0F:FED1: 290F    AND #$0F             ; mask off the upper nibble of the poll result
0F:FED3: AA      TAX                  ; copy the remainder into X (these are the directional buttons ONLY)
0F:FED4: 68      PLA                  ; pull the overall poll result back into A
0F:FED5: 29F0    AND #$F0             ; mask off the lower nibble of the poll result
0F:FED7: 1DAEFE  ORA $FEAE, X         ; use the value in X to index the table at $FEAE
                                      ; take the value out of that table and OR it with A to combine them
                                      ; table is: 00 01 02 00 04 05 06 04 08 09 0A 08 00 01 02 00
                                      ; this is done to filter out invalid inputs like simultaneous
                                      ; up and down presses and simultaneous left and right presses
0F:FEDA: 48      PHA                  ; push the normalized input value onto the stack
0F:FEDB: 8502    STA $0002            ; also store this value in $0002
0F:FEDD: 59F700  EOR $00F7, Y         ; XOR the value at $00F7[Y] (previous input) with the current input
                                      ; to determine what new buttons were pressed/released
0F:FEE0: 2502    AND $0002            ; AND this value with $0002 (now only newly pressed buttons)
0F:FEE2: 99F500  STA $00F5, Y         ; store this value at $00F5[Y]
0F:FEE5: 8518    STA $0018            ; also store this value at $0018
0F:FEE7: 68      PLA                  ; pull the previously pushed value (normalized input) into A
0F:FEE8: 99F700  STA $00F7, Y         ; store this value at $00F7[Y]
0F:FEEB: 8517    STA $0017            ; also store this value at $0017
0F:FEED: 88      DEY                  ; decrement Y (the controller index)
0F:FEEE: 10D0    BPL $FEC0            ; if we are non-negative, branch back to $FEC0
                                      ; NOTE: this will re-run the processing loop above
                                      ;       for the first controller when Y = 0

In a nutshell, the data at $00F6 is the current new button presses from the second controller and $00F5 is the current new button presses from the first controller. As the link above says, these button presses are encoded as:

bit:     7     6     5     4     3     2     1     0
button:  A     B  Select Start  Up   Down  Left  Right

Thus, a value of #$50 (0b01010000) in $00F5 means we were pressing the B and Start buttons on the first controller. A value of #$98 (0b10011000) in $00F6 means we were pressing A, Start, and Up on the second controller. If you inspect the inputs file (linked above), you'll see this is the case.

After we execute the bvc above (which takes us back to $008F because our carry flag is not set), we'll stumble back through to $00F5 again. Before we get there, however, another input polling interrupt fires. Because we are no longer mashing buttons to trigger the bug, but holding down specific buttons on each controller, the interrupt will return normally. The next time we reach $00F5, it looks like this:

  :00F5:0A        ASL
  :00F6:20 5A B8  JSR $B85A

The asl here is irrelevant. What really matters is the jsr at $00F6 that we execute directly after it. These bytes, again, are here because of our controller inputs.

The #$5A (0b01011010) at $00F7 is the buttons we pressed during the last poll on the first controller (B, Start, Up, and Left). The #$B8 (0b10111000) at $00F8 is the buttons we pressed during the last poll on the second controller (A, Select, Start, and Up). The #$20 (0b00100000) at $00F6, as I mentioned earlier, is the new buttons we pressed since the last input on the second controller (which is just Select).

All of the above creates the jsr to $B85A (addresses are in little-endian order, so you flip the #$B8 and #$5A). This just so happens to be, according to Southbird's disassembly, the start of the routine that handles Mario's reunion with Princess Peach at the end of the game.

That's (apparently) all you need to beat SMB3! Crazy, right?