Home
NES
Atari 2600
Neo geo Pocket Color
NES
Show the relation between the CPU
                and PPU
Click on picture for more details about the ports (From NESDEV wiki)
Memory availability
    The Picture processing unit (PPU) generates a video signal for 1 frame of animation, then it rests for a brief period called vertical blanking. The CPU can load graphics data into the PPU only during this rest period. From the interrupt signaling the end of the display to the beginning of the next, the PPU stays off for 2273 machine cycles. Considering overhead from the interrupt, the user has roughly 2250 cycles. To get the most out of the limited time, we need to do nothing but load data as fast as possible. A good way to visualize this by thinking of a clock face and imagining one frame as one hour. Vertical Blank would be a small portion of that hour, say the first 5 minutes (when the minute hand is between 12 and 1). No matter what you do in your program the minute hand is always spinning around the clock, moving in and out of that 5-minute period... and the PPU is always moving in and out of Vertical Blank. To update the PPU in your game, you must make sure your drawing code falls within that small period. Failure to do this will cause all sorts of display glitches.

Computing changes during this time should be avoided. Instead, one should prepare what the CPU will write to the PPU beforehand, and then write the results into Video ram during the vertical blank. To achieve this, a buffer is used to store and read the results.
Diagram of
        what I just explained
The transfer of data from the buffer into the PPU must also be optimized to move as fast as possible, therefore 2 techniques are often employed to streamline the process:
1.    RLE tile compression as commands for the loading code to use
2.    Unrolled loops for bulk transfers

The drawing routine should also follow a data format that store information that tells the drawing code what to draw, and how to draw it. This format should be as generic and flexible as possible. Most implementations use a system that uses a chain of ‘strings. Each ‘string’ tells the drawing code what to draw, where, and how. Now there are multiple ways to do these strings, but a common way is to use a Compact stripe & plot based RLE. This format has 2 modes depending if the draw code is working with a strip of tiles or plotting individual tiles.

Strips:
byte 1: command & length
byte 2: high address of PPU Address
byte 3: low address of PPU Address
byte 4-x: data

command & length format
DTCCCCCC
D = direction (down / right)
T = type (run (copy the same byte multiple times) / literal (read a new byte of data each time))
C = Count (2-63) [Count of 1 invokes plot command]

Length Value
Meaning
00
End of strings
01
Plot Mode
02 - 3F
Literal to right: Copy n+1 bytes to video memory addresses increasing to right
40 - 7F
Run to right: Copy one byte n-63 times to video memory address increasing to right
80 - BF
Literal down: Copy n - 127 bytes to video memory addresses increasing down
C0 - FF
Run down: copy one byte n - 191 times to video memory addressing increasing down

Plot:
byte 1: PLOT_TILES constant (length of 1)
byte 2: high address of PPUADDR
byte 3: low address of PPUADDR
byte 4: data
- Repeat bytes 2,3, & 4 -
byte x: END_PLOT constant (negative number)

Another part of the code should be unrolled loops. The goal of this is to increase the program’s speed, at the cost of space, by eliminating instructions that control the loop such as “end of loop” tests on each iteration and reducing branch penalties.
6502 Rolled Loop

    ldx    #8     ;2
-   jsr    SomeFunction    ;12
    dex            ;2
    bne    -       ;3 / 2

Bytes: 8
Cycles: 137
6502 unrolled loop
    jsr    SomeFunction    ;12
    jsr    SomeFunction    ;12
    jsr    SomeFunction    ;12
    jsr    SomeFunction    ;12
    jsr    SomeFunction    ;12
    jsr    SomeFunction    ;12
    jsr    SomeFunction    ;12
    jsr    SomeFunction    ;12

Bytes: 36
Cycles: 96
This simple techniques in this exaple gives us a ~30% speed boost.

My implementation:
PPULoading.asm


Useful Links:
        NESDEV wiki
        CPU simulator

Return to top