Replies: 1 comment 3 replies
-
|
Amazing work! I researched this glitch in an exclusively blackbox manner before since reversing anything lower than the schematics is beyond my skills and toolset, and my explanation of the glitch were mostly handwaving and educated guesses, so it's very nice to see some "real" research being put into it! Looking at my code, it seems to explain why certain rows (e.g. FE40) behave in a more glitched manner than the others (on my DMG, read glitches on that row are non-deterministic, unlike other rows which are fully deterministic) – it's because transitions from row FE38 to FE40 have difference in 4 different bits, meaning you're affected by propagation delays even more. It seems very similar to the LFSR corruption glitch I've been researching recently. Great work! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I got some detail about the reason behind which rows are affected by the OAM bug. I've seen some comments in
memory.cthat you've already figured out that with some specific timings the first two objects can get corrupted as well. Since you put so much work into figuring all this out, I thought maybe what I found out about this is somewhat interesting to you. Which rows are affected is actually decided by the PPU's address transition that is done while the CPU triggers the bug. The address lines inside the chip all have different lengths and therefore different signal propagation delays. So when the PPU changes the current address it wants to access, the OAM can see multiple different addresses while the address lines change one by one. Those intermediate addresses determine which rows are affected by the corruption.I figured this out because I'm working on an accurate simulation of a complete DMG-CPU B chip for some years now. (see here: https://github.com/msinger/dmg-sim)
My simulation can predict which memory rows get corrupted, but it does not predict the correct value for the first word of the row. It just copies entire rows. I don't know (yet) what detail I'm missing for that. I ran this example code to trigger corruption:
On SameBoy and on a real Game Boy, this produces the following OAM content when it was initialized with Blargg's test pattern:
This is the wave diagram generated by the simulation:

You can see that the corruption happens around the red vertical marker. The CPU at the bottom is performing a read access. Its
rdsignal is high and the applied address is 0xFE00. When the CPU outputs an OAM address, thefexxsignal near the top rises. This signal inhibits the bit line precharging of the two OAM RAMs (bl_pch_nsignals). Because the CPU also performs a read, the word line driver precharging gets also inhibited (wldrv_pch_n). Word lines are the rows of the RAMs; bit lines are the columns.I documented the SRAM blocks (and all other cells in the chip) here: https://iceboy.a-singer.de/doc/dmg_cells.html#sram
Also a look in the schematics may help understanding things: https://github.com/msinger/dmg-schematics/releases
OAMs are at pages 34 & 35 in the PDF.
The signal vector
wlin the diagram above shows which word lines of the RAM are currently enabled. In normal operation, only one word line is enabled at a time. At the point of the corruption howeverwlis 7. This means that three bit lines are enabled simultaneously: the first, second and third. This causes all bits in these rows to be short-circuited together vertically in their respective columns, forcing them to have an equal state.We can zoom in to see how this came to be:

We see that the address lines
a[6:2]and their complementsa_n[6:2]are changing their state one after another. In normal operation, the word line drivers should be precharged during those transitions, which would force all word lines to be disabled. But since the precharge signal is inhibited, the enabled word lines just accumulate with each changing address line.When we add one additional
nopin the code above, then only two rows are affected: The third row gets copied over the fourth row. In the real Game Boy, the first word in the third row is modified like you implemented it in SameBoy before it gets copied.The following two diagrams show what is happening with this additional


nop:You can see in the zoomed-in version that only one address line needed to change in that situation. That's why there are only two rows involved now.
When the CPU is not reading, but just outputting an OAM address, then only the bit lines are missing their precharge. The word lines are precharged just fine as we can see here in the case of an increment instruction operating on BC that contains 0xfe00:

The diagram shows that the word lines are disabled during the address transition, but the bit lines keep the previous value and transfer it into the next row. In this case the order in which address lines change state doesn't matter, data gets always transferred from one row to the next.
The last diagram shows PUSH BC with SP at 0xFE01:

There are two corruptions going on. In the first one, the CPU just applies the address 0xFE01. In the second, the CPU performs a write operation to 0xFE00. We can see that it doesn't make a difference if the CPU actually writes or not. In both cases only the bit lines miss their precharging. The word lines precharge fine. Only when the CPU issues a read operation (like in the first example) the word line precharge fails.
Like I said, I don't know how those changes in the first word come to be. Maybe I look into this at some time, but first I have to figure out why I can't pass the Blargg dmg_sound test.
I hope this information is useful. I could make some more experiments to see in what order all the address lines change, so that the affected rows can be determined.
I was reading your description about the bug here: https://gbdev.io/pandocs/OAM_Corruption_Bug.html
It still claims that the first two objects are not affected by the bug. Just wanted to let you know, in case you forgot about it.
Regards,
Michael
Beta Was this translation helpful? Give feedback.
All reactions