After years of work, SMP is finally working on the Wii U! If you run Wii U Linux and would like to take advantage of all 3 CPU cores, you’ll need to build a kernel using the rewrite-6.6-smp
branch of the linux-wiiu
kernel source found here. Unfortunately, as of me writing this, there are no pre-built kernels with SMP support available for the Wii U, but fortunately the readme for the default branch contains a very easy to follow set of instructions on how to build the kernel, although they do require Docker. You will, however, do git clone -b rewrite-6.6-smp https://gitlab.com/linux-wiiu/linux-wiiu
instead of git clone https://gitlab.com/linux-wiiu/linux-wiiu
, and make wiiu_smp_defconfig $LINUXMK
instead of make wiiu_defconfig $LINUXMK
(otherwise SMP won’t get enabled!)
(FYI: I’m working on a pre-built SMP kernel and patched Gentoo tarball for the Wii U, so stay tuned for that.)
I was originally going to write a long post about how we got here, but then I decided that I don’t actually want to write that, so instead, I’m just going to list some of the things we (me and Quarky) learned while working on this.
1: The EXI Bootstub is Not Used in Wii U Mode.
The primary source for all of our knowledge on how to the CPU in the Wii U comes from one article written by Marcan of fail0verflow. It is the reason we know how to start up and take control of the primary CPU core, despite all of Nintendo’s safeguards, how to initialize several important CPU registers, and how to start up the secondary CPU cores. Because Marcan is a veteran console hacker, and because he was right about everything else, everyone (including us) took the information in this article as gospel; we assumed everything in it was true.
In order to start up the secondary CPU cores correctly, we need to know where they start executing code from, and according to Marcan, this location was a piece of hardware leftover from the Wii called the EXI bootstub.
Cores 1 and 2 boot with
MSR[IP]=1
, thus at the high vectors. The old Wii PPC EXI bootstub hardware is still present and controls the instructions read from there, so just perform the normal EXI bootstub configuration sequence from the Starbuck.-Marcan, fail0verflow.
So, as per Marcan’s instructions, we wrote our initialization code to the EXI bootstub and started up the secondary cores, but they never did what we expected. Then, one day, as I was staring at a dump of the decrypted kernel ancast image in Ghidra, I realized that, when mapped to the known location of the ancast image in Cafe OS (the Wii U’s built-in OS), my dump extended past the high boot vector, where the EXI bootstub was supposed to be, so I scrolled down and lo and behold: there was a bootstub there! After discovering this, I tried writing our bootstub overtop that part of the ancast image and it worked! The secondary were finally behaving as expected!
Conclusion: In Wii U mode at least, the high boot vector (the one used by cores 1 & 2) is not mapped to the EXI bootstub: it’s mapped to MEM0 as part of the ancast image at address 0x08100100
PS: it turns out that developers behind emulators like Cemu and WUFE already knew that 0xFFE00000 (the location of the kernel ancast image) is mapped to MEM0. Marcan must have known this too, as the Cafe OS kernel immediately jumps to a location above 0xFFE00000 (using an absolute jump), and then starts up the secondary cores soon afterwards, and there is clearly no code present to initialize the EXI bootstub. I assume, therefore, his statement quoted above was either a mistake, or in reference to vWii mode (which in fairness was the focus of his article.)
2: CAR and/or BCR Control Cache Coherency
With the ability to run whatever code I wanted on the secondary cores, I was now able to send them into the Linux CPU initialization code. However, there was a problem: despite having proof that they were running, my debugging code was saying the secondary cores were not coming up at all! Specifically, I had a variable in RAM where the cores would write their ID* number to upon starting up, and I then wrote code to print out the value of that variable after starting each core. My code told me that the value of the variable wasn’t changing, but when I looked in /dev/mem
afterwards, I saw that its value was exactly what I expected it to be if the cores were starting up. I eventually figured out this was due to cache coherency problems; cores 1 and 2 were coming online and writing to that value exactly as they were supposed to, but core 0 wasn’t seeing the change, either because the new value was stuck in the write cache of the secondary core, or because the read cache of the primary core wasn’t being flushed. Adding some dcbst
instructions to the CPU init code and a cache flush to the debugging code seemed to fix the issue, but similar problems just kept cropping up.
Out of ideas and desperate for a solution, I once again turned to the fail0verflow article written by Marcan, and I noticed something: there were 2 unknown CPU registers mentioned we weren’t using: “CAR” and “BCR”. The article gave no indication of what they did, but I decided it couldn’t hurt to try initializing them, and so I wrote some code to do so using the same values Cafe OS did, as per the article. Once again, it worked! Magically, all of my cache coherency problems disappeared, and, after dealing** with some driver bugs, I was able to get the cores fully started and (somewhat)** working in Linux.
Conclusion: CAR and BCR, in some way, control the new cross-core cache coherency mechanisms in the Wii U CPU.
*By ID number I mean the core number, so core 1 would write “1” and core 2 would write “2”. I also had another debugging variable that told me how far into the process of starting up each core was getting, and this variable told me that the secondary cores were getting quite far in despite the kernel claiming they’d never started.
**What I’m referring to here was a bug in our IPI driver that caused the wrong interrupt to be triggered. Fixing that allowed the cores to come up successfully, but the IPIs still weren’t working and Quarky ended up having to completely redo the IPI implementation in order to make the system usable.
3: The Bit Order of the ICI Bits of SCR is Backwards.
This one was discovered by Quarky. As I said above, in order to get the system stable enough to use, she ended up having to completely redo the IPI implementation. While doing so, she learned about a much easier method to implement IPIs than the one we were previously been using: SCR already contained bits for sending interrupts between cores!
pain
WiiUBrew calls these “ICIs” instead of “IPIs”, but they’re the same thing*. By flipping one of these bits from off to on, you could send an interrupt to the corresponding core, which was exactly what we needed. However, while picking through Cafe OS to figure out exactly how these IPI bits worked, she discovered that the order of the bits listed on WiiUBrew was backwards. Bit 18 controls the IPI to core 2, not core 0, and bit 20 controls the IPI to core 0, not core 2.
Conclusion: The order of the ICI bits of SCR (bits 18-20) is backwards.
*ICI = Inter-core Interrupt. IPI = Inter-processor Interrupt. Linux always calls them IPIs.
Hopefully you found this information useful or interesting.
Leave a Reply