- Jan 05, 2022
-
-
Geo Ster authored
* Finally we can start executing some kernel functions now. Executing syscalls is very straightforward; just create an exception of type 8 and the handler will handle the rest. This is the first time the exception handler is used on the EE so I'm bit anxious whether it will be bug free. * Log each syscall and print its name instead of just a simple number. Known syscalls along with their ids are on ps2devwiki [1] [1] https://playstationdev.wiki/ps2devwiki/index.php?title=Syscalls
-
Geo Ster authored
* The COP0 code was hacky and didn't report unknown TLB instructions. This caused an infinite loop when the program was calling ERET to exit the current syscall without me knowing. * Fix up the code a little and add the DI and ERET instructions
-
Geo Ster authored
Shrinks the log file size from 4GB to 2GB
-
- Jan 04, 2022
-
-
Geo Ster authored
* The old code was pretty complicated and bad. Reorganize the register structs and simplify the reading/writing process.
-
Geo Ster authored
* VU0 tries to access address 0x4210 which maps to the VU1 I/O registers. Currently we don't have VU1 support so abort when trying to write to these locations
-
Geo Ster authored
* The interrupt raised flags should be cleared when mode is written. In addition the raised flag should not get set unless the interrupt has actually been fired
-
Geo Ster authored
* Instead of ticking each component, each cycle which is expensive most emulators tick each components for specific cycles until they complete a frame which is rendered in the end. This decreases the accuracy by a tiny margin but no PS2 game can be cycle dependant enough to notice. * The timing information was sourced from my PS2 Slim with the hardware test that was uploaded in a recent commit.
-
Geo Ster authored
* This register is pretty undocumented even though it's crucial for EE <-> IOP synchronization. I asked a dev PCSX2 dev about this and I was linked the code PCSX2 uses, so I will use it here as well.
-
Geo Ster authored
* From now on, sometimes I have to rely on hardware tests to accurately measure timings or other info the emulator needs. This test was originally written by refraction for use with DobieStation. I touched up the code a bit and wrote a small build script for anyone who wants to run it on their own console.
-
- Jan 03, 2022
-
-
Geo Ster authored
-
- Jan 02, 2022
-
-
Geo Ster authored
* Add support for a variety of new bit and memory shifting instructions. These include LDL/LDR/SDL/SDR/DSRL32/DSRAV etc. * Fix a small bug in SRA where the value would only be written to the lower word instead of the dword. * Finally the BIOS has finished the initialization phase now! # Initialize Done. * Now the IOP has entered an inifite loop waiting for the EE to wake it up. This will be added in the upcoming SIF implementation.
-
Geo Ster authored
* There's not much to explain on the COP1, it's just an FPU that has similar quirks to the VUs. Currently very little instructions are required from it * Also capture IOP console output, by handling calls to the putc function Code "borrowed" from DobieStation [1] [1] https://github.com/PSI-Rockin/DobieStation/blob/40c9b01f50bc04debad809b3bad1cf51e7e7e495/src/core/emulator.cpp#L1201
-
Geo Ster authored
* For now it's only a basic class that handles reads/writes to the IPU.
-
Geo Ster authored
* Next up the BIOS starts initializing the Vector Interface (VIF), which is used to communicate with VU1 and VU0 (when it's in micro mode). It has its own register set which is easy to implement as shown and a fifo, similar to the GIF FIFO. The fifo receives 128bit qwords sequentially. Each VIF "packet", starts with a 32bit header that tells the VIF which command to execute. Some headers are standalone, meaning that another command header will follow them, while for others some 32bit data words will be sent after (for exampe STROW/STCOL). * In the future I will change this to be an actual fifo structure, because some fifos in the PS2 can be blocked, meaning that they queue up written data until the component that blocked them, lifts the block. For now though this will do. * Also allow writing to code and data memory for VU0 to reduce the logspam a little Sources: https://psi-rockin.github.io/ps2tek/#vif
-
- Jan 01, 2022
-
-
Geo Ster authored
* The bios uses them to initialize the VU0 registers. For now I don't check for overflow but I think it's going to become necessary in the near future.
-
Geo Ster authored
* Early on the IOP continiously reads from 0x1F402005 which is the N command status register [1], to know if it's ready or not. So it be safe, let's report that the drive is ready by returning 0x40 (bit 6 set) [1] https://psi-rockin.github.io/ps2tek/#cdvdioports
-
Geo Ster authored
* The VUs are custom made SIMD processors used to accelerate math operations on vectors and matrices. This doesn't seem that bad but in reality they are probably the hardest piece of hardware on the PS2 to emulate correctly. That is for two reasons: 1. Not much documentation 2. Complex and confusing pipeline The first it pretty self explanatory. However the second reason, the pipeline is what makes them so hard. Normally, even in LLE emulators we don't care about the internal pipeline of the chips, as it doesn't affect the result of the instructions themselves, it just makes them run faster. The CPU doesn't expose its pipeline to the target program. Some architectures are different. MIPS for example has branch delay slots which in reality are a pipeline quirk. Generally the more pipeline quirks you expose to the program the more complex it is to emulate correctly. The VUs basically expose their full pipeline... For now we only support a portion of the macro mode instruction set. The pipeline is going to come when the VU starts executing micro programs and when I figure out how it works...
-
- Dec 29, 2021
-
-
Geo Ster authored
* The BIOS now continues by initializing the DMA Controller. This is one of the most important hardware components of the PS2, as it assists the EE with transfering data where it needs to be. I've even read that at times it can do more work than the EE itself. * Since the DMAC isn't used at this stage, we only really have to implement its registers and reads/writes to them, which is pretty easy. However one register D_CTRL is a bit quirky in a sense that writes to it clear/reverse its bits, not overwrite them. * To emulate this, an additional struct is added to the register unions and bitwise operators are used to write to the upper and lower parts of the register appropriately. You can look into the source code for more details. * This allows the EE to start initializing the VU1 which is quite exciting!
-
Geo Ster authored
-
Geo Ster authored
* Allows us to progress futher into the initialization phase
-
Geo Ster authored
* Yeah, timers again, my favourite topic... To be frank the EE timers are a bit simpler than the IOP timers as they have less complexity in their configuration. However, since the BIOS starts to use them at this point, we can't get away with a extermelly partial implementation like the IOP. * The Emotion Engine has four hardware timers, each of them having three registers (four on Timer 0 and 1). They are practically the same with the IOP in that regard, having a count a compare/target and a mode register. Timer 0 and 1 have an additional register Tn_HOLD which keeps track of the count value when a peripheral on the SBUS generates an interrupt. * All the timers increment based on the bus clock which is exactly half of the EE clock. The timers can also be configured to count based on external sources, namely hblank and vblank. These are less accurate but can be used to keep track when the screen refreshes. I had hoped that we could have ignored hblank for now, but the BIOS configures Timer 3 (used for BIOS alarms) to use it so implementing it is necessary. The timings were taken from the timer header [1] of the ps2sdk. * An interesting fact as well is the interrupts as edge triggered which means that an interrupt is sent to the EE when the raised flags in Tn_MODE switch from 0 to 1 [2]. This is easy to implement and so did I, to avoid any headaches in the future. * Since the EE ticks the timers directly, we can't increment the counters each time the function get called. To properly emulate the timer frequency, an internal counter is used, that when its value is equal to the ratio between the EE frequency and the timer clock, the real counter is incremented. * This can be expensive since the timer function gets called every EE cycle so we will probably change it to cycle adding in the future, especially when the JIT will be implemented. [1] https://github.com/ps2dev/ps2sdk/blob/master/ee/kernel/include/timer.h#L53 [2] https://psi-rockin.github.io/ps2tek/#eetimers
-
- Dec 27, 2021
-
-
Geo Ster authored
-
Geo Ster authored
* Firstly, I fixed a small bug in the Handler that caused data loss on 128bit operations. * The GIF is a marvellous and complicated little piece of hardware that handles transfers between the EE and the GS. It can be "fed" by three paths, PATH1 is from the VPU1 memory, PATH2 is from the VPU1 FIFO and PATH3 is directly from the main bus. Since we don't have any VUs implemented we only care about PATH3 at this stage. * Each primitive sent has the form of a linked list. The EE first sends an 128bit GIFTag that acts as the header and tells the GIF how much more data to expect and what to do with it. The loop ends when the EE sends a GIFTag with the EOP field set to 1. (EE User's Manual [150]) * Each data packet after a GIFTag can be processed in three different ways depending on the FLG field of the tag; PACKED, REGLIST or IMAGE mode. For now we only care about PACKED. * When in PACKED mode, the EE will send NREG * NLOOP (specified in GIFtag) qwords after the tag. Each qword can be processed in different ways depending on the desc in REG field of the GIFTag. Page 152 of the EE User's Manual shows the different modes. The REG field though is in reality a bit array of 4-bit descriptors. To understand this better, here are the processing steps: 1. The first qword after the GIFTag is processed based on the least significant bits (64:67) (the first descriptor) and is output 2. The second qword is processed based on the next descriptor (68:71) (second descriptor) and is output 3. Steps 1,2 are repeated NREG times. 4. Steps 2,3 are repeated NLOOP times There are more variables we have to take into account with PATH3, because it can also be masked by other PATHs which have higher priority. But that is for later. Don't worry though if you didn't get it completetly. The GIF is nowhere near finished, so I will have more chances to explain how it works. For more info you can read the GIF chapter of the provided EE User's Manual.
-
- Dec 26, 2021
-
-
Geo Ster authored
-
Geo Ster authored
* Since the components will never give pages directly, let them use addresses instead and compute the page in the register function to save some work on the component side.
-
Geo Ster authored
* Initially the LQ/SQ instructions were implemented to perform two sequential 64bit operations to emulate 128bit reads/writes. However this won't work well for us, especially when writing to the GIF FIFO. To mitigate this we can use the __int128 gcc extension (yay for switching to clang once again!), which provides us with an optimized way of storing 128bit data.
-
Geo Ster authored
* Until now the memory system didn't take into account the bit width of the data coming in and out of the handlers. Instead I assumed that 64bits would be large enough for everything. But alas I was wrong. Some addresses (notably the GIF/IPU FIFOs) are read/written with 128bit values. I don't want to force every function to return __int128 types as that would cripple performance so some tweaks were needed. This isn't as hard as it might sound. The emulator read/write functions are templates so we know which type we want beforehand. So it's as simple as abstracting the Handler with a bit more inheritence magic and we can cast HandlerBase to the type we want.
-
Geo Ster authored
Currently there's nothing really of note, it's just an empty class that handles reads/writes to some of the registers. The functionality will be explained in subsequent commits. Along with this I've added a new document, from which, the GIF implementation will be based on.
-
Geo Ster authored
* The handler table is dynamically but the memory never gets deallocated. Plug the leak by clearing any memory in the destructor ;)
-
Geo Ster authored
They cause too much logspam for now and we don't use them
-
Geo Ster authored
* Remove cycle argument, we don't need it as we tick the timers each IOP cycle * Make the code a bit cleaner
-
- Dec 25, 2021
-
-
Geo Ster authored
This commit fixes some issues preventing IOP interrupts from working correctly while also seperating them into a seperate class for convenience. * Previously the pending flag was written to the first bit of cause.IP, which while correct was flawed. To understandw why let's look at how interrupts get triggered. COP0 has 2 8 bit masks, IP (cause) and Im (status). On both of these registers the first 2 bits are ignored because they are used for software interrupts which are unsupported on the IOP. However while Im was including these unused bits, IP did not thus causing mistaken comparions. Below is a diagram that shows the issue. IP was bits 10-15 while Im was bits 8-15. Comparing diffent ranges like this doesn't work. Cause: ... 00|111111| ... Status: ... |00111111| ... The fix was to make IP point to 8-15 range and adjust the writing mechanism in the INTR::interrupt_pending function. * In addition the usage of >= instead of == in the timers, caused a bug where the timer would continiously send interrupts after reaching target which is not the intended behaviour. Fix that as well.
-
Geo Ster authored
This is a pretty big commit so the description is probably going to be a whole essay again explaining all the changes. Emulation is extermely complicated and thus I need to explain all of my reasoning and sources. This commit contains 3 major changes that all work together to form the new memory subsystem: * New handler infrastrucutre * Compiler switch to clang-cl * Initial implementation of EE interrupts Now, you reader, might wonder why I decided to redo the relatively simple and straightforward system we had before. Well that system had some drawbacks that I think needed to be addressed early on. Firstly, it is highly centralized, which means that for every new component the read/write functions of the ComponentManger (now Emulator) need to updated. This isn't that big of an issue as the second one though. The old system relies heavily on branches to figure out the destination of a read/write which is bad for performance. Especially because our address ranges aren't continuous, the compiler can't optimize the switch statement in any way. This leads to a lot of assembly code, many jumps. The initial idea for this new system was taken from a PCSX2 devblog I read recently: https://pcsx2.net/developer-blog/218-so-maybe-it-s-about-time-we-explained-vtlb.html It explains a system, where the address range is divided into pages, where each page is handled by a handler function. This is perfect for us, because it moves most of the code to the initialization phase (when the components register their handlers), while reads/writes are very fast, only having to lookup the handler table and calling the appropriate function. However is isn't as easy to implement to implement though. The main problem was how to store class member function of different classes in a single array and call them without knowing their type. Firstly I thought of using std::function, which is perfect for this due to its type erasure but is was quickly ruled out because of the very high overhead. Next, I considered inheritence and virtual functions, which was a step to the right direction. However that also has the overhead of looking up the vtable. Finally, though, I discovered a neat little trick with function pointers. You can actually cast a pointer to a base class member function, to a derived class member function as long as the function isn't ambigious. So the final solution was to make all the components inherit from an empty (for now) Compoent class and store a common Component function pointer. The compiler will handle the rest, with some dose of magic and inheritance! The handler interface is located in the common/component.h file. You can check out the IOP DMA controller constructor for how a component can register handlers with this system. This is very efficient, generating only 10-15 lines of assembly (with clang 12.0), which leads me to the second change, that of the compiler. The switch to clang-cl was made primarily for performance reasons. clang generates a lot more efficient code than MSVC does so the switch will improve perfomance down the road. It also catches more warnings and code issues, allowing for cleaner code overall. The next hurdle, was figuring the handler page size. This is more difficult than it seems, because there are additional "hidden" addresses the BIOS writes to, which aren't listed in the ps2tek memory map. Making the page size too big, will lead to these garbage addresses being handled by our compoents which defeats the purpose of this whole system. Making the page size too small though, will both make the handler array table massive and require compoents to register many handlers to cover their address ranges. So after studying the memory map for a while, I decided that 0x80 = 128 is the best size. For example in the DMAC (EE DMA) each channel takes up exactly 0x80, while the IOP DMA each channel group is also exactly 0x80 in size. 0x80 is, in addition, small enough that garbage addresses don't get caught. Even in the case we have something like that, I have placed asserts on debug builds to capture them. Our struggle isn't done though! The initial handler table ended up causing stack overflows because the array was too large. To mitigate this, the stack size was increased to 10MB and a small optimization was implemented. If you view all the addresses in the memory map of the PS2, a pattern emerges. It turns out that a byte inside the address is always zero, no matter the address (except for 0xfffe addresses which we don't care about). This means we can "squash" the address by removing that byte, allowing us to significantly reduce the handler table size: 0x100|0|3070 -> 0x1003070 0x120|0|0060 -> 0x1200060 0x1F4|0|2006 -> 0x1F42006 0x1F8|0|1120 -> 0x1F81120 0x1F9|0|01AC -> 0x1F901AC This is implemented in the Emulator::calculate_page function. A debug assert is also placed here to ensure nothing our of the ordinary happens. Finally, I also implemented EE interrupts because they are needed at this stage. Timer 5, should normally be ticking now (next commit I promise), and is waiting to cause an interrupt, thus we need to have those implemented. The implementation is taken from a new document I found, which is the same as the previous one, but more focused on the EE and its features, something that should help us a lot in the near future. Right now its not finished, but that will come in the next commit.
-
- Dec 12, 2021
-
-
Geo Ster authored
Wtf, why did I miss this?
-
Geo Ster authored
* After a while the IOP starts setting up timer 5 so we need to start implementing timer support. Timers are pretty simple actually. Each one has 3 32bit registers (we use 64bit registers to check for overflow), a count register that counts the number of cycles, the mode register which configures the timer and the target register which generates an interrupt when count == target. For now that's all we need. * In addition, interrupts are also implmented. These are a bit more complicated since they involve COP0, but not to difficult either. The IOP has 3 registers, I_MASK, I_STAT and I_CTRL. I_CTRL acts as a global enable/disable so it's pretty simple. I_STAT is a bit mask that states which interrupts are pending. I_MASK on the other hand has the ability to enable/disable specific interrupts. So to check if the interrupt will be executed we must do !I_CTRL && (I_MASK & I_STAT). All info can be found on ps2tek/nocash psx * On the EE side, a few new instructions are added to progress further. Now the EE starts setting up the GIF, which is quite exciting! * I think it's a good time to also elaborate on how we read structs. Instead of using switch statements I prefer pointers and struct because these generate a lot more compat code, even with compiler optimizations and eliminate the need for branches. This should not be of concern on normal applications but we are special ;). Most registers on the console are 32bit, so structs are cast to uint32_t* to access them. And since the offsets are always in bytes we must divide them by sizeof(uint32_t) = 4 (I prefer >> 2 since it's more efficient) * Some registers however are peculiar in a sense that a parts of them are located in completely different address ranges. This is bad because for example timer 0 and timer 3 have the same offset of 0 from their relative address ranges. To fix this, we introduce a variable called group that records which "group" the write/read is refering to, with some simple bit masks. The result is casted to bool which converts the result to 0 or 1. Then the expression "offset + group * <number>" is used to access registers. In the timer examples accessing timer 0 will give group 0 and offset 0 so timer 0 will be accessed. With timer 3 though group will be 1 so 0 + 1 * 3 = 3 will be accessed. This a convenient way to bypass branches.
-
- Dec 10, 2021
-
-
Geo Ster authored
* The DMA routine on the IOP works similarly to the PSX version with a few additions. There are 7 channels from the PSX and an additional 6 new PS2 exclusive ones. One the PSX, each channel has 3 registers used to configure and use it and 3 global registers. * The PS2 contains all the older DMA registers, but it add 6 more channels and duplicates the global registers (DPCR now has a counterpart called DPCR2) This is done because each global register can control up to 7 channels. An additional register on each channel (tadr) and 2 additional global registers have been added as well. For now we don't really care to implement them, only read/write to them. * For reading and writing to the registers structs are used to prevent the usage of switch and if statements.
-
- Dec 04, 2021
-
-
Geo Ster authored
* Due to load delay slots the target register doesn't get written immediately, so use the value instead to correctly display the loaded value in the logs
-
- Dec 02, 2021
-
-
Geo Ster authored
* Nothing to really say here, just looks better imo
-
- Dec 01, 2021
-
-
Geo Ster authored
* On reads/writes it is important to check the address alignment before proceeding with the operation. However unalignment errors almost never happen in real world games, so let the compiler know that these branches are unlikely to happen to speed them up a bit.
-
- Nov 30, 2021
-
-
Geo Ster authored
* So after a week, it's finally here! The initial implementation of the IOP has been added to the emulator. You might wonder why did it take so long? This was mostly because I wanted to make the implementation as complete as possible and also test it to ensure it's bug free. So this is actually based on the MIPS R3000A interpreter I wrote last year for my PS1 emulator. So did I just copy the code and call it a day? Hell no, the code in that ancient project is awful, even if it works. So I completely rewrote the interpreter by using our modern techiniques of storing state. So rewriting the old code allowed me to test if it actually worked in that environment and could boot PSX games. * Due to this, the implementation is a bit more complete than the EE as it includes interrupt support. In addition we have to account for the fact that the IOP runs at 36.864MHz, in constrast to the EE which clocks at 295MHz. This maps approximatly to an 1/8 ratio, which means that 1 IOP instruction will run every 8 EE cycles. The current implementation of this is hacky and a bit inaccurate because some EE instructions can take more than 1 cycle to execute, but it's good enough for now (Play! assumes this as well and can boot 40%+ of games). * Because both the CPU emulators can share a lot of naming conventions, to avoid confusion each processor has been seperated into a namespace so we can always know which CPU we are refering to. Finally, for now reads/writes except for the BIOS and IOP RAM, haven't been implemented but will come soon.
-