- Mar 26, 2022
-
-
Geo Ster authored
* Using vulkan allows us to be a lot more efficient than OpenGL, leading to an impressive 10-20 FPS boost in performance. This is achieved by doing away with the redundant vertex buffer copy operations and submitting just a single command buffer. To change the depth comparison function the new Vulkan 1.3 dynamic states are used.
-
Geo Ster authored
-
- Mar 16, 2022
- Mar 14, 2022
-
-
Geo Ster authored
* It currently doesn't work properly due to class related stack quirks. Will probably seperate the interpreter functions
-
- Mar 12, 2022
-
-
Geo Ster authored
-
- Mar 11, 2022
-
-
Geo Ster authored
-
- Mar 10, 2022
-
-
Geo Ster authored
-
- Feb 13, 2022
-
-
Geo Ster authored
* This is something that I wanted to do for a very long time but I haven't due to lack of experience and for practical reasons since older systems don't really see any performance benefits from JITs. This is not the case with the PS2. * The JIT will use the asmjit library for code generation as it handles all the system calling conventions and allows us to focus on the code generation part. For now the JIT is in the prototyping phase. * The general architecture will be: IRBuilder -> Passes -> CodeGenerator I have some notes and I hope I can explain more things as I go.
-
- Feb 03, 2022
-
-
Geo Ster authored
* With this the logo texture doesn't overlap and destroy the font anymore
-
Geo Ster authored
* Depth testing in the GS is controlled by the TEST_1/TEST_2 registers. The problem is that nothing prevents the program from changing these register mid frame, meaning we have to flush our renderer and rebuild the VBO from scratch. * In addition enable OpenGL depth testing and configure it properly based on the TEST configuration. It's also important to adjust the depth buffer to have a default value of 0 rather than 1, otherwise GL_GREATER tests will always fail. * These changes make the triangles with proper depth. Need to also implement texture transfers to complete the demo. This is more complex than it sounds for multiple reasons that will hopefully be explained in the following commits. Currently the GS code is extermely hacky and unpleasant but I will clean it up soon, I promise.
-
- Feb 02, 2022
-
-
Geo Ster authored
* This commit removes the hacky VBO rendering and now uses a draw vector to keep the vertices. This might change soon though as I haven't decided what the final implementation will look like. Handling depth is very tricky with the GS at it might change mid frame, forcing a VBO flush. * Now the stars render properly, albeit with many afterimages due to lack of screen clearing
-
- Jan 31, 2022
-
-
Geo Ster authored
* To test the emulator I decided it's a good time to start booting some homebrew before the BIOS. This commit implements an extremely hacky OpenGL render that can only render a quad for now. To be expanded...
-
- Jan 30, 2022
-
-
Geo Ster authored
-
Geo Ster authored
* This allows us to finally pass the basic VIF test from ps2autotests, yay!
-
Geo Ster authored
* Accessing the VU memory is more complex than it seems. Both VUs have their memory mapped to the main bus between 0x11000000 - 0x11010000. However the distinction between code and data once again comes to make this more complicated. Inside the VU code and data are considered different memory spaces, so their address spaces also different. This means that an address of 0x0 can refer to either, depending on the caller instruction. * Using this ahead-of-time knowledge we can use templates to make the compiler do the work for us and just have a small branch when the EE wants to write something directly, which is pretty rare.
-
Geo Ster authored
* The GIF similarly to the VIF also contains a 16 qword FIFO for queueing data. The implementation is very similar to the VIF and I used similar function names to make it easier to follow. * In addition introduce proper reset routines. Messing with the this pointer is not possible in our case, because it breaks the Handler infrastructure that relies on the pointer not changing.
-
Geo Ster authored
* Also fix bug in the Queue::size function leading to incorrect values and adjust FIFO size to be 16 qwords. I thought the size was 64 but turns out this isn't the case [1] [1] EE User's Manual [144]
-
Geo Ster authored
* Going this long without running any tests is like walking on charcoal in emulation development. So I wrote a little elf loader that loads the ELF file into RAM and executes it, when the BIOS has fully loaded, because most tests require some basic syscalls to be available. * Using the EE tests the following bugs were fixed: 1. Fix branch delays in BEQ/JALR instructions (NOTE: That test wasn't using valid MIPS code, but it's nice to know we are accurate to what the hardware does) 2. DIV/DIVU support for dividing with zero 3. Added missing sign extension to ADDU
-
- Jan 29, 2022
-
-
Geo Ster authored
* Was scrolling through the CPU manual again and noticed this remark on the LWR instruction description: "If the word sign bit (bit 31) is loaded from memory into the register by the instruction, then the loaded word is sign-extended. If the sign bit is not loaded from memory by the LWR, then bits 63..32 of the destination are unchanged", TX79 Core Architecture [A-75] Before implementing this I checked if other emulators also handle this and it seems the answer is positive [1]. So I added this, since it's not too difficult and should prevent headaches in the future. [1] https://github.com/PSI-Rockin/DobieStation/blob/master/src/core/ee/emotioninterpreter.cpp#L796
-
Geo Ster authored
* This is a really big commit as it expands the previous VIF implementation significantly. The VIF acts as a gateway between the EE and the VU, letting the former transfer both instructions and data to the latter. Each VIFcode consists of a 16bit IMMEDIATE field, the interpretation of which depends on the command, an 8bit NUM field that also changes depending on the command and the 8bit CMD field. * Interestingly the VIF considers instructions and data different entities and handles them differently, with different instructions. MPG is used to transfer instructions while UNPACK and its variants transfer data exclusively. The NUM field in the MPG instruction dictates the number of dwords transfered to the VU, while in UNPACK NUM states the number of qwords WRITTEN to the VU. This is important because the number or read and write data in UNPACK is not necessarily equal. * UNPACK basically works by reading some amount of words from the FIFO and converting them into a qword. Some formats are straightforward like V4-32, which is implemented here, that reads 4 words and packs them into a qword. Other formats are more complex. In addition though, UNPACK has many many other variables to consider. Most important is the CYCLE register that enables skipping or filling writes according to the relation between the CL and WL fields. Skipping write is simple; it basically means "write CL qwords and jump ahead (WL - CL) qwords". Filling write is another story that I won't explain here, since it's not needed at this point. * In addition the STL queue has been replaced in favour of a custom written leightweight Queue class. This desicion was made for several reasons: 1. std::queue is based on std::deque, which has overhead 2. Most STL containers in general are too "safe" for us. In UNPACK for example, there are several formats that require pulling multiple words of data from the FIFO at once. This can be done with the STL queue, but is tedious. 3. STL queue is dynamic. All PS2 FIFOs are static, either 32 or 64 qwords in size. When they fill up the DMA stalls until the component starts draining the FIFO. This behaviour is crucial for some games and must be emulated. Using a dynamic std::queue turns this into constant size checks that hurt performance. Sources [1] https://psi-rockin.github.io/ps2tek/#vif [2] EE User's Manual, chapter 6.3
-
- Jan 22, 2022
-
-
Geo Ster authored
* This is pretty similar to the previous commit but it adds support for 16bit textures. Currently they use different functions but I plan to switch to templates. First step to this is abstracting format specific configuration variables into a seperate struct. * Also fix minor bugs in the PSMCT32 texture writing.
-
Geo Ster authored
* We are getting texture writes, exciting times. The BIOS initializes a 256 x 64 PSMCT32 texture. The DMAC sends data with the GIF PATH3 to the the GS. So let me explain how texture writes work, cause the GS is very peculiar in this regard. * The VRAM in the GS is 4MB in size and is split into 8KB pages, each of which is a grid of 256 byte blocks. Each block is divided into 4 columns and finally each column contains the actual pixel data. Here is a small little diagram: -page - block - column - pixel - pixel ... - column - pixel - pixel ... - block - column - pixel - pixel ... * The problem is that pixels and blocks aren't actually stored sequentially in video ram. How they are ordered depends on the format, but here is an example for PSMCT32: <------- 8 blocks/64 pixels -------> | 0| | 1| | 4| | 5| |16| |17| |20| |21| ^ | 2| | 3| | 6| | 7| |18| |19| |22| |23| | 4 blocks/32 pixels | 8| | 9| |12| |13| |24| |25| |28| |29| | |10| |11| |14| |15| |26| |27| |30| |31| | * Say you want to write to the block at coordinates (5, 2) in the above page. If the blocks were sequentially stored, then the offset of the block in memory would be: x + width * y => 5 + 8 * 2 = 21 * In reality though block 25 would be accesed. Personally I don't know why the GS organizes VRAM this way, but it makes emudevs like me struggle a lot. I guess it has something to do with texture swizzling but I hope I will know more in the short future. Need to also start thinking if it is possible to read textures like this from shaders. I guess it could be possible. * Other misc changes in this build include: - Unaligned stores/loads were rewritten to be branchless - Fixed bug in GIF PATH3 that caused some packets to be ignored - Switched to explicit register writing in the GS. This is because most registers are used to perform special operations, rather than just store data and writing structs is very inconvenient in this case. Will probably switch other components to this like (DMAC...) Sources: [1] https://tcrf.net/User:Kojin/TIM2_Information [2] GS User's Manual Version 6.0
-
- Jan 19, 2022
-
-
Geo Ster authored
* This commit doesn't implement any specific component or behaviour, rather it fixes another set of bugs that I found during testing.
-
- Jan 16, 2022
-
-
Geo Ster authored
* Implementing IOP components is probably the hardest task in a PS2 emulator, even compared to complex chips like the VUs. That is because almost every single component in the IOP is completetly undocumented so emulators have to make assumptions and reverse engineer them to figure out what they do. This commit adds support three new IOP components. * The DVD drive is quite simple in concept. It accepts two types of commands, S commands (synchronous) and N commands (asynchronous). Async commands are used mainly for seeks and reads, so the CPU doesn't have to wait for the drive to fetch the data. On the other hand S commands complete instantly and are used for more misc operations. * Both types of commands contain three registers; one stores the current command, the other acts as the status register and the third register either gives the current command output (read) or stores the parameters of the command (write). * SIO2 is very peculiar since very little is actually known about it. It is responsible for managing peripherals like the memory card or the DualShock controller. The CPU first sends a peripheral byte that informs about the target peripheral, then a command and waits for a reply. The SEND3 array contains the command size for each SIO2 command. * Since in this case the gamepad gets accessed, that needs to be implemented. To prevent this commit message from getting extermely long, because the gamepad is very complex in its operation, I will talk about it in another time. Check out the new txt file in the docs folder for more info
-
- Jan 10, 2022
-
-
Geo Ster authored
* Next the IOP DMA controller initiates an SPU2 transfer. However similarly to SIF0/SIF1 when a transfer finishes, an irq is generated in the SPU STAT register. * After searching on github for any details on this particular register, I found a hacky snippet DobieStation that seems to properly initialize the SPU2 (according to the logs) so I'll use it here.
-
- Jan 09, 2022
-
-
Geo Ster authored
* Fixed bug when writing to DICR/DICR2 flag field * Fixed definition of the DMAC tag * SIF0 doesn't use the id field, apparently? * Allow writing to some specific regions of the BIOS With these changes OSDSYS has now started loading! The BIOS is initiallizing the SPU2, probably to play that boot up charm the PS2 does.
-
- Jan 08, 2022
-
-
Geo Ster authored
* The original code was using brute force to ensure correct results but was very expensive, doing 2 * 30 = 60 loops. So I rewrote it to use __builtin_clz to count the bits instead, leading to noticeable speedup.
-
Geo Ster authored
* This commit was intended to start the DMA implementation, but quickly turned to a tedious bug fixing journey. So in return for having my soul drained from all the debugging, the following bugs were fixed: * Fixed bug that caused incorrect IOP and EE timer writes * Fixed bug that caused interrupts to not trigger (NOTE: This is an error on ps2tek and more specifically with the use of the I_CTRL register) * Fixed handling of some virtual addresses (0x2*) (The program doesn't notify me when out of bounds writes happen for some reason?) * Fixed handling of DICR2 register in the IOP DMA and * Fixed INT1 interrupts running endlessly * Fixed incorrect int1_pending field position in COP0 status In addition to the above bugfixes some new additions needed to be made to make DMA work: * Added VBLANK ON/OFF interrupts * Added INT1 interrupt detection * Added some new EE instructions related to exception handling. * Added syscall support So after all of that, at least the program started running normally again and I could begin with the DMA implementation. SIF0/SIF1 are quite confusing since nowhere in ps2tek does it mention where that data comes from. Thankfully I came across this linux kernel commit [1] which specified that there exists a bidirection FIFO in the SIF that sends data from either side to the other. There are problems however. The EE DMAC works exclusively with qwords or 4 word packets, while the IOP DMA only sends words. Also the DMAC tags are 128bit while the IOP tags are 64bit. So even though I would have liked to use qwords in the fifo, I was forced to switch to 32bit values to have finer control on how much data gets used. Other things to note are that the EE/IOP can start DMA transfer without the FIFO having any data. In that case the transfer stalls and waits until the other side starts filling it. A typical sequence goes as follows: 1. The EE starts a SIF0 transfer, waiting for the IOP to send data. 2. The IOP starts a SIF0 transfer, pushing data from its RAM to the fifo. 3. The EE notices and starts taking that data to form a DMA tag (when the fifo reaches a size bigger than 4 words of course) 5. The transfer completes and INT1 is asserted. 6. An exception is triggered and the exception handler checks the data that was written. If any of those steps above go wrong (IOP sends too much data, EE receives it too early, the data is not written to the correct address) the whole process will be stuck in an infinite loop. Now figure out what happened in between the 1000+ instructions that were executed... At least it seems to be working quite well now. Note that this is not the final implementation of the DMA. Normally only one channel can trasfer data in a single cycle while in our current implementation all the channels are checked. I have a request system in mind, that should fix this but right now I really want some graphics on screen as fast as possible. [1] https://patchwork.kernel.org/project/linux-mips/patch/fb79dab2db2bfa9a06e96c211d27423d0c51399c.1567326213.git.noring@nocrew.org/
-
- Jan 05, 2022
-
-
Geo Ster authored
* The read function of the SIF causes massive logspam, especially when component poll a specific register non-stop. Removing the logging also makes the code much faster * Some syscalls, especially 0x7A get called really often and fill up the whole console really quickly. Don't log unknown syscalls and add ability to disable exception logging. Now the logs are much cleaner
-
Geo Ster authored
* The ERET instruction doesn't have a branch delay slot so the direct_jump function must be used * When skipping the branch delay slot don't return as that skips all the cycles left to execute * The CPU has now entered an infinite loop waiting for data from the 0x8c440 address. This probably means it's time for DMA
-
Geo Ster authored
* Finally we can start executing some kernel functions now. Executing syscalls is very straightforward; just create an exception of type 8 and the handler will handle the rest. This is the first time the exception handler is used on the EE so I'm bit anxious whether it will be bug free. * Log each syscall and print its name instead of just a simple number. Known syscalls along with their ids are on ps2devwiki [1] [1] https://playstationdev.wiki/ps2devwiki/index.php?title=Syscalls
-
Geo Ster authored
* The COP0 code was hacky and didn't report unknown TLB instructions. This caused an infinite loop when the program was calling ERET to exit the current syscall without me knowing. * Fix up the code a little and add the DI and ERET instructions
-
Geo Ster authored
Shrinks the log file size from 4GB to 2GB
-
- Jan 04, 2022
-
-
Geo Ster authored
* The old code was pretty complicated and bad. Reorganize the register structs and simplify the reading/writing process.
-
Geo Ster authored
* VU0 tries to access address 0x4210 which maps to the VU1 I/O registers. Currently we don't have VU1 support so abort when trying to write to these locations
-
Geo Ster authored
* The interrupt raised flags should be cleared when mode is written. In addition the raised flag should not get set unless the interrupt has actually been fired
-
Geo Ster authored
* Instead of ticking each component, each cycle which is expensive most emulators tick each components for specific cycles until they complete a frame which is rendered in the end. This decreases the accuracy by a tiny margin but no PS2 game can be cycle dependant enough to notice. * The timing information was sourced from my PS2 Slim with the hardware test that was uploaded in a recent commit.
-
Geo Ster authored
* This register is pretty undocumented even though it's crucial for EE <-> IOP synchronization. I asked a dev PCSX2 dev about this and I was linked the code PCSX2 uses, so I will use it here as well.
-
Geo Ster authored
* From now on, sometimes I have to rely on hardware tests to accurately measure timings or other info the emulator needs. This test was originally written by refraction for use with DobieStation. I touched up the code a bit and wrote a small build script for anyone who wants to run it on their own console.
-