Skip to content
Snippets Groups Projects
  1. Mar 31, 2022
  2. Mar 30, 2022
    • Geo Ster's avatar
      VULKAN: Add texture support · e7cd38e5
      Geo Ster authored
      * This allows sampling of textures from the fragment shader. Storage buffers are the only thing left now to implement
      e7cd38e5
  3. Mar 26, 2022
    • Geo Ster's avatar
      GS: Switch to a brand new vulkan renderer · ec9638f5
      Geo Ster authored
      * Using vulkan allows us to be a lot more efficient than OpenGL, leading to an impressive 10-20 FPS
      boost in performance. This is achieved by doing away with the redundant vertex buffer copy operations
      and submitting just a single command buffer. To change the depth comparison function the new Vulkan 1.3
      dynamic states are used.
      ec9638f5
    • Geo Ster's avatar
      Add README · b771d5c5
      Geo Ster authored
      b771d5c5
  4. Mar 16, 2022
  5. Mar 14, 2022
  6. Mar 12, 2022
  7. Mar 11, 2022
  8. Mar 10, 2022
  9. Feb 13, 2022
    • Geo Ster's avatar
      EE: Start work on MIPS JIT compiler · 4e5e8dcf
      Geo Ster authored
      * This is something that I wanted to do for a very long time but I haven't
      due to lack of experience and for practical reasons since older systems
      don't really see any performance benefits from JITs. This is not the
      case with the PS2.
      
      * The JIT will use the asmjit library for code generation as it handles
      all the system calling conventions and allows us to focus on the code
      generation part. For now the JIT is in the prototyping phase.
      
      * The general architecture will be:
          IRBuilder -> Passes -> CodeGenerator
        I have some notes and I hope I can explain more things as I go.
      4e5e8dcf
  10. Feb 03, 2022
    • Geo Ster's avatar
      GS: Fix VRAM writes with TRXPOS offsets · 8a3f97f7
      Geo Ster authored
      * With this the logo texture doesn't overlap and destroy the
      font anymore
      8a3f97f7
    • Geo Ster's avatar
      GS: [WIP] Handle depth properly · 6c39b1b7
      Geo Ster authored
      * Depth testing in the GS is controlled by the TEST_1/TEST_2
      registers. The problem is that nothing prevents the program
      from changing these register mid frame, meaning we have to
      flush our renderer and rebuild the VBO from scratch.
      
      * In addition enable OpenGL depth testing and configure it
      properly based on the TEST configuration. It's also important
      to adjust the depth buffer to have a default value of 0 rather
      than 1, otherwise GL_GREATER tests will always fail.
      
      * These changes make the triangles with proper depth. Need to also
      implement texture transfers to complete the demo. This is more complex
      than it sounds for multiple reasons that will hopefully be explained
      in the following commits. Currently the GS code is extermely hacky
      and unpleasant but I will clean it up soon, I promise.
      6c39b1b7
  11. Feb 02, 2022
    • Geo Ster's avatar
      GS: [WIP] Properly render the vertices and colors · a82d9f6c
      Geo Ster authored
      * This commit removes the hacky VBO rendering and now uses
      a draw vector to keep the vertices. This might change soon
      though as I haven't decided what the final implementation
      will look like. Handling depth is very tricky with the GS
      at it might change mid frame, forcing a VBO flush.
      
      * Now the stars render properly, albeit with many afterimages
      due to lack of screen clearing
      a82d9f6c
  12. Jan 31, 2022
  13. Jan 30, 2022
    • Geo Ster's avatar
      DMAC: Don't transfer tag if TTE is not set · 9ed09e0d
      Geo Ster authored
      9ed09e0d
    • Geo Ster's avatar
      DMAC: Add VIF0 channel and fix many VIF bugs · aed51541
      Geo Ster authored
      * This allows us to finally pass the basic VIF test from ps2autotests, yay!
      aed51541
    • Geo Ster's avatar
      VU: Rework VU writes to be more generic · b9019c67
      Geo Ster authored
      * Accessing the VU memory is more complex than it seems.
      Both VUs have their memory mapped to the main bus between
      0x11000000 - 0x11010000. However the distinction between
      code and data once again comes to make this more complicated.
      Inside the VU code and data are considered different memory
      spaces, so their address spaces also different. This means
      that an address of 0x0 can refer to either, depending on the
      caller instruction.
      
      * Using this ahead-of-time knowledge we can use templates to make
      the compiler do the work for us and just have a small branch when
      the EE wants to write something directly, which is pretty rare.
      b9019c67
    • Geo Ster's avatar
      GIF: Introduce the GIF FIFO · 96ac24fd
      Geo Ster authored
      * The GIF similarly to the VIF also contains a 16 qword FIFO
      for queueing data. The implementation is very similar to the VIF
      and I used similar function names to make it easier to follow.
      
      * In addition introduce proper reset routines. Messing with the
      this pointer is not possible in our case, because it breaks the
      Handler infrastructure that relies on the pointer not changing.
      96ac24fd
    • Geo Ster's avatar
      VIF: Report FIFO size from VIFn_STAT · 76bd5820
      Geo Ster authored
      * Also fix bug in the Queue::size function leading to
      incorrect values and adjust FIFO size to be 16 qwords.
      I thought the size was 64 but turns out this isn't the
      case [1]
      
      [1] EE User's Manual [144]
      76bd5820
    • Geo Ster's avatar
      EE: Add test elf loading support · 7f7d30b6
      Geo Ster authored
      * Going this long without running any tests is like walking
      on charcoal in emulation development. So I wrote a little
      elf loader that loads the ELF file into RAM and executes
      it, when the BIOS has fully loaded, because most tests
      require some basic syscalls to be available.
      
      * Using the EE tests the following bugs were fixed:
      
      1. Fix branch delays in BEQ/JALR instructions
      (NOTE: That test wasn't using valid MIPS code, but
      it's nice to know we are accurate to what the hardware does)
      2. DIV/DIVU support for dividing with zero
      3. Added missing sign extension to ADDU
      7f7d30b6
  14. Jan 29, 2022
    • Geo Ster's avatar
      EE: Implement missed LWR behaviour · a0198a75
      Geo Ster authored
      * Was scrolling through the CPU manual again and noticed this
      remark on the LWR instruction description:
      
      "If the word sign bit (bit 31) is loaded from memory into the register by the instruction,
      then the loaded word is sign-extended. If the sign bit is not loaded from memory by the
      LWR, then bits 63..32 of the destination are unchanged", TX79 Core Architecture [A-75]
      
      Before implementing this I checked if other emulators also handle this
      and it seems the answer is positive [1]. So I added this, since
      it's not too difficult and should prevent headaches in the future.
      
      [1] https://github.com/PSI-Rockin/DobieStation/blob/master/src/core/ee/emotioninterpreter.cpp#L796
      a0198a75
    • Geo Ster's avatar
      VIF: Refactor code and implement more functions · 1aa7dc74
      Geo Ster authored
      * This is a really big commit as it expands the previous VIF
      implementation significantly. The VIF acts as a gateway between
      the EE and the VU, letting the former transfer both instructions
      and data to the latter.
      Each VIFcode consists of a 16bit IMMEDIATE field, the interpretation
      of which depends on the command, an 8bit NUM field that also changes
      depending on the command and the 8bit CMD field.
      
      * Interestingly the VIF considers instructions and data different
      entities and handles them differently, with different instructions.
      MPG is used to transfer instructions while UNPACK and its variants
      transfer data exclusively. The NUM field in the MPG instruction
      dictates the number of dwords transfered to the VU, while in UNPACK
      NUM states the number of qwords WRITTEN to the VU. This is important
      because the number or read and write data in UNPACK is not necessarily equal.
      
      * UNPACK basically works by reading some amount of words from the FIFO
      and converting them into a qword. Some formats are straightforward like
      V4-32, which is implemented here, that reads 4 words and packs them into
      a qword. Other formats are more complex. In addition though, UNPACK
      has many many other variables to consider. Most important is the CYCLE
      register that enables skipping or filling writes according to the
      relation between the CL and WL fields. Skipping write is simple;
      it basically means "write CL qwords and jump ahead (WL - CL) qwords".
      Filling write is another story that I won't explain here, since it's
      not needed at this point.
      
      * In addition the STL queue has been replaced in favour of a custom written
      leightweight Queue class. This desicion was made for several reasons:
      
      1. std::queue is based on std::deque, which has overhead
      2. Most STL containers in general are too "safe" for us.
      In UNPACK for example, there are several formats that require
      pulling multiple words of data from the FIFO at once. This can
      be done with the STL queue, but is tedious.
      3. STL queue is dynamic. All PS2 FIFOs are static, either 32 or 64 qwords
      in size. When they fill up the DMA stalls until the component starts
      draining the FIFO. This behaviour is crucial for some games and must
      be emulated. Using a dynamic std::queue turns this into constant
      size checks that hurt performance.
      
      Sources
      [1] https://psi-rockin.github.io/ps2tek/#vif
      [2] EE User's Manual, chapter 6.3
      1aa7dc74
  15. Jan 22, 2022
    • Geo Ster's avatar
      GS: Add support for 16bit PSMCT16 textures · 439ca357
      Geo Ster authored
      * This is pretty similar to the previous commit but it adds
      support for 16bit textures. Currently they use different functions
      but I plan to switch to templates. First step to this is abstracting
      format specific configuration variables into a seperate struct.
      
      * Also fix minor bugs in the PSMCT32 texture writing.
      439ca357
    • Geo Ster's avatar
      GS: Add PSMCT32 texture support · 307cc678
      Geo Ster authored
      * We are getting texture writes, exciting times. The BIOS initializes
      a 256 x 64 PSMCT32 texture. The DMAC sends data with the GIF PATH3
      to the the GS. So let me explain how texture writes work, cause the GS
      is very peculiar in this regard.
      
      * The VRAM in the GS is 4MB in size and is split into 8KB pages,
      each of which is a grid of 256 byte blocks. Each block is divided
      into 4 columns and finally each column contains the actual pixel data.
      Here is a small little diagram:
      
      -page
        - block
          - column
            - pixel
            - pixel
            ...
          - column
            - pixel
            - pixel
            ...
        - block
          - column
            - pixel
            - pixel
            ...
      
      * The problem is that pixels and blocks aren't actually stored sequentially in
      video ram. How they are ordered depends on the format, but here is an
      example for PSMCT32:
      
       	<------- 8 blocks/64 pixels ------->
        | 0| | 1| | 4| | 5| |16| |17| |20| |21| ^
        | 2| | 3| | 6| | 7| |18| |19| |22| |23| | 4 blocks/32 pixels
        | 8| | 9| |12| |13| |24| |25| |28| |29| |
        |10| |11| |14| |15| |26| |27| |30| |31| |
      
      * Say you want to write to the block at coordinates (5, 2) in the above page.
      If the blocks were sequentially stored, then the offset of the block in memory would be:
      
      x + width * y => 5 + 8 * 2 = 21
      
      * In reality though block 25 would be accesed. Personally I don't
      know why the GS organizes VRAM this way, but it makes emudevs like me
      struggle a lot. I guess it has something to do with texture swizzling
      but I hope I will know more in the short future. Need to also start
      thinking if it is possible to read textures like this from shaders.
      I guess it could be possible.
      
      * Other misc changes in this build include:
      - Unaligned stores/loads were rewritten to be branchless
      - Fixed bug in GIF PATH3 that caused some packets to be ignored
      - Switched to explicit register writing in the GS. This is
        because most registers are used to perform special operations,
        rather than just store data and writing structs is very inconvenient
        in this case. Will probably switch other components to this like
        (DMAC...)
      
      Sources:
      [1] https://tcrf.net/User:Kojin/TIM2_Information
      [2] GS User's Manual Version 6.0
      307cc678
  16. Jan 19, 2022
    • Geo Ster's avatar
      All round improvements/bug fixes · 05efa5ed
      Geo Ster authored
      * This commit doesn't implement any specific component or behaviour,
      rather it fixes another set of bugs that I found during testing.
      05efa5ed
  17. Jan 16, 2022
    • Geo Ster's avatar
      Implement SIO2/PAD communication and CDVD commands · 5a39532b
      Geo Ster authored
      * Implementing IOP components is probably the hardest task in a PS2
      emulator, even compared to complex chips like the VUs. That is because
      almost every single component in the IOP is completetly undocumented
      so emulators have to make assumptions and reverse engineer them to
      figure out what they do. This commit adds support three new IOP components.
      
      * The DVD drive is quite simple in concept. It accepts two types of
      commands, S commands (synchronous) and N commands (asynchronous).
      Async commands are used mainly for seeks and reads, so the CPU
      doesn't have to wait for the drive to fetch the data. On the other
      hand S commands complete instantly and are used for more misc
      operations.
      
      * Both types of commands contain three registers; one stores
      the current command, the other acts as the status register and
      the third register either gives the current command output
      (read) or stores the parameters of the command (write).
      
      * SIO2 is very peculiar since very little is actually known about it.
      It is responsible for managing peripherals like the memory card or
      the DualShock controller. The CPU first sends a peripheral byte that
      informs about the target peripheral, then a command and waits for
      a reply. The SEND3 array contains the command size for each SIO2 command.
      
      * Since in this case the gamepad gets accessed, that needs to be implemented.
      To prevent this commit message from getting extermely long, because the
      gamepad is very complex in its operation, I will talk about it in another
      time. Check out the new txt file in the docs folder for more info
      5a39532b
  18. Jan 10, 2022
    • Geo Ster's avatar
      DMA: Add SPU2 interrupts · df2591e9
      Geo Ster authored
      * Next the IOP DMA controller initiates an SPU2
      transfer. However similarly to SIF0/SIF1 when a
      transfer finishes, an irq is generated in the SPU
      STAT register.
      
      * After searching on github for any details on this particular
      register, I found a hacky snippet DobieStation that seems to
      properly initialize the SPU2 (according to the logs) so I'll use
      it here.
      df2591e9
  19. Jan 09, 2022
    • Geo Ster's avatar
      Fix some additional bugs, OSDSYS now loading! · 7a4b8e93
      Geo Ster authored
      * Fixed bug when writing to DICR/DICR2 flag field
      * Fixed definition of the DMAC tag
      * SIF0 doesn't use the id field, apparently?
      * Allow writing to some specific regions of the BIOS
      
      With these changes OSDSYS has now started loading! The BIOS
      is initiallizing the SPU2, probably to play that boot up charm
      the PS2 does.
      7a4b8e93
  20. Jan 08, 2022
    • Geo Ster's avatar
      Optimize PLZCW instruction · 16464a36
      Geo Ster authored
      * The original code was using brute force to ensure correct
      results but was very expensive, doing 2 * 30 = 60 loops. So I
      rewrote it to use __builtin_clz to count the bits instead, leading
      to noticeable speedup.
      16464a36
    • Geo Ster's avatar
      Major bug fixes + EE <-> IOP communication support · 073cba80
      Geo Ster authored
      * This commit was intended to start the DMA implementation, but
      quickly turned to a tedious bug fixing journey. So in return for
      having my soul drained from all the debugging,
      the following bugs were fixed:
      
      * Fixed bug that caused incorrect IOP and EE timer writes
      * Fixed bug that caused interrupts to not trigger
      (NOTE: This is an error on ps2tek and more specifically with the
      use of the I_CTRL register)
      * Fixed handling of some virtual addresses (0x2*)
      (The program doesn't notify me when out of bounds writes
      happen for some reason?)
      * Fixed handling of DICR2 register in the IOP DMA and
      * Fixed INT1 interrupts running endlessly
      * Fixed incorrect int1_pending field position in COP0 status
      
      In addition to the above bugfixes some new additions needed to be
      made to make DMA work:
      
      * Added VBLANK ON/OFF interrupts
      * Added INT1 interrupt detection
      * Added some new EE instructions related to exception handling.
      * Added syscall support
      
      So after all of that, at least the program started running normally
      again and I could begin with the DMA implementation. SIF0/SIF1 are
      quite confusing since nowhere in ps2tek does it mention where that data
      comes from. Thankfully I came across this linux kernel commit [1] which
      specified that there exists a bidirection FIFO in the SIF that sends
      data from either side to the other.
      
      There are problems however. The EE DMAC works exclusively with qwords
      or 4 word packets, while the IOP DMA only sends words. Also the DMAC
      tags are 128bit while the IOP tags are 64bit. So even though I would
      have liked to use qwords in the fifo, I was forced to switch to 32bit
      values to have finer control on how much data gets used.
      
      Other things to note are that the EE/IOP can start DMA transfer
      without the FIFO having any data. In that case the transfer stalls
      and waits until the other side starts filling it. A typical sequence
      goes as follows:
      
      1. The EE starts a SIF0 transfer, waiting for the IOP to send data.
      2. The IOP starts a SIF0 transfer, pushing data from its RAM to the fifo.
      3. The EE notices and starts taking that data to form a DMA tag
      (when the fifo reaches a size bigger than 4 words of course)
      5. The transfer completes and INT1 is asserted.
      6. An exception is triggered and the exception handler checks the
      data that was written.
      
      If any of those steps above go wrong (IOP sends too much data, EE receives
      it too early, the data is not written to the correct address) the whole
      process will be stuck in an infinite loop. Now figure out what happened
      in between the 1000+ instructions that were executed...
      At least it seems to be working quite well now.
      
      Note that this is not the final implementation of the DMA. Normally only
      one channel can trasfer data in a single cycle while in our current
      implementation all the channels are checked. I have a request system
      in mind, that should fix this but right now I really want some graphics
      on screen as fast as possible.
      
      [1] https://patchwork.kernel.org/project/linux-mips/patch/fb79dab2db2bfa9a06e96c211d27423d0c51399c.1567326213.git.noring@nocrew.org/
      073cba80
  21. Jan 05, 2022
    • Geo Ster's avatar
      Reduce logging in some areas · 39407eea
      Geo Ster authored
      * The read function of the SIF causes massive logspam, especially when
      component poll a specific register non-stop. Removing the logging also
      makes the code much faster
      
      * Some syscalls, especially 0x7A get called really often and fill up
      the whole console really quickly. Don't log unknown syscalls and add
      ability to disable exception logging. Now the logs are much cleaner
      39407eea
    • Geo Ster's avatar
      Fix misc CPU bugs · e1de8e78
      Geo Ster authored
      * The ERET instruction doesn't have a branch delay slot
      so the direct_jump function must be used
      
      * When skipping the branch delay slot don't return as that
      skips all the cycles left to execute
      
      * The CPU has now entered an infinite loop waiting for data from
      the 0x8c440 address. This probably means it's time for DMA
      e1de8e78
    • Geo Ster's avatar
      Implement syscall support · 1ead43d3
      Geo Ster authored
      * Finally we can start executing some kernel functions now. Executing
      syscalls is very straightforward; just create an exception of type 8
      and the handler will handle the rest. This is the first time the exception
      handler is used on the EE so I'm bit anxious whether it will be bug free.
      
      * Log each syscall and print its name instead of just a simple number. Known
      syscalls along with their ids are on ps2devwiki [1]
      
      [1] https://playstationdev.wiki/ps2devwiki/index.php?title=Syscalls
      1ead43d3
    • Geo Ster's avatar
      Improve COP0 instruction decoding · f427f844
      Geo Ster authored
      * The COP0 code was hacky and didn't report unknown TLB instructions.
      This caused an infinite loop when the program was calling ERET to exit
      the current syscall without me knowing.
      
      * Fix up the code a little and add the DI and ERET instructions
      f427f844
    • Geo Ster's avatar
      Only log instructions from the kernel and beyond · 3a7a6b80
      Geo Ster authored
      Shrinks the log file size from 4GB to 2GB
      3a7a6b80
  22. Jan 04, 2022
    • Geo Ster's avatar
      Rework IOP DMA reads/writes · 6592212f
      Geo Ster authored
      * The old code was pretty complicated and bad. Reorganize the register
      structs and simplify the reading/writing process.
      6592212f
    • Geo Ster's avatar
      Don't write out of bounds · 668d5417
      Geo Ster authored
      * VU0 tries to access address 0x4210 which maps to the VU1 I/O registers.
      Currently we don't have VU1 support so abort when trying to write to these
      locations
      668d5417
Loading