Skip to content
Snippets Groups Projects
  • Geo Ster's avatar
    ec313120
    Memory subsystem rewrite + EE IRQs! · ec313120
    Geo Ster authored
    This is a pretty big commit so the description is probably going
    to be a whole essay again explaining all the changes. Emulation is extermely
    complicated and thus I need to explain all of my reasoning and sources.
    This commit contains 3 major changes that all work together to form the new memory subsystem:
    
    * New handler infrastrucutre
    * Compiler switch to clang-cl
    * Initial implementation of EE interrupts
    
    Now, you reader, might wonder why I decided to redo the relatively simple
    and straightforward system we had before. Well that system had some
    drawbacks that I think needed to be addressed early on. Firstly, it is
    highly centralized, which means that for every new component the read/write
    functions of the ComponentManger (now Emulator) need to updated. This isn't
    that big of an issue as the second one though. The old system relies heavily
    on branches to figure out the destination of a read/write which is bad for
    performance. Especially because our address ranges aren't continuous, the
    compiler can't optimize the switch statement in any way. This leads to a lot
    of assembly code, many jumps.
    
    The initial idea for this new system was taken from a PCSX2 devblog I read
    recently: https://pcsx2.net/developer-blog/218-so-maybe-it-s-about-time-we-explained-vtlb.html
    It explains a system, where the address range is divided into pages, where each
    page is handled by a handler function. This is perfect for us, because it moves
    most of the code to the initialization phase (when the components register
    their handlers), while reads/writes are very fast, only having to lookup
    the handler table and calling the appropriate function.
    
    However is isn't as easy to implement to implement though. The main problem
    was how to store class member function of different classes in a single array
    and call them without knowing their type. Firstly I thought of using
    std::function, which is perfect for this due to its type erasure but is
    was quickly ruled out because of the very high overhead. Next, I considered inheritence
    and virtual functions, which was a step to the right direction. However that
    also has the overhead of looking up the vtable. Finally, though, I discovered
    a neat little trick with function pointers. You can actually cast a pointer to
    a base class member function, to a derived class member function as long as the
    function isn't ambigious. So the final solution was to make all the components
    inherit from an empty (for now) Compoent class and store a common Component function pointer.
    The compiler will handle the rest, with some dose of magic and inheritance!
    The handler interface is located in the common/component.h file.
    You can check out the IOP DMA controller constructor for how a component can register
    handlers with this system.
    
    This is very efficient, generating only 10-15 lines of assembly (with clang 12.0), which
    leads me to the second change, that of the compiler. The switch to clang-cl was made primarily
    for performance reasons. clang generates a lot more efficient code than MSVC does so the switch
    will improve perfomance down the road. It also catches more warnings and code issues, allowing for
    cleaner code overall.
    
    The next hurdle, was figuring the handler page size. This is more difficult than it seems, because there
    are additional "hidden" addresses the BIOS writes to, which aren't listed in the ps2tek
    memory map. Making the page size too big, will lead to these garbage addresses being handled
    by our compoents which defeats the purpose of this whole system. Making the page size too
    small though, will both make the handler array table massive and require compoents to register
    many handlers to cover their address ranges. So after studying the memory map for a while, I
    decided that 0x80 = 128 is the best size. For example in the DMAC (EE DMA) each channel takes up
    exactly 0x80, while the IOP DMA each channel group is also exactly 0x80 in size.
    0x80 is, in addition, small enough that garbage addresses don't get caught.
    Even in the case we have something like that, I have placed asserts on debug builds to capture them.
    
    Our struggle isn't done though! The initial handler table ended up causing stack
    overflows because the array was too large. To mitigate this, the stack size was increased
    to 10MB and a small optimization was implemented. If you view all the addresses in the memory map of
    the PS2, a pattern emerges. It turns out that a byte inside the address is always zero, no matter the address
    (except for 0xfffe addresses which we don't care about). This means we can "squash"
    the address by removing that byte, allowing us to significantly reduce the handler table size:
    
    0x100|0|3070 -> 0x1003070
    0x120|0|0060 -> 0x1200060
    0x1F4|0|2006 -> 0x1F42006
    0x1F8|0|1120 -> 0x1F81120
    0x1F9|0|01AC -> 0x1F901AC
    
    This is implemented in the Emulator::calculate_page function.
    A debug assert is also placed here to ensure nothing our of the ordinary happens.
    
    Finally, I also implemented EE interrupts because they are needed at this stage. Timer 5, should normally
    be ticking now (next commit I promise), and is waiting to cause an interrupt, thus we need to have those implemented.
    The implementation is taken from a new document I found, which is the same as the previous one, but more focused on
    the EE and its features, something that should help us a lot in the near future. Right now its not finished, but
    that will come in the next commit.
    ec313120
    History
    Memory subsystem rewrite + EE IRQs!
    Geo Ster authored
    This is a pretty big commit so the description is probably going
    to be a whole essay again explaining all the changes. Emulation is extermely
    complicated and thus I need to explain all of my reasoning and sources.
    This commit contains 3 major changes that all work together to form the new memory subsystem:
    
    * New handler infrastrucutre
    * Compiler switch to clang-cl
    * Initial implementation of EE interrupts
    
    Now, you reader, might wonder why I decided to redo the relatively simple
    and straightforward system we had before. Well that system had some
    drawbacks that I think needed to be addressed early on. Firstly, it is
    highly centralized, which means that for every new component the read/write
    functions of the ComponentManger (now Emulator) need to updated. This isn't
    that big of an issue as the second one though. The old system relies heavily
    on branches to figure out the destination of a read/write which is bad for
    performance. Especially because our address ranges aren't continuous, the
    compiler can't optimize the switch statement in any way. This leads to a lot
    of assembly code, many jumps.
    
    The initial idea for this new system was taken from a PCSX2 devblog I read
    recently: https://pcsx2.net/developer-blog/218-so-maybe-it-s-about-time-we-explained-vtlb.html
    It explains a system, where the address range is divided into pages, where each
    page is handled by a handler function. This is perfect for us, because it moves
    most of the code to the initialization phase (when the components register
    their handlers), while reads/writes are very fast, only having to lookup
    the handler table and calling the appropriate function.
    
    However is isn't as easy to implement to implement though. The main problem
    was how to store class member function of different classes in a single array
    and call them without knowing their type. Firstly I thought of using
    std::function, which is perfect for this due to its type erasure but is
    was quickly ruled out because of the very high overhead. Next, I considered inheritence
    and virtual functions, which was a step to the right direction. However that
    also has the overhead of looking up the vtable. Finally, though, I discovered
    a neat little trick with function pointers. You can actually cast a pointer to
    a base class member function, to a derived class member function as long as the
    function isn't ambigious. So the final solution was to make all the components
    inherit from an empty (for now) Compoent class and store a common Component function pointer.
    The compiler will handle the rest, with some dose of magic and inheritance!
    The handler interface is located in the common/component.h file.
    You can check out the IOP DMA controller constructor for how a component can register
    handlers with this system.
    
    This is very efficient, generating only 10-15 lines of assembly (with clang 12.0), which
    leads me to the second change, that of the compiler. The switch to clang-cl was made primarily
    for performance reasons. clang generates a lot more efficient code than MSVC does so the switch
    will improve perfomance down the road. It also catches more warnings and code issues, allowing for
    cleaner code overall.
    
    The next hurdle, was figuring the handler page size. This is more difficult than it seems, because there
    are additional "hidden" addresses the BIOS writes to, which aren't listed in the ps2tek
    memory map. Making the page size too big, will lead to these garbage addresses being handled
    by our compoents which defeats the purpose of this whole system. Making the page size too
    small though, will both make the handler array table massive and require compoents to register
    many handlers to cover their address ranges. So after studying the memory map for a while, I
    decided that 0x80 = 128 is the best size. For example in the DMAC (EE DMA) each channel takes up
    exactly 0x80, while the IOP DMA each channel group is also exactly 0x80 in size.
    0x80 is, in addition, small enough that garbage addresses don't get caught.
    Even in the case we have something like that, I have placed asserts on debug builds to capture them.
    
    Our struggle isn't done though! The initial handler table ended up causing stack
    overflows because the array was too large. To mitigate this, the stack size was increased
    to 10MB and a small optimization was implemented. If you view all the addresses in the memory map of
    the PS2, a pattern emerges. It turns out that a byte inside the address is always zero, no matter the address
    (except for 0xfffe addresses which we don't care about). This means we can "squash"
    the address by removing that byte, allowing us to significantly reduce the handler table size:
    
    0x100|0|3070 -> 0x1003070
    0x120|0|0060 -> 0x1200060
    0x1F4|0|2006 -> 0x1F42006
    0x1F8|0|1120 -> 0x1F81120
    0x1F9|0|01AC -> 0x1F901AC
    
    This is implemented in the Emulator::calculate_page function.
    A debug assert is also placed here to ensure nothing our of the ordinary happens.
    
    Finally, I also implemented EE interrupts because they are needed at this stage. Timer 5, should normally
    be ticking now (next commit I promise), and is waiting to cause an interrupt, thus we need to have those implemented.
    The implementation is taken from a new document I found, which is the same as the previous one, but more focused on
    the EE and its features, something that should help us a lot in the near future. Right now its not finished, but
    that will come in the next commit.
main.cc 1.19 KiB
#include <glad/glad.h>
#include <GLFW/glfw3.h>
#include <common/emulator.h>
#include <thread>

int main()
{
    glfwInit();
    glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 4);
    glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 6);
    glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);

    GLFWwindow* window = glfwCreateWindow(800, 600, "PS2 Emulator", NULL, NULL);
    if (window == NULL)
    {
        glfwTerminate();
        return -1;
    }

    glfwMakeContextCurrent(window);
    glfwSetFramebufferSizeCallback(window, [](GLFWwindow* window, int width, int height)
	{
		glViewport(0, 0, width, height);
	});

    if (!gladLoadGLLoader((GLADloadproc)glfwGetProcAddress))
        return -1;

    common::Emulator emulator;
    std::thread thread([&]() { while (!emulator.stop_thread) { emulator.tick(); } });

    while (!glfwWindowShouldClose(window))
    {
		if(glfwGetKey(window, GLFW_KEY_ESCAPE) == GLFW_PRESS)
        	glfwSetWindowShouldClose(window, true);

        glClearColor(0.2f, 0.3f, 0.3f, 1.0f);
        glClear(GL_COLOR_BUFFER_BIT);

        glfwSwapBuffers(window);
        glfwPollEvents();
    }

    emulator.stop_thread = true;
    thread.join();

    glfwTerminate();
    return 0;
}