Shader compilation stutter is nothing new to most emulator users, especially on RPCS3. However it is worth clearing up some misconceptions that go around regarding how RPCS3 shaders work. I’ll try to quickly go over the history of shader compilation on RPCS3 and hopefully explain why the shader compilation stutter appeared and why some people believe RPCS3 did not have shaders before.
Shader complexity and custom vertex fetch
In early 2017, I embarked on a task to remove the very expensive vertex preprocessing step from the CPU side of RPCS3. This basically meant implementing all those custom vertex types and vertex reading techniques to the vertex shader and providing only raw memory view that the ps3 hardware would be viewing. This greatly improved RPCS3 performance, more than tenfold in some applications. This change is what made RPCS3 usable for playing real commercial games with playable framerates without needing HEDT system. However, the new fetch technique increased the size of the vertex shader and added a complex function to extract vertex data from the memory block. This made the graphics drivers take very long to link the programs, even without optimizations, likely due to use of vector indexing, switch blocks and loops with dynamic exits. Extra operations including bitshifts and masking were also needed to decode the vertex layout block. The code runs very fast, but the linking step is very slow. A shader cache system already existed before and if you ran an area for the first time, there was slight microstuttering that some users did not notice; its this stutter that got much worse. The solution to this: preload the shaders so that you don’t need to compile them next time. This lead to the infamous “Loading Pipeline Object…” screen and the “Compiling shaders…” notification.
There are several challenges to tackling RSX shaders. First, the RSX is not a unified architecture like most programmers are used to today. It has separate vertex and fragment pipelines, both with their own separate ISA. They are also very limited and larger programs or more complex programs can result in very messy binaries. One of the largest problems is that the bytecode itself does not contain all the information required to run the program, extra configuration is configured via registers as draw calls are passed in. A good example is that the TEX instruction does not differentiate between texture types, but a texture configuration register exists that allows using the same program to read 1D, 2D, 3D, or CUBE textures as well as their shadow comparison variants. This means you can only know the generated shader once the texture register has been set up. There are other examples of things like these that make it so that you need the game to set up the program environment before the program itself is compilable.
Continue reading Eliminating Stutter with Asynchronous Shader Implementation!