Rewriting Vertex Processing for Massive Performance Gains

Greetings. I am kd-11, graphics developer for rpcs3 with a mid-month update on latest developments on the emulator.

As many are already aware, a lot has been going on lately with the new changes to the RSX (the PS3 GPU) emulation, dubbed vertex rewrite. This change moves a lot of vertex processing duties from the CPU to the GPU where they rightly belong and as a result there are massive performance gains especially with OpenGL but also with Vulkan in geometry heavy scenes.


Most if not all users are probably aware by now, but dedicated graphics cards exist on a physically separate board. This means data has to be moved to and from it through the PCI-E bus which is quite fast. However, while it is high bandwidth, it is also high-latency. That means you cannot just send something over there and expect to get it immediately available for the next draw call. Instead, the GPU has to wait for data to be prepared and then signaled that data is ready for processing before drawing begins. This is a general simplification, but it helps illustrate the point. The RSX on the PS3 doesn’t work the same way however. It has near direct access to the XDR main memory on a PS3 and ‘pulls’ data directly from main memory as though it were local memory. It is somewhat similar to integrated graphics memory in this case. That means data is not ‘pre-packaged’ for transport to the PS3 GPU since the memory is virtually unified from the point of view of the RSX. When using Vulkan, drawing is not scheduled until the whole command queue is flushed mitigating the impact of transfer since data will likely have been uploaded beforehand, but for OpenGL this was a big bottleneck.

The second issue was that the emulator was doing a lot of computation on the CPU on how to read vertex data from main memory, essentially pre-packaging the data into formats easy for GPUs to use. This is a very slow process and also very memory intensive (hence the ‘Working buffer not enough’ crashes). Enabling a debug overlay with the old method shows some games taking up to 200ms to prepare vertex data for one frame (Hellboy: The Science of Evil). This is obviously not optimal. The impact could be lowered by using more threads for vertex processing, but with the number of threads already needed to emulate the PS3’s multi-core processor, it was a problem. Spawning 8+ vertex processing threads reduced the time spent processing vertices, but cost other threads to starve and performance would drop significantly. The solution was to shift the work to the GPU instead and not touch it in any way. Just copy the data block and the GPU could fetch the data it needed for itself, mimicking the behaviour of the real hardware.

Continue reading Rewriting Vertex Processing for Massive Performance Gains

Progress Report: July 2017

July like every month before it this year set a new record in the number of improvements that happened. Mostly centered around bug fixes and compatibility improvements it is safe to say that if every single improvement were to be covered in great detail this progress report would take more than a month to finish. Therefore the format is now going to change a bit. This report will focus on some major emulation improvements and it will explain what these entail in general. Thereafter a few select more interesting games and how they were improved will be covered. Every improved game will not be covered because there were simply too many, and evaluation of earlier reports indicate that it isn’t interesting content either.

First of all are the compatibility database statistics for the month of July. Take note that the last database update was performed a day before the major emulation improvement known as “LLE gcm” was merged, meaning the hundreds of games improved from this are not listed in the figures below, or even on the compatibility database yet.

Table of Contents


Game Compatibility: Game Status
Game Compatibility: Monthly Improvements (June 2017)

Looking at the GitHub statistics, 18 authors have pushed 201 commits to the master branch. Here 257 files have changed and there have been 14,559 additions and 5,088 deletions of lines of code. What improvements came from these changes? Let’s take a look:

Continue reading Progress Report: July 2017