Rewriting Vertex Processing for Massive Performance Gains


Greetings. I am kd-11, graphics developer for rpcs3 with a mid-month update on latest developments on the emulator.

As many are already aware, a lot has been going on lately with the new changes to the RSX (the PS3 GPU) emulation, dubbed vertex rewrite. This change moves a lot of vertex processing duties from the CPU to the GPU where they rightly belong and as a result there are massive performance gains especially with OpenGL but also with Vulkan in geometry heavy scenes.

Background

Most if not all users are probably aware by now, but dedicated graphics cards exist on a physically separate board. This means data has to be moved to and from it through the PCI-E bus which is quite fast. However, while it is high bandwidth, it is also high-latency. That means you cannot just send something over there and expect to get it immediately available for the next draw call. Instead, the GPU has to wait for data to be prepared and then signaled that data is ready for processing before drawing begins. This is a general simplification, but it helps illustrate the point. The RSX on the PS3 doesn’t work the same way however. It has near direct access to the XDR main memory on a PS3 and ‘pulls’ data directly from main memory as though it were local memory. It is somewhat similar to integrated graphics memory in this case. That means data is not ‘pre-packaged’ for transport to the PS3 GPU since the memory is virtually unified from the point of view of the RSX. When using Vulkan, drawing is not scheduled until the whole command queue is flushed mitigating the impact of transfer since data will likely have been uploaded beforehand, but for OpenGL this was a big bottleneck.

The second issue was that the emulator was doing a lot of computation on the CPU on how to read vertex data from main memory, essentially pre-packaging the data into formats easy for GPUs to use. This is a very slow process and also very memory intensive (hence the ‘Working buffer not enough’ crashes). Enabling a debug overlay with the old method shows some games taking up to 200ms to prepare vertex data for one frame (Hellboy: The Science of Evil). This is obviously not optimal. The impact could be lowered by using more threads for vertex processing, but with the number of threads already needed to emulate the PS3’s multi-core processor, it was a problem. Spawning 8+ vertex processing threads reduced the time spent processing vertices, but cost other threads to starve and performance would drop significantly. The solution was to shift the work to the GPU instead and not touch it in any way. Just copy the data block and the GPU could fetch the data it needed for itself, mimicking the behaviour of the real hardware.

Initial tests

The first task was to put this theory to test. I started by writing routines to identify continuous memory used by games to store vertex data and found most use interleaving to help speed up data transfer even on a real ps3. This is good since copying is very easy, boiling down to a single memcpy operation. I quickly got OpenGL to use this method whenever large blocks of data were in use and fired up a test case I had been using for some time to benchmark vertex processing – TNT racers title screen. The scene is very simple but throws up vertices using immediate draws and does over 2000 draws. This was a problem on rpcs3 where OpenGL was stuck at 10fps for a long time. Using the simplified processing, FPS went up to around 17, so the experiment was a success. Vertex upload times went down significantly from about 50ms per frame to under 10ms. However, the change in frame rate was not so impressive, so I had to do some more investigation.

Research and second attempt

After getting feedback from ssshadow that the performance boost was not as expected, I sought to find out how drivers handle memory writes to external PCI-E devices. For this I turned to the mesa open source drivers that I use on my PC and poured through the code. I confirmed the use of write-combined memory to improve throughput and went to work redesigning the rendering pipeline to make effective use of this. The simplified single writes to a small contiguous block already gave a good boost, so I tried reordering where the vertex upload happened to give it some time before the draw call requiring the data was issued. Memory writes were placed very early in the submission chain and other computations moved after this point but before the draw call. This way, the GPU could get the data before it is needed, improving efficiency of the renderer. With this change, I hit a GPU limit at ~25 fps which was 2.5x the initial test result and 10fps over the initial implementation. A driver update later brought the frame rate up to ~28fps but that is where my hardware maxed out.

Vulkan and disappointing performance

After the research was completed, I had a good idea of what to do to maximize throughput on Vulkan as well. I quickly applied the changes to Vulkan and refactored shared code between renderers to make future work easier. I quickly ran the same benchmark on Vulkan expecting 30fps but only got +1 fps gain; from 14 fps to 15fps. After measuring time spent in different parts of the pipeline I found another bottleneck that had been holding back Vulkan for the longest time – Every frame was waiting for the previous frame to complete before beginning rendering. This is not very efficient use of the graphics hardware or the API. Since the baseline framebuffer uses a doublebuffer setup, there are two surfaces to write to. There is no need to wait on previous commands if they write to a different surface. A quick restructure later and the renderer was rewritten to support asynchronous frame processing.

Final results

Combining the two optimizations and running the same benchmark yielded about 24 fps. This was about 10fps higher than the baseline performance provided on master. I fired up another benchmark – Hellboy: The Science of Evil and confirmed fps jumped from 3-5 fps before to 20-30 fps on both OpenGL and Vulkan. As an aside, the memory access behaviour of intel integrated graphics give 40 fps in that TNT racers scene using the new OpenGL backend – almost double the Vulkan performance of a 270x. This shows that things can be better with more optimization.

Before screenshots from Hellboy: The Science of Evil:

After screenshots from Hellboy: The Science of Evil:

Before screenshot from TNT Racers with Vulkan:
 

After screenshots from TNT Racers. The first two from a dedicated GPU with Vulkan and OpenGL. The last from integrated Intel graphics.

Another side effect of unifying vertex processing and handling things correctly was that many games that did not have any visual output before started working almosy immediately. Most notable cases here are Metal Gear Solid 2 and Metal Gear Solid 3. Shadow of the Colossus also got partially fixed as well as improvements to other games such as Sleeping Dogs and the Yakuza games.

Minor other bugfixes were also added that fixed Ni No Kuni’s flicking graphics when using Vulkan as well as broken depth in that title. Unreal engine 3 games also had broken color in intro cinematics and logos that were fixed as well.

Things were looking good but there was one more problem – shader compilation stutter was making things unbearable.

Shader cache reorgnization

Rpcs3 has always had a JIT shader cache that slowly builds as the emulator runs. It was however very broken on Vulkan causing programs to constantly rebuild. This manifested as microstutter before where the game did not feel smooth although the fps counter showed high fps. With the new more complex shaders, the stutter went from a few milliseconds per shader to several seconds in some cases. Even after fixing the microstutter on Vulkan due to bad key hashing, it still took unbearably long to generate the internal cache into something useful for gameplay, interspersed with minute-long pauses in places. It was quickly apparent that something had to be done. As such I decided to have the shader cache dumped to disk and preloaded when the emulator was started up again. This made significant changes to the feel of games run from within the emulator. A few workarounds were needed to make intel work at all. Its still not working 100% but improvements will be coming.

Bugs and technical issues

While the changeset does improve the core rendering pipeline a lot with the new systems, it brings with it new baggage. First is the aforementioned increase in shader compilation times. The second more serious issue is more relevant to nvidia users – high memory usage when the number of precompiled shaders rises. This issue was brought to my attention by a user on discord who mentioned that Cemu works the same way and that nvidia users experience the same problem with memory usage going up very high. On rpcs3, this means you can see the emulator consume 5+ GB of RAM when compiling the shader cache. An interim solution for those with lower RAM would be to clean the shader cache periodically until we find a suitable workaround. More information can be found here.

Lastly, intel drivers (at least on windows) have buggy glsl generation. I have a workaround in place specific to intel but it wont fix all problems on that platform. More work is certainly needed.

What does this mean?

First, it means OpenGL is now within striking distance of Vulkan in terms of performance and faster in some cases. Vulkan however suffers more from the first-time shader compilation stutter. This will be improved in the near future, but for now, you may want to give OpenGL a try if the stutter is too distracting or avoid the new nightly builds for now. Overall Vulkan is still faster now with the reworked framework. I have seen suggestions about disabling the shader cache – that wont help with the stutter since its caused by the linking step that is done by the drivers. I would ask those negatively affected to be a little more patient as we work to resolve the issues.

Closing Words

If you want to help out and make RPCS3 progress even faster you can check out the Patreon page here. Right now we are coming closer and closer to the $3000 goal of having me, kd-11, work on RPCS3 full time which would greatly increase the rate of progress in the future.

Also, come check out our YouTube channel and Discord to stay up-to-date with any big news.

Progress Report: July 2017

July like every month before it this year set a new record in the number of improvements that happened. Mostly centered around bug fixes and compatibility improvements it is safe to say that if every single improvement were to be covered in great detail this progress report would take more than a month to finish. Therefore the format is now going to change a bit. This report will focus on some major emulation improvements and it will explain what these entail in general. Thereafter a few select more interesting games and how they were improved will be covered. Every improved game will not be covered because there were simply too many, and evaluation of earlier reports indicate that it isn’t interesting content either.

First of all are the compatibility database statistics for the month of July. Take note that the last database update was performed a day before the major emulation improvement known as “LLE gcm” was merged, meaning the hundreds of games improved from this are not listed in the figures below, or even on the compatibility database yet.

Continue reading Progress Report: July 2017

RPCS3 AppImages are now available for Linux!


Nearly six months after the Patreon launched the RPCS3 team have finally improved RPCS3 on Linux to the point where it has reached compatibility and stability parity with Windows. Thanks to the hard work by hcorion we can finally start to provide pre-compiled binaries in the form of AppImages for easy installation on your favorite distribution.

What Was the Issue?

There were a lot of problems. Back in January, quite literally nothing was working. RPCS3 would crash instantly upon booting any game, if the program itself would even start at all. Moreover, additional functionality like the debugger, framerate counter, and firmware installer were completely broken too. While many of these auxiliary issues were quickly identified and fixed, the fact of the matter was that almost every game would hang after running for a few seconds. This turned out to be much more difficult to fix. This was caused by several different bugs in thread synchronization which were fixed continuously in the past few months. Finally, one last relatively small commit in early April fixed the last bug and suddenly RPCS3 on Linux went from basically nothing to Demon’s Souls and Persona 5. Or so we though, but we quickly found out that the LLVM recompiler was completely broken for a lot of users who just got completely nonsense errors. We encountered strange and esoteric bugs and oddities about LLVM and how various Linux software, including the Mesa drivers, were using it. These problems made RPCS3 unusable for a lot of people. A lot of false flags and red herrings later the bugs were fixed not by changing any code, but by using rare compiler flags. RPCS3 on Linux is now working as intended for everyone, including AMD and Intel graphics users with modern Mesa. Even Vulkan with Mesa is now working!

PlayStation 3 Games on Linux

Below are some popular PlayStation 3 games showcased running on Linux. Performance is about the same as on Windows, perhaps even a few percent better in some very intensive games like Persona 5. But take note: These images were either captured on a laptop with a very old i7-2670qm CPU, or a fast desktop with a i7-4770 CPU.

Continue reading RPCS3 AppImages are now available for Linux!

Persona 5 Is Now Playable in RPCS3

Last week the Persona 5 video seen below was posted showing how RPCS3 had been improved to fix various emulation issues with the game. The issue where Persona 5 had broken bloom and depth of field, making various parts of it have incorrect colors and also look blurry, was fixed. Additionally performance had been improved thanks to the reworked PPU recompiler a few days earlier.

While game certainly looks playable in the video it really wasn’t until today. Two critical issues persisted: The game would crash every time when starting a battle if the frame rate was higher than about 10-15 fps, and without opening the RPCS3 debugger and pausing a few special threads performance would suffer. But these problems are now fixed, and using the work in progress build linked below Persona 5 is playable.

Is Persona 5 Really Playable?

Yes, a few people have already beaten the entire 100+ hours long game in RPCS3 so essentially every issue is known to the developers, and all of the critical problems have been fixed. As the game looks more or less fine, performs more or less fine, and is provably beatable now without any strange hacks and workarounds we classify it as playable, but not perfect. One important aspect of this is that the Persona 5 engine is “framerate unlocked” so to speak, that is the game runs at full speed at almost any frame rate, like most PC games do for example. The actual interval where it runs at full speed is 15 – 60 fps, and for example an overclocked Haswell i5 can expect to play the game at about 10 – 20 fps in school and around Tokyo, and at about 20 – 30 fps in dungeons and battles.

Continue reading Persona 5 Is Now Playable in RPCS3

Progress Report: June 2017

Another month another colossal RPCS3 progress record. But first things first, take a look at this beautiful introduction to RPCS3 video that reznoire made:

Moving on to some statistics the numbers alone this month are quite impressive, and that is not even counting games that improved from emulation improvements that are still work in progress.
Game Compatibility: Game Status

Game Compatibility: Monthly Improvements (June 2017)

Looking at the GitHub statistics 16 authors have pushed 140 commits to the master branch. Here 353 files have changed and there have been 19,729 additions and 17,660 deletions. These numbers are much larger than normal, and several exciting changes are behind it. This month RPCS3 transitioned away from the previous GUI toolkit Wx to Qt which in turn also lead to several user facing interface improvements. This separate report goes into significantly greater detail regarding what work has already been done on the GUI, and what work is planned to be done in the future. On a lower level, Nekotekina reworked the entire PPU LLVM recompiler to greatly increase compatibility of it. kd-11 fixed various graphical issues affecting hundreds of games, most notably broken shadows and depth of field in various titles. Moreover kd-11 also enabled Vulkan on Linux and pushed support for it to the master branch. Nekotekina and Numan also worked on .sprx relocations in general and in the LLVM recompiler which in simple terms is a compatibility and performance improvement.

The talented developer Jarves is working on a very important core improvement. It is not yet finished or merged to the master branch, but as work on this has been going on in public, and a lot of people have already tested it and submitted results this report will cover the work anyway. Jarves is working on “LLE gcm” which is huge and here is why:

LLE stands for Low Level Emulation. In RPCS3 this means that a PS3 operating system module is run directly as it is via lower level emulation methods. This is great for compatibility because it doesn’t involve much guesswork and is very accurate. As far as the games are concerned, they are working with the exact same operating system methods with the exact same implementations here as on a real PS3.

gcm or “graphics command management” is the part of the PS3 operating system responsible for creating and managing various graphics commands, including everything from how to set up vsync to graphics memory allocation.

LLE gcm is huge. A very core part of emulating games is now done exactly the same as on on a real PS3, and the result is amazing. For example Red Dead Redemption no longer hangs almost instantly. The Last of Us doesn’t crash immediately and actually shows a loading screen and goes through a lot of initialisation successfully. Persona 5 no longer randomly crashes. Various high end games that did nothing before are now starting to boot, for example the Yakuza series where Yakuza 3 went from nothing to ingame, and Yakuza 4 and 5 went from nothing to loading screens. One could go on and on about what a substantial compatibility improvement LLE gcm is (and I will). Now combine this with the greatly improved LLVM recompiler and over 40 commits of graphics fixes and you have a month of quite insane progress. This would be a great segway into looking at improved games and such, but Jarves did one other important contribution that deserves its own section.

Continue reading Progress Report: June 2017

RPCS3 Has Moved to Qt

June has been an exciting month for RPCS3. Quite a few new features have been added to the emulator, with a healthy focus on the somewhat outdated GUI. This post will break down some of changes brought by Qt, as well as new features introduced since the transition. This mini progress report will only focus on GUI changes. The main progress report to be published tomorrow will cover the rest.

First up is, obviously, the actual transition from the user interface toolkit WxWidgets to Qt. This transition has been a long and time consuming process, but it added a host of new functionality, both visible and not so visible. A non-exhaustive list of the transition pull requests, the main of which can be found here is:

  • – Appveyor and Travis now build with Qt project, thanks to hcorion. This switches the nightly builds, and eventually the linux builds, to use the Qt interface.
  • – Made some design changes to the GUI, such as progress bars and taskbars now showing percent completion and slight improvements to the Vulkan/DX12 adapter selection box to make it easier to use.
  • – Added support for layouts, which allows you to move your docked widgets as you please.
  • – Added support for booting games in fullscreen.
  • – Added a Welcome screen to point new users to the Quickstart setup guide.
  • – Support for themes! An example can be found further down.

As with all major changes, this caused a few hiccups, but they are being worked on as they are found, and issues can be reported here.

This transition to Qt brought a wave of GUI improvements with it. First up was a recent games list, implemented in #2843 and #2847. This saves the last nine games launched in an easy to use list that has hotkeys, so that those with large libraries, or those using Blu-ray disc games, can easily launch their favorites. The list can be frozen, though items cannot yet be pinned.

Second was a new viewing mode and a tool bar. The game list can now be viewed as a simple grid:

Continue reading RPCS3 Has Moved to Qt

Progress Report: May 2017


May was a very eventful month for RPCS3 as we saw significant core and performance improvements. The goal of this progress report is to highlight some of the more notable or interesting developments of the project. The report will start by showcasing a selection of games that were improved in one way or another. Thereafter we will summarize what work each contributor did this month.

Game Compatibility: Game Status

Game Compatibility: Monthly Improvements (April 2017)

kd-11 joined Nekotekina’s Patreon

You can read about this in more detail here.
The short version is that kd-11 is an extremely talented graphics developer who has helped RPCS3 since January of 2016. Without his work RPCS3 would not be here today. It goes the other way around too of course, without the significant core accuracy and performance improvements by Nekotekina we would not be here today either. Therefore it make sense for two great minds to join forces. With the generous Patreon support kd-11 will be able to acquire new hardware for development and testing. On the short term purchasing list is a modern NVIDIA GPU, a modern AMD GPU, and possibly a Skylake+ laptop of some kind. This would allow kd-11 to fix specific issues on these platforms. Not many people use RPCS3 with integrated Intel graphics but they are technically fast enough (for now at least). The problems lie in very very buggy Intel graphics drivers which is why it would help to have direct access to it.

Check out this video below, it highlights some games that saw performance improvements from “the secret build” which was a huge general graphics rework, with focus in particular on Vulkan.

Continue reading Progress Report: May 2017

Progress Report: April 2017


April was a very eventful month for RPCS3. The goal of this progress report is to highlight some of the more notable or interesting developments of the project. The report will start by showcasing a selection of games that were improved in one way or another. Thereafter we will summarize what work each contributor did this month.

Since the last progress report, approximately 18 authors have made 104 commits, added 9,621 new lines of code and deleted 1,904 lines of code.

Game Compatibility: Game Status

Game Compatibility: Monthly Improvements (April 2017)

Persona 5

This is the last real PlayStation 3 game and it also happens to be highly popular. Let us take a look at a few things.

People often ask what kind of CPU is needed to run this game. The answer is that no CPU is truly fast enough right now, but if you enjoy playing games at 10 FPS or so, then feel free to get a CPU with a lot physical cores. See the screenshot below for the motivation behind this statement. Of course in the future when RPCS3 performance is improved such an extreme CPU likely will not be required.

Continue reading Progress Report: April 2017

Progress Report: March 2017

March 2017 beat the previous record set by February 2017 as the most eventful month in the history of the project. So much happened that even this colossal progress report can only begin to scratch the surface.

First of all, lead RPCS3 developer Nekotekina reached the $1000 goal on Patreon, securing his long time commitment to work on RPCS3 full time. This was the direct result of the massive amount of attention the project got when two popular games were drastically improved this month. First, Demon’s Souls went from crashing in zero seconds to going ingame and almost being playable. Second, the cult classic Catherine received significant performance improvements and now runs with practically perfect graphics and performance. The two videos of these games received over 200 000 views on the official RPCS3 YouTube channel. Even the famous YouTube personality, TotalBiscuit was impressed with the progress, and the killing of the notorious Demon’s Souls boss Vanguard in RPCS3.

Of course the hundreds of videos posted by the community on YouTube, different forum posts, Reddit submissions, and so on, have contributed greatly to the massive growth of the RPCS3 community. In fact, new developer Inviuz aka Numan was one of these people who recently discovered RPCS3, spent a lot of hours reading the code and debugging Demon’s Souls, and finally getting it to boot for the first time. Oh and Red Dead Redemption also went to the main menu thanks to this, and some graphics improvements by kd-11. This showcases one of the most important strengths of RPCS3 being free and open-source software and it is very likely that more and more people will join the project in the future and contribute changes both big and small.

Since the last progress report, approximately 17 authors have made 115 commits with 6,831 lines of code added, and 4,471 lines of code deleted. And some significant improvements are still in the pipeline but have yet to be merged.

The progress report is mainly split into three different parts. First we will take a closer look at what each pull request this month did, and show a few practical examples. Thereafter we will take a look at a selection of some interesting games that were improved this month, though this is just a small slice of the hundreds upon hundreds of games that received small or significant improvements this month. Lastly we will take a look at some of the upcoming changes before rounding of this monthly progress report.

Continue reading Progress Report: March 2017

Progress Report: February 2017

February 2017 was one of the most eventful months in RPCS3 history. Earlier this month we reached the first Patreon goal of $500, thus ensuring that Nekotekina can continue to work on RPCS3 full time for the time being. A total of 17 authors have pushed 127 commits to the master branch, with 9882 lines of coded added and 6575 lines of code deleted. This represents several hundred, if not thousands hours of work on the project, and it really shows.

In this progress report we will take a look at what each person has been working on for the past month, and highlight some of the more noteworthy changes. This is however far from a complete list of contributions and improvements. Several people in the community have tested hundreds of games, reported several issues, made YouTube videos and supported people on Discord. Nekotekina, kd-11, and the rest of the RPCS3 team would like to thank everyone for their contributions to the project.

Continue reading Progress Report: February 2017