Shader compilation stutter is nothing new to most emulator users, especially on RPCS3. However it is worth clearing up some misconceptions that go around regarding how RPCS3 shaders work. I’ll try to quickly go over the history of shader compilation on RPCS3 and hopefully explain why the shader compilation stutter appeared and why some people believe RPCS3 did not have shaders before.
Shader complexity and custom vertex fetch
In early 2017, I embarked on a task to remove the very expensive vertex preprocessing step from the CPU side of RPCS3. This basically meant implementing all those custom vertex types and vertex reading techniques to the vertex shader and providing only raw memory view that the ps3 hardware would be viewing. This greatly improved RPCS3 performance, more than tenfold in some applications. This change is what made RPCS3 usable for playing real commercial games with playable framerates without needing HEDT system. However, the new fetch technique increased the size of the vertex shader and added a complex function to extract vertex data from the memory block. This made the graphics drivers take very long to link the programs, even without optimizations, likely due to use of vector indexing, switch blocks and loops with dynamic exits. Extra operations including bitshifts and masking were also needed to decode the vertex layout block. The code runs very fast, but the linking step is very slow. A shader cache system already existed before and if you ran an area for the first time, there was slight microstuttering that some users did not notice; its this stutter that got much worse. The solution to this: preload the shaders so that you don’t need to compile them next time. This lead to the infamous “Loading Pipeline Object…” screen and the “Compiling shaders…” notification.
There are several challenges to tackling RSX shaders. First, the RSX is not a unified architecture like most programmers are used to today. It has separate vertex and fragment pipelines, both with their own separate ISA. They are also very limited and larger programs or more complex programs can result in very messy binaries. One of the largest problems is that the bytecode itself does not contain all the information required to run the program, extra configuration is configured via registers as draw calls are passed in. A good example is that the TEX instruction does not differentiate between texture types, but a texture configuration register exists that allows using the same program to read 1D, 2D, 3D, or CUBE textures as well as their shadow comparison variants. This means you can only know the generated shader once the texture register has been set up. There are other examples of things like these that make it so that you need the game to set up the program environment before the program itself is compilable. Continue reading Eliminating Stutter with Asynchronous Shader Implementation!
One of the most anticipated features has just been added to RPCS3! High resolution rendering allows users to play at resolutions far exceeding what the PS3 could handle. If you thought your favourite PS3 games were starting to look a bit dated, just wait until you get to experience them in up to 10k! Although, we doubt many users will have the setup necessary to benefit from 10k today, emulation is all about preserving for tomorrow.
The video above showcases various popular titles with a side-by-side comparison from 720p (native PS3 resolution) to 4k rendering with 16x anisotropic filtering in RPCS3. The difference is quite incredible in titles that have high quality assets. There is a lot of detail that just wasn’t visible at 720p. This feature is available for every PlayStation 3 game running on RPCS3. The only exception being that it does not yet work with Strict Rendering Mode. Games that require this setting will have to wait a bit longer before they can benefit from high resolution rendering.
Rendering a modern PC game in high resolutions such as 4k, while beautiful, is quite taxing on your hardware and there is often a massive hit in performance. However, since most of the workload for RPCS3 is on the CPU and GPU usage is low, there is a lot of untapped performance just waiting to be used. All processing is done CPU side, and as far as the GPU is concerned it is simply rendering 2006 era graphics (yes, the PS3 is 11 years old now). We’re happy to report that anyone with a dedicated graphics card that has Vulkan support can expect identical performance at 4k.
Anisotropic Filtering (AF)
High resolution support wasn’t the only thing that was added in this update! Another reason for such a massive upgrade in visual fidelity is having full 16x AF support. This greatly improves how textures can look, especially when viewed at an angle. Take a look at the below screenshots of Ni no Kuni as an example of the default AF vs forced 16x. The difference is especially noticeable on the ground inside the gate.
Ni No Kuni with default AF (left) and forced 16x AF (right)
Greetings. I am kd-11, graphics developer for rpcs3 with a mid-month update on latest developments on the emulator.
As many are already aware, a lot has been going on lately with the new changes to the RSX (the PS3 GPU) emulation, dubbed vertex rewrite. This change moves a lot of vertex processing duties from the CPU to the GPU where they rightly belong and as a result there are massive performance gains especially with OpenGL but also with Vulkan in geometry heavy scenes.
Most if not all users are probably aware by now, but dedicated graphics cards exist on a physically separate board. This means data has to be moved to and from it through the PCI-E bus which is quite fast. However, while it is high bandwidth, it is also high-latency. That means you cannot just send something over there and expect to get it immediately available for the next draw call. Instead, the GPU has to wait for data to be prepared and then signaled that data is ready for processing before drawing begins. This is a general simplification, but it helps illustrate the point. The RSX on the PS3 doesn’t work the same way however. It has near direct access to the XDR main memory on a PS3 and ‘pulls’ data directly from main memory as though it were local memory. It is somewhat similar to integrated graphics memory in this case. That means data is not ‘pre-packaged’ for transport to the PS3 GPU since the memory is virtually unified from the point of view of the RSX. When using Vulkan, drawing is not scheduled until the whole command queue is flushed mitigating the impact of transfer since data will likely have been uploaded beforehand, but for OpenGL this was a big bottleneck.
The second issue was that the emulator was doing a lot of computation on the CPU on how to read vertex data from main memory, essentially pre-packaging the data into formats easy for GPUs to use. This is a very slow process and also very memory intensive (hence the ‘Working buffer not enough’ crashes). Enabling a debug overlay with the old method shows some games taking up to 200ms to prepare vertex data for one frame (Hellboy: The Science of Evil). This is obviously not optimal. The impact could be lowered by using more threads for vertex processing, but with the number of threads already needed to emulate the PS3’s multi-core processor, it was a problem. Spawning 8+ vertex processing threads reduced the time spent processing vertices, but cost other threads to starve and performance would drop significantly. The solution was to shift the work to the GPU instead and not touch it in any way. Just copy the data block and the GPU could fetch the data it needed for itself, mimicking the behaviour of the real hardware.
Nearly six months after the Patreon launched the RPCS3 team have finally improved RPCS3 on Linux to the point where it has reached compatibility and stability parity with Windows. Thanks to the hard work by hcorion we can finally start to provide pre-compiled binaries in the form of AppImages for easy installation on your favorite distribution.
What Was the Issue?
There were a lot of problems. Back in January, quite literally nothing was working. RPCS3 would crash instantly upon booting any game, if the program itself would even start at all. Moreover, additional functionality like the debugger, framerate counter, and firmware installer were completely broken too. While many of these auxiliary issues were quickly identified and fixed, the fact of the matter was that almost every game would hang after running for a few seconds. This turned out to be much more difficult to fix. This was caused by several different bugs in thread synchronization which were fixed continuously in the past few months. Finally, one last relatively small commit in early April fixed the last bug and suddenly RPCS3 on Linux went from basically nothing to Demon’s Souls and [redacted]. Or so we though, but we quickly found out that the LLVM recompiler was completely broken for a lot of users who just got completely nonsense errors. We encountered strange and esoteric bugs and oddities about LLVM and how various Linux software, including the Mesa drivers, were using it. These problems made RPCS3 unusable for a lot of people. A lot of false flags and red herrings later the bugs were fixed not by changing any code, but by using rare compiler flags. RPCS3 on Linux is now working as intended for everyone, including AMD and Intel graphics users with modern Mesa. Even Vulkan with Mesa is now working!
PlayStation 3 Games on Linux
Below are some popular PlayStation 3 games showcased running on Linux. Performance is about the same as on Windows, perhaps even a few percent better in some very intensive games like [redacted]. But take note: These images were either captured on a laptop with a very old i7-2670qm CPU, or a fast desktop with a i7-4770 CPU.
June has been an exciting month for RPCS3. Quite a few new features have been added to the emulator, with a healthy focus on the somewhat outdated GUI. This post will break down some of changes brought by Qt, as well as new features introduced since the transition. This mini progress report will only focus on GUI changes. The main progress report to be published tomorrow will cover the rest.
First up is, obviously, the actual transition from the user interface toolkit WxWidgets to Qt. This transition has been a long and time consuming process, but it added a host of new functionality, both visible and not so visible. A non-exhaustive list of the transition pull requests, the main of which can be found here is:
– Appveyor and Travis now build with Qt project, thanks to hcorion. This switches the nightly builds, and eventually the linux builds, to use the Qt interface.
– Made some design changes to the GUI, such as progress bars and taskbars now showing percent completion and slight improvements to the Vulkan/DX12 adapter selection box to make it easier to use.
– Added support for layouts, which allows you to move your docked widgets as you please.
– Added support for booting games in fullscreen.
– Added a Welcome screen to point new users to the Quickstart setup guide.
– Support for themes! An example can be found further down.
As with all major changes, this caused a few hiccups, but they are being worked on as they are found, and issues can be reported here.
This transition to Qt brought a wave of GUI improvements with it. First up was a recent games list, implemented in #2843 and #2847. This saves the last nine games launched in an easy to use list that has hotkeys, so that those with large libraries, or those using Blu-ray disc games, can easily launch their favorites. The list can be frozen, though items cannot yet be pinned.
Second was a new viewing mode and a tool bar. The game list can now be viewed as a simple grid:
As you can see in the screenshot provided by user Talkashie(which also shows a custom made theme!), the new toolbar allows for searching of large libraries, as well as directly filtering the list by game types. An example being that you can filter out audio/video apps and home apps, or you can choose to view only HDD games. Thanks to user rutantan for creating the very nice button icons in the toolbar.