Welcome to the January 2019 progress report! This month saw significant improvements to the core components of RPCS3 such as the introduction of multithreaded cache compilation for the SPU LLVM recompiler, reimplementation of the graphics framebuffer management, overhaul of the audio backend and much more. We also saw RPCS3’s version bump to 0.0.6 to better showcase the state of the emulator.
In addition to the following report, further details of Nekotekina and kd-11’s work during January and upcoming contributions can be found in their weekly reports on Patreon. This month’s Patreon reports are:
Table of Contents
Over at our forums, a few user and moderators have come together to acquire and test titles that have not been tested recently. Our developers have also been hard at work debugging niche issues that seem to prevent a few titles from progressing ingame. The results of their targeted efforts gives us veracious compatibility statistics from which we can see a big decrease in Intro and Loadable categories and a corresponding increase to Playable and Ingame categories.
On Git statistics, there have been 8179 lines of code added and 5073 removed through 36 pull requests by 9 authors.
Major RPCS3 Improvements
Before we get started, here’s a video showcasing some of the major improvements made to RPCS3 from December 2018 to January 2019:
Multithreaded SPU LLVM Compilation (#5586)
In January, Nekotekina implemented multithreaded SPU cache compilation during startup as well as runtime. This greatly improved startup times when using the SPU LLVM recompiler, especially for CPUs with high cores/thread counts. In Red Dead Redemption, a title known for having an excessive number of SPU programs, the compile time of about 75,000 objects on an Ryzen R7 1700 @3.9GHz went from 12 minutes 25 seconds to just 1 minute 34 seconds! Users can expect much faster startup times when using the SPU LLVM recompiler which was previously a big complaint with the experimental compiler. This improvement should also benefit SPU ASMJIT recompiler although the improvements would manifest as more stable framerate as opposed to an increase in FPS.
To implement the above feature, Nekotekina also improved adjacent logic such as SPU ubertrampolines and refactor the spu_runtime class. SPU ubertrampolines are a simple primitive performing a binary search over multiple program candidates in order to select the right one without knowing which one is currently loaded into memory. While the output was generated effectively, even with the SPU ASMJIT recompiler the output was simply not generated fast enough to meet RPCS3’s needs. Hence, Nekotekina implemented SPU ubertrampolines generation using raw assembly. Also, spu_runtime was refactored into a common class unifying the code used for both the ASMJIT and LLVM recompilers in this regard.
In December 2018, kd-11 had aimed to merge two major sets of changes to the RSX emulation. The first being improvements to the framebuffer object (FBO) management and the second being improvements to double assignments, conditional execution and shader inputs which was dubbed GT Fixes after the major series that benefited from it. While we’ve covered GT Fixes in last month’s report, this month we’ll be going over the FBO improvements which consisted of many improvements, the primary two major components being the removal of unnecessary framebuffer attachment for attachments that are likely unused and the rewrite of memory inheritance transfers.
The first improvement reorganised the framebuffer object management to avoid touching memory (a technique used to cheaply track memory usage) if a command was set up such that the data would not have changed. This allows RPCS3 to avoid generating framebuffer attachments for attachments that are likely unused and prevents memory corruption when Write Color Buffers/Write Depth Buffers is not used since we rely on attachment memory tags to watch for events. Previously, these corruptions affected multiple games such as Skate 3 where performance would abnormally drop below 1FPS while ingame and crash the emulator when entering skater edit screen.
Soon after, kd-11 rewrote the memory inheritance transfers to implicitly guarantee correct results even when Strict Rendering Mode disabled by invoking a memory barrier when actively reading from an unsynchronized texture. This simplified the memory transfer operations and allowed many games that currently require Strict Rendering Mode to display accurate graphics without the option enabled. Other notable improvements were the fix to the texture cache deadlock due to an incorrect internal size computations and increase in maximum number of compute invocations when using the Vulkan renderer from 120 to 1024. This was a result of manual benchmarking of various games demonstrating that it is completely possible to trigger a very high number of readback operations during loading screens.
For those of you wondering why this pull request was not merged in December 2018 itself, as part of kd-11’s original plan, the FBO improvements was slated to precede GT Fixes. However, while these changes did fix a variety of issues across a plethora of games, it also caused a large number of games to regression in multiple ways causing kd-11 to undertake a deeper analysis that what was initially envisioned. The main cause of these regressions were due to the large number of permutations possible on the framebuffer setup. This resulted in over 20 days of back and forth between kd-11 and our community of testers spiraling into a cycle of reporting and squashing bugs. Thanks to the continuous testing of almost every game and kd-11’s resilience to fix all the issues reported, FBO improvements became the most commented pull request in RPCS3’s history!
However, all efforts paid of as a plethora of games benefitted with improved graphics or fixed crashes. MotorStorm and MotorStorm Pacific Rift/3D Rift no longer crash frequently, Tekken Tag Tournament 2 shows accurate colors on clothes, Midnight Club: Los Angeles now renders rain effects, Saints Row IV now displays graphics when using the OpenGL renderer, Ratchet & Clank: QForce no longer suffers from the ghosting effect, Saint Seiya: Brave Soldiers has fixed depth buffer and much more. While the below images are a small selection of improvements, check out the improvements video linked above to see the improvements in many other titles!
Not one to slow down, kd-11 quickly began working on a follow-up pull request with various fixes for issues that were identified from the FBO improvements. These fixes included the rewrite of the index buffer base offsets to preserve the use of u16 index buffers for consistency and reduced memory footprint, a partial reversal of changes made in the FBO improvements affecting asynchronous shaders compilation, fixed red-blue color inversion when using the OpenGL renderer, avoiding a potential deadlock in RSX FIFO control, improving alpha-to-coverage transparency with a better approximation technique and rewriting the vertex attribute divisor logic.
These changes targeted various components of the RSX emulation and the resulting synergy was further improvement to graphical rendering and color accuracy. Midnight Club: Los Angeles saw tree shadows rendering for the first time and an improvement to foliage while MotorStorm games now display tire tracks accurately.
Audio Buffering & Backend Improvements (#5456)
When it comes to audio, RPCS3 has official support for 4 backends. On Windows, XAudio2 is available while ALSA and PulseAudio is available on Linux. Finally, OpenAL is a cross-platform audio backend that works on both Operating Systems. While the first three backends saw improvements overtime, the OpenAL backend was largely ignored and consequently was deprecated to a Do Not Use category (much like DirectX 12). One of our contributors, ruipin, evaluated the current audio setup and the various issues plaguing it and decided to undertake a large-scale refactoring of how audio is handled by RPCS3.
Issues relating to audio always posed a challenge to debug as many issues faced by users were the result of insufficient CPU resources. Since threads relating to RSX, SPU and PPU have higher priority, audio threads are delayed during bottlenecks leading to crackling or complete loss of sound. So in cases where the games couldn’t hit the target FPS, audio issues were prevalent even when gameplay was tolerable. However, titles like NieR had stuttery audio even when running at full speed on an i9 9900K. This revealed a deeper issue of games not being able to deliver audio in a timely fashion even when sufficient hardware resources were available. The PlayStation 3 provides audio in fixed intervals of 5.3 milliseconds but due to various factors, games could take longer than this once in a while, causing noticeable crackle and stutter.
To address these issues, ruipin implemented two new features, audio buffering and dynamic audio period. Together, these features aim to avoid intermittent stutter in games and provide users a smoother gameplay. To better understand these features let’s take a look at what each does.
Audio buffering is pretty straightforward affair with the audio being buffered into memory and played with a configurable delay (default being 100 milliseconds). However, dynamic audio period is where the magic truly happens. With it enabled, if the audio logic detects that the game is falling behind on audio mixing, it will dynamically increase the audio period to give the game some extra time. On the other hand, if the game seems to have audio ready earlier than necessary, the audio logic will dynamically shrink the audio period in order to maintain the average audio buffer duration and consequently ensure synchronization. This can continue with (almost) no stutter as long as the buffer does not run out. The below clip showcases the benefits of these changes:
However, ruipin went a step further and also introduced a time stretching algorithm which can be used to smooth out the crackles or stutter if the buffer fill level is falling dangerously low. This is meant to smoothen out rare stutter caused due any situation such as a background task kicking off or a loading screen using more CPU resources that usual. However, there is no pitch correction with this algorithm and hence if a game constantly has issues maintaining a full audio buffer (common on low thread count CPUs), this option will cause a huge reduction in audio quality. For this reason, this option is disabled by default.
On the other hand, audio buffering and dynamic audio period do not have such drawbacks and are enabled by default (on Windows) for all users to enjoy. However, these changes are not intended to be an end-all-be-all solution that magically fix audio in games that do not run at full speed. Your CPU needs to be fast enough to drive a full audio buffer every 5.3 milliseconds (on average).
Apart from these features, the OpenAL backend was also revived and brought up to the same quality levels as the XAudio2 backend. The entire cellAudio thread was also extensively refactored for easier maintenance and improvement in the future. All audio backends now automatically expose a list of capabilities so that the cellAudio algorithm can decide how to drive audio depending on what is supported. While audio buffering is enabled by default on XAudio2 and OpenAL backends (with a buffer duration of 100 milliseconds), PulseAudio and ALSA do not currently support these new features. Since the ALSA backend is the default option on Linux, users are required to enable the OpenAL backend to take advantage of these new features. Finally, Megamouse jumped in and added all relevant options to the GUI for the features mentioned above. You can find these options under the Audio tab of the Settings menu.
In January, kd-11 announced that he would implement an On-Screen Keyboard (OSK) directly into the Native UI to allow users to enter text using the controller itself and not reach out for an actual Keyboard. However, a number of issues were identified with the existing Qt OSK dialog and Native UI that posed a challenge to the native On-Screen Keyboard implementation. Fortunately, our resident Quality-of-Life developer, Megamouse, decided to fix these issues to allow for a seamless integration of kd-11’s native OSK implementation.
The first set of improvements targeted bugs across various dialog systems in RPCS3 (Qt and Native UI):
- A regression from the December’s controller configuration improvements that caused games to sometimes freeze which opening or closing Native UI interfaces was fixed.
- Next, titles such as Diablo 3 would assume that the dialog was accepted, even when it was actually cancelled. To address this, Megamouse ran test on the PlayStation 3 to determine how the original hardware handled cancellations and implemented the same accurately in RPCS3.
- After this, there was an issue where the game would continue to recognise input from the controller even when the text input dialog was open. With the Qt implementation, this did not pose much of an issue as all text was fed with a keyboard while input to the game was given with a controller. But once the native OSK is implemented, games incorrectly recognising input that was meant for the OSK would cause unwanted issues for users. To address this, proper system dialog input interception was implemented to discard any input made to the game while a system dialog is active.
- A missing check for empty strings was implemented which fixed a crash when users tried to confirm an input dialog without entering any text. Some games such as Diablo 3 will now use this check to identify whether the string is empty and notify the user accordingly after closing the dialog.
- Finally, few constants and checks were added to improve emulation accuracy.
The above improvements fixed issues affecting multiple titles such as Lair, South Park: The Stick of Truth, Sengoku Basara 4, Diablo 3 and Godzilla. However, a number of other titles such as Skate 3 and NieR began to crash unexpectedly when interacting with game dialogs. Investigating these regressions, Megamouse discovered that these issues were not true regression but were in fact various unimplemented features that were brought to the forefront thanks to the improved accuracy of the message dialog system. Once these missing permutations were pinned down, the features required to address “regressions” were summarily implemented.
To aid in debugging and facilitating further improvements, Megamouse also improved the error code logging for all system dialogs and separated the existing OSK dialog code into its own class. This will also help the smooth implementation of the Native UI On-Screen Keyboard. Finally, the HLE implementation of OSK dialog was improved by reimplementing state checks (open, closed, aborted) and making sure that data isn’t written to unallocated memory whilst passing the user input to the game’s buffer. This was a longstanding issue that was finally addressed issues in titles such as Class of Heroes 2G.
The third installment in the Skate franchise, Skate 3, is now fully playable! This title previously suffered from major slowdowns in the free skate mode but thanks to improvements made by kd-11 to the framebuffer management, this issue has now been resolved. Check out the performance of this title on RPCS3 in the video below:
Dragon Ball Z: Battle of Z
One of the last titles in the Dragon Ball series to grace the PlayStation 3, Dragon Ball Z: Battle of Z has now become playable! This title also benefited from kd-11’s framebuffer management improvements which fixed a regression that caused the game to crash at the title screen.
Ni no Kuni: Wrath of the White Witch
Ni no Kuni: Wrath of the White Witch is now completely Playable on RPCS3 at a good framerate even on mid-range systems thanks to recent improvements from Nekotekina and Eladash. While Ni no Kuni has run well on good systems for a long time, the recent graphical, performance and stability improvements have finally pushed it over the edge into the playable category.
No More Heroes: Heroes’ Paradise
While this console exclusive title had good performance and graphics, it suffered from a bug preventing users from being able save their progress. Thanks to the recent improvements to save data handling in RPCS3, this issue has been fixed and No More Heroes: Heroes’ Paradise is now fully playable! A user from our forum, MilkManEX, managed to finish the game from start to end on RPCS3 without any major issues.
Onechambara Z: Kagura with NoNoNo!
Did you have your PlayStation 3 set-up in the living room? Were you denied the chance to hack down zombies whilst wearing a bikini because of this? Well then, your wait is over. This month, Onechambara Z: Kagura with NoNoNo! was found to progress ingame! While the game has stable performance and graphics, it does suffer from crashes during boss battles keeping it from being playable.
As is customary, here’s another weeb school/mecha/action/shooter title that is now fully playable on RPCS3.
Ragnarok Odyssey Ace
This JRPG previously suffered from significant graphical glitches keeping it from being playable. However, with recent improvements to RSX emulation, these glitches have been fixed allowing this title to be fully playable.
WWE All Stars
This critically acclaimed WWE spin-off is now full playable! Previously this title would crash on the loading screen before going ingame. With the recent stability improvements these issues have now been fixed.
Absolute Supercars & Ferrari titles
Thanks to the RSX vertex base type improvements by eladash, multiple racing titles from Eutechnyx studio (Absolute Supercars, Ferrari Challenge: Trofeo Pirelli and Ferrari: The Race Experience) went ingame for the first time this month. However, these titles still suffer from few graphical issues and low performance keeping it from being playable.
Elevator Action Deluxe
Thanks to kd-11’s improvements, the last remaining bug affecting the HUD was fixed, allowing the game to be fully playable now!
Vampire Rain: Altered Species
Thanks to the recent stability improvements made to the emulator, this survival horror title went from Loadable to Ingame!
There have been numerous other pull requests merged during the month that couldn’t make it to the Major Improvements section. We have collected a list of all such improvements here, and attached a brief overview to each. Make sure to check out the links provided for them if you are interested, as their GitHub pages usually uncover further details, as well as the code changes themselves. To see the complete list of pull requests directly on GitHub, click here.
5453 – Changed the cache location to a dedicated folder. For the PPU module cache a new hash-based location was implemented similar to SPU cache. Also, the structure was changed to decrease size of each module resulting in a bump in the PPU cache version; Fixed a minor race condition in cellVdec and other misc. improvements;
5572 – Changed audio backend priority on Linux to ALSA. Also, fixed few cases of error spam with cellMsgDialog on CELL_OK;
5586 – Implemented support for parallel compilation of SPU code, both at startup and runtime; Removed the obsolete “SPU Shared Runtime” option; Refactored spu_runtime into a common class for both ASMJIT and LLVM recompilers; Implemented SPU ubertrampoline generation in raw assembly for the SPU LLVM recompiler. See coverage in major improvements here;
5599 – Improved compilation speed and performance in a few games when using the SPU ASMJIT recompiler;
5614 – Fixed a rare crash in cellSaveData by retrying to move directories on FILE_ACCESS_ERROR.
5509 – Updated Vulkan descriptor pool init sizes to prevent descriptor pools from running out of resources with only 25% occupancy; VK_WHOLE_SIZE will not be used for the null resource bind since most drivers have uniformBufferRange limits that are much smaller; Clamps swapchain resources to allowable surface extents;
5565 – Rewrote the index buffer base offsets to preserve the use of u16 index buffers for consistency and reduced memory footprint; Partial revert of a change in the above pull request affecting asynchronous shaders; Fixed red-blue color inversion when using OpenGL; Avoid potential deadlock in RSX FIFO control; Reimplemented alpha-to-coverage transparency with a better approximation technique; Rewrote vertex attribute divisor logic. See coverage in major improvements here;
5612 – Set preference for the slower FIFO mode for VSync instead of Mailbox mode as the latter is broken on recent Nvidia drivers when using Vulkan on Windows. Also implemented fence timeouts to detect dead renderers if a crash happens in submit before vkGetFenceStatus can return relevant data.
5618 – Fixed a regression from the above pull request where game window couldn’t enter fullscreen mode on Linux when using Vulkan.
5540 – Allowed invalid NV4097_NOTIFY context to pass execution that addressed an incorrect exception thrown in Ni no Kuni: Wrath of the White Witch!
5522 – Allowed more than one SPU MFC list to execute in a tag group when one list is stalled which now supports multiple list stalling and unstalling in a tag group. This improvement now matches RPCS3’s support for MFC list stalling to that of an actual PlayStation 3;
5435 – Improved LR event handling of SPU GETLLAR, PUTLLUC and PUTLLC transactions resulting in higher accuracy. Also, reduced lifetime of vm::writer_lock in PUTLLC transactions for non-TSX CPUs, replaced cache line spinlocks with a new passive range lock in MFC PUT DMA transfers and other improvements which resulted in a significant performance increase in multiple titles for CPUs without TSX;
5568 – Patch ppu main thread priority in cases where the PPU main thread priority specified by the executable file is invalid the default value is used like an actual PlayStation 3. With this improvement, White Album 2 went from Loadable to Ingame!
5585 – Allowed vertex base type 0 to pass execution which was not allowed previously due to its unknown behaviour. With this mode correctly implemented, games such as PixelJunk Eden, Absolute Supercars, Ferrari Challenge: Trofeo Pirelli and Ferrari: The Race Experience were able to reach ingame!
5579 – Fixed spurious failure of try_rlock/wait methods in lv2 semaphore by restarting the atomic loop and waiting for condition confirmation instead of aborting prematurely on failed CMPXCHG;
5595 – Improved accuracy by making sys_ppu_thread_isjoinable return an error with the proper error code when trying to join the main thread;
5605 – Fixed potential crash in begin_occlusion_query() while closing games;
5502 – Improved compatibility of DECR mode by fixing the 512MB memory area allocation by sys_memory in 1MB pages mode. This fixed WSC Real 09: World Snooker Championship and allowed the game to progress ingame! However, further improvements are necessary for the DECR mode allocations to be fully accurate.
5493 – Implemented an “Exit game?” dialog that lets you abort the exit process and also warns about loss of progress. A similar dialog will also spawn when you boot a game while another game is running. The dialogs can be disabled via “Don’t show again” checkbox and re-enabled in the GUI settings. All other info dialog settings were added to the same location as well;
5491 – Improved the builtin debugger to now only allow hex values with length 8 in the “Go to address” pop-up. Also improved handling of the “No Thread” option to prevent some crashes;
5531 – Fixed the macOS Travis builds and upgrades Travis to use Xcode 10.1. Also, homebrew packages for macOS are now installed using travis.yml instead of build-mac.bash;
5528 – Made dock-widget title bars optional by adding a new checkbox to Menu & View;
5529 – Fixed an issue in cellGameGetSizeKb during installation of a game when a directory wasn’t found by returning size=0 instead of an error. Allowed Tom Clancy’s H.A.W.X 2 to progress from Loadable to Intro;
5548 – Added error code log messages to cellSaveData in order to improve debugging of issues relating to game saves;
5574 – Improved the game list by using the the correct patch version for PS3 disc games. Previously, the game list only showed 1.0 for all disc games and the actual version was only found in game data entries;
5588 – Fixed an obscure error in the shader compilation dialog that displayed the wrong progress bar status. Also, set the error for the notorious underrun message to log level warning;
5587 – Changed the default location for all per-game custom configurations to config/custom_configs/ and modified their naming format to include the game ID as well such as config_ABCD12345.yml. This feature is backwards compatible with the old locations and old config files will still be considered but the new config files will take precedence over the old ones. Also, added the option to boot games either with their custom configurations or with the global configurations.
5456 – Refactored the audio backends and introduced audio buffering and time stretching which helps avoid intermittent stutter (currently only for XAudio2 and OpenAL). See coverage in major improvements here;
5539 – Fixed regressions caused in the above pull request.
5461 – Fixed issues with initial trophy set-up by accurately simulating the PlayStation 3’s behavior through a mix of forced granularity and timeouts for games that don’t check the progress while processing the trophy entries. This improved compatibility for games such as Diablo 3, Odin Sphere and Mahjong * Dream C Club;
5525 – Updated the Readme file to show Qt5Qml as a dependency when using Arch Linux;
5619 – Bumped the RPCS3 version to 0.0.6.
5558 – Fixed a bug where the GUI option for removal of shader cache, PPU cache and SPU cache incorrectly pointed to the old cache location.
5607 – Added appropriate styling to QDoubleSpinBox when using the Kuroi (Dark) and YoRHa stylesheets.
We hope you liked this report and look forward to the next! If you would like to contribute to the project, you can do so either by contributing code, helping the community or becoming a patron. RPCS3 has two full-time developers working on it who greatly benefit from the continued support of the many generous patrons. In exchange, patrons also get special support over on our Discord server and get access to early updates directly from our lead developers. If you are interested in supporting us, consider visiting our Patreon page at the link below and becoming a patron, or join our Discord server to learn about other ways of contribution.
We’re always looking for dedicated writers to help us write these reports. If you have the skill, time and are willing to help, please apply here. Also, come check out our YouTube channel and Discord to stay up-to-date with any big news.
This report was written by HerrHulaHoop, DigitalDude555, Megamouse, eladash and GalCiv.