August has been an amazing month for RPCS3 as we crossed multiple new milestones. This month saw massive performance improvements to many AAA titles, accuracy and performance enhancements to SPU LLVM, support for C++ 2017, laying the foundation for macOS support and much more!
In addition to the following report, further details of Nekotekina and kd-11’s work during August and upcoming contributions can be found in their weekly reports on Patreon. This month’s Patreon reports are:
Table of Contents
The Playable category has finally crossed 1,000 titles milestone! Considering that this time last year, the Playable category was only a little over 400, it truly demonstrates the amazing pace of development. For all other categories, we can see the metrics moving in the right direction with the elusive Nothing category dropping by 1, with only 5 games remaining in it. For a more detailed look, you can view the compatibility history page to see exactly which games had their status changed this month.
On Git statistics, there have been 7,086 lines of code added and 4,298 removed through 127 commits by 21 authors.
Major RPCS3 Improvements
SPU Realms (#4920)
This month Nekotekina merged a very crucial update to the SPU LLVM recompiler, which fixed physics, sound and graphics anomalies in countless titles. Its changes were twofold: on one side, they affected how the SPU’s MFC (Memory Flow Controller) reaches out to virtual memory that isn’t mapped by the SPU, and on the other, it enhanced the accuracy of handling extended floats while using SPU LLVM.
The former works by exploiting how such MFC commands are usually issued. By default, accessing non-SPU assigned virtual memory is actually quite expensive, but since both the commands and their parameters are known to the recompiler and remain constant, these transfer codes can now be simplified and made to execute much faster. This improvement is currently only enabled for TSX-compatible CPUs, though Nekotekina has mentioned that it might be carried over to non-TSX CPUs in the future, if found feasible.
The latter however, is not a speed, but rather an accuracy improvement – meaning that when enabled, it does reduce performance. To put it simply, it modifies how extended floats (a simplified single-precision floating-point format used on the SPU) are handled, resulting in an accuracy that outclasses the ASMJIT recompiler’s, while maintaining comparable performance.
When SPU LLVM was first introduced in May, it had various stability issues which, at first glance, appeared to be unique to it, but were in fact also present with SPU ASMJIT in early 2017. These issues stemmed from the difference in how floating point calculations are handled by x86-64 architecture CPUs compared to the Cell’s SPUs. In order to appropriately handle such calculations, you had to simulate these calculations manually as done by the precise SPU interpreter. While this resulted in virtually perfect accuracy, it also proved to be quite expensive and degraded performance significantly. To mitigate them on SPU ASMJIT, kd-11 had implemented a quick workaround, which, by effectively discarding certain results (treating infinities and NaNs as zeroes), fixed most of these stability issues that were present without the severe performance degradation. However, as this workaround was not an accurate solution, certain problems still remained.
With the introduction of SPU LLVM and the massive performance benefit it brought, an opportunity to implement a new solution was found. Through the use of LLVM, it is possible to “adjust” the x86-64 architecture’s method of handling float calculations to match the Cell’s SPUs’. While this method still entails a performance penalty, the new implementation provides us with accuracy benefits similar to the precise SPU interpreter’s, while also enabling performance levels comparable to the current ASMJIT SPU recompiler.
The results of this improvement was clearly evident, as it addressed a wide variety of crashes and physics issues that users had encountered previously. Few known examples include fixing: muteness in Mirror’s Edge, Folklore and Guilty Gear Xrd REV 2; falling under the map in GTA IV; crashing when breaking boxes in Drakengard 3; crashes on shooting barrels, boxes and other destructible objects in Shadows of the Damned; ragdoll and throwable physics in Uncharted 1; invisible body parts / broken character models in Infamous 1 and Captain America; some graphics errors in GoW III; and many more.
Due to the this improvement bringing accuracy at the cost of performance, it has been retained as a toggleable option in the CPU tab of the settings menu. Users who wish to make use of this accuracy improvement can enable the “Accurate xfloat” option. Please do note that this option is only available when using the SPU LLVM recompiler.
AAA Graphics Fixes (#4973)
In the middle of August, kd-11 made major improvements to RPCS3, which most notably almost doubled performance in Uncharted: Drake’s Fortune, and fixed the majority of God of War III’s remaining graphical issues. If you haven’t already, watch the showcase video below which covers most of the notable changes.
But how was this done? Well, let’s start with the performance improvements. The biggest hindrance for poor RSX performance was synchronization between the Cell processor and RSX. And less notably, synchronization between RSX units. kd-11 began by testing a beloved classic; Uncharted 1, disabling all synchronization and forcing a resolution of 2560×1440. During these tests he noticed near double the performance, which clearly indicated a synchronization bottleneck.
After carefully identifying the major types of synchronization barriers on PS3, kd-11 set upon eliminating unnecessary stalling in the emulator and finished up with a massive ~70% RSX performance improvement. This greatly improved performance in games that had a RSX bottleneck which is why Uncharted: Drake’s Fortune saw such a massive performance improvement.
To check your RSX usage in a game, simply enable the performance overlay. If your RSX usage is very high, it’s probably safe to say that this change would’ve given you a bump in performance.
But the performance improvements didn’t stop there. kd-11 continued by making significant optimisations to RPCS3’s ZCULL synchronization tuning. By replacing arbitrary ‘cycles’ with real-world microsecond measurements to give more consistent results and tuning the algorithm to avoid unnecessary work until RPCS3 is certain the renderer has finished blocking tasks. This results in a major performance improvement in games with heavy ZCULL usage. Most notably, Skate 3 received a performance increase of around 70% bringing it even closer to becoming playable!
Moving on to rendering improvements, kd-11 implemented fixes for framebuffer memory tracking. This includes write cascading and memory aliasing contention where an address is used as both a color target and depth target at the same time with the contents being determined based on the state of configuration registers. As this algorithm gets more and more accurate with every game that is seen to violate some rules, we get closer to perfect graphics in many AAA titles. Also, the “yellow filter” effect affecting few games such as the Prince of Persia series when using the Vulkan renderer have now been fixed.
We’re sure there are plenty of other games that have not been mentioned here, benefit from these improvements. If you find such games that have improved or regressed, please notify us via appropriate channels.
Mouse Movement Binding (#4957)
After careful tweaking throughout the month, Megamouse finally implemented mouse to controller button binding! This is huge news to everyone who, for example, wanted to look around or aim in games the way they usually do in PC titles! While making the mouse-to-controller mapping possible, he also vastly improved on the ‘Basic’ mouse handler, which passes the user’s mouse directly and lets the game handle it natively, if supported. Some titles such as Unreal Tournament 3 allow natively using the mouse this way.
While these improvements do make virtually any game compatible with a mouse, it is, however, not without a fault. Since console games are usually (not-so-surprisingly) not designed with a mouse-controlled camera in mind, oddities may appear. As game developers usually utilize custom deadzones and use custom acceleration multipliers, Megamouse made it possible to tweak these variables with a handful of keyboard shortcuts, which can be found in the PR’s description. By fine-tuning these parameters, some games can reach near-perfect mouse-based camera control!
This however also means, that end-users will have to manually find the best settings on their own, matching both their games’ internal settings and their mouse’s sensitivity. Do note that with the sticks’ max. speed value being capped to 255, and games usually utilizing other techniques (such as directly changing the camera angle on reaching certain locations or auto-following the character), many titles will simply never respond the same way a regular PC game would with native mouse support. These games will either need a more robust solution or mods for this functionality to be ironed out further.
As far as current compatibility goes, there are currently two modes with which the user can hook up their mouse to a game. One is the aforementioned native ‘Basic’ mouse handler (which you can find in the I/O tab of the Settings menu), and the other is setting the Mouse Handler type to ‘Null’, then binding the mouse movements to the right stick instead. For this, you’ll need to click on one of the Right Analog’s directional buttons (in the Pad settings), then click-and-hold, while moving the mouse in the appropriate direction. Do this for all four buttons and you’re set. (Just make sure not to mix the mouse to controller mapping with the ‘Basic’ handler, as they will interfere.) Or alternatively, you can also map the mouse to other buttons, making, let’s say, an entirely new Souls experience possible!
This month, a user named @kvark from the gfx-rs team hinted on our GitHub that he was working on porting RPCS3 to macOS. Previous attempts at porting the emulator to macOS were unsuccessful, as Apple’s proprietary graphics API, Metal, suffered from various technical limitations. Also, such a port would require significant time and effort to develop and maintain a third graphics backend, which would be exclusive to a single platform.
However, with the introduction of Vulkan Portability libraries such as MoltenVK and gfx-rs, it is now possible to efficiently map the Vulkan API to native APIs (Metal). Using portability libraries completely negate the need for RPCS3 to maintain a separate Metal backend and instead allows us to focus on improving the Vulkan backend which will benefit all platforms. However, while this addressed the hard limitations, more work was required to be done to address technical limitations.
kvark then began the work of allowing RPCS3 to use gfx-rs using the MoltenVK extensions. Once the Vulkan Portability code-path was open, it was time to address limitations. The first major technical limitations were the complete lack of hardware support for texture format swizzling with the Metal API and absence of support for viewing texture buffers from shared memory. kd-11 worked around these limitations by implementing a software decoder for format swizzles and double-buffered heaps as fallback path when using Apple.
The next hurdle was a supposed limitation to the texture buffer size limit which was set to 16K, a limit that was simply too low for RPCS3. However, upon further investigation, it was found that the limitation was actually caused by the incorrect implementation of texture buffers in gfx-rs (the buffers were assumed to support only small 1D textures and not large heaps). As this issue was addressed on their side, kd-11 improved RPCS3’s check for viewable heap size of texture buffers which was previously set to a constant 64M. The porting of RPCS3 to macOS proved particularly beneficial to gfx-rs, as they were able to identify and fix many bugs throughout the process. Once all these issues were summarily addressed, the first signs of RPCS3 running on macOS was visible!
While this is a major step towards bringing RPCS3 to more platforms, the macOS port still has quite the way to go. At the current stage, the PPU and SPU recompilers do not work and users will have to use the respective interpreters. More work and research is required to allow the recompilers to work on macOS. Additionally, continuous builds for macOS are also not available as Apple only offers certain necessary C++ features behind their latest beta OS.
Army of TWO series
Army of TWO and Army of TWO: The 40th Day both became playable this month. These titles improved significantly with the introduction of the recent Asynchronous Shader Implementation made by kd-11. One user from our discord, Cheat_Codes_On_Life, finished Army of TWO: The 40th Day on RPCS3 and even played most of it in split-screen!
While this fan favorite PlayStation 3 title is still present in the Ingame category, MsLow, a user from our discord, managed to finish the game from start to end on RPCS3 without any crashes or game breaking bugs. However, Yakuza Kenzan! is still not categorised as playable due to micro-stuttering present even with high-end CPUs (i7-8700K @ 4.5GHz) and graphical issues such as exploding vertices.
Nonetheless, it allows us to appreciate how far RPCS3 development has progressed to be able to emulate such AAA titles with remarkable performance. As a celebration of his feat, MsLow made a fan trailer of the game which you can check out below!
Teenage Mutant Ninja Turtles: Turtles in Time Re-Shelled
Thanks to kd-11’s “AAA fixes” this month, the remaining graphical issues in the pirate ship stage has been fixed moving this title from ingame to playable!
Supersonic Acrobatic Rocket-Powered Battle-Cars
This PlayStation 3 exclusive, that served as the prequel to the famous Rocket League, is now fully playable!
Kamen Rider Battride War
This console exclusive has had a rocky history as it managed to get ingame early on, but froze during a particular mission. However, this game has now become fully playable and works extremely well! Do note, strict rendering mode needs to be enabled for enemies to render correctly.
50 Cent: Blood on the Sand
Thanks to our testers, 50 Cent: Blood on the Sand was found to be playable this month on RPCS3. You must fire your gun when you start a level or else your character won’t be able to move, but other than that the game runs well.
Sengoku Basara 4: Sumeragi
Although this console exclusive has managed to get ingame in the past, it had suffered from various issues since the beginning of 2018. However, with these issues fixed it is now fully playable with only a few minor graphical glitches!
GTI Club+: Rally Côte d’Azur
This PlayStation 3 exclusive did not progress for quite a long time, but finally manages to go ingame! However, the unstable frame rate and stuttering audio are keeping this title from being considered playable.
Monster Jam: Path of Destruction
Thanks to elad335’s improvements to RSX method registers, this console exclusive now finally goes ingame! As seen in the image below, the game does suffer from low performance and some fairly obvious graphical issues that keeps it from being playable.
Tears to Tiara Gaiden: Avalon no Nazo
This console exclusive now manages to go ingame and with a stable FPS. However, as seen below, it does suffer from graphical issues keeping it from being playable.
Tomb Raider: Underworld
Although not a console exclusive, the classic Tomb Raider: Underworld now goes ingame with seemingly stable FPS and correctly rendered graphics. Further testing is necessary will reveal if it is playable or not.
As always, this may not be a complete list of PRs or commits, nor does it necessarily list every single thing done by a given PR. For a full list, see PRs merged in August.
4920 – SPU Realms, see coverage in major improvements;
4975 – Allows sys_fs_open to identify and run split internal game files. This fixed a common issue faced by users when dumping games using multiMAN as it would split the files which were larger than 4GB. Also improves PS3 userspace memory allocations by correcting the base allocation addresses;
4987 – Fixed a regression to sys_vm_memory_map from the above pull request;
5002 – Minor improvements to virtual memory emulation;
5031 – Various coding improvements to support C++ 2017. The minimum compiler requirements for linux is consequently now GCC 7.3+ or Clang 5.0+ with Travis CI using GCC 8 compiler;
5033 – Fixed few bugs introduced by the above PR;
5041 – Improved vm::stack by allowing 4k-aligned allocations. Also, improves split files support in sys_fs_stat.
5008 – For the Vulkan renderer, newer Nvidia drivers do not expose ‘Immediate present mode’ of VSync without enabling the option in the Nvidia control panel. To work around this deficiency, fallback preference is given to ‘Mailbox present mode’ over ‘FIFO relaxed present mode’ to preserve performance closer to true ‘Immediate present mode’;
5013 – Improved the RSX texture cache implementation of handling false positives. This fixed the flickering bug noticed in Demon’s Souls and other titles!
5024 – Improve portability of the Vulkan renderer to macOS by implementing a software decoder for format swizzles and fallback paths for buffer types that may not support host access. See coverage in major improvements here;
4947 – Allowed the detection of RSX FIFO stack overflows and invalid commands;
4966 – Improves the speed and accuracy of RSX address translation by using an offset table instead of searching for corresponding mapping entries to the address given for every translation. Also, fixed virtual memory management regression caused by PR 4795;
5022 – Fixed a regression when the corresponding translated RSX address to the given one was negative;
4995 – Improved consistency of displayed address and fixed branch commands shown on the RSX debugger;
5038 – This laid the groundwork for treating unknown values written to the RSX method registers, by specifically implementing the behavior when an unknown cull face mode is set. This ‘feature’ is used by a lot of games. However, it is currently unknown how each register behaves when writing invalid values into it. This improvement allowed Monster Jam: Path of Destruction to finally go ingame!
5048 – Use cellGcm’s (PS3 GPU’s API) default register values instead of overriding them and doing the same for every flip command. This implemented all the missing default values, by using the ones from the original source, while also removing the reset of registers that should not be reset. Fixed God Of War 3’s intro scenes and Conan’s loading screens;
5054 – Fix typos in the register offsets from the above Pull Request.
5049 – Bump CMake requirement to 3.8.2+ in readme on account of C++ 2017 support.
5046 – Add a comma to readme description.
5043 – Fixed Git Revision string to always show 8 character.
4676 – Improved cellMusic, cellMouse, cellKb and cellPad accuracy. These improvements addressed an issue where the emulator would fail to detect the controller, thereby allowing TMNT: Mutants in Manhattan and The Legend of Korra to move from Nothing directly into Ingame!
4977 – Added “Accurate xfloat” option to the CPU tab;
4953 – Reduced lag when using cellMouse and addressed an issue which prevented mouse movements in fullscreen mode;
4998 – Minor Fix to stylesheet warnings;
5018 – Fixed bug with basic mouse input while switching between applications while in fullscreen mode;
4749 – Improvements to cellGame by implementing cellHddGameExitBroken and cellGameDataExitBroken as well as adding additional parameter checks to cellGameExec. Also, improvements to cellMsgDialog by moving cell message dialogs to their own function and simplified cellGameContentErrorDialog;
4957 – Implemented Mouse Movement for keyboard pad handler. Users can now map the right analog stick to the mouse!
4970 – Addressed texture cache bugs in overlaps_page, get_intersecting_set and invalidate_range_impl_base methods. In overlaps_page, the bug caused few textures to be re-uploaded unnecessarily, get_intersecting_set would skip the first cache entry due to an off-by-one error and invalidate_range_impl_base failed to mark some unprotected textures as dirty in certain cases. Also, get_intersecting_set was refactored to avoid unnecessary looping.
4930 – Improved documentation of dependencies in the readme.
4899 – Compilation fixes for Mingw64 (MSYS2).
4886 – Updated names of LV2 functions.
4871 – Fixed a bug in few games where only trophies installed are called when the trophies are already in place. However, this solution has caused a few regressions amongst other games. A more robust solution is under development.
4787 – Fixed overflow in PPUThread stack frame dump which caused the debugger to randomly crash when open for a extended period of time.
4546 – Fixed few compilation errors with VS2017 (v141 build tools).
4467 – This PR includes some of the initial work in reverse engineering cellMic, the module responsible for microphone support. While these changes don’t receive input from an actual mic, it allows games that require microphone input to progress further.
Contributor @ruipin has been working on a significant refactor of the texture cache part of the codebase since last month, which will make the code much more maintainable and a bit more efficient, while also extending the debug capabilities significantly. His work enables kd-11 to focus on more patron/end-user-facing features, and will aid the project by providing extra functionality for catching certain types of regressions long before they even hit the PR stage.
This refactor has been a long time coming, and with ruipin taking it up, both him and kd already profited off of it by locating deadlocks and other niche bugs in several titles. Due to the nature of a significant refactor however, it still requires further careful testing and fixing before it can get merged. You can follow the progression of this PR here.
If you like in-depth technical reports, early access to information, or you simply want to contribute, consider becoming a patron! All donations are greatly appreciated. RPCS3 now has two full-time coders that greatly benefit from the continued support of over 800 generous patrons.
This report was written by Asinine, elad335, HerrHulaHoop, KoDa and nitrohigito.