Welcome to November’s Progress Report! As 2018 now comes to close, we hope you’re enjoying the holiday season. In this report, we will be detailing kd-11’s work on rewriting the FIFO and draw call processing which provides a noticeable improvement in performance in quite a few games. This month also saw the return of another long-time contributor, GalCiv whose contributions fixed a large number of regressions in multiple games. We also saw exclusive titles such as Gran Turismo 5 and MLB: The Show 16 go ingame for the first time on RPCS3. We’ve got lots to share so let’s jump straight into it!
In addition to the following report, further details of Nekotekina and kd-11’s work during November and upcoming contributions can be found in their weekly reports on Patreon. This month’s Patreon reports are:
Table of Contents
For the first time, the percentage of games in the Ingame and Playable categories have risen above 80%. This marks yet another milestone in game compatibility for RPCS3. As the accuracy and performance improves, we will see this ratio improve even further! Thanks to the intensive testing done by our testers this month, 95 games were moved out of Intro and Loadable categories. Also, multiple duplicate titles across all categories were identified and merged, improving the overall accuracy of the compatibility list. Finally, the elusive Nothing category has dropped to just 4 games. For a more detailed look, you can view the compatibility history page to see exactly which games had their status changed this month.
On Git statistics, there have been 4,229 lines of code added and 1,843 removed through 30 pull requests by 11 authors.
Major RPCS3 Improvements
RSX FIFO processing rewrite (#5315)
As anyone who has followed the RSX improvements in the past few months would know, kd-11 had placed a higher importance on bug-fixing and code cleanup in various areas of the graphics pipeline. While this exercise brought massive improvements to graphics, its true purpose was to lay the foundation for a major overhaul to the RSX rendering pipeline. Though smaller bug-fixes were necessary, kd-11 had surmised that to truly break past the performance limitations seen in certain AAA titles, there must be a significant rewrite to how RPCS3 handles draw calls and the FIFO command queue.
Before we step into what exactly was done, here’s a basic exposition to keep this section understandable. Objects in 3D computer graphics are rendered as a mesh which is a collection of vertices, edges and faces that define the shape. A draw call is a command given by the CPU to the GPU to render one mesh. However, since the CPU and GPU process data at different speeds, a command buffer (or queue) exists for the draw calls to eliminate any potential slowdowns. This command queue works on a FIFO (First In First Out) processing mechanism. With regards to the PlayStation 3, the FIFO processing mechanism is the command queue that Cell uses to communicate instructions with the RSX chip.
The goal of kd-11’s effort was to create a new approach that would be more efficient and cut out unnecessary work that eats up draw call time. His initial approach focused on optimizing the FIFO command queue itself but experiments showed this was not actually necessary and it was far more important to improve how the draw calls are processed. Most games have about 1,000 – 3,000 draw calls per frame but certain games (for example, games from the Ratchet and Clank series) have over 10,000 draw calls per frame. These titles being PlayStation 3 exclusives were heavily unoptimised and would generally bring RPCS3 down to a crawl.
To handle this, firstly, the RSX code was refactored to make the FIFO processing a standalone namespace including separation of the main loop from the flattener. This helped improve readability of the code and eased the process of implementing optimisations. Next, the pipeline functionality was streamlined so that RSX back-ends work more like a typical game, with minimal updates between separate draw calls. Certain aspects of both graphics renderers (OpenGL and Vulkan) were also rewritten to improve efficiency and keep them inline with the new pipeline functionality. These preliminary changes succeeded in easing the CPU burden by minimizing data transfer load between the CPU and GPU as much as possible. A performance uplift was already visible in draw call heavy titles.
Once these changes were finalised, kd-11 reworked how the draw calls themselves were processed and apply optimisations. There are 2 parts to this, the first being to completely ignore the FIFO begin/end commands and use our own custom mechanisms to decide where to mark the scope of draw calls. The custom mechanism places emphasis on optimising the command queue and improving efficiency of the command processor. These two processes go together with hints being injected into the FIFO stream by the front-end processor to maximize draw call submission efficiency.
The second change was to make the FIFO flattener engage dynamically depending on load to avoid slowdown when the extra processing overhead is not required. The emulator tries to detect when FIFO preprocessing is beneficial and only enables optimizations if the benefit outweighs the cost. The current threshold is at least 500 draw calls saved at over 2,000 draw calls to justify the overhead.
Finally, with a couple more miscellaneous improvements to optimise the draw call throughput, we could see massive performance improvements in games such as Uncharted: Drake’s Fortune, Infamous 1 & 2, Resistance: Fall of Man and more. Check out our longest showcase video below which includes improvements from other changes this month, including improvements in God of War 3, Gran Turismo 5, MotorStorm and more!
Improved PPU lwmutex locking and thread scheduler (#5314)
While the emulation of the PPU has seen great strides in terms of performance and compatibility, there always exists a chance that certain games would behave in a way that the emulator does not correctly handle. Eladash had encountered one such exception with Le Tour de France 2012 and promptly went about digging for the issue. What he found were rare inaccuracies in the PPU emulation regarding PPU lwmutex locking and PPU thread scheduling.
Speaking of lwmutex, a mutex is a locking mechanism used to synchronize access to a resource. When the mutex is locked, other threads cannot access the resource except the one acquiring the mutex. The PlayStation 3 provides a complex structure for lwmutex through its system module “liblv2”. To request for lwmutex, a game would typically call the liblv2 function which in turn would call the mutex syscalls. However, it was discovered that certain games call the mutex syscalls directly with different behaviour/parameters which was not handled correctly by the emulator.
Once this was discovered, Eladash implemented the functionality to handle direct calls from games to lwmutex. To further address multiple assertion failures, a simple flag was added to identify the state of lwmutex locking which solved unnecessary actions such unlocking the same resource twice. This helped games that directly called lwmutex during boot up to progress further.
Though the lwmutex issue was addressed, Le Tour de France 2012 still refused to load which prompted eladash to debug further. After further investigation, the game was suffering from a race condition where the emulator was not acquiring the PPU thread scheduler lock when changing thread priority. The PPU thread scheduler uses the thread priority as a hint to allocate CPU resources. Similar to a PC, a thread with higher priority is placed closer to the beginning of the thread queue whereas a thread with lower priority is pushed towards the end. Another important function of the thread priority order is to detect duplicate threads. When the emulator did not correctly lock the PPU thread scheduler’s mutex, it led to faulty thread insertion that created duplicate threads.
With the problem now identified, eladash ensured the PPU thread scheduler’s mutex would be locked before changing the priority of PPU threads. While this finally fixed Le Tour de France 2012, it also fixed a plethora of crashes in other titles such as Skate, Gran Turismo 5, Super Street Fighter IV, Beyond Two Souls White Album, Gundam Breaker 2, MotorStorm, Borderlands, Backbreaker Vengeance and many more!
Ensure SPU threads are stopped in sys_spu_thread_group_join (#5310)
Amongst all the massive improvements, November saw the return of GalCiv (RipleyTom), a developer who took a break from RPCS3 in August 2017. With his return, he brought fixes for a tricky regression that had plagued RPCS3 for months. The issue first surfaced after Nekotekina’s famous SPU ASMJIT v2.0 pull request back in April that brought massive increase to both performance and compatibility. But as with most big updates, these changes caused a regression in titles like Catherine and Valkyria Chronicles making them hang before reaching the menus. However, the issue only affected certain users with low thread count CPUs (such as 4C4T) or weak laptop CPUs. Complicating matter further, for some users, this issue occurred every time while for other users it only occurred intermittently. This pointed to the existence of a race condition which only manifested under certain precise circumstances, commonly with inadequate CPU resources. This volatility made the issue quite a challenge to debug.
For whatever reason, GalCiv decided to make his comeback by fixing this very issue. But to understand exactly how he did it, here’s a quick primer to how the SPU threads are handled on the PlayStation 3. The sequence of how the SPU threads are used by the PlayStation 3 is as follows:
1. Creates the SPU thread group (sys_spu_thread_group_create)
2. Initialize each SPU thread individually (sys_spu_thread_initialize * num_threads)
3. Start all the SPU threads in the SPU group (sys_spu_thread_group_start)
4. Wait for the SPU group execution to terminate (sys_spu_thread_group_join)
5. Issue a STOP instruction in one of the SPU threads (sys_spu_thread_group_exit)
On a real PlayStation 3, the STOP instruction instantly stops all the SPU threads but the same is not possible on RPCS3 as instructions that are atomic on the PlayStation 3 are not in RPCS3. Stopping an SPU thread while it is executing an operation such as an MFC operation (a memory transaction) will lead to disastrous outcomes. So instead of instantly killing all SPU threads in the group, when each SPU thread reaches sys_spu_thread_group_exit, RPCS3 sets the state variable to “cpu_flag::stop” to tell the thread to stop. The emulator also sets the SPU group status to SPU_TGJSF_GROUP_EXIT. When sys_spu_thread_group_join sees the status changed to SPU_TGJSF_GROUP_EXIT, the function will finish and the game would continue on.
For the most part, this worked fine as the SPU threads would terminate and the game would continue on its merry way. But sometimes, certain games immediately call sys_spu_thread_group_start again using the SPU group that was just “stopped”. sys_spu_thread_group_start would then remove the “cpu_flag::stop” flag and the threads would start again. This worked fine as long as all SPU threads were actually stopped when the group was reused. If some SPU threads were still in between execution, they would not be restarted and do nothing after completion of their initial operation. This is obviously undesirable as some SPU threads would then start a waiting instruction (like a jumping loop) on itself and never do anything while other SPU threads are waiting on them causing the game to hang.
Once the root of the problem was identified, GalCiv simply implemented a check to ensure that all SPU threads have actually stopped running in sys_spu_thread_group_join before reusing the SPU group. This fixed the regression in Catherine, Naruto Shippuden: Ultimate Ninja Storm 2, Devil May Cry 4 and many other titles making these games playable once again on 4C4T CPUs.
The first instalment in the much beloved Skate franchise is now fully playable thanks to eladash’s improvements! You can now experience this classic in glorious 4K resolution. A user from our discord server, Jotain, played this game from start to finish on RPCS3 with no major issues. Check out our gameplay video below:
Gran Turismo series
Racing fans rejoice! Thanks to improvements by eladash this month, two PlayStation 3 exclusives from the Gran Turismo series go ingame, Gran Turismo 5 and Gran Turismo 5 Prologue. Both titles get decent performance even on mid-range CPUs. The games however still suffer from graphical issues and incorrect display of text preventing them from being playable. Check out the gameplay footage from Gran Turismo 5:
Gran Turismo 5 Prologue suffers from a unique set of graphical issues. Setting the base resolution to 720p gives us clean graphics but only occupies a portion of the screen (left image) whereas setting the base resolution to 1080p gives us a strobe tint during the race (right image). However, fixes for most of the issues affecting these games are already in the works.
College Hoops 2K8
This month saw multiple sports titles progress ingame and even become playable. Among them is the console exclusive College Hoops 2K8 which is now playable even on mid-range CPUs! Fans who missed this title on PC can now play it in full 4K resolution.
Major League Baseball series
This month saw improvements to both Major League Baseball series, the Playstation exclusive MLB: The Show series and console exclusive Major League Baseball 2K#. Four titles from MLB: The Show (13 – 16) and two titles from Major League Baseball (2K7 and 2K8) went ingame for the first time this month! However, the games still suffer from low performance and few graphical glitches which keep them being playable.
Clash of the Titans
Clash of the Titans is another title that became playable this month. Previously, the game after the menus when attempting to start a new game. Thanks to the accuracy improvements made this month, this title jumped from Intro straight into the Playable category!
Fight Night series
To wrap up the titles for this month, the remaining two games from the Fight Night series, Fight Night Champion and Fight Night Round 4 now go ingame! However, both titles suffer from very broken audio and are not considered playable yet.
Gundam Breaker 2
The Gundam series has seen constant improvements over the past few months and this month is no exception. Thanks to eladash’s fixes, Gundam Breaker 2 is now progressing Ingame with stable frame rate and gameplay. However, some users have reported that the game suffers from minor texture corruptions and crashes preventing it from being Playable. Check out gameplay footage below:
Marvel vs. Capcom: Origins
For retro fans, Marvel vs. Capcom: Origins is now playable! Be sure to enable Strict Rendering Mode to correctly display few textures.
Fatal Inertia EX
Another console exclusive title, Fatal Inertia EX made its way into the Playable category this month.
Little League World Series Baseball 2010
To wrap up the sport titles for this month, Little League World Series Baseball 2010 was tested this month and found to be fully playable on RPCS3.
While Quantum Theory reached ingame back in September, its performance has improved significantly this month. The game now has good performance and graphics but further testing is necessary to ascertain if the game is indeed playable. In the meantime, check out the current state of gameplay of this title on RPCS3:
Finally, we have the console exclusive Fuse developed by Insomniac Studios going ingame for the first time. While there aren’t any major graphical glitches, the game does suffer from low performance and all character models are stuck in a T-pose, keeping this title from being playable on RPCS3.
There have been numerous other pull requests merged during the month that just couldn’t make it to the Major Improvements section. We have collected a list of all other improvements here, and attached a brief overview to each. Make sure to check out the links provided for them if you are interested, as their GitHub pages usually uncover further details as well as the code changes themselves. To see this whole list right on GitHub, click here.
5298 – Added specific optimizations to a few conditional variable usage; Made cellSaveData operations atomic to avoid race conditions that may corrupt saves; Fixed a few corner cases errors with SPU thread termination;
5320 – Improved accuracy of shared_mutex implementation by increasing max_readers to 16,383 as the standard requires at least 10,000 and miscellaneous maintenance efforts;
5366 – Fixed a bug with the 16-bit audio mode where the audio sent to a local buffer would immediately go out of scope before it was sent to the audio backend with certain compilers and audio backends;
5372 – Optimisations for conditional variables by implementing helper functions such as balanced_wait_until and balanced_awaken (introduced in Windows 8.1). This is now used by shared_mutex, cond_variable, cond_one and cond_x16; Replaced most occurences of semaphore<> with shared_mutex;
5359 – Dynamically check the loaded Qt library version in case of mismatch;
5214 – Fixed starting offset from PPU stack allocation in stack register, giving the missing stack space that was necessary for a few games to operate. Fixed a crash in High Velocity Bowling when any update is applied;
5261 – Fixed nullptr arguments dereferencing in cellGameCreateData function. If the parameter is a null pointer, the library simply skips writing into it;
5302 – Fixed draw call validation regression of inlined arrays when using unwritten registers. Fixed a regression in Test Drive Unlimited 2;
5311 – Fixed cellPadGetData buffer fill of pad state buffer by writing only parts that need writing. Some games scan through the controller data received and are very sensitive to buffer changes, when performing unnecessary/incorrect writes they will detect wrong input changes and as a result will not respond to button changes. Many games such as Mad Riders, Call of Juarez: Gunslinger, Dead Island: Riptide and Chime Super Deluxe which previously did not recognise button inputs, now progress further!
5330 – Fixed a regression from PR 5302 where reserved and unused arrays for vertex buffer are used for padding with inlined arrays, adding their stride as well to the starting offset of the preceding arrays;
5326 – Fixed a typo in error code checking of the PS3 files API function for creating folders (sys_fs_mkdir). When trying to create an existing directory, previously this operation returned success status while in reality should return an error code indicating the directory could not be created because it already exist. Fixes installation process errors in few titles such as Devil May Cry 4 and Marvel vs. Capcom 3;
5331 – Implemented missing sys_spu_thread_tryreceive_event receiving functionality for SPUs. Fixed a crash in College Hoops 2K7 and All-Pro Football 2K8 allowing the titles to progress a little further;
5317 – Fix libcamera regression
Fixed a regression from PR 5211 in the libcamera initialization process when using “fake” camera input setting. Previously the emulator would crash when games tried to initialise camera with this setting;
5365 – Fixed a cellFsOpen flag combination that was not handled when opening files through the PS3 API which was used by the updated version of Gran Turismo 6 in its installation process.
5242 – Fixed a minor issue which caused games to not launch when compiling RPCS3 with the -fsanitize=address flag on Linux;
5300 – Adds the libnsl.so.1 library into RPCS3’s AppImages, research for this was submitted upstream and results in Appimages which require libnsl.so.1 to work on Fedora 28!
5303 – Minor fix for Xbox One controllers over bluetooth in Linux where bluetooth returns Select as KEY_BACK;
5321 – Added an extra check to avoid an exception if the firmware .PUP file cannot be opened (generally due to antivirus interference);
5325 – Default stack allocation for a sysutil callback was increased as it was overwriting the previous function stack. This fix helped Afrika progress further ingame;
5310 – Fixed a bug in sys_spu_thread_group_join where the SPU threads were only notified and not properly stopped by the end of the function which caused issues when an SPU group was reused, see coverage in major improvements here;
5341 – Minor accuracy improvements to error checking for parameters of cellVdecGetPicture. With this fix, Ridge Racer 7 3D no longer required LLE cellvdec to go ingame!
5338 – Added null alloc_addr checks to sys_memory_allocate and sys_memory_allocate_from_container. This fixed crashes affecting few games including Ridge Racer 7.
5323 – Fixed macOS compilation on Travis CI.
5313 – Fixed the update message on OpenCorrectionDialog – it was incorrectly printing the new setting in the place of the old one (e.g. “The config entry ‘Render’ was corrected from ‘Vulkan’ to ‘Vulkan'”).
5299 – Loads trophy data in another thread and adds dialog box with progress bar, previously the emulator would become unresponsive when loading too many trophies.
5293 – Updates firmware check to latest version.
As you may have seen from the RPCS3 Improvements video already, the PlayStation 3’s Graphical User Interface (GUI) known as the XMB started to work in WIP builds from Ruipin this month!
This wouldn’t have been possible without earlier work in the year by Jarves and Farseer. Together they started trying to hack together a build that could boot into the XMB menu. However, there is a lot involved in making this possible because while the XMB is just an interface, it’s built upon the PS3’s shell known as VSH. And as such, requires accurate emulation of the VSH in order for it to boot.
The developers were actually interested in getting the XMB working because it allows us to LLE (Low Level Emulation) some firmware modules that we previously had to HLE (High Level Emulation). For example our trophy HLE module is not perfect and breaks some games, but if we could LLE the module it would be emulated with perfect accuracy.
Fast forward a few months and Juhn (one of our contributors) picked it up, rebased it and cleaned up the code. A few days later Ruipin also became interested in the XMB and made some changes to allow us to have only a single dumped file from the PS3 instead of a whole HDD image. If you were to download this build, all you would need to dump from your PS3 is a xRegistry.sys file.
In its current state, audio does not work, you cannot boot games from the XMB, networking is limited to apps that don’t check for connectivity such as the web-browser. And overall it still requires a lot of work before it can be merged into the main builds of RPCS3.
This sums up our November report. If you like in-depth technical reports, early access to information, or you simply want to contribute, consider becoming a patron! All donations are greatly appreciated. RPCS3 now has two full-time coders that greatly benefit from the continued support of over 800 generous patrons.
This report was written by HerrHulaHoop, GalCiv, Asinine, Juhn, Digitaldude555 and elad335.