Contents
- 1Introduction
- 1.1TLDR
- 1.1.1Fast g480 aq0 vs VerySlow
- 1.1.2x264 Medium vs Slow
- 1.2Progress
- 1.2.1Older posts that led to this study
- 1.2.2720p 60 fps
- 1.2.31080p 60 fps
- 1.2.41440p 60 fps
- 1.3Test Bench
- 2Testing Strategy
- 3Results
- 3.1x264 Fast Preset
- 3.1.1VMAF
- 3.1.2MS-SSIM
- 3.1.3PSNR
- 3.2x264 Medium Preset
- 3.2.1VMAF
- 3.2.2MS-SSIM
- 3.2.3PSNR
- 3.3x264 Slow Preset
- 3.3.1VMAF
- 3.3.2MS-SSIM
- 3.3.3PSNR
- 3.4x264 Very Slow Preset
- 3.4.1VMAF
- 3.4.2MS-SSIM
- 3.4.3PSNR
- 4Finalists
- 4.1VMAF
- 4.2PSNR
- 4.3MS-SSIM
- 5Sample frames
- 5.1Fast g480 aq0 vs VerySlow
- 5.2x264 Medium vs Slow
- 6Conclusion
- 7Data
Introduction
This is the first post in a long series I plan on making to hash out as much data as possible. After the release of Turing RTX cards, I started to evaluate NVENC’s potential as an x264 replacement. Some people expressed a desire to see similar tests done with some variations. This could be additional settings applied, or other quality metrics included.
Note that currently I’m only collecting data for single-pass or equivalent encode methods. This is to remain relevant for live streaming. For example, NVENC 2-pass can use CUDA or the CPU for the first pass, and NVENC for the second. By doing both simultaneously, it’s still live encoding and suitable for streaming. x264 2-pass requires completely scanning the input then starting again for a second time. Therefore, I only list single-pass encodes here.
And thus the quest begins!
TLDR
If you aren’t sure yet that you want to read all the results before seeing some images, here’s a preview:
Fast g480 aq0 vs VerySlow
[twenty20 img1=”1289″ img2=”1290″ direction=”horizontal” offset=”0.5″ align=”none” width=”100%” before=”Fast g480 aq0 3195kbps” after=”VerySlow 3197kbps” hover=”true”]
x264 Medium vs Slow
[twenty20 img1=”1293″ img2=”1294″ direction=”horizontal” offset=”0.5″ align=”none” width=”100%” before=”Medium 3202kbps” after=”Slow 3203kbps” hover=”true”]
Progress
Older posts that led to this study
- This is a test designed to imitate the new NVENC settings introduced with OBS v23.
- Here is a broader set of encoders tested under varying conditions on 3 bitrates.
720p 60 fps
- Apex Legends: H.264 – x264 (this post), H.264 – NVENC Turing – H.264 QuickSync Coffee Lake.
- Overwatch: H.264 – x264, H.264 – NVENC Turing – H.264 QuickSync Coffee Lake.
- Heroes of the Storm: H.264 – x264, H.264 – NVENC Turing – H.264 QuickSync Coffee Lake.
1080p 60 fps
- Apex Legends: H.264 – x264, H.264 – NVENC Turing – H.264 QuickSync Coffee Lake.
- Overwatch: H.264 – x264, H.264 – NVENC Turing – H.264 QuickSync Coffee Lake.
- Heroes of the Storm: H.264 – x264, H.264 – NVENC Turing – H.264 QuickSync Coffee Lake.
1440p 60 fps
- Apex Legends: H.264 – x264, H.264 – NVENC Turing – H.264 QuickSync Coffee Lake.
- Overwatch: H.264 – x264, H.264 – NVENC Turing – H.264 QuickSync Coffee Lake.
- Heroes of the Storm: H.264 – x264, H.264 – NVENC Turing – H.264 QuickSync Coffee Lake.
Test Bench
- VMAF. The metric used by Netflix to select the video version to stream to it’s users.
- MS-SSIM. An older metric that compares quality at different scales to allow for user monitor variation.
- PSNR. The classic signal to noise metric. More concerned with mathematical purity than with what humans can see.
The test bench isn’t particularly relative in a quality analysis. No matter what hardware you use, you will get identical quality so long as it actually works. Speed tests will start coming through later where hardware matters more.
- CPU = Intel i7-8086k stock clock. Coffee Lake QuickSync Version 6.
- Graphics Card = NVIDIA RTX 2070. Turing NVENC 6th Generation.
- RAM = 16GB
- Encoding by FFMPEG v4.1.1 12th February 2019 with libx264 core 157 r2935 545de2f by VideoLAN http://www.videolan.org/x264.html
- Scoring by FFMPEG version N-93394-g14eea7c with libvmaf v1.3.14 built with gcc 7 (Ubuntu 7.3.0-27ubuntu1~18.04) all compiled from source on 17th March 2019.
- libvmaf model vmaf_v0.6.1.pkl
All original sources are recorded lossless from original gameplay at the resolution in question. Instead of scaling the same video down for alternative resolutions, I recorded each one individually. This is to avoid introducing any scaling artifacts.
Testing Strategy
Feedback from previous x264 tests included a request for fixed GOP length. I tried 4, 8 and 16 seconds on a couple of test bitrates. 8 seconds (-g 480) showed an improvement over 4 seconds (-g 240). However 16 seconds provided no difference or worse results sometimes. So I ran the full test with 8 (-g 480). This is indicated in the data by “g480”.
Another idea was to include variations in adaptive quantization. Disabling adaptive quantization (-aq-mode 0) improved VMAF and PSNR scores, but lowered MS-SSIM scores. So I included it for people to make their own informed choices. This is indicated in the data by “aq0”.
One of the more controversial results from earlier tests, was how faster presets beat slower presets in VMAF. Spoiler alert… This test will show the same. I know it’s not intuitive, but it is indeed what actually happens. Bear in mind, that with different resolutions and different source gameplay footage, this may change. So stay tuned for 720p60 in both Overwatch and Heroes of the Storm, followed by the same again at higher resolutions.
Results
x264 Fast Preset
VMAF
- Disabling adaptive quantization (aq0) provides a decent buff to the quality per bitrate.
- Setting Group of Pictures length to 8 seconds (g480) provides a tiny additional buff. This affects both the standard Fast preset and the aq0 variant.
- x264 Fast preset with both aq0 and g40 performs better than the other variants on VMAF.
MS-SSIM
- g480 provides a minimal buff to the MS-SSIM score.
- aq0 hurts the score a large deal. aq0 at 5500kbps scores the same as 4400kbps without the aq0 flag.
- Fast preset with only g480 performs better than the other variants on MS-SSIM.
PSNR
- The quality improvement by bitrate according to PSNR is much closer to linear than the other metrics.
- aq0 provides a decent buff to PSNR scores from non aq0 variants.
- g480 provides a minimal buff to the scores.
- x264 Fast preset with both aq0 gives the best by a tiny margin.
x264 Medium Preset
VMAF
- g480 outperforms the other variants for the middle of the curve.
- aq0 & g480 together win on the low and high end of the curve.
MS-SSIM
- As before, MS-SSIM performs worse with aq0 and a tiny bit better with g480.
PSNR
- On Medium, the PSNR scores do the same as they did on Fast preset. The variant with both aq0 and g480 performs the best.
x264 Slow Preset
VMAF
- aq0 begins to show a more consistent detriment to x264 when we get to the Slow preset.
- g480 still provides a small buff to non-g480.
- Slow g480 WITHOUT aq0 will be selected as the best performer from this graph.
MS-SSIM
- The MS-SSIM scores work exactly as in the previous presets.
PSNR
- The PSNR scores on the Slow preset show the same pattern as with the previous presets. The only thing to notice is that the lines are getting closer together as the preset gets slower.
x264 Very Slow Preset
VMAF
- For the most part, all variants are quite similar in scores. The largest separation is at the very lowest end of the curve, where aq0 provides a quality buff.
- The variant with both g480 and aq0 manages to achieve the best scores overall.
- In summary, different x264 presets at different bitrates have varying degrees of success with VMAF. Generally speaking, g480 and aq0 both provide a buff, just not in EVERY circumstance.
MS-SSIM
- Again, the MS-SSIM scores follow the same pattern. g480 provides a tiny buff and aq0 hurts the score.
- In summary, MS-SSIM agrees with VMAF and PSNR regarding the small quality buff of g480. However it disagrees with the other metrics regarding aq0.
PSNR
- PSNR also consistently views each variant as either a buff or nerf in all tested presets at all tested bitrates. aq0 is a decent buff, and g480 is a very minor one.
- The only thing that changes as the x264 preset gets slower is the gap between the variants.
Finalists
Here we select the best curve from each preset and compare them per quality metric.
VMAF
The variant with aq0 AND g480 is used for all presets except Slow, where the g480 only variant is selected. The VMAF scores for x264 presets are the exact opposite to what most people expect. Fast comes out as the winner, followed by Medium, then Slow and finally VerySlow. In other resolutions and games it may not be exactly like this. Different resolutions see different rankings between the presets. But that is a graph for another day. Stay tuned!
PSNR
All presets for PSNR use the aq0 & g480 variant since they consistently scored the best. Compared together, they are remarkably close in scores. The Slow preset performs the worst all round, while Fast performs the best. Medium starts equal to Fast at low bitrates, but drops off. On the other hand, VerySlow catches up with Fast at the higher bitrates. VeryFast and Medium swap places around 4350kbps.
MS-SSIM
This graph is the more commonly expected one. VerySlow scores the highest across all bitrates. Slow and Medium are approximately equal from start to finish. Fast starts off well below, but catches up to Slow and Medium at around 3200kbps.
Sample frames
Webspace limitations at this time prevent me from just making a bunch of videos available for download. This may change in the future and this post may get updated. For now, a least we can look at some sample frames.
Fast g480 aq0 vs VerySlow
[twenty20 img1=”1289″ img2=”1290″ direction=”horizontal” offset=”0.5″ align=”none” width=”100%” before=”Fast g480 aq0 3195kbps” after=”VerySlow 3197kbps” hover=”true”]
To point out the disagreement between VMAF & PSNR from MS-SSIM I picked a frame of high motion. VMAF & PSNR believe that the left side is better while MS-SSIM believes that the right side is better. In my own personal opinion, I believe that MS-SSIM makes the better choice in this instance.
Unfortunately there is a catch. This catch is one of the problems with subjective opinions on quality. I believe that the MS-SSIM pick is better because I can see the detail that it is trying to preserve. On the VMAF choice, some of this quality is blended and blurred in. The MS-SSIM winner does seem to retain more details, however those small details look blocky and pixelated. The VMAF winner tends to look smoother and more consistent. Personally, the blockiness bothers me less than the blurriness in this particular video. But this is because I have access to the original, while a stream viewer would not!
I’d be lying if I didn’t admit something. This is that if I did not have both versions, AND the original lossless source material, AND play Apex on a regular basis, I may very well not have been aware of the detail that is lost in the VMAF preferred frame. A random viewer who doesn’t have the original to check, AND is not aware of the detail that is blurred out, COULD easily reach the opposite conclusion, being that the VMAF preferred video looks smoother and cleaner, while the MS-SSIM video looks blocky and pixelated. This is the catch with being subjective, and the reason why we need metrics. It’s good to learn which metric prefers what differences so we can form better conclusions.
x264 Medium vs Slow
[twenty20 img1=”1293″ img2=”1294″ direction=”horizontal” offset=”0.5″ align=”none” width=”100%” before=”Medium 3202kbps” after=”Slow 3203kbps” hover=”true”]
This is the same frame but from 2 more encodes. This time, I chose encodes to point out what VMAF sees that MS-SSIM does not. With neither the g480 nor aq0 options, MS-SSIM rates Medium and Slow almost exactly the same at this bitrate. However, VMAF and PSNR both say that Medium is at least a little bit better than Slow at this bitrate.
A good place to look at this video is the rocks and grass on the far right hand side. Also, the flare from the gunfire just to the left of the weapon. This frame looks much the same on both encodes but the subtle difference is most apparent in these two places. Similar to the above comparision, the Slow version retains slightly more detail for the price of some slight blockiness. MS-SSIM thinks that these two versions are effectively the same, with a SLIGHT preference for the Medium. VMAF and PSNR both show clear preference for the Medium version. MS-SSIM doesn’t consider some blockiness to be detrimental to the score. However, VMAF and PSNR both prefer elimination of blockiness by smoothing out some detail.
Conclusion
For this test video, Apex Legends in 720p at 60 fps, the differences are honestly not particularly huge. Choosing a slower preset or aq0 is just a matter of preferring MS-SSIM scores or VMAF and PSNR. The only consistent way of improving ALL scores is to take whatever preset you have and add g480. Even then, the benefit is only very very minor and not easily noticeable.
I would hazard a guess, that the gradual optimisation of x264 over the past decade and a half has largely been based on SSIM as a benchmark. Adaptive quantization, as well as the search range, direction and weighting across pixels and between frames has likely been developed on a basic premise of “if it improves the SSIM then it’s good, if it doesn’t then it’s not helping”. At least to some degree. While PSNR has been around the entire time, VMAF was only adopted by Netflix from VQA in mid 2016. This is well after the release of HEVC and VP9, as well as several alternative and hardware encoders for the AVC H.265 format.
In the end, for this resolution and game, you simply need to select your own preference. Would you pay the price of blokiness to retain some detail? Or do you prefer the image to be smoother at the price of blurring out some detail? In this case, I personally prefer the MS-SSIM results, but I can understand why some people prefer the other. Especially if they don’t have access to the original for comparison.
Data
Agamemnus has a passion for gaming and an eye for tech. You can see him streaming occasionally on twitch.tv/unrealaussies and catch him on the Unreal Aussies Discord. Evidence > Opinion.