Contents
- 1Introduction
- 1.1TLDR
- 1.1.1x264 Fast g480 aq0 4390kbps vs VerySlow 4898kbps
- 1.1.2x264 Medium 3946kbps vs 3914kbps
- 1.2Progress
- 1.2.1Older posts that led to this study
- 1.2.2720p 60 fps
- 1.2.31080p 60 fps
- 1.2.41440p 60 fps
- 1.3Test Bench
- 2x264 Testing Strategy
- 2.1g480
- 2.2aq0
- 2.3Harmonic Mean
- 3Results
- 3.1x264 Fast Preset
- 3.1.1VMAF
- 3.1.2MS-SSIM
- 3.1.3PSNR
- 3.2x264 Medium Preset
- 3.2.1VMAF
- 3.2.2MS-SSIM
- 3.2.3PSNR
- 3.3x264 Slow Preset
- 3.3.1VMAF
- 3.3.2MS-SSIM
- 3.3.3PSNR
- 3.4x264 Very Slow Preset
- 3.4.1VMAF
- 3.4.2MS-SSIM
- 3.4.3PSNR
- 4Finalists
- 4.1VMAF
- 4.2PSNR
- 4.3MS-SSIM
- 5Sample frames
- 5.1x264 Fast g480 aq0 vs VerySlow
- 5.2x264 Medium 3946kbps vs 4914kbps
- 6Conclusion
- 7Data
Introduction
Welcome to the second post in this series. This time I ran x264 encodes on a 1080p 60 fps Apex Legends clip. The main differences compared to 720p are with PSNR scores, while VMAF and MS-SSIM tend to maintain their patterns.
Note that currently I’m only collecting data for single-pass or equivalent encode methods. This is to remain relevant for live streaming. For example, NVENC 2-pass can use CUDA or the CPU for the first pass, and NVENC for the second. By doing both simultaneously, it’s still live encoding and suitable for streaming. x264 2-pass requires completely scanning the input then starting again for a second time. Therefore, I only list single-pass encodes here.
TLDR
If you aren’t sure yet that you want to read all the results before seeing some images, here’s a preview:
x264 Fast g480 aq0 4390kbps vs VerySlow 4898kbps
[twenty20 img1=”1505″ img2=”1506″ direction=”horizontal” offset=”0.5″ align=”none” width=”100%” before=”Fast g480 aq0 4390kbps” after=”VerySlow 4898kbps” hover=”true”]
x264 Medium 3946kbps vs 3914kbps
[twenty20 img1=”1544″ img2=”1545″ direction=”horizontal” offset=”0.5″ align=”none” width=”100%” before=”Medium 3946kbps” after=”Slow 4914kbps” hover=”true”]
Progress
Older posts that led to this study
- This is a test designed to imitate the new NVENC settings introduced with OBS v23.
- Here is a broader set of encoders tested under varying conditions on 3 bitrates.
720p 60 fps
- Apex Legends: H.264 – x264, H.264 – NVENC Turing – H.264 QuickSync Coffee Lake.
- Overwatch: H.264 – x264, H.264 – NVENC Turing – H.264 QuickSync Coffee Lake.
- Heroes of the Storm: H.264 – x264, H.264 – NVENC Turing – H.264 QuickSync Coffee Lake.
1080p 60 fps
- Apex Legends: H.264 – x264 (this post), H.264 – NVENC Turing – H.264 QuickSync Coffee Lake.
- Overwatch: H.264 – x264, H.264 – NVENC Turing – H.264 QuickSync Coffee Lake.
- Heroes of the Storm: H.264 – x264, H.264 – NVENC Turing – H.264 QuickSync Coffee Lake.
1440p 60 fps
- Apex Legends: H.264 – x264, H.264 – NVENC Turing – H.264 QuickSync Coffee Lake.
- Overwatch: H.264 – x264, H.264 – NVENC Turing – H.264 QuickSync Coffee Lake.
- Heroes of the Storm: H.264 – x264, H.264 – NVENC Turing – H.264 QuickSync Coffee Lake.
Test Bench
- VMAF. The metric used by Netflix to select the video version to stream to it’s users.
- MS-SSIM. An older metric that compares quality at different scales to allow for user monitor variation.
- PSNR. The classic signal to noise metric. More concerned with mathematical purity than with what humans can see.
The test bench isn’t particularly relative in a quality analysis. No matter what hardware you use, you will get identical quality so long as it actually works. Speed tests will start coming through later where hardware matters more.
- CPU = Intel i7-8086k stock clock. Coffee Lake QuickSync Version 6.
- Graphics Card = NVIDIA RTX 2070. Turing NVENC 6th Generation.
- RAM = 16GB
- Encoding by FFMPEG v4.1.1 12th February 2019 with libx264 core 157 r2935 545de2f by VideoLAN http://www.videolan.org/x264.html
- Scoring by FFMPEG version N-93394-g14eea7c with libvmaf v1.3.14 built with gcc 7 (Ubuntu 7.3.0-27ubuntu1~18.04) all compiled from source on 17th March 2019.
- libvmaf model vmaf_v0.6.1.pkl
All original sources are recorded lossless from original gameplay at the resolution in question. Instead of scaling the same video down for alternative resolutions, I recorded each one individually. This is to avoid introducing any scaling artifacts.
x264 Testing Strategy
g480
Due to feedback from earlier tests, the variations applied to the presets are as follows… Setting the group of picture size to 480 (-g 480). I tried 4, 8 and 16 seconds on a couple of test bitrates. 8 seconds (-g 480) showed an improvement over 4 seconds (-g 240). However 16 seconds provided no difference or worse results sometimes. So I ran the full test with 8 (-g 480). This is indicated in the data by “g480”.
aq0
Another idea from feedback was regarding adaptive quantization. Disabling adaptive quantization (-aq-mode 0) improved VMAF and PSNR scores, but lowered MS-SSIM scores. So I included it for people to make their own informed choices. This is indicated in the data by “aq0”.
Harmonic Mean
Another piece of feedback was with the averaging of scores. Typically, the default method is to use the arithmetic mean. But I’ve included the harmonic mean as a same-coloured-dotted-line for each result. This has almost not effect on MS-SSIM scores, but does lower the overall score for VMAF and PSNR. Harmonic mean is the reciprocal of the arithmetic mean of reciprocals. A simple explanation is that with scores of [4, 4, 1] the arithmetic mean is 3, while the harmonic mean is 2. Outliers tend to affect the harmonic score more, which means the odd frame that is completely warped will hurt the overall score greatly, so the harmonic mean difference from the regular average reveals how frequently the worst frames appear.
As long as the harmonic mean remains at a consistent small distance from the arithmetic mean, you can be certain that the very worst frames in a video are not too far from the average quality.
Results
x264 Fast Preset
VMAF
- Just like with 720p60, disabling adaptive quantization (aq0) provides a decent buff to the quality per bitrate.
- Setting Group of Pictures length to 8 seconds (g480) has varying success. On the regular Fast preset it provides a small buff while under 3.6Mbps. At higher bitrates it performs worse. With aq0 set, g480 provides only a marginal buff.
- x264 Fast preset with both aq0 and g480 performs better than the other variants on VMAF.
MS-SSIM
- g480 provides an indiscernible buff to the MS-SSIM score with aq0. Conversely, g480 gives a small buff on the regular Fast below 3600kbps which then reverses for higher bitrates.
- aq0 decreases MS-SSIM a large deal. aq0 at 6500kbps scores the same as 5400kbps without the aq0 flag.
- Fast preset with neither g480 nor aq0 performs better than the other variants on MS-SSIM.
PSNR
- The PSNR metric has the most linear quality per bitrate rise of all the metrics.
- aq0 provides a decent buff to PSNR scores from the standard fast preset.
- g480 provides no discernible difference to the aq0 variant.
- Although g480 without aq0 gives better scores than the standard Fast preset, it causes the command which is otherwise the same to use more bits, resulting in a curve that ends up lower for the most part.
- x264 Fast preset with both aq0 and g480 gives the best by a tiny margin.
x264 Medium Preset
VMAF
- g480 provides an imperceptible increase on Medium, while aq0 gives a larger one.
MS-SSIM
- This graphs shows MS-SSIM scoring worse with aq0 and an almost non-existent amount better with g480.
PSNR
- On Medium, g480 makes no discernible difference to PSNR.
- aq0 has a consistent positive buff to PSNR on Medium preset.
x264 Slow Preset
VMAF
- g480 still provides only a tiny improvement, but aq0 provides slightly more
- The variant with both aq0 and g480 still performs the best on VMAF on x264 Medium.
MS-SSIM
- The MS-SSIM scores work much the same as in the previous presets, with aq0 lowing the score significantly and g480 improving it just marginally.
PSNR
- The PSNR scores on the Slow preset show the same pattern but with the lines getting closer together as the preset gets slower.
x264 Very Slow Preset
VMAF
- g480 on VerySlow now provides around half the buff of aq0.
- The variant with both g480 and aq0 achieves the best VMAF scores again.
MS-SSIM
- As usual, aq0 hurts the MS-SSIM score drastically.
- g480 provides a small but consistent improvement across bitrates.
PSNR
- Each variant is consistently either a buff or nerf in all tested presets and bitrates.
Finalists
Here we select the best curve from each preset and compare them per quality metric.
VMAF
The variant with aq0 AND g480 is used for all presets. The VMAF scores for x264 presets are mixed. For the most part, Fast is the winner, followed by Medium, then Slow and finally VerySlow. The only exception, and where this differs from the 720p 60 fps Apex Legends test, is at very low bitrates, where Medium beats fast and VerySlow beats Slow. However, this bitrate gives scores that are well below acceptable quality. At bitrates this low, it may be better to lower the resolution or frame rate and accept scaling artifacts instead of encoder quantization artifacts.
PSNR
All presets for the PSNR metric use the aq0 & g480 variant since they consistently scored the best. Compared together, they are remarkably close in scores. The Slow preset scores the lowest below 3200kbps. VerySlow scores the highest for the entire curve. Medium is second best until 7.5Mbps where Fast catches up to it and they even out.
MS-SSIM
The MS-SSIM agrees with the x264 search selection method most accurately. VerySlow scores the highest and Fast the lowest across all bitrates. Slow and Medium are approximately equal starting from 4Mbps.
Sample frames
There’s 256 encoded files occupying nearly 6GB, and the original lossless version is 2GB. So I can’t really just pop them up here for download. What I can do, is make some frame comparisons and provide all the log files if somebody wants to analyse them, or repeat the test themselves on another source.
x264 Fast g480 aq0 vs VerySlow
[twenty20 img1=”1505″ img2=”1506″ direction=”horizontal” offset=”0.5″ align=”none” width=”100%” before=”Fast g480 aq0 4390kbps” after=”VerySlow 4898kbps” hover=”true”]
I selected a frame with medium amounts of movement, not too fast, not too slow. The left side is Fast aq0 g480 and the right side is VerySlow default. In VMAF’s opinion, the left side is a tiny bit better than the right side. PSNR rates the right side as a tiny bit better than the left. However, according to MS-SSIM, the right side is considerably better than the left.
x264 obviously agrees with MS-SSIM since that’s the one with adaptive quantization and a wider search space through the preset. In my personal opinion, the right side image is the better one.
Clearly VMAF favours smooth blurring while SSIM favours fine detail. When looking at the above image, take a close look around the bullet cartridge. In the right-side image it retains more detail, but some light bleeds into the surroundings. In the left-side image it’s blurred over, but looks cleaner at the same time. Another good example is the label on the gun. The left-side is smoothed over, while the right-side retains more detail, but it looks somewhat distorted in a different way.
With this in mind, pick your preferred metric for Apex 1080p60 and use the finalists graphs to examine the best preset for you.
x264 Medium 3946kbps vs 4914kbps
[twenty20 img1=”1544″ img2=”1545″ direction=”horizontal” offset=”0.5″ align=”none” width=”100%” before=”Medium 3946kbps” after=”Slow 4914kbps” hover=”true”]
Above is the same frame from 2 other encodes. Both are the Medium preset with no additional flags, the only difference is the bitrate. Left-side was set to 4000 and came out at 3946, right-side set to 5000 and gave 4914. The difference is not blatantly obvious, but it does make a difference.
Conclusion
There is a little variation in the scoring in this test compared to the same at 720p. The differences between MS-SSIM’s and VMAF’s rating are more obvious and you’ll need to choose which one you prefer.
Apex Legends actually scores pretty low across the board compared to other games. It is indeed a beautiful game, and the speed it plays at will both combine to affect stream compression capacity.
In the end, for this resolution and game, you simply need to select your own preference. Would you pay the price of blockiness to retain some detail? Or do you prefer the image to be smoother at the price of blurring out some detail? In this case, I personally prefer the MS-SSIM results, but I can understand why some people prefer the other.
Data
Agamemnus has a passion for gaming and an eye for tech. You can see him streaming occasionally on twitch.tv/unrealaussies and catch him on the Unreal Aussies Discord. Evidence > Opinion.