PC/NAS Upgrade: Re-Evaluating Video and Audio Encoding in OBS and Vegas

Phase 1 of my big upgrade/migration project was a good chance to reevaluate my recording/rendering settings, so as here’s an extra write-up relating to that.

Left: x264 Medium CRF20. Right: x264 Very Slow CRF14. Note: both integer-scaled to 2x.

This is part of The Big 2022 PC and NAS Upgrade.

 

Quick definitions

Feel free to skip this bit if you know what the bold terms mean.

A video container (e.g., mp4, mkv) is a box holding an encoded video stream, some quantity of encoded audio streams, and some metadata (and sometimes other stuff too, which we’re ignoring here).

A video stream can be encoded in a specific format (e.g., H.264) using a specific encoder (e.g., x264, which encodes H.264).

NVENC is a hardware encoder on modern Nvidia GPUs. It can encode different formats including H.264 and H.265. As of writing, NVENC on RTX 4000 cards can also encode AV1.

CBR stands for constant bit rate. CBR video maintains the same bit rate throughout, regardless of whether the current scene is complex (requiring more bits to encode accurately), or simple (requiring fewer bits to encode accurately). This tends to result in visually complex scenes looking comparatively low quality. CBR is mostly used for real-time applications like streaming. Increasing bit rate when using CBR will increase quality and file size.

VBR stands for variable bit rate. VBR video tries to allocate more bits to complex scenes and less bits to simple scenes, whilst still targeting a rough average bit rate (see also ABR: average bit rate). Although common in audio, it’s not that used in the specific niches of video which I’m involved in and is mentioned here mostly for completeness. Increasing bit rate when using VBR will increase quality and file size.

CRF and CQP/QP stand for Constant Rate Factor and [Constant] Quantization Parameter. Without being too technical, they try to maintain a consistent image quality regardless of scene complexity (there’s slightly more to it than this, but it’s approximately accurate). This means that they will use a higher bit rate for complex scenes, and a lower bit rate for simpler scenes. A lower CRF or CQP/QP value will increase quality and file size.

Encoder presets like “fast” and “medium” basically describes a tradeoff of encoding complexity. Using more encoding features is more efficient for quality vs file size, but uses more processing power. Real-time encoding (such as when recording or streaming) requires you to trade encoding performance for encoding quality. For NVENC this is “P1” to “P7”, with a lower number providing lower quality but faster encoding (less taxing on the encoder).

What I’m working with

My video-related workflow I specifically involves:

  • High-quality screen recordings (games, tutorials)
  • Passable-quality game streaming1
  • Editing and then rendering the edited video
  • Sharing clips of any of these sources without re-encoding the video stream whenever possible

The first two are done exclusively in OBS Studio, the third is mostly done in Vegas (currently Vegas 18),2 — although I’m considering Davinci Resolve — and the last is mostly embeds on Discord.

I needed something to break up the wall of text but have no relevant images on hand, so here’s a cat.

High quality recording

Everyone’s definition of “high quality” is different. For example, FFmpeg’s H.264 documentation considers x264 CRF 17 or 18 “visually lossless or nearly so”, yet I don’t personally think it’s that close to meeting that description. I personally was still slightly disappointed by the quality of x264 at CRF 14, which produces files that are like twice as large as CRF 18! Like don’t get me wrong, it looks pretty good — but there’s no way I’m confusing it for the original if I’m familiar with the source material.

Quite recently before porting to my new install, I had switched to using NVENC H.265 for local recording as it presented the best compromise between these factors roughly ranked in order:

  1. Game performance (I’m not willing to experience choppy/laggy gameplay for the sake of recording).
  2. Video smoothness (e.g., is it choppy at all during playback?)
  3. Video image quality
  4. File size
  5. Usability / compatibility

Remember that I’m testing this on my specific hardware. If you have an RTX card your encoding quality is better. If you have a slower/faster CPU, your x264 / x265 encoding will be different. Etc.

Here’s a little table for my evaluation of x264 and x265:

Encoder settingPerformance PenaltySmoothnessImage QualitySizeUsability
x264 CRF 18 FastTolerableSome minor choppiness under heavy loadGoodTolerableGreat
x264 CRF 18 FasterTolerableSmoothGoodTolerableGreat
x264 CRF 14 FasterTolerableSmoothVery GoodPainfulGreat
x265 (any equivalent setting)UnacceptableN/AN/AN/AN/A

For a while I was using x264 CRF 18 Fast, but the minor choppiness eventually put me off it. I switched to CRF 18 Faster but was a little disappointed with the quality after watching back my XCOM:EW playthrough.3 Complex particle effects and high-complexity lighting (think holograms) was where I especially wanted to see a better result. Basically the most complex/detailed parts of a scene still weren’t quite what I wanted quality-wise.

As such, my later XCOM 2 playthrough used x264 CRF 14 Faster. I was content with the quality but the file size is so prohibitive for those longer recordings that I still haven’t finished uploading the files a full year later because it takes like an hour to upload an hour of footage at my usable-but-underwhelming upload speed. Saturating my upload is a real pain for actually using the connection to do anything else.

Queue my recent experimentation with H.265.

It didn’t seem like x265 was competitive on my CPU for real-time encoding (not enough processing power), so after discovering that I could make use of NVENC H.265,4 I started testing that.

I still haven’t finished evaluating it, but right now I’m trying CQP23 at P4, with single-pass and no look-ahead (two-pass and look-ahead both seemed to overload the encoder for me).

Xaymar — the person behind StreamFX — recommends using P1 for recording but in my testing so far I haven’t seen an obvious difference in frame delivery between P1 and P4.

What I did quickly notice was how choppy P6 (the default) setting was. The weird thing is that the result wasn’t dropped or duplicated frames. It seems like instead you would get frame timing like:

X——-X—–X———X—-X—–X———X—X

instead of a smoothly delivered:

X——X——X——X——X——X——X——X

Where often a frame would be late rather than dropped. I didn’t verify if this was indeed an encoder load problem,5 instead tentatively assuming it was true and switching down to test at P5 and P4.

I checked to see if there was any in-game performance penalty between the P1 and P4 with my setup by running 3DMark Time Spy while recording, but the difference between the scores was within margin of error. Note that I did lose ~7% score with either option compared to the control score of not recording at all though.

Recording settingTime Spy score% of baseline
No recording (OBS closed)6411100.00%
NVENC HEVC CQP23 P4598493.34%
NVENC HEVC CQP23 P1595092.81%

Nvidia’s own NVENC settings guide unfortunately uses older configuration options from before the P1-P7 options were in use, but they recommended “Quality” or “Max Quality” out of what I think were only maybe four or five(?) total options (with “Max Quality” being highest and “Quality” being the notch below it),6 suggesting that “Quality” would map to perhaps P5 or so? There isn’t a 1:1 translation of settings between these two sets of configurations so it’s hard to say.7

There’s a couple of interesting tidbits in the Nvidia guide: they say that HEVC (H.265) is only 15% more efficient than AVC (H.264), which is an extremely low figure compared to most other comparisons which typically say ~50%. To me this implies the NVENC HEVC encoder is comparatively worse than the NVENC AVC encoder, which wouldn’t be too surprising given streamers almost all stream with AVC so Nvidia probably spent more resources on optimizing that encoder.

They also recommend encoding at CQP 15. The size of long recordings would be fucking enormous.8

However, that extremity’s got me thinking about having a separate recording profile for short vs long recordings. I’d strongly prefer to use one single profile for both if possible because inevitably one time I’ll forget to switch profiles, but I might toy with it. Maybe clips at CQP16~20 and then longer recordings at CQP 23~24 for example.

I should be able to gain some ground with either CPU-encoding on my future 5950X upgrade, or whatever hardware encoder are available on my GPU upgrade. A few weeks ago I would’ve said that I would definitely be swapping to using AV1 for local recordings, but after doing test renders in my video editor, I’m not so sure about that anymore. I’ll discuss that more in the editing section.

An example frame from my XCOM: EW playthrough. It looks decent, but even at just 100% zoom I’m not going to be mistaking it for a proper-quality game screenshot. Part of this is 4:4:4 chroma vs 4:2:0 chroma, which is likely the main culprit in why the red text looks comparatively mediocre.

Passable-quality streaming

I barely stream so I’m not nearly as worried about optimal settings here, plus I fully expect to be using AV1 for streaming via GPU upgrade within the next few years anyway.

I currently target a combined video+audio bit rate of 4Mbps (4,000kbps), with 3840 for video and 160 for audio. Image quality at 3840kbps is obviously quite a ways off my local recordings.9

Nonetheless, I’ve briefly tested dual-encoding while streaming: NVENC H.264 for the stream, and NVENC H.265 for the local recording.10 It seemed to work fine, and NVENC (even on my GTX 1070, which has slightly worse encoding quality than the newer RTX-era NVENC) isn’t too far off the quality I would be getting out of x264 either.

To pre-empt why I didn’t just use x264: because I don’t want to worry about performance impact on the game. Since I’m not regularly using it for recording anymore, I’m not really used to having that modest performance dip right now and using NVENC instead lets me get around that (granted fairly minor) issue.

Editing / rendering

Vegas — at least as of version 18 — is supposed to be able to conditionally import MKV files if the video and audio streams are using supported codecs. However, despite an H.265 video + AAC audio being fine to import in an MP4, Vegas flatly refuses to import the same thing if it’s in an MKV. Ugh.

I don’t do much editing right now so I’m trying not to dedicate too much time to neatly solving this; instead I’m basically just moving the MKV’s data into an MP4 file for anything I need to import into Vegas right now. OBS does have the capability to automatically do this each time you finish a recording, but for long files it takes quite a while to complete and you still have to remember to manually delete the original MKV after the MP4 is made. Given my limited current editing, I’d prefer to just do the conversion manually as-needed rather than have to deal with constant manual maintenance of all recorded files.

On the render/export side I thankfully discovered Voukoder, a free external renderer for Vegas (and some other popular editors). Having this external renderer be integrated is so much more convenient than having to use a multi-step process such as with intermediate files, which is the kind of stuff I had to do back when my upload speed was too low to get a good quality upload from Vegas’ inferior native rendering options.

I experimented with both HEVC (using x265 as the encoder) and AV1 (using SVT as the encoder) and… was surprised to find AV1 was — for my usage with large, high quality recordings — seems to be slightly worse at the same render times and file sizes.

I don’t want to bore you with the whole testing process I did (which was fine for my needs but not exhaustive I might add). The difference isn’t night-and-day huge (some people might not even notice it at 100% zoom), but AV1 just didn’t preserve details well at high bit rate compared to HEVC in my tests. It just goes to show that you can’t generalize strengths in one area (lower, used-for-streaming bit rates) into all areas – despite what I commonly see online.11

I don’t like calling out people who are trying to be helpful, but I feel obligated to say that I’m extremely skeptical about the presets built into Voukoder. They’re not provided by the developer, but rather integrated from community member iAvoe (this is a link to a comment, you can ignore the video itself).

iAvoe probably has a better overall understanding of the common video encoders and their settings than I do, but they lose a lot of credibility with me by recommending specific settings which are considered “not sane”; ones that increase encoding time significantly whilst providing almost no improvement in encoding efficiency.

For example, for the “very slow, good quality high compression” preset with x264, they use --bframes 13 --b-adapt 2, yet they do this at CRF 19.5. What the hell?

MeGUI (a video converter) has this in their (granted somewhat older) documentation:

Recommendation: Unless you use –b-adapt 2 choose –bframes 16, the maximum. This is the fastest and most flexible option for the encoder. If you use –b-adapt 2 much lower –bframes is reasonable, like 2-5. Higher values will significantly slow the encoding without major benefit.”

In the same custom preset they use --me esa, a search pattern which, as per one of the x264 developers:

“only exists as a reference for me to compare the real motion estimation algorithms to”

You can see an x264 developer visually describe the differences between diamond, hexagon, uneven multihexagon (umh), and exhaustive (esa) in a thread from 2005. As one other commenter put it:

“Exhaustive searching is intended to find the upper bound as every position is checked. It is far too slow for any reasonable use other than this.”

x264 dev’s explanation of the uneven multihexagon search pattern. Image credit: akupenguin via doom9.

Obviously computational power has improved a lot since 2005, but I’m still extremely skeptical that esa / tesa are suited for normal usage, even if you’re trying to improve compression. Sure, your video will compress a little better, but holy encode times batman. This isn’t a “very slow” set of encoding options, this is a “comically slow” set of encoding options, and they probably still can’t match the quality of just notching the CRF down a bit. If you’re encoding at CRF 19.5 and want better quality, don’t quadruple your render time – just lower your CRF to 18 or something.

I more-or-less agree with the premise of “the original presets are not necessarily optimal” — particularly in specific cases like high-motion gameplay recording — but this is way too far in the opposite direction imo.

Unfortunately unless you want to deep-dive into testing settings on your own system (i.e., at least days of work), you’re stuck relying on the presets and/or other people’s recommendations. In this case it’s not clear that the custom presets are superior to the default presets for general use, and importantly no data has been provided to assert otherwise. They might be better, but it’s just not practical to set aside 50 hours to test a hundred different combinations of settings.

My recommendation for Voukoder users: if you have a bit (but not a lot) of time, benchmark the custom preset vs some of the default presets using some relatively short clips (ideally at least a few minutes of footage that’s representative of what kind of video you’re usually working with). Record the encoding time, the file size, check the VMAF, and then look at the clips yourself (THIS LAST PART IS IMPORTANT: VMAF is not magically perfect at universally quantifying viewer perception). Then decide if the settings make sense.

Between when I started working on these articles and actually publishing them, I decided to do a short sanity check on the presets. While I can’t commit the time to exhaustively refute / support each of iAvoe’s setting recommendations, I can say that I tested their x265 presets against ordinary x265 presets that provided similar encode times and similar file sizes. In my tests iAvoe’s settings were not clearly better or worse than the normal presets. This DOES NOT generalize to all situations, but is enough that I’ll personally be sticking with the more widely-used presets until I get clear evidence in favor of using specific custom settings.

Sharing clips

Because I’ve chosen a Discord-incompatible format for recording, I currently just re-encode clips with x264 before sharing them. I think AV1 is coming to Discord soon™, so I might swap to encoding clips-for-sharing with e.g. SVT when the time comes, but I’ll evaluate that when it’s actually become immediately relevant.12

I usually keep both the original source and the re-encoded clip so this process is a bit space-inefficient. However, when taken as a proportion of all of my recorded videos it’s not particularly significant though, as a minute or two here and there is nothing compared to hours of recordings.

Audio formats

(this is sort of a bonus that I tacked onto the main thing which was originally just about video and not audio)

There are two audio formats I investigated for recording: AAC and Opus.

If you don’t care about small differences, then let me save you the reading time: if you’re using an encoder of at least average quality (you probably are), it’s difficult to hear differences between these two when using at least moderate bit rates (128kbps+). I wouldn’t personally recommend that most people bother to optimize further.

The results of a ~96kbps listening test done back in 2014. Because our bit rates are higher, it’s not strictly fair to extrapolate these results upwards. Image credit: https://listening-test.coresv.net/results.htm

For the three people still left, here are the condensed results of my investigation, beginning with usage in OBS:

  • OBS by default will use AAC audio encoded with FFmpeg, which is an average quality encoder at time of writing.
  • Because OBS supports custom FFmpeg outputs, you could change this to Opus output (using whatever the default encoder is) if you really wanted to, but using custom output introduces additional limitations and complexity (and so I don’t think it’s worth it for normal usage).13
  • Although Opus does have difficulties with specific types of audio samples (so called “killer samples”) because of technical design reasons, AAC is not immune to this issue either and has its own struggles with specific types of audio. For general usage at higher bit rates (192kbps+) it’s unlikely to be an issue with either format though.
  • The Apple AAC encoder is considered to be the best quality AAC encoder, and interestingly can be installed for use with OBS fairly easily.

On balance, my current recommendation for high quality recording with OBS is to use AAC encoded with the Apple AAC encoder, although there are certainly some use cases at low bit rate where Opus is likely to be superior — I just don’t personally need that so haven’t bothered to look into it in detail.

The justification essentially boils down quality gained vs complexity added. If switching from Apple-encoded AAC to Opus, it looks like at high-ish bit rates you gain only minimal quality in the original recording, yet you most definitely gain extra complexity in order to use Opus.

Onto the choice in Vegas:

  • Vegas is much easier to work with if importing AAC rather than Opus (even if put into a natively-OGG container),14 which is already a good enough reason for me to use it regardless of any of the above.
  • For exporting, Voukoder does support the FDK AAC encoder (with some extra steps), which performs slightly better than the default FFmpeg encoder in listening tests.
  • Despite having better overall quality, one tradeoff which the FDK encoder makes is completely slicing off some of the highest frequencies.15To be extra clear: it will still typically outperform the FFmpeg encoder despite doing this — or perhaps even partly because of doing this.16
  • Voukoder also natively supports exporting Opus audio with no extra fuss.

On balance my personal preference here is definitely to use Opus. It’s basically a flipped version of my OBS recommendation: we can use it here because it’s easily accessible without any extra hassle. Playback compatibility is slightly worse (fewer things natively support Opus), but YouTube will ingest it just fine.

There is a not-completely-trivial argument that whatever codec being used in OBS should be re-used in Vegas because this is likely to cause less generational information loss than using a different codec for each stage — particularly if using AAC. Unlike what you might be familiar with regarding JPEG memes, AAC handles appears to handle multiple passes of compression with comparatively little additional information loss for each pass beyond the first.

However, I’m normally only re-encoding once or twice so AAC’s repeat-encode advantage is likely minimal. Further, Voukoder’s best AAC encoder is FDK, which is inferior to (and has more quirks than) Apple AAC. I could request the latter as a feature but it seems so comparatively unimportant, particularly in light of the “Voukoder successor” in development.

Although it’s not a 1:1 comparison because it’s been converted from one channel to two as part of the encode, this white noise sample clearly shows FDK’s frequency cutoff.

So in summary for audio, at least for my needs:

  • Most users don’t need to worry about audio encoding technicalities in OBS, but if you care you can use Apple AAC instead of the default AAC encoder for slightly improved quality. Using Opus would require more effort and I don’t personally think the minor quality advantages are worth it for typical usage, although there are niches where it might make sense.
  • For Vegas — using Voukoder — Opus is the best option overall, but AAC (either FDK or FFmpeg) are also “good enough” quality if you need AAC for compatibility reasons.

Final notes

It’s worth being sure what your goals are with your choice of video / audio codecs and encoders. For example, I want video image quality good enough that I can take a freeze-frame and that frame could pass as an original screenshot at a glance. If you A/B with an actual high-quality screenshot (even if it’s lossy) it’ll be obvious that my video frame is lower quality, but without it you might believe that I pulled the image straight from the game and then just saved it as a JPEG with slightly-too-aggressive compression.

For both audio and video I also want to maintain a high enough quality source for later re-encoding (e.g. for editing). Similarly to the above, my desired result is that it will seem at a glance that the twice or thrice encoded video/audio will still seem almost like I used the original full-quality source as my input.

However, if maintaining maximum quality and eliminating edge cases is actually quite important to you, don’t mess around so much: use a lossless audio codec (0% chance of killer samples) and pair it with a well-researched video codec + encoder with enough bits to make your eyes water.17 Everything is a tradeoff, so have an idea of what you value most so that you can make the decisions that’ll lead to the best outcome for you.

 

Phase 1: storage + software (2022)

Phase 2: CPU + GPU (2022~2023)

  • CPU and GPU Upgrade (ongoing)
  • Leftovers and Looking Ahead (ongoing)

 


 

  1. Not that I stream much though 😅
  2. Just upgraded to this thanks to Humble’s most recent recurring Vegas bundle, having mostly used Vegas 14 until just prior.
  3. I most likely would have felt the same if I had used CRF 18 Fast for it; the difference between them is obviously not enormous.
  4. Initially using StreamFX, though it seems it was natively integrated into OBS itself later on? I couldn’t find a specific timeline for when it was added.
  5. I’m not sure how to quantitatively track that – OBS’ stats window didn’t note any issues but they were clearly present during playback and checking frame-by-frame playback there were no duplicate frames either.
  6. Seemingly corroborated by this post.
  7. Xaymar says that “High Quality” translates to P4 though (see notes at bottom of page).
  8. I assume their recommendation assumes you’re only using this for clips rather than recording full gameplay sessions.
  9. Unless you’re a partnered streamer you have to be willing to make some compromises for watchability. Non-partners (even affiliates) don’t get guaranteed transcodes, so you have to ensure your source bitrate balances between people on good connections and people on subpar connections.
  10. The much higher quality recording can be uploaded to YouTube later.
  11. There’s quite a few “just use AV1 for everything lol” type of people floating around.
  12. AV1 encoders are still being actively developed and improved, so there’s not much point locking in settings well before you actually use them.
  13. For example, NVENC HEVC/H.265 doesn’t seem to be natively supported for FFmpeg’s custom output in OBS (I don’t know why). You can get around this specific issue by using StreamFX but then you’re relying on third party development for a key part of your recording pipeline.
  14. The Opus container is basically just the OGG container, so for situations where behaviour is hardcoded based on file extension, you can actually rename a file from filename.opus to filename.ogg to “gain” support if an Opus data stream is supported by the Opus container itself isn’t recognised.
  15. The cutoff varies, but the highest it can go is ~17 kHz.
  16. Most adults have little or no hearing in the relevant frequency range, so not allocating much/any bits to this audio allows for more to be spent elsewhere.
  17. I don’t actually know what codec + encoder that would be for video, but I know that very few people should be targeting it.

Leave a Reply

Your email address will not be published. Required fields are marked *

I accept the Privacy Policy

This site uses Akismet to reduce spam. Learn how your comment data is processed.