A Few Tidbits on Syncing Music Timing in Videos

Over the years I’ve spent a bit of time during various projects matching music tracks to onscreen events. Emotional tone is generally easy enough – putting sad music during a sad scene isn’t exactly a revelation.

What’s more difficult to get right is matching specific elements of a particular backing track with specific things happening on screen. A gunshot, an explosion, a dramatic stand off. A pause, a look, a glance, a reaction or joy, or a reaction of sorrow. Trying to match that to a beat, a strum, or chord progression – that’s where things get particularly interesting to me.

To make a long backstory shorter, I recently saw this, found the song in the comments, and then thanks to those same comments discovered that the song was used here because it was being memed to death.

The song, For the Damaged Coda, was used a couple of times on Rick and Morty – once in April 2014, and again in September 2017. It seems as though this second usage is what sparked its more widespread usage. This little journey of discovery got me thinking about music syncs again, and I wanted to share a few things I’ve picked up by going through some examples.

Unfortunately my artistic abilities are mostly limited to stick figures1 at this stage, so a storyboard that properly communicates the intended expressions and scenes would be.. challenging for me, particularly for any visually complex or difficult-to-frame scenes. Instead I’m going to borrow an old clip I’ve used before for the sake of convenience – but we’ll get to that later.

Now, there’s a few parts of the song that need to factored into the video sync to make it work optimally. Let’s return to For the Damaged Coda for our example.

  • The intro in the song has a specific tone to it. It’s solemn and somber, with a sprinkling of general sadness and light despair for good measure. As shown in the Rick and Morty episode it works for simple narration quite well in the right emotional contexts, but could also potentially be a voiceless scene following up from a section with dialogue
  • The leadup to the “drop” is actually quite a long almost-silence – longer than you’d usually find in most songs. Getting a great match on the visuals of a dramatic pause like this can be a bit of a challenge because it needs to contrast with the drop to really drive the emotional hit
  • The drop itself works best if it coincides with an immediate dramatic change on the screen. I feel the Rick and Morty sequence is a little lacking here, since it does a slow pan in the same scene with the same camera shot. it’s hard to have that hit of drama if you use the same camera shot without any new elements being immediately introduced because the camera can only pan and zoom, neither of which can match the speed of the drop hitting after the silence.
  • The first second or so following the drop work best if the initial sudden visual switch (which has already been introduced on-screen with the drop) increases in scope to the viewer. The easiest way is literally, such as zooming out to see more of the scene (similarly to how it’s done in the Rick and Morty episode).
  • For the longer-term followup (the 10+ seconds after the drop, but mostly excluding the first second or so following it), you have two options. 1) the drop is so hard hitting that you don’t need much of anything on the screen (such as if you’re rolling credits after presenting a distressing shock onscreen) and the viewer simply processes what just happened without having to be distracted by more on-screen action. 2) the drop is only a piece of the larger emotional hit, in which case you can further play things out on the screen to keep adding to things.

I’ve put together a few examples to demonstrate how the feel of the scene can differ by just changing the timing / sync of the onscreen visuals with the supporting music.

This my friends is the opening cinematic for MechWarrior 4: Vengeance, released when I was still learning the alphabet. We’re only interested in the scenes up to 4:27 in this upload, since the uploader has put the campaign cinematic in there as well even though that’s a separate thing.

The article doesn’t require you to watch the full cinematic as a prerequisite, but it’s there if you want it. If you use speakers/headphones with recessed mids it might be helpful to reference the dialogue because in my mixes a few dB difference can make it too difficult to understand.

One thing I discovered a few years back when syncing music to onscreen action: you often want to delay the music by a couple of frames or so (depending on the frame rate and what’s on screen). If you actually sync the music frame-perfectly with no offset or delay, the music usually feels like it starts too soon. I’m guessing this is something to do with the time it takes to initially process a visual change vs initially process the sound, but it’s just a working theory.

In any case, here’s the first variation of the sync with the MW4 clip

I tried to sync the drop with the impact of the mech with the ground. It works.. okay, but there are a few issues that significantly reduce the emotional impact of this variation.

First, the mech doesn’t have a single “thud” moment where the whole thing hits the ground at once (think like a big rock or something hitting the ground in a cartoon). The mech falls more realistically, with a somewhat gradual collapse. This might be fine in an action sequence (which is what the original scene is), but makes it difficult to get the audio drop to sync cleanly. Because it doesn’t pair nicely the emotional hit is lessened.

The second thing is that the second or so after the music hits, the visual followup doesn’t work great. The explosion is okay (some good, some not so much so), but the pacing of the Vulture (second mech) waddling away afterwards definitely has a tone mismatch.

The longer-term followup afterwards is pretty poor as well, but that’s something that can’t realistically be changed if the visual media is prebaked like this. All of these variations suffer from this third point, so I’m only going to mention it here once, but it does apply to all of them more or less equally. One thing that can be a crappy drop-in replacement for a real visual followup is taking the pan towards the end and slowing it down and fading out there. If you really used a shot like this it would obviously look a little different, as the pacing is all wrong when it’s slowed down compared to when it’s played at normal speed (it spends way too long looking at the sky etc). You can get the gist of it though.

Something that does work quite well with this window of timing is the dialogue. During the almost-silence preceding the music kicks back in you have the frantic yelling of “EJECT! EJECT!” and you can see and hear the other pilot panicking leading up to the explosions. So long as this type of dialogue is placed within the almost-silent section of the music, the specific timing usually isn’t as important so long as the tone of it, as well as other elements (particularly the drop itself) sync nicely.

Here’s number two. In this variation I wanted to see how well it worked when synced to the camera switch right after the missiles hit.

Turns out, not that well. The drop itself lacks the onscreen punch required for this shot to be particularly powerful, and since the mech doesn’t fall suddenly but instead somewhat realistically (the mech is initially falling quite slowly for the first handful of frames), the first quarter second of onscreen action feels way too slow for the dramatic change that the musical drop brings.

If you pull the music to be a frame or two earlier it feels like it plays too early. If you push it a frame or two later, it feels like it’s playing too late. It’s a lose-lose-lose situation with timing, where the best available timing still doesn’t work well.

The followup two seconds is the dramatic explosion, but that’s both a blessing and a curse. It shifts attention away from the original music drop which would usually be bad because it lacks the sudden juxtapositional impact that you get with preceding silence. In this case however, it’s probably better than having a weaker followup since the drop itself is so weak already.

If the viewer had a more significant emotional investment in the now-dead pilot, that would have certainly helped to make up for poor “sync-punch” by having just straight up emotional distress and shock. Since we don’t really know the pilot, we don’t actually care that much what happens to them. In any case, for the clip being used, this timing is a poor choice for a sync. Onwards to number three!

Here I revisited timing similar to the first iteration, making sure to not repeat the fatal mistake of having a weak drop like in the previous iteration. I’ve tried to make the drop sync with the explosion after the mech has already hit the ground, which resolves the issue of the falling mech not having a concrete (hur hur) frame where the mech specifically impacts the ground.

Unfortunately doing that introduces its own issue. There also isn’t one nice clean explosion to sync with here, but instead around four discrete events that occur after the mech is on the ground. First the right side sparks out, then the left side explodes, then the middle explodes, and finally the explosion flares up afterwards. The flaring afterwards aren’t a concern for the drop itself (they’re minor compared to the two explosions prior), so it’s not a concern just yet. However, because there are multiple discrete explosions, you miss out a little bit on the idea of a “big single hit” that coincides with the drop. I chose to ignore the first minor sparking explosion on the right side (mech’s left) and instead sync the drop with the explosion on the left side, which sort of allows the middle explosion to follow along afterwards fairly naturally.

The drop only works so-so here (mainly due to the reasons mentioned above), but there’s a saving grace – the flaring works really well here with the drop’s immediate followup. While you’re still processing that the initial drop didn’t time perfectly, you suddenly get the distraction of a sync that does work well and suddenly it seems okay. This really underlines the importance of the immediate followup to the drop itself, as it can save a mediocre drop as shown here, or help make an already-good drop amazing.

Something unique to this sync is that you get some extra tension from the mech falling and you thinking “you know, maybe she could survive that?”. Then the whole thing just explodes and you get a bit of an “oh shit..” moment. None of the other syncs have something comparable to this extra element, and it’s a large factor in why I like this sync the most out of all of them.

The fourth and final iteration is one I made later, as a sort of spinoff of #2 to see if I could keep a somewhat similar timing while solving its problems. Instead of syncing to the camera shift of the mech dropping, I synced to the explosion that happens right before it.

The answer is sadly no, not really, but it lets us walk through why it doesn’t work.

First, the large explosion I’ve synced to isn’t the first explosion you see in that shot, which immediately shaves off some of the “shock” factor. Secondly, the large explosion cuts to the mech-falling shot which unfortunately kills some of the momentum of the sequence if you start the drop before it. It’s not awful, but better syncs are available. A small positive is that the grimace of the surviving pilot as he turns away from the explosion works pretty well, it’s just overwhelmed by other elements around it.

The immediate followup is also so-so, which just adds up to a total sync that’s as much. An issue unique to this variation is that putting the drop this early cuts off (or rather drowns out) the end of the pilot’s panicking dialogue, which I think is a bit of a loss.

Let’s hop back to the Ricky Ponting and Mortimer episode that seemed to make this track popular and walk through the timing in that.

It might not be immediately noticeable if you haven’t listened to the track a few dozen times2 but the audio here is actually tweaked in at least three places by my count.

  1. The start of the track has the first two beats a combination of lopped off and extremely faded out. This more smoothly introduces the track into the mix without resorting to a much slower fade in (which makes the initial introduction have no punch). The fade is partly covered the blaster shots as well, which gives you a “free” transition into the mix without making it seem too abrupt.
  2. The audio leading up to the almost-silence is cut short. Maybe this was done intentionally for pacing reasons (the pre-drop does last quite a while), or maybe it was just for brevity. Either way, not much is lost by doing it in this context.
  3. Unless it’s just an oddity of this upload (it’s not an official clip upload), the drop has had its volume upped slightly, and then the volume returns to par before the vocals kick in. If this was intended, it’s probably just for extra dramatic effect, both on the drop and the very-immediate followup.

Here’s a comparison of the two waveforms, which make points 2 and 3 easier to notice (the peak at the start of the red line is the drop). There’s a little bit of extra “noise” in the Rick and Morty one because there’s dialogue and other sounds happening at the same time, but you can still see the differences. The audio clips are synced to where the piano starts to play using both hands, which coincides to between when Morty says “politics” and “order”, and is indicated by the yellow dotted line.

Notice how in the original (top) the audio peaks are all uniform, whereas there’s a large discreency between the drop and the subsequent sound levels in the Rick and Morty “version” of it.

You can also see the couple of seconds missing at the start compared to the original.3 The part missing before the drop is also noticeable here, although since Morty is speaking during the silence (changing the waveform) it’s harder to see exactly what part is cut off without also listening to it.

Moving on to sync itself, I think it’s done pretty well, but is still a touch lacking. It’s possible I’m missing an element of the emotional impact because I don’t have the full context of the episode (which a normal viewer would have) or even the series (which many viewers would still have).

One luxury which you enjoy when not using prebaked media for the visuals is that you actually have a leadup with a better tone than whatever you can scrounge together from a game cinematic. Said leadup works quite well here, and a monologue is much the same as narration pacing-wise.

My issue comes mostly from the way the pause and the drop interact here. If using the pause for the dramatic moment, I favor keeping the following scene low-impact so as to not take away from what’s just been seen (such as rolling credits). If you want a dramatic moment with onscreen following, I’m of the opinion it works best to have the apex of the dramatic moment at the drop.

In this clip however, the drop coincides with increasing the scope of the situation — in quite a literal sense — but because the scope increase has no suddenness to it, I personally feel like this is one of the weaker uses of the drop.

An alternative sequence that maintains the same timeline would’ve been to see someone dramatically press a large button to eject the bodies during the pause, then switch to the outside camera during the drop. You’d probably need to start the shot slightly more zoomed out as well so as to actually give you a sense of how bad the area outside truly is right from the get-go (otherwise the drop is too weak).

Imagine if Morty is pressing the button right after he says action and the pause begins. The ejection sound plays and Morty looks through a porthole (with a narrow view by making sure the camera is further away from it) and sees the bodies outside (camera still inside at this point). They begin floating a bit further out, perhaps even outside of the view of this narrow porthole. THEN the drop hits as the camera switches to the scene outside with the just-ejected bodies floating alongside previous bodies.

In any case, the not-my-favorite drop is followed up longer-term very effectively, and the use of bodies at varying distances that are all visually different (particularly with different visible injuries) really drives things home nicely. The floating papers towards the end of the clip really seal things for me, and I’m guessing if you have more context that they work even better.

I’ll close out with a bonus clip: a sync I stumbled upon many years ago while in high school, coincidentally using the same MW4 cinematic. The impressive thing about this particular sync is how well it works over the duration of an entire song – that song being “Power of the Horde” from Warcraft 3. I cringe a little looking back at this,  but damn if it doesn’t sync amazingly for something so long. Sure, a lot of the syncs are approximate only, but they’re still close enough to definitely work. This is a big part of what got me into playing around with syncs like this, and I’m glad I found it way back then.4

 


 

  1. Or building upon an existing piece.
  2. Perhaps while writing an article..
  3. I’ve only sampled the audio starting from when the blasters fire, which is the earliest time the song could possibly be playing – the silence before that has been added for comparison purposes.
  4. I have no idea how I discovered this sync though.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.