Subscribe to AfterDawn's weekly newsletter.
AfterDawn Blu-ray Encoding Tutorial Lesson 7
Advanced Audio Processing
Although MeGUI's audio encoding options are extensive, they are by no means exhaustive, particularly when it comes to Blu-ray discs where the available space often permits uncompressed audio streams. Additionally, being primarily a video-oriented tool, there may be times when you may have to combine separate files, each containing just a single channel of a surround source. In this guide you will learn how to use a free audio editing tool called Audacity for these jobs, and even learn how to use it for Dolby Digital (AC-3) encoding.
The Complete AfterDawn Blu-ray Encoding Tutorial
Encoding video for Blu-ray is easy, but not necessarily simple. To make it easier to learn we have divided this tutorial into several individual lessons, each of which addresses a single step in the process. At the top and bottom of each lesson is a navigation menu where you can jump to any other lesson in the series. You can easily return to a previous section for review or skip over any future section. It is recommended that you read the entire series at least the first time through.
Official AfterDawn Blu-ray Encoding Tutorial feedback thread
We have created a dedicated discussion in our forums (open in new window) for feedback on this tutorial. We would love to hear your whatever thoughts you have. Tell us what you liked or what you didn't like. Let us know if there was something you didn't understand or even something that was just plain wrong. We strive for 100 percent accuracy in our guides, but nobody's perfect. Any help you can give us in getting a little closer to that goal is appreciated. Our goal is to help you out, and anything we can change to do a better job of that is an improvement.
The BDMV (Blu-ray) specifications include support for LPCM audio, which is uncompressed and lossless, in a variety of channel, frequency, and bit-depth configurations. If you happen to be encoding for producing a BD9 (BDMV on dual layer DVD) disc, leaving the audio uncompressed may not be a viable option due to space limitations. If you are making a BD5 (BDMV on single layer DVD) disc there almost certainly won't be room for uncompressed audio. However, the drastically increased size of an actual Blu-ray disc, whether single or dual layer, is large enough that even a surround encoded LPCM stream will often fit easily.
An Introduction To Audio Resolution
You may never have thought of audio in terms of resolution, and with good reason. Prior to Blu-ray the resolution of audio on the two widely used consumer disc formats had a set resolution. Technically DVD-Video allowed high resolution audio, but didn't require players to support it or use media with enough capacity to make it practical. In reality, though, they are more similar than different.
A Sampling Primer
As complex as video encoding may be, in many ways it pales in comparison to even the simpleest uncompressed audio formats. The problem isn't so much that audio is complicatd, although it certainly can be. For the most part the confusion comes from the impossibility of making audio completely digital.
The problem is that audio is inherently analog. A digital signal has defined states and discrete pieces. A digital camera's only analog technology is the circuitry required to measure lightwaves. Once that is done it is digital forever. Even modern TVs use individual pixels designed to match up to the resolution of a video frame. But audio is different. Whether it's a voice, a special effect or a note from a musical instrument. Sound is nothing more than variations in air pressure. It does not start and stop to match the clock of a computer. It begins as an analog signal, and after being sampled and processed it must return to the analog world to be played.
But since the purpose of this guide is working with audio which has already been digitized, you will need to understand the properties of a simple digital audio stream. At the same time, you cannot ignore its analog properties so the best place to start is with an explanation of how it is sampled. But before jumping into how audio is sampled, let's begin with a more straight forward analog to digital conversion process to get a simpler point of reference. We'll begin by exploring digital video.
1. Photosensors Measure Light
- Even though it seems like a digital video camera captures an entire frame all together, in reality there are thousands of individual photosensors independently capturing as little as a third of the information for a single pixel. Each sensor is sort of like a miniature solar cell fitted with a filter to make sure only a single color, either red, green, or blue, gets through. The more intense that particular color is, the greater the charge that's built up.
2. Combining Colors
- Once the red, green, and blue intensity is measured by the photosensors, it must be combined to generate a single RGB value for each pixel, charging another component underneath them to one of 256 unique voltages. In reality the voltages probably wouldn't be combined just yet, but to keep things simple we'll do it now.
Now that the light has been measured and converted into a voltage, the charge for each pixel is measured and a RGB value is assigned to the corresponding pixel in a video file. This is a sample. The process is repeated until the entire frame is saved. Although every sample for a single frame will be separated from the samples belonging to other frames by a header in the file, they are also separate and distinct from each other. In fact, rather than saying the resolution is 1920×1080, we could say each frame has a resolution of 2,073,600 samples. Assuming the camera is recording at the standard film framerate, you could also say the resolution of this video stream is 49,766,400 samples per second. In reality neither of those numbers is useful though. Instead we say there is a spatial resolution of 1920×1080 samples and a temporal resolution of 24fps, or if you prefer 24samples per second because that's the information our video encoder will used to encode the file.
Audio sampling at least begins in much the same way. Rather than a specialized measurement component, audio sampling can be demonstrated with a garden variety microphone. Even though microphones predate digital audio by decades, their very nature makes the suitable for the analog to digital conversion process known as sampling. Unlike say an old analog camera, microphones are designed specifically for converting sound to electricity. That conversion, which allows the analog signal to be represented by variations in voltage, is the basic building block of the process. In fact the voltmeter is arguably the most basic measurement tool in electronics, and certainly in analog to digital conversion.
The next step, sampling our (still analog) electrical signal is very different from what we did with the video camera. Because the audio is still an analog signal at this point, we don't have the luxury of taking a bunch of samples, then waiting and taking more. Since the analog signal doesn't stop, our sampling must follow it. Instead of getting thousands of samples all at once, we will need to get one sample thousands of times.
In fact we will be recording 48,000 samples every second. This is necessary so we can return the audio to its analog form later. We can do this thanks to the power of the sampling theorem. Algorithms developed for the sampling theorem give us a way to perfectly reproduce an analog waveform provided we sample it at twice rate of the largest frequency we need to reproduce. Since the cutoff for digital audio is typically 20,000Hz (cycles per second), the highest frequency most people can hear, we need at least 40,000 samples per second.
Fortunately we already know we are collecting 48,0000 samples per second. This will easily be enough to re-create the analog audio perfectly based on our sampling rate. That rate will be the temporal resolution of our digital audio file. And instead of continuing to refer to it in samples per second, let's start calling it 48,000Hz or 48kHz (thousands of Hertz). So we have a temporal resolution, but there is one more type of resolution we need to figure out and it is arguably the most confusing part of the audio sampling proces, at least for someone whose background is rooted in the purely digital world.
The natural assumption about bit-depth, the number of binary digits used to "describe" a digital audio sample, is that it represents the frequency, much like the binary RGB code which defines the color of a video sample. But that isn't true at all. The frequency of every sample has already been determined using the sampling theorem. What the bit-depth represents is the volume of each sample. In other words, in a 16-bit digital audio file there are 65,535 possible volumes.
Putting that all together, our audio file's resolution is 48,000 16-bit samples per second. To use a standard shorthand for it, that translates to a 16/48 audio file, or alternatively 16/48k. Hopefully you are familiar with this notation and recognize that 16-bit/48kHz audio is the standard format used for DVD-Video and has also been inheirited by Blu-ray. However, this is a standard resolution audio file. But now that you have a basic foundation in audio resolution and sampling, it's time to install some software and start working with some files.
Download Audacity and run the installer, following the steps in the wizard.
If you plan to encode your audio to Dolby Digital you will also need to download a version of FFmpeg which has been compiled specifically for Audacity. Once installed it will be available as an export format. There is also a library available for LAME, the open source MP3 encoder, however since MP3 audio is not Blu-ray compliant it won't be covered here. However you can easily download and install LAME yourself using the same process, substituting the LAME download button for the FFmpeg download of course.
Open Audacity's Preferences.
Click on Libraries and then the Download button next to FFmpeg Library to open the Audacity webpage which links to the FFmpeg download.
Click on the download page link.
Locate the FFmpeg download corresponding to the version of Audacity you are running. An installer will be downloaded. Run the installer to extract the FFmpeg libraries.
Once FFmpeg is installed you should see version information in the Preferences window. Leave the Preferences dialog open to continue with some other basic configuration.
Basic Preferences Configuration
Before you close the Preferences window there are a number of settings you should look at.
The Quality preferences can be used to set options related to changing the resolution of your audio. Make sure to read the explanation for each option closely because the captions in the Preferences dialog are oriented specifically toward capturing (recording) and may not mean what you think for opening or importing an audio file.
1. Default Sample Rate
- Although this setting appears to determine the number of samples per second Audacity will use when exporting your audio, the initial Sample Rate for your project will automatically be set to match the first file loaded in your project.
2. Default Sample Format
- This bit-depth is used if you perform a Mix and Render operation. It will not affect the Sample Format used for processing such as applying filters. That is always done in 32-bit floating point precision. I will explain more about that a little later.
3. Real-time Sample Rate Converter
- If you are previewing a project which is set to be resampled, this is the method which will be used to resample on the fly. Unless you find that the audio does not play smoothly, you should always have this set to High-quality Sinc Interpolation to use the same settings as your final project.
4. Real-time Dither
- When previewing this is the setting Audacity will use for Dithering. Shaped is always the recommended setting unless you find that it causes qualiy problems in which case try one of the other settings and see if it fixes the problem.
5. High-quality Conversion
- This is the setting Audacity will use to resample your project when saving. High-quality Sinc Interpolation is always recommended.
6. High-quality Conversion Dither
- When resampling a file you will almost always want to dither, a process used to hide the minor errors which commonly result from the process. Set it to Shaped, but make sure to test your file in case you need to try one of the other settings.
Show track name in waveform display
- Check this box if you want the name of each audio file (each channel if they are in separate files) displayed on it. Initially this will be taken from the filename, but you can also set the name for each one manually.
1. Copy Uncompressed Files or Direct Read
- When you import an uncompressed PCM audio file Audacity can either copy it and use the copy as its source file or read directly from the file you select for import. As it says in the dialog, reading directly is faster but making a copy is safer since your original file won't even be opened, let alone modified. Generally this should only become an issue if Audacity crashes, and generally not even then. It does crash from time to time though, so think about this carefully.
2. Use custom mix
- This must be selected in order to output a file with more than 2 channels. That would include anything in this lesson.
3. Show Metadata Editor prior to export step
- If this is checked you will be prompted to provide information like artist and title. Since this information won't be used for your Blu-ray disc anyway I recommend unchecking this to save yourself a mouse click when saving your file.
Basic Audacity Workflow
We'll begin with a basic walk-through of Audacity to familiarize you with how things work. It's a little different than most programs so it may take some getting used to, but once you get started it's fairly intuitive.
1. Import Audio File
- Since Audacity automatically creates a new project file for you, to add an audio file to it use the Import button.
2. Save Project
- Audacity has its own format for saving project files in a way that makes them simpler (faster) to load by breaking them into smaller pieces.
3. Effect menu
- The Effect menu includes a large number of filters you can use for processing your audio files. These include high pass and low pass filters, a denoising tool, and even a filter we will use later for changing the speed of an audio track to match the framerate change on a video file.
One of the more interesting features in Audacity is the bit-depth it uses for processing audio files. Regardless of your file's bit-depth, Audacity does all processing in 32-bit floating point mode.
Going back to my earlier explanation about bit-depth of an uncompressed audio file, typically audio editing is done using 24-bit files. The reason for this is that it provides what is called headroom. That refers to extra volume steps (unused bits) at the top of the scale. That way if you perform some processing which unexpectedly increases the volume you won't create clipping. Clipping occurs when the volume of one or more samples is raised beyond the maximum allowed based on the file's bit-depth.
Audacity takes this a step further by processing in 32-bit floating point. 32-bit floating point is actually a type of 24-bit audio file. But where a standard integer-based file has a fixed center point for the audio at 0dB, Audacity's floating point mode allows the center point for the audio to move up or down if any audio samples are in danger of clipping.
Creating A 5.1 Channel WAV From 6 Mono Source Files
In many ways it could be argued that HD DVD was superior, from a consumer's point of view, to Blu-ray. Certainly from a DRM perspective it was nowhere near as restrictive. However, there is one important part of the Blu-ray spec which offers consumers an option HD DVD could not. That's support for uncompressed LPCM high resolution surround audio streams.
While Blu-ray supports a pair of proprietary lossless audio compression formats from DTS and Dolby, the encoders required are priced far beyond the means of a normal consumer. However, the quality available from LPCM audio is every bit as good, with the only drawback being a lack of compression. LPCM is the type of audio we looked at in the beginning of this lesson.
Even though LPCM is uncompressed, resulting in extremely inefficient files, thanks to the way Blu-ray was designed, with media sizes large enough for the horribly inefficient video encoding typically found on Hollywood releases, there is actually plenty of room for this oversized audio. The bitrate savings you can achieve by encoding with x264 using optimized settings should more than make up for the difference between a format like DTS HD Master Audio and LPCM. If there's one caveat, it would be that Hollywood releases use dual layer media. While you can get dual layer recordable Blu-ray discs, they are still a little on the pricey side. However, for most home video applications you should have plenty of space on a single layer disc.
If you have six mono WAV files which make up the individual channels of a surround sound audio stream, it's simple to combine them into a single 5.1 channel file to import into your Blu-ray authoring application of choice. Begin by importing each file using the instructions above.
The key to this process, aside from the option you should have set earlier which will allow you to export audio files with more than 2 channels, is getting your tracks lined up in the right order. Unlike simple stereo files which have a Left and Right channel, the channels in surround files are simply assigned numbers. In Audacity those numbers run from the top of the project window to the bottom.
Identifying Specific Channels
Before you do anything else, take a look at the screenshot above and you will see some characteristics which can help you identify particular types of channels. They won't help you figure out whether a channel belongs on the right or left, by with just a little practice you should be able to quickly figure out which ones are the Surround (back) channels, or the Center, or LFE, or Front. Even though I have them labeled, which I will show you how to do next, I don't need the labels to give me most of the information.
Starting from the top, you'll notice that the Left and Right (front) channels have by far the most audio samples. You would expect that considering that's where most of the audio comes from. All the way at the bottom you will see the two Surround channels. In this particular case they have almost identical audio, which isn't necessarily that unusual. The main thing to consider is that they have significantly fewer samples than the Left, Right, or Center channels, but more than the LFE (.1) channel. So let's look at the two channels in the middle. The bottom channel is clearly the LFE (subwoofer) channel because it has less samples than any other channel. Likewise, the Center channel above it has more samples than any but the front channels. Those patterns will be found in most of the surround streams you run across so it doesn't hurt to remember them.
Labelling & Ordering Channels
Obviously you should have realized by now that I have labeled all the channels in my project. There are two reasons for that. The first is that the filenames, which were the initial labels, were not particularly descriptive. Since I knew I would have to make sure they were in the right order before exporting, I labelled them with their channel name. Then I moved them into the correct positions for a 5.1 channel WAV file.
1. Open Channel Options
- Clicking the arrow to the left of any channel in your project will open a dropdown list of options.
- Click here for a dialog where you can change the name of this channel. By default it will be the filename. This is a good way to change it to something more descriptive. If you set the option earlier to display the label over the waveform it will change there as well.
3. Move Track Up or Down
- These options will move the channel up or down in your project. It's important that you have your channels in the right order when you export a surround stream or else the channels will be mapped to the wrong speakers when you play your Blu-ray disc.
Exporting As A 24-bit 5.1 Channel Lossless File
Once all your channels are in the correct order, remember Channel 1 is at the top, you can export to a lossless file. Since Audacity is not a Windows-specific program, there are a number of different lossless formats you could choose to export to. The main things are getting the right channel order and selecting the proper export option. Here are the channel orders for some other formats you may find it useful to export to.
Left, Left Surround, Center, Right, Right Surround
RAW LPCM (Blu-ray Channel Order)
Left, Right, Center, Left Surround, Right Surround, LFE
Export Your Project
Once your file is ready for exporting, select Export from the File menu.
1. Save as Type
- To save an uncompressed 5.1 file, or any 24-bit file actually, you wil need to select Other uncompressed as the file type.
- Then click the options button to select the appropriate encoding. Make sure the format you select fits your channel order.
- The Header option is synonymous with the file type. If you try to save a LPCM file you will notice there is no such option. Select RAW instead. You will also want to make sure your Blu-ray authoring program allows RAW LPCM audio to be imported as an asset.
- Make sure to select the correct bit-depth. In this case it is 24-bits but yours could be different.
Is High Resolution LPCM Worth It?
In addition to the standard resolution 16/48 audio streams allowed in Blu-ray, it also supports several high resolution options, including 16/96, 20/96, 24/96, as well as 16/192, 20/192, and 24/192 with no more than six channels. Now that all sounds impressive. A higher sample rate or bit-depth is certainly interesting on the surface and high resolution audio is getting hot right now. But it's worth taking a step back and thinking things over before you get too excited. Although there are many people who claim to be able to hear the difference compared to CD, many others claim there's no benefit to anything above 16/48 audio. So what's the real story?
Actually it's hard to say just yet. Certainly the people who claim it's nonsense have some good points, but looking at both sides of the argument, and paying particular attention to the fact that lots of people claim to be able to prove there's no quality gain from any of these formats, but only using math. On the other hand a lot of other people have offered specific examples where one of these formats is a big improvement. If anything stands out about that to me, it's that it seems both desparate and more than a little disingenuous to rely on the algorithms in the sampling theorem when there are specific examples which it would be easy to hold up as examples of someone claiming to hear differences which aren't there.
The Math Works If There Are No Mistakes
One thing I've notice a lot of credible people bringing up is the fact that sometimes there are quality problems due to issues the sampling theorem doesn't address. For example, almost all professional recording is now done in 24-bit PCM, which means it has to be resampled to 16-bit for CD. Downsampling like that always introduces what are called Quantizer errors. Just as we saw in Audacity's settings earlier, you use Dithering to cover up the errors. It's basically artifically generated noise which obscures the problems.
One of the most common defenses of high resolution audio which is sampled at 96kHz, twice the rate of standard Blu-ray or DVD audio streams, is that the resulting oversampling takes care of quantization errors which haven't been wiped out by dithering. Other people claim it compensates for poor quality analog filters used in CD production to eliminate the top and bottom frequencies which is required to ensure 100% accurate sampling. Although I can't personally say one side or the other is correct, it seems to me that opponents of high resolution audio are basing their position on the assumption that it's not possible for any errors to creep into the CD production process, when in fact that's an absurd position with respect to any industry. It seems that 96kHz sampling is pretty good at compensating for problems so it's at least worth considering and testing for yourself. On the other hand, I'm not an expert on the matter and I could be wrong.
But What About 24-bit?
One area which does seem to be surrounded in misinformation, at least among the general public, is 24-bit audio. As I already mentioned, downsampling to 16-bit audio from 24-bit is known to always result in errors. The big question is how often those errors are hidden completely, the way they are supposed to be, by dithering. Without a lot of experimenting I certainly wouldn't make a claim either way, but it seems unlikely that it's never a problem. Even though 24-bit audio has no advantages in terms of sound quality. Remember, those bits don't control anything but volume and since standard CDs already have more audio range than you could listen to without your ears bleeding there's no advantage beyond potentially avoiding errors due to downsampling.
Downsampling 24-bit LPCM To 16-bits
Since I happen to have 24-bit sources already loaded, I'll go ahead and walk through the downsampling process in Audacity.
The simplest way to do it is to open the dropdown menu for each channel, one at a time, go to Set Sample Format, and select 16-bit. The selected channel will immediately be resampled to the lower bit-depth.
Encoding To Dolby Digital
If you don't care about lossless audio, but instead just want to combine your mono WAV files to encode to Dolby Digital, you can do that as well using Audacity's FFmpeg plugin. But before you can do that the channel order will need to be rearranged as follows.
I'll go back to Export on the File menu.
1. Save as type
- Select AC3 Files (FFmpeg)
- Select Options to set the bitrate.
I'll set the bitrate to 640, the maximum allowed for Blu-ray and then click OK.
This time a new window comes up for setting the channel order. Although you can adjust the channels here, I've found the controls to adjust the connections between input and output channels a little buggy so I prefer to do it in the project window instead.
Convert 23.976fps To 24fps
The last part of this lesson is actually going to be more of an interesting trick than anything else. I will be using one of the built in Effects called Change Speed to speed up an audio file just the slightest bit to compensate from a video framerate change from 23.976fps to 24fps. I call it a trick because there really isn't any need to perform such a speed up, although if you were re-encoding the video anyway it wouldn't hurt anything. Back in the days of DVD, when I occasionally needed to convert from PAL to NTSC, this was probably my favorite feature in Audacity. Now, with HDTVs and Blu-ray giving us international standards, it simply isn't that important.
First I'll select Change Speed from the Effect menu.
Since a film source is slowed down by a factor of 1.001 to get to 23.976fps, I will speed it up by the same amount.
Continue To Lesson 8
Video and audio may be the heart of the Blu-ray experience, but of course it also supports subtitles. If you have subtitles ripped from a Blu-ray originally they should already be ready for authoring. But what if they started out in one of the common text formats, like SRT or SSA? By using a simple conversion program you can encode them in the same image-based format used for professionally authored Blu-ray discs, as you will learn in the next lesson.
Last updated: 13 August 2012