GSoC: Mid-term update

The work on the MLP bitstream verifier had to be paused for some time, as things with more priority popped up.

Bug #1

The encoder was unable to produce a lossless bitstream at transients, which was not a good thing. To fix it I had to finally understand how the FIR filters were used in the encoder as predictors.

The encoder uses Linear Predictive Coding to get a prediction of the nth sample by using (a linear combination of) of the preceding samples. Then, rather than storing the actual value of the nth sample, we store the difference of the actual value and our prediction (called residual).

This does not reduce the size of the file on it’s own i.e. this doesn’t compress anything, we still need to store the value of each sample’s residual and also the LPC predictor coefficients. But, this helps the entropy encoding step (which comes after) to produce better results. This happens because the variance of the residuals is less as compared to the original samples.

My mentor (atomnuker) helped greatly in explaining how LPC and FIR filters worked, and this saved a lot of hours I would have spent to understand it from scratch.

Coming back to the issue of lossless check failures, these were caused only at transients. And for simple audio samples like a sine wave, the encoder was working flawlessly. This was a clear indication that something was going wrong in the prediction stage. After thoroughly reading each line of the LPC related code, finally I spotted the bug in the apply_filter function.

if (residual < INT24_MIN || residual > INT24_MAX)
    return -1;

This was the part where we check if the residual overflows the word-length (aka bit-depth) of a sample. If it does overflow, then the prediction we have done isn’t good, and it’s better to keep the samples as is and skip the prediction step for this particular block. Everything okay except that here these lines were assuming that the audio has a bit depth of 24-bits, which isn’t always true. My testing samples were 16-bit, and even though the residual was overflowing the bit-depth, it still used the prediction filters. This caused lossless errors at the time of decoding, as residual - prediction != original because the residual value was incorrect due to overflow.

Fixing it was easy. Don’t assume 24-bit, rather use the bit-depth of the input audio. Here’s the commit fixing the bug.

Bug #2

The encoder should theoretically work for mono (1-channel) audio samples. But it wasn’t able to. The bitstream it produced didn’t play and was full of errors.

I had discussed this on the ffmpeg-devel mailing list, and michaelni’s suggestion to match the encoder’s bitstream writes, to decoder’s bitstream reads, saved the day.

In every restart header of the bitstream, there is important info on how to go about decoding each substream (like number of channels in the audio, LSBs to bypass etc.). It also stores the max_matrix_channel, i.e. the maximum channel used in the rematrixing stage.

This was hardcoded to 1 (i.e. 2nd channel because 0 based indexing).

rh->min_channel        = 0;
rh->max_channel        = avctx->channels - 1;
rh->max_matrix_channel = 1;

For mono audio, which has only channel 0 the max_matrix_channel should be 0 and not 1. Setting it to rh->max_channel solved the bug for now (because it only supports mono and stereo audio right now). This will change when support for more channels is added, which is the next task.

Next task

Add support for encoding audio with more than two channels. After this task I’m planning to submit the patch for review and inclusion into the current ffmpeg codebase. Then as my mentor suggests, I will work on supporting TrueHD before resuming the work on the verifier.