Homebrew, open source, repurposed, hacked, software defined, open hardware

Monday 7 September 2015

Codec2 700B mode trellis decoding experiments and Reverend Thomas Bayes

The codec2 700B mode has been an improvement on the 700 mode, and employs vector quantisation (VQ) rather than scalar quantisation.

The characteristics of the 700B frames can be summarised as follows:

  bits_per_frame = 28;       % number of bits/frame for "700B" mode
  bit_fields = [1 5 3 6 6 6 1]; % number of bits in each field
  field_labels = ["voicing"; "logWo"; "energy"; "LSPmelVQ1"; "LSPmelVQ2"; "LSPmelVQ3"; "spare"];

which contrasts to the features of the 700 mode:

  bits_per_frame = 28;
  bit_fields = [1 5 3 3 2 4 3 3 2 2];
  field_labels = ["voicing"; "logWo"; "energy"; "LSP1"; "LSP2"; "LSP3"; "LSP4"; "LSP5"; "LSP6"; "spare"];

Naturally, trellis decoding seemed worth trying on 700B.

Here are the transition probability plots for the 700B bit fields, when the ve9qrp10s sample is processed.





It is interesting to note the quite striking difference in appearance between the very uniformly dispersed nature of the 700B VQ bit field transition probability mesh plots and the "peakier" 700 scalar LSP bit field mesh plots with more marked central tendencies ( see: codec2-700-mode-trellis-decoding )

This suggests that the VQ is doing a pretty good job of encoding information without much redundancy, which is likely to have implications for maximum likelihood decoding strategies. It is harder to derive a useful measure of the central tendency and then meaningfully apply it when the mesh plot looks like a square of uniformly cut lawn, as opposed to a nice mound in the middle of the lawn.

codec2's author, David Rowe, also discusses this issue of information redundancy in his blog.

It seems to be all about that hoary old chestnut that was first discussed in 1763 by the Reverend Thomas Bayes in "An Essay towards solving a Problem in the Doctrine of Chances". The posterior odds, i.e. our estimation of the most likely received codeword, can only be improved if we have useful information regarding the prior distribution of likely codewords. In the absence of useful information, we are left with a non-informative entropy maximizing distribution.



Here are some summary statistics from the first decoding runs:

Passing through bitfield 4 LSPmelVQ1 without trellis decoding
Passing through bitfield 5 LSPmelVQ2 without trellis decoding
Passing through bitfield 6 LSPmelVQ3 without trellis decoding
Passing through bitfield 7 spare     without trellis decoding
processing 700B mode parameter: 1, nbits: 1, label: voicing                         
Eb/No: 0.00 dB nerrors 25 13 BER: 0.10 0.05 std dev: 0.31 0.23
processing 700B mode parameter: 2, nbits: 5, label: logWo                           
Eb/No: 0.00 dB nerrors 121 82 BER: 0.10 0.07 std dev: 4.71 4.67
processing 700B mode parameter: 3, nbits: 3, label: energy                          
Eb/No: 0.00 dB nerrors 67 63 BER: 0.09 0.08 std dev: 1.20 1.32

And an attempt to trellis decode the VQ encoded bitfields (four hours of octave on a dedicated 64 bit 2.7GHz intel cpu):

Passing through bitfield 7 spare     without trellis decoding
processing 700B mode parameter: 1, nbits: 1, label: voicing                         
Eb/No: 0.00 dB nerrors 25 13 BER: 0.10 0.05 std dev: 0.31 0.23
processing 700B mode parameter: 2, nbits: 5, label: logWo                           
Eb/No: 0.00 dB nerrors 121 82 BER: 0.10 0.07 std dev: 4.71 4.67
processing 700B mode parameter: 3, nbits: 3, label: energy                          
Eb/No: 0.00 dB nerrors 67 63 BER: 0.09 0.08 std dev: 1.20 1.32
processing 700B mode parameter: 4, nbits: 6, label: LSPmelVQ1                       
Eb/No: 0.00 dB nerrors 246 115 BER: 0.17 0.08 std dev: 15.57 10.97
processing 700B mode parameter: 5, nbits: 6, label: LSPmelVQ2                       
Eb/No: 0.00 dB nerrors 277 115 BER: 0.19 0.08 std dev: 15.42 10.80
processing 700B mode parameter: 6, nbits: 6, label: LSPmelVQ3
Eb/No: 0.00 dB nerrors 285 114 BER: 0.19 0.08 std dev: 14.55 10.52

Judging by the large increases in errors from our attempt to apply maximum likelihood decoding to the VQ bit fields, it seems reasonable to conclude that the VQ encoded bit fields are doing an excellent job of conveying a lot of information with minimal redundancy. Unfortunately for us, this would suggest that we can't profitably employ maximum likelihood decoding for the VQ bit fields directly.

While experimenting, we have also seen that Reverend Bayes' insights are quite relevant to low bit rate audio codec R&D some 250 years later!

Here are some .wav files of the trellis decoded ve9qrp_10s sample which has had additive gaussian white noise added after 700B encoding, in keeping with the method used for the other codec2 mode trellis decoding experiments....

No bit fields trellis decoded:
ve9qrp_10s_700B_0.50_simple_700B.wav

Bit field 1 trellis decoded:

ve9qrp_10s_700B_0.50_trellis_700B_dec1.wav

Bit field 2 trellis decoded:
ve9qrp_10s_700B_0.50_trellis_700B_dec2.wav

Bit field 3 trellis decoded:
ve9qrp_10s_700B_0.50_trellis_700B_dec3.wav

Bit fields 1,2 trellis decoded:

ve9qrp_10s_700B_0.50_trellis_700B_dec12.wav

Bit fields 1,3 trellis decoded:
ve9qrp_10s_700B_0.50_trellis_700B_dec13.wav


Bit fields 2,3 trellis decoded:

ve9qrp_10s_700B_0.50_trellis_700B_dec23.wav


Bit fields 1,2,3 trellis decoded:
ve9qrp_10s_700B_0.50_trellis_700B_dec123.wav


Bit fields 1,2,3,4,5,6 trellis decoded
ve9qrp_10s_700B_0.50_trellis_700B_dec123456.wav


Bit field 4 trellis decoded:
ve9qrp_10s_700B_0.50_trellis_700B_dec4.wav

Bit field 5 trellis decoded:

ve9qrp_10s_700B_0.50_trellis_700B_dec5.wav


Bit field 6 trellis decoded:
ve9qrp_10s_700B_0.50_trellis_700B_dec6.wav


The intelligibility of the samples with bit fields 4, 5, or 6 decoded are not improved, as would be expected based on the very uniform distribution of the encoded VQ values evident in the transition probability plots, and also from the significantly increased number of bit errors and  standard deviation seen in the summary statistics.

From the above samples, it is clear that bit fields 1,2 and 3, either singly or in combination, benefit from direct trellis decoding, but the VQ bit fields would require decoding before attempting trellis decoding of their encoded information.


7 comments:

  1. Hi Erich,

    The VQ indexes will be uniformly distributed, however the LSP trajectories themselves will be correlated. So we could try adjusting the VQ index, then decoding to LSPs, then measuring the MSE on the decoded LSPs. You would get distances for all p=6 LSPs at once. Would suggest just applying to the first LSP VQ stage and see if the results are useful.

    Cheers, David

    ReplyDelete
  2. Thanks for the insights. I will definitely look at decoding the VQ prior to trellis decoding as you suggest. I will probably have a quick look at the 1300 mode underlying the 1600 mode first, to round off the fairly simple set of initial octave script modifications I am making before I delve into the VQ decoding.

    Cheers,

    Erich

    ReplyDelete
  3. This comment has been removed by a blog administrator.

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
  5. This comment has been removed by a blog administrator.

    ReplyDelete
  6. This comment has been removed by a blog administrator.

    ReplyDelete
  7. This comment has been removed by a blog administrator.

    ReplyDelete