Codec2 is an open source low bit rate voice coder (vocoder) that enables voice to be carried on data channels at very low data rates.

Low bit rate vocoders are distinct from their higher bit rate encoder cousins such as mp3 which seek to reproduce more than just voice, i.e. music.

Codec2 is particularly exciting owing to its potential to revolutionise HF voice communications, until now dominated by Single Sideband Transmission (SSB). Codec2 already offers better robustness than SSB in low signal to noise conditions. Codith ec2 also has significant potential in VHF and above amateur radio communications, where single frequency time division multiple access technologies (TDMA) already in use commercially for mobile communications and telephony can be introduced with significant spectrum saving benefits in amateur bands and on amateur repeaters.

Anyway, after playing with trellis.m in the codec2-dev/octave directory, support for trellis decoding of the 1600 bit/s mode was implemented.

Maximum likelihood decoding is used, using the raw hts sample audio file encoded at 1600 bit/s as the source of the training database.

David Rowe has pointed out that that 1600 bit/s mode uses an underlying 1300 bit/s bit stream plus a 300 bit/s forward error correction (FEC) bit stream, for a total of 1600 bits per second.

Accordingly, experiments on the underlying 1300 bit/s mode is planned, in the absence of FEC.

Additive white gaussian noise has been added to the ve9qrp_10s audio sample, followed by trellis decoding of various combinations of the bit fields.

This is the result of decoding after the addition of noise to the codec2 bitstream, with no trellis decoding:

ve9qrp_10s_1600_0.50_simple_1600.wav

This is the result of decoding after the addition of noise to the codec2 bitstream, with trellis decoding of bitfields 3,6,7,8,9,10,11,12,14,15,16:

ve9qrp_10s_1600_0.50_trellis_1600_dec3678910111213141516.wav

This is the result of decoding after the addition of noise to the
codec2 bitstream, with trellis decoding of bitfields
1,3,4,6,7,8,9,10,11,12,14,15,16:

ve9qrp_10s_1600_0.50_trellis_1600_dec134678910111213141516.wav

This is the result of decoding after the addition of noise to the
codec2 bitstream, with trellis decoding of bitfields 7,8,9,10,11,12,14,15,16:

ve9qrp_10s_1600_0.50_trellis_1600_dec78910111213141516.wav

This is the result of decoding after the addition of noise to the
codec2 bitstream, with trellis decoding of bitfields 1,4,7,8,9,10,11,12,14,15,16:

ve9qrp_10s_1600_0.50_trellis_1600_dec1478910111213141516.wav

1600 mode bitfields are as follows:

bits_per_frame = 64;

bit_fields = [2 7 5 2 7 5 4 4 4 4 4 4 4 3 3 2];

field_labels = ["voicing1"; "scalarWo1"; "energy1"; "voicing2"; "scalarWo2"; "energy2"; "LSP1"; "LSP2"; "LSP3"; "LSP4"; "LSP5"; "LSP6"; "LSP7"; "LSP8"; "LSP9"; "LSP10"];

Intelligibility in the presence of noise seems to be enhanced the most by maximum likelihood decoding of the LSPs and voicing bits, with next best being maximum likelihood decoding of just the LSPs.

Maximum likelihood decoding of just the LSPs in the 1600 bit/s mode is not too demanding given the small bitfield lengths. Based on this admittedly limited sample set, maximum likelihood decoding of the voicing bits in addition to the LSPs seems to reduce the occasional "stutter" type artifacts.

Errors introduced by maximum likelihood decoding of the energy bitfields seem to have an adverse effect on intelligibility. Interestingly, there were dense probabilities in the high order bits, and sparse probabilities in the low order bits... see transition probability plots below, and see the summary statistics showing significantly increased errors with trellis decoding.

Maximum likelihood decoding of the scalar W0 bitfields has not been performed owing to the bits = 7, making processing by octave quite challenging. This is due to the exponential ( i.e. 2^(bitfield length) ) demands of the maximum likelihood decoding. Further experimentation with maximum likelihood decoding of the scalar W0 bitfields in C is planned. Also of interest were the uniformly dense probabilities in the low order bits,
and sparse probabilities in the high order bits... see transition
probability plots below... which may have an impact on trellis decoding effectiveness - but this may reflect the effects of FEC, in which case it is actually desirable. Furthermore, the ability to adequately encode outliers is also important to convey and preserve intelligibility.

A couple of the bitfield transition probability plots also showed marked clustering into four peaks, perhaps suggesting that quantising could take better advantage of the available bits, although it may just be evidence of the FEC at work, adding robustness, or again, reflecting a required ability to adequately encode the occasional outlier.

The following plots are labeled and are presented in order of the 1600 bit/s bitfields. Basically, the Z axis shows the frequency at which a bitfield value on the X axis maps to another value on the Y axis. Accordingly, the more densely clustered the maxima, the more predictable changes from one bitfield to the next will be, and the greater the ability of the trellis decoding to make informed guesses about the most likely codeword:

The summary statistics produced by octave:

Passing through bitfield 2 scalarWo1 without trellis decoding

Passing through bitfield 5 scalarWo2 without trellis decoding

processing parameter: 1, nbits: 2, label: voicing1

Eb/No: 0.00 dB nerrors 34 36 BER: 0.07 0.07 std dev: 0.63 0.57

processing parameter: 3, nbits: 5, label: energy1

Eb/No: 0.00 dB nerrors 133 98 BER: 0.11 0.08 std dev: 4.11 4.66

processing parameter: 4, nbits: 2, label: voicing2

Eb/No: 0.00 dB nerrors 31 40 BER: 0.06 0.08 std dev: 0.61 0.65

processing parameter: 6, nbits: 5, label: energy2

Eb/No: 0.00 dB nerrors 126 88 BER: 0.10 0.07 std dev: 4.69 4.96

processing parameter: 7, nbits: 4, label: LSP1

Eb/No: 0.00 dB nerrors 83 72 BER: 0.08 0.07 std dev: 2.44 2.24

processing parameter: 8, nbits: 4, label: LSP2

Eb/No: 0.00 dB nerrors 98 95 BER: 0.10 0.10 std dev: 2.46 2.55

processing parameter: 9, nbits: 4, label: LSP3

Eb/No: 0.00 dB nerrors 104 75 BER: 0.10 0.08 std dev: 2.63 2.89

processing parameter: 10, nbits: 4, label: LSP4

Eb/No: 0.00 dB nerrors 80 83 BER: 0.08 0.08 std dev: 2.43 2.64

processing parameter: 11, nbits: 4, label: LSP5

Eb/No: 0.00 dB nerrors 66 75 BER: 0.07 0.08 std dev: 2.02 2.34

processing parameter: 12, nbits: 4, label: LSP6

Eb/No: 0.00 dB nerrors 67 81 BER: 0.07 0.08 std dev: 1.54 2.82

processing parameter: 13, nbits: 4, label: LSP7

Eb/No: 0.00 dB nerrors 82 81 BER: 0.08 0.08 std dev: 2.71 2.41

processing parameter: 14, nbits: 3, label: LSP8

Eb/No: 0.00 dB nerrors 86 61 BER: 0.12 0.08 std dev: 1.36 1.29

processing parameter: 15, nbits: 3, label: LSP9

Eb/No: 0.00 dB nerrors 17 50 BER: 0.02 0.07 std dev: 0.43 1.14

processing parameter: 16, nbits: 2, label: LSP10

Eb/No: 0.00 dB nerrors 25 29 BER: 0.05 0.06 std dev: 0.31 0.49

In conclusion, early indications are that trellis decoding has the potential to improve the performance of the 1600 bit/s codec2 mode in the presence of noise, as envisaged by codec2's author, David Rowe. Further investigation is planned of the 1300 bit/s mode prior to the addition of FEC in the 1600 bit/s mode, and also the lower bit rate 700B mode that does not employ FEC.

See also:

Rowetel blog posting "Trellis Decoding for Codec 2"

and for information on codec2 in general:

http://www.rowetel.com/blog/?page_id=452

## No comments:

## Post a Comment