David’s FreeDV Update – June 2024

This month I’ve been working on the DSP detail work required for a practical HF waveform based on RADAE. Not as interesting as the Machine Learning (ML) work, but something we need to grind through for a real world HF speech system.

Acquisition

Acquisition is where we determine (a) is a received signal present and (b) if so what is it’s frequency offset and where each frame of “data” starts (coarse timing). The general approach is to search for the pilot symbols at the start of each frame over a grid of time and frequency points. The problem is complicated by the presence of noise, multipath, and high power ML data symbols.

In my earlier FreeDV work I built some ad-hoc acquisition algorithms but this time I took a more mathematical approach. The problem with RADAE is that it operates at very low SNRs which makes acquisition using traditional DSP difficult. Due to the PAPR optimisation the RMS power of the ML data symbols is higher than the classical DSP pilot symbols used for acquisition. While reduced PAPR is in general a good thing, it complicates detection of the pilots.

So I needed a deep dive into the math behind acquisition to get an extra boost in performance. Anyway, the sums showed me two ways I can improve acquisition performance, and it seems to be working well in simulation down to reasonably low SNRs. I’m pretty sure we can do better with ML-based acquisition, but that’s a significant side-project that I’ll put in the “further work” basket for now.

Automated Tests

There has been a lot of RADAE code developed over the course of 2024, so much that I’m starting to lose track of it myself. So I’ve added a set of automated tests to make sure everything keeps working and help trap any bugs I might introduce as the code develops. It’s also a neat framework to guide future refactoring and a real time/C port.

Chirp SNR estimator

The April Over the Air (OTA) test campaign showed the need for a way to measure the SNR of off-air samples. It needs to work on HF multipath channels which tend to notch out various frequencies. After a few false starts, I’ve built a “chirp” based SNR estimator. At the start of a transmission, I send a few seconds of chirp signal that sweeps over a range of frequencies. The receiver script knows where this signal is and using a little math can come up with a good estimate of the actual channel SNR.

Chirp at the beginning of the spectrogram, followed by SSB, then RADAE. The chirp allows us to measure signal power across a range of frequencies, averaging out the effects of frequency selective fading.

As further work I’d also like to develop a way to measure delay spread, to help optimise the waveform to handle long distance paths.

Interesting Bugs

The previous round of OTA tests was in April. After thinking about the results I found some bugs in the waveform we tested.

I accidentally omitted the cyclic prefix in the waveform tested in April. The cyclic prefix protects us from intersymbol interference, so it “shouldn’t have worked” on HF channels. Exploring just why it worked (and worked rather well) is on the TODO list, and might explain the poor performance on DX channels (e.g Japan to Australia). Sometimes accidents lead to “light bulb” moments.

Another possible bug is the use of fixed timing estimate used for the entire 10 second sample (we don’t adjust timing after the initial estimate). The ionosphere is changing all the time, and the Tx DAC and Rx ADC sample clocks are also slightly different which means a timing estimate that varies over time. So a fixed timing estimate is a bad idea, and I was kind of lucky it worked on most of the samples we collected.

Recent Progress and OTA Low PAPR Tests

So I figure the last few months of work is probably enough for this round of development:

  • Two new low PAPR waveforms (750 and 1500Hz RF bandwidth)
  • Acquisition system improvements
  • Addressing some bugs from the April 2024 test campaign
  • Chirp based SNR measurement to calibrate our OTA tests

While there are many possibilities for further development, I don’t want to go too far down any R&D rabbit holes without checking against real world performance. So I’m preparing for some more stored file OTA tests, to see how we are performing against our stated goals of low and high SNR performance that is competitive with SSB.

Here are some initial samples (using a sample of my voice) of the 1500Hz low PAPR waveform (model17) over a 2000km path at 14.250 MHz, at a few watts transmit power:

Low SNR (0.5dB peak) SSB over 2000km path at 14.250 MHz
Low SNR (0.5dB peak) RADAE over 2000km path at 14.250 MHz
Low SNR spectrogram – significant “barber pole” fading can be seen on the RADAE sample

The SNR is measured from the chirp. The chirp signal has 0dB PAPR, so this is the SNR at the peak power of the SSB and RADAE signals. The RMS power and hence average SNR of the SSB signal would be about 6dB lower (-5.5dB), and the RADAE about 0.8dB lower (-0.3 dB). So with the same power amplifier, RADAE delivers about 5dB more power to the receiver than SSB.

An hour or so later I turned up the power to get a high SNR sample over the same 2000km path:

High SNR (18.5dB peak) SSB sample
High SNR (18.5dB peak) RADAE sample

While much easier to understand, even at high SNR there is quite a bit of background noise with SSB (this could possibly be improved with DSP noise reduction). However there is some “vocoder” distortion on the RADAE signal as well – it’s not totally clean. You actually have to listen fairly carefully to hear differences between the low and high SNR RADAE samples. This might mean we’ve biased the training towards “low SNR”, rather than “highest quality”. These results also suggests we can run 1.5W rather than 100W, for similar speech quality, as 10log10(100/1.5) = 18dB.

While performing these test I noticed a bunch of little things to look into:

  • A pop artifact in one of my samples that goes away when the input speech level changes. Suggests the ML is entering territory is hasn’t seen in training.
  • I’m not sure if my Tx power from my SSB radio is staying constant as intended with a low PAPR waveform – need to sample the actual Tx power and plot on the spec-an. I need to confirm all three signals are at the same peak power.
  • The high SNR RADAE speech quality isn’t consistent across samples, some speakers sound a bit better. This is subjective of course so needs a further look.

Tx Spurious

At high SNRs there is some out of band spurious Tx energy (e.g. from 2000 to 3000 Hz) in the in the PAPR optimised RADAE signal. We should remove this if possible. A naive approach would be to filter this at the RADAE Tx ouput. However from experience we know that a side effect of filtering will be to increase the PAPR that we have so carefully tried to reduce, hence reducing the SNR at the receiver. So while the filtering approach would be acceptable for a high SNR link on crowded bands, it would cost you a few dB at low SNRs. A better approach would be to include spurious reduction in the ML training, for example train the network to reduce the out of band energy or insert a filter in the training loop. Another interesting topic for further work!

Next Steps

Every time I put this technology over real radio channels I learn a lot and have a bunch more questions and tasks added to my TODO list. However I do feel it’s time to focus on building a real time system that we can test with real PTT conversations. Even a rudimentary system that has some teething problems will teach us a lot. We have several ML models we can try (e.g. high and low PAPR, 750 and 1500 Hz wide waveforms), and it’s quite easy to try others as our experience improves.

So I will continue working towards a real time implementation so we can get on the air and test this technology with real time PTT conversations. Some challenges ahead are (a) a state machine sync system that can acquire and determine when an over is complete (b) refactoring the code to run on modem frame size chunks rather than several seconds of samples (c) some way for anyone to run RADAE in real time (either in Python or a C port) with streaming audio (d) other chunks of DSP like tracking frequency, amplitude, and timing offsets as they evolve (e) a way to perform controlled tests and evaluate quality automatically – subjective reports and ad-hoc testing is not very reliable.

David’s FreeDV Update – May 2024

The last few months have been focused on building up the DSP code required to try the Radio Auto-encoder (RADAE) over the air. In order to answer the big question of “does it really work” as quickly as possible, I had to skim over many intriguing topics. So now that we have a qualified “yes” to the big question – I’ve returned to some Machine Learning (ML) R&D to explore a some intriguing ideas:

  • Reduction of the “latent dimension” and hence RF bandwidth of the RADAE signal.
  • Encouraging the network to train 2 dimensional constellations rather than 1D.
  • Training for low Peak to Average Power Ratio (PAPR) – a potential 6dB improvement.

To date RADAE has used a “latent dimension” of 80 symbols every 40ms, which are mapped to 20 OFDM carriers at 50 symbols/s, resulting in a RF bandwidth of 1000 Hz. I spent some time exploring how to to reduce this to dimension 40, i.e. a 10 carrier, 500 Hz bandwidth signal. This would result in more efficient use of spectrum. With fewer carriers our pilot based equalization work better as there would be more power per pilot symbol. Fewer carriers also helps reduce PAPR. On the negative side, classical communications theory predicts a narrower bandwidth signal will perform worse on HF channels, and may be less power efficient (e.g. BER performance of 8PSK versus QPSK).

The original RADAE design has a one dimensional bottleneck that limits the amplitude of real valued symbols to +/-1. Given additive noise, the network would always place constellation points at +/-1 in order to minimize the effect of noise. As the dimension reduced, distortion increased as there was nowhere in 1D space to place additional constellation points without being unduly affected by noise. I reasoned that encouraging the network to train two dimensional constellations would help. For example in classical digital systems, we can use an 8PSK constellation, each point is equal distance away from the origin. If the SNR is high enough, this can send more information per symbol than QPSK.

So I arranged the elements of the latent vector in complex number pairs (e.g. 20 complex valued symbols for a 40 element latent vector), and set up a two dimensional bottleneck that constrained the magnitude of the complex symbols trained by the network. This worked, I can now obtain good performance from a dimension 40 system. Curiously, the resulting constellations are circles, rather than discrete points.

Constellation of PSK symbols when trained with a 2D bottleneck on the symbol magnitude.

Also this month I developed a method for comparing ML models objectively. The method runs the training database through a trained model at a range of SNRs, and produces curves of model “loss against Eq/No” for the model (Eq is the energy of one PSK symbol). I feel there is a reasonable match between these curves and the subjective speech quality. Having an objective method of measuring a models performance lets me know if I’m on the right track with a ML model design without tedious listening tests.

Loss v Eq/No curves for 4 models. model05 (m5) is the control – this was used for the recent the OTA test campaign, and is a dim=80 1D bottleneck. Model 17 looks comparable (PAPR optimised 2D bottleneck), however m14 & m18 are not so great.
As above, but loss v C/No. This normalizes for the different symbol rates. Now m18 is dim=40, so only has half as many symbols to send across the channel. Given the same Tx power, we therefore have twice the energy per symbol. It now looks competitive to m5 and M17.

OK, so now we have an objective measure for comparing models, a way of training lower dimensional models, and some understanding of 2D constellations: i.e. how to train them, and what to expect from the 2D constellations developed by training.

Using these tools, I attempted to build a PAPR optimised ML model. I estimate a low PAPR waveform has the potential to provide a further 6dB improvement at the receiver compared to a classical DSP OFDM waveform – so this is definitely worth exploring. This requires a “time domain” 2D bottleneck that simulates the way a power amplifier saturates. Combining this with multipath training is tricky, and I have tried several different approaches. At the time of writing I believe I have a way forward with a hybrid time-frequency domain model, and am currently evaluating the results. The design uses OFDM and classical DSP for equalisation, and ML for PAPR optimisation, and achieves a PAPR of less than 1 dB.

Here are some samples that show the PAPR optimised waveform over a simulated multipath poor (MPP) fast fading channel. They both have the same “peak power to noise” P/No ratio. Imagine them both being transmitted from the same radio with 100W peak power, over the same (really bad) HF radio channel, to the same receiver.

Peter, VK5APR, using SSB at a P/No of 39dB (Rx SNR -2.4dB)
Peter, VK5APR, using RADAE model18 also at a P/No of 39dB (Rx SNR 3.4dB)

Note the difference in the receiver SNR. The “S” in S/N is the RMS power at the receiver, which is lower for SSB as the SSB PAPR is higher (around 6dB, after compression). The goal of most radio systems is to maximise the RMS power at the receiver. So with the same transmitter, we have achieved around 6dB higher SNR at the Rx by carefully minimising the PAPR of the RADAE waveform.

Here are the spectrograms, note the model18 dim 40 RADAE signal uses only about 750 Hz of RF bandwidth (500 Hz for the ML PSK symbols plus some bandwidth for OFDM overheads). The moth-eaten effect is the multipath channel wiping out chunks of the signal.

There are many other areas we could explore (e.g. ML based equalization), but as we don’t have infinite time, I’m choosing to time box the ML R&D before we lock in a V1.0 design, and proceed to real time implementation.

Next month I will round out the ML design work, address a few other bugs, and attempt to arrive at a RADAE design suitable for our first real time implementation.

The Right to Innovate in the HF Data Space

On the HF data front, I’ve been working with Simon DJ2LS to test and merge several libcodec2 PRs to support FreeDATA. This work has improved protocol efficiency and enabled Simon to “homebrew” his own custom OFDM waveforms. His first attempt at a new waveform has roughly doubled the highest data transfer speed of FreeDATA. Simon is working on a new FreeDATA release that includes these improvements. We also have a 16QAM prototype waveform under development, which in high SNR channels, will double the speed again.

One of the PRs supports custom configuration of the OFDM modem, for example you can plug in the number of carriers, symbol rate, and number of bits per frame at “init time” without writing any C code. Empowering Hams (and indeed anyone) to build their own HF data waveforms is important. This work “preserves the right to innovate” in the HF data space, a key value of the ARDC.

David’s FreeDV Update – April 2024

Breakfast on the Nullarbor Plain

This month I took a vacation, so less work that usual on FreeDV. I traveled by road from my home in Adelaide to Western Australia (WA), reaching the South-Western tip of Australia, about 3000km away. South Western WA is a lovely part of the country, and the trip included the adventure of crossing the 1200km Nullarbor plane (translated: no trees).

But back to the business of HF digital voice. Given the encouraging results from our initial Radio Autoencoder (RADAE) over the air (OTA) tests, we have expanded our program of testing to include Hams from different countries. It takes a lot of work to develop a new speech communication system , so it’s important to validate the design as early as possible. Much smarter to do this in the current simulation form, rather than put in ten times the work on a real time implementation, release it, and find out it falls over in a common use case. This is experience talking – we’ve learned many lessons after a decade of FreeDV development.

So we are testing the prototype RADAE design using crowd sourcing. I have approached several Hams for help in testing RADAE signals over their local radio channels and in different languages. They provide me with a 10s speech sample in their language, I send them back a file of RADAE samples that they can transmit over the air. The received RADAE signal is recorded off air by them (e.g. using a KiwiSDR), we then decode and evaluate. This is all done in non real time using stored files being emailed back and forth.

In particular I would like to thank Kanda JH0PCF, Yuichi JH0VEQ, and Simon DJ2LS for your help. Some take aways:

  1. RADAE works well in Japanese and German, as well as English.
    It handles Near Vertical Incidence Skywave (NVIS) channels quite well. This is typical of local (several hundred km) HF communication in countries like Japan and the UK. The signal goes straight up and straight down.
  2. However on the samples tested so far, RADAE is falling over with long distance (e.g. Japan to Australia) communications, so some more work required there.
  3. The speech quality is competitive with SSB at high and low SNRs, however we need a better way to measure the actual SNR of off air signals to “calibrate” the results. So I need to hit the math on that one to develop a suitable algorithm.

Some comments from the test team:

I listened to the voice that was replied. RADAE has no problem with Japanese demodulation. In fact, I feel that RADAE’s demodulated audio is easier to hear than SSB. I thought the sound quality was close to the 2020 mode implemented in the current version. It’s a strange feeling to be able to experience FM mode quality audio with SSB. (Kanda, JH0PCF)

It feels, if we can hear SSB clearly, then RADAE also works, but as soon as we are coming closer to the edge of ability of hearing SSB, also RADAE struggles. But the radae output is really nice, much better voice output compared to SSB. Using your last example, I first thought, you’ve send me the default audio example (ie source sample), so yes, its really nice. (Simon, DJ2LS)

Here is a example from Kanda at low SNR

Kanda JH0PCF, 20W, received by Tokyo KiwiSDR (SSB)

Kanda JH0PCF, 20W, received by Tokyo KiwiSDR (RADAE)

The two signals are transmitted one after the other, so get (more or less) the same channel conditions. The spectrogram below shows (left to right) the sine wave tone, SSB, then RADAE. Note about half the RADAE signal is wiped out at the start, but it seems to sound OK.

Here’s a medium SNR (8dB-ish) sample from Simon, DJ2LS, over a 1100km path at about 10W peak (1-2W RMS):

SSB
RADAE

Yuichi performed a couple of novel experiments that I hadn’t thought of. Here is RADAE compared to existing FreeDV modes, transmitted a few seconds apart, so over more or less the same channel at roughly the same power. SNRs are around 8dB.

SSB
RADAE
FreeDV 2020B
FreeDV 700E

Unfortunately 2020B didn’t sync on this channel/SNR. I feel the 700E sample could possibly be improved (e.g. levels, microphone, filtering), but it does illustrate the problems with current FreeDV modes – the speech quality borders on the lower level required for communication and it can be a struggle to get consistent results. This is something we have recognized and are attempting to address with our ARDC funded R&D program.

As a fairer comparison, here is the sample used for RADAE testing passed through a FreeDV 700E simulation with no channel errors:

Project Plan Pivot

Given the encouraging results with RADAE, we’ve pivoted our ARDC project plan to focus on RADAE, and have paused development of Codec 2 and FreeDV modes. RADAE appears to be our strongest candidate for satisfying the top three goals we set for ourselves when applying for the ARDC grant:

  1. Improve speech quality to a level comparable to commercial codecs.
  2. Develop a “rag chew” FreeDV mode with subjective speech quality comparable to SSB at high SNRs.
  3. Improve low SNR operation such that FreeDV is superior to SSB over poor HF channels.

We are on track to meet (and indeed exceed) the first two goals, but I think the final goal has yet to be demonstrated (e.g. SSB and the current incarnation of RADAE fall over at roughly the same SNR). There are a few bugs and many practical issues to work through before we have a real world version of RADAE that anyone can use. Plus there will be a few “gotchas” we haven’t thought of yet. Plenty for me to do in the coming months!

I’ve also been working on HF data modem software with Simon from the FreeDATA project. In next months report we hope to present a new FreeDATA release incorporating this work, resulting in a significant boost in FreeDATA performance.

David’s FreeDV Update – March 2024

This month was spent building up the “classical” DSP support code around the Radio Autoencoder, so I could test it over the air using real radio signals. Can we repeat the impressive low SNR results from simulation over real radio channels? This meant coding up an OFDM modem in PyTorch, lots of testing, and a bunch of support scripts to drive the radio hardware.

The algorithms we developed last year on improving FreeDV acquisition and designing filters came in handy, especially at the low SNRs required for this work.

The first test was “over the cable” (OTC) at VHF (144.5 MHz) using a HackRF transmitter, switchable attenuator as the “channel”, and a RTLSDR as the receiver. The noise (N) is injected by the physical properties (noise figure) of the RTLSDR receiver, so the S/N is controllable by the level (S) presented by the switchable attenuator. My calculations indicated it should work around the -135dBm level, and sure enough it did – sounding just like the simulations (see Feb 2024 for examples). This was a great confidence boost as it’s hard to argue with real world noise, but easy to mess up the calibration of noise simulated by software.

For comparison a narrow band FM signal will fall over at around -120dBm, and a first generation digital VHF radio (using proprietary speech codecs) perhaps a few dB lower. Although to be fair the digital VHF systems also transmit ancillary digital data at the same time, which consumes some of their power and bandwidth.

Radio Autoencoder VHF signal on my spectrum analyser – getting hard to measure as it’s close to the noise floor of the spec-an

Next step was HF radio, a somewhat tougher channel. This required quite a bit more work on the OFDM sync algorithms, but eventually I was ready to transmit a signal using my HF radio. I sent a 5W signal over a 500km HF path to a KiwiSDR, and passed the received signal through the Radio Autoencoder system. Much to my surprise, it worked first time! Good quality audio over several different paths and channel types, up to 2000 km away. It seems quite robust to the channels I have tested so far, including NVIS, EMI corrupted receivers, and SNRs below 0dB.

Plot of test signal sent over HF radio – a sine wave tone, compressed SSB, Radio Autoencoder signal. All signals have the same RMS power.
Sine wave header and compressed SSB received over a 475km HF path
Radio Autoencoder signal received over the same 475km HF path
Spectrogram of received signals over a 30km Near Vertical Incidence (NVIS) path – chosen as the fading is pretty bad when the ground and sky wave mix. Note the “barber pole” effect on the Radio Autoencoder signal RHS.

These are encouraging results for the Radio Autoencoder. I’m now pondering next steps. I think it makes sense to test the system with some more samples and over different channels. Plus so many things we could do with the Machine Learning side, like using ML instead of classical DSP for synchronisation, and trying our PAPR reduction system over the air. Also, at some point we need a C port so this can be used in real time by anyone.

FreeDATA Update

Part of our ARDC grant activities is to support the FreeDATA project. Simon and team have recently completed a major re-write and FreeDATA is back on the air. This month I’ve been working with Simon on a faster modem waveform for “ACK” packets, that will help speed up the FreeDATA protocol. I’m also pleased to see FreeDATA working over real HF channels, including this 7 hour 1.44Mbyte file transfer over an 800km path.

David’s FreeDV Update Feb 2024

This month I’ve been working on a feasibility study using an autoencoder derived from RDOVAE [1], based on code originally written by Jean Marc Valin and Jan Büthe for an Opus application. The goal is to see if we can send good quality speech over HF multipath channels at low SNRs.

The autoencoder takes as input a typical set of vocoder features (short term spectrum, pitch, voicing), then applies time based prediction and transforms to arrive at a small number of parameters that can be sent over a channel. This is similar to an old school vocoder that uses classical DSP, except Machine Learning (ML) allows us to learn non-linear transforms and prediction, which tend to be more powerful.

Usually, after the transformation/prediction stage we then quantise to a low bit rate, then use Forward Error Correction (FEC) and modems to send the bits over a channel. However this latest work takes a novel twist – we train the autoencoder to generate PSK symbols that we send over the channel. It effectively combines quantisation, channel coding, and modulation. The symbols tend to cluster around +/-1 like BPSK but are continuously valued. So it’s like a discrete time, continuously valued (analog) PSK.

Scatter Plot showing the signal constellation from the Radio Autoencoder, with symbols mapped to two dimensions like QPSK. Unlike conventional PSK, they are continuously valued.
A 3D scatter plot makes the picture clearer. Most symbols are at the +/- 1 points – the network has learned that in the presence of noise, these are the best points.


This month I’ve been building up the code required to test the idea over multipath (HF) channels. This mean reshaping the PSK symbols into an OFDM modem frame, and adding a multipath simulation. The initial results are encouraging, with speech quality better than any existing FreeDV mode, and competitive with SSB at low SNRs. At high SNRs the quality is also quite good, better than analog FM.

Simulated SSB with compressor at -3dB SNR on an AWGN channel.

The Radio Autoencoder at -3dB SNR on an AWGN channel.

Spectrogram of received signal at -3dB SNR, the autoencoder output has been mapped to an OFDM signal about 1000 Hz wide.

However this is all early days. To expedite answering the key questions, the current simulation ignores a lot of real world issues like acquisition, phase, frequency and timing offset correction. I reasoned that we have classical DSP solutions to these problems that work pretty well, so instead I focused on multipath performance as experience has shown that is the toughest issue with HF digital speech.

The ML code used for training includes a channel model. As an experiment, I added a saturating HF power amplifier model. The output was an OFDM modem waveform with a 1dB Peak to Average Power Ratio (PAPR), which is an excellent result. Our FreeDV waveforms run at around 4.5 dB, and SSB with a good compressor 4-6dB.

ML systems tend to work well until they experience conditions outside what they have been trained for. So I’m taking small steps, and planning to test a variety of channel impairments one by one, looking for that “ML gotcha”. I’m also spending way to much time checking my channel model calculations – handling the shift from digital PSK to analog has taken some careful thought and is a bit mind bending after 35 years of work in digital PSK!

The next step is to build up acquisition and synchronization code, and get to a point where we can send and receive signals over real RF channels. I’ll start with an Over The Cable (OTC) test on the bench, and work up to the point where we can play stored files over real HF channels.

[1] J.-M. Valin, J. Büthe, A. Mustafa, Low-Bitrate Redundancy Coding of Speech Using a Rate-Distortion-Optimized Variational Autoencoder, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023.

Davids FreeDV Update Jan 2024

This month I’ve been trying to apply Machine Learning (ML) techniques to quantise Codec 2 features.

At low bit rates, Codec 2 [1] sends a smoothed version of the speech spectrum, that is represented by 20 spectrum samples, which is updated every 10-40ms. The first sample covers 200-300 Hz, the second 300-450 Hz, up to the last sample at 3700 Hz. These 20 spectral samples are grouped together into a vector, or in ML-speak “features”, that we need to somehow send over the radio channel.

Here is a 3D plot of those 20 samples, plotted over 250 frames (2.5 seconds) of time. The Y axis is amplitude in dB. Turns out that adjacent samples are similar, (or correlated) along both the time and frequency axis.

The key advantage of ML over classical DSP is it can find linear and non-linear correlations, leaving us less information to transmit over the channel. If there is less information, then we require less bits for a given speech quality level.

I worked in two stages. First I built a simple autoencoder than reduced the number of Codec 2 features that needed to be quantised from 20 to 10. This means the ML network has worked out how to reduce the amount of information by about half. It does this by working out that some of the adjacent samples are similar, so we only need to send the information common to both of them. I then applied Vector Quantisation (VQ) to quantise the dimension 10 vectors, and obtained reasonable speech quality at 24 bits/frame. I’ve documented the work in [2], including lessons learned.

The following three samples show a single sample after passing through each processing stage. The idea is to have minimal change between each sample. In the final sample we have quantised the speech spectrum with 24 bits/frame. If we send the 24 bits/frame over the channel every 20ms this would result in 24/0.02 = 1200 bit/s.

Dimension 20 Vector
Autoencoder Dimension 10
Autoencoder Dimension 10 and 24 bit/s frame Vector Quantiser (VQ)

These techniques could be used to build an improved quality Codec 2 mode in the 1200-2400 bit/s range. As a next step I’d like to work out how to include the pitch and voicing information in the same vector, and take advantage of correlation across time, which might lead to techniques applicable for even lower bit rates.

Towards the end of the month, I started to investigate the latest LPCNet technology (a high quality ML vocoder), and how it can be applied to our goal of high quality speech over HF radio channels. There has been some interesting work in LPCNet quantisation [3], that may be useful for HF radio. To explore this I’ve kicked off a feasibility study, and have built up a PC with a RTX4090 GPU card, as it requires some serious ML resources for training.

[1] Codec2
[2] ML Quantisation of Codec 2 Features
[3] Low-Bitrate Redundancy Coding of Speech Using a Rate-Distortion-Optimized Variational Autoencoder



David’s FreeDV update Dec 2023

Codec 2 Algorithm Description

This month I finished the Codec 2 algorithm description document. It was quite a lot of work, and ended up being 30 pages long. Thanks jimt for proof reading and Mooneer for helping with the automation (we rebuild the doc as part of our automated tests).

There was some discussion on this Codec 2 mailing list thread around the need for a formal specification versus the documentation/reference code/test approach I have taken in explaining Codec 2. I haven’t had any comments or questions on the technical content yet, I guess the audience interested in the DSP is small.

Codec 2 Machine Learning and Male Speech

For a few months I’ve been exploring the use of Machine Learning (ML) with Codec 2. This has been something of a side project, as I don’t feel competent in ML. While the field shows a lot of promise, in the past I have struggled to build any ML systems that actually do something useful. However my side project appears to work and I have some meaningful improvements in speech quality of the core Codec 2 vocoder. As I’m on a learning curve, I’ve only billed a fraction of the hours spent on this work to the ARDC grant. Thanks Jean-Marc for your tips on ML and inspiring work.

The project involved building a filter that “narrows” the bandwidth of vocal tract resonances (or formants) for low pitched male speakers. It’s these peaks (known as formants) that convey the information in speech. It addresses the problem with energy distribution that I mentioned in the October report.

The following plot shows the ML “inference” in action. The aqua plot is the smoothed spectrum (ML input), and red is the ML output, which is pretty close to the ideal (green).

The smoothed spectrum (aqua) is an intermediate processing step that reduces the information (and hence bit rate) we need to transmit. Unfortunately it messes up low pitched, male speakers, while high pitched speakers such as females pass through OK. This is puzzling, as it seems reasonable to assume that male and female speech has the same amount of information. I have a theory that the formant bandwidth is important for male speech. So I designed a ML system to recover the narrow formant bandwidths for males from the smoothed spectrum. It’s a bit like un-blurring an image. This is tricky using traditional DSP that can only do linear transformations, but finding complex, non-linear relationships is something that ML is meant to be good at.

Here are some speech samples, with Codec 2 3200 as a reference (used for M17, and roughly the same quality as AMBE/MELP). Especially through headphones, the ML input sounds buzzy and muffled. Please note the ML output sample is not quantised (that’s the next step), but previous work suggests this could be quantised to about 1600 bits/s at the same quality using traditional DSP, and perhaps lower with ML based quantisation.

ML Input, smoothed spectrum, wide bandwidth formants
ML Output, narrow formant bandwidths restored
Codec 2 3200 bit/s mode (reference anchor)

This addresses a long standing mystery (to me at least) of why low pitched male speakers sound poorer than females when the spectrum is coarsely represented. I had previously addressed this issue with a poorly understood “post filter” that used parameters based on educated guesses.

Next step is to see if we can use ML to help quantise the Codec 2 model parameters. If I can gain skills in this area we may be able to improve speech quality at a given bit rate and perhaps robustness to channel errors.

David’s FreeDV Update Nov 2023

This month I’ve been busy documenting the Codec 2 algorithm. Codec 2 evolved from some code I developed in the 1990s when I studied speech coding. Around 2009 I pulled that code off the shelf and turned it into a practical speech codec, adding bits and pieces over the next decade. So now I’m making an effort to pull all the algorithms details together into one document. It’s a work in progress, currently located in this PR.

One mission of this project is to explain how speech codecs work, as they are often shrouded in mystery and indeed understanding is often discouraged by various closed source strategies.

I’m explaining Codec 2 at two levels, the first is aimed at the Radio Amateur, at a technical level that could be published in a Ham Radio magazine. Codec 2 was written by Hams for Hams so this is important. The second level is a deeper dive into the DSP, using math where appropriate, and assuming a familiarity with signal processing.

As I pull the various building blocks together and write about each one, I realise it is rather complicated. There are a lot of moving parts, and it’s been a while since I looked at it as a whole. So it’s important to document the algorithm to make it easier to understand for others, and as a baseline to help reach our ARDC project goals like improved speech quality and robustness over HF channels.

I feel the algorithm is best described by a combination of the source code, text description and math. Sometimes the source code does a better job of explaining than text or math, and want to avoid describing things twice (don’t repeat yourself principle).

Automated tests can be useful in explaining the algorithm, so I have made a list of tests and source code cleans ups I’d like to work on. I’m also including some GNU Octave simulations that let us peer into the algorithm and run it step by step – like a software oscilloscope.

Some of the code is 30 years old now, and still builds and runs cleanly. A testament to the longevity of C I guess. Signal processing, like math and physics, tends to remain relevant over time. Our human speech production hardware is somewhat older than 30 years and isn’t likely to change soon.

ARDC Grant & Project Plan

In early 2023, we were fortunate to receive a grant from the ARDC, to support a two year program of FreeDV development. This post in an excerpt from our grant application that describes the project plan we are now busy executing.



I have a background in project planning, so with the FreeDV Project Leadership team (PLT) designed the plan in work package form, much as I would plan a commercial project.

The work package breakdown is provided below, however we expect this will evolve over the course of the project. The grant funding will be directed by the FreeDV PLT to best accomplish our mission of creating great free software and hardware for digital voice communication over HF radio.

WP1000 Project Management. Budget at 10% of project. Coordinate resources, hiring a DSP engineer, and communication with stakeholders.  Evaluate progress against the plan, revise plan as project evolves.

WP2000 Codec 2 Improvement.  In this WP we will attempt to improve the speech quality of two Codec 2 modes, at a low (around 700 bit/s) and moderate (1200-2400) bit/s rate. Establish requirements for Codec 2 (bit rate, voice quality, background noise robustness, CPU load). The most promising areas are spectral quantisation and a better excitation model.  Develop algorithms for handling background noise. The moderate bit rate modes in Codec 2 (above 700 bits/s) have not been actively developed for many years, so there is likely some low hanging fruit here.  Progress in this WP will be measured by conducting tests using speech samples processed with the revised modes and SSB/commercial codecs as references.

WP3000 Rag chew mode.  A waveform for HF digital voice will be designed to use the new 1200-2400 bits/s mode from WP2000. Waveform design involves designing the modem, FEC code selection, and implementation.  The new waveform will be supported by automated tests, and integrated into libcodec2 and freedv-gui for use over the air.  A option for a high quality mode is to employ the neural speech coding (as per the experimental FreeDV 2020 modes). 

WP4000 Low SNR mode. Using our existing 700D/700E modes, carefully investigate modem performance over HF multipath channels (real and simulated), measure performance compared to theory, and look for potential optimisations.  Investigate the use of non-pilot symbol synchronization methods to lower waveform overheads. Investigate methods for Peak to Average Power Ratio (PAPR) reduction by Vector Quantiser (VQ) and interleaver optimisation.  With the new 700 bit/s codec from WP2000, develop a new low SNR waveform, integrate into libcodec2 and freedv-gui for use over the air.  Success will be measured by conducting controlled over the air experiments where we compare SSB to the low SNR mode using the same speech samples and Peak power.  The goal is to outperform SSB at low SNRs.

WP5000 Commercial radio integration.  When WP3000 and WP4000 are mature, reach out to commercial radio companies.  Promote the benefits of FreeDV: open standards, no license fees, and performance competitive to SSB.  Options for integration include linking libcodec2 into their radios DSP/Host CPU, or a plug-in OEM FreeDV module using a STM32 or ESP32 microcontroller.  Success will be measured against the goal of integrating FreeDV into at least 2 COTS HF radios serving the Amateur Radio market.

WP6000 HF Data Modes: Extend the current suite of HF data waveforms by developing and testing a high bit rate/high SNR QAM mode and sub 0dB SNR low SNR mode. Work with FreeDATA to integrate and test Over the Air (OTA).  Conduct an automated test campaign over many months that provides objective evidence of system performance over real world channels.

WP7000 ezDV Development: Extend ezDV with additional functionality and for ease of use. Develop a usable enclosure for the ezDV board and optimize the hardware for production, as well as manufacture a small run of devices for general ham use and testing. 

WP8000 Packaging Improvements: Improves the codebase and workflow to enable easier packaging of official releases by the major Linux distributions. Code signing (for platforms that heavily encourage/require it) will be put in place.

WP9000 CPU load optimizations: These are items intended to reduce the CPU required to run FreeDV. This activity is scheduled to start after the new modes have been developed.

WP10000: freedv-gui improvements: These are items intended to improve the general usability of the FreeDV application.

WP11000 Ongoing development and maintenance: libcodec2, freedv-gui, and embedded platforms (ezDV, the SM1000 successor). In addition to ongoing maintenance such as documentation of Codec 2 algorithm and FreeDV waveforms to a professional level.  This will encourage commercial adoption.

WP12000 Ongoing promotion: Provide a presence at major hamventions in the form of a FreeDV/Codec2 booth and/or talks given by the group. This presence would have live demos of FreeDV (both using the application and with embedded devices such as ezDV). There would be up to five hamventions targeted in this grant, with the ones selected based on historical attendance figures.

The project will also participate in various forms of other promotion (for example, on-air events, mailing lists, social media and discussion forums). Usage figures can be measured from online reporting during events (e.g. reports that are sent from the FreeDV PC application to the PSK Reporter service).

David’s FreeDV Update Oct 2023

This post is summary of the work I have performed in October 2023 for our Enhancing HF Digital Voice With FreeDV ARDC grant.

Acquisition

My deep dive into the OFDM modem algorithms continues, this month focusing on the algorithms used by a FreeDV receiver to lock onto off air signals. The goal is fast sync with up to +/- 200 Hz frequency offset on low SNR multipath fading channels. In the past this has been quite a challenge, and was a source of problems with the earlier versions of FreeDV 700D.

This time around I’m using a little probability to describe the chances of successfully acquiring the FreeDV signal. A 90% chance means that 9 times out of 10 you will acquire the FreeDV signal in one frame (about 180ms). On poorer channels it may take 2 or 3 frames. Some other problems to watch out for are false acquisition – we don’t want to output speech when there is random noise on the channel, or acquire on a carrier when someone is tuning up. So there is a trade off in being sensitive enough to detect weak signals, but ignore those that aren’t valid FreeDV signals.

Anyway, I’ve been working through those issues one by one, doing a little math, running some simulations, and writing up the results. The acquisition work is documented in Section 6 of this report. It’s getting close to the point where we can update the OFDM modem code and try it over the air.

Codec 2

I’ve started working on Codec 2 again, revisiting some of the algorithms I prototyped in the ratek resampler study. In particular I’m looking at why low pitched speakers like males require a higher bit rate to quantise the spectrum than high pitched speakers (females or children). I have a theory that it is related to the distribution of energy over a pitch cycle.

Here’s a plot of some speech that shows the problem. At top we have the original (input) speech, at bottom after it’s passed through Codec 2. Notice how at the bottom the signal decays before the end of each cycle? The energy is confined to the start of each pitch cycle, rather than being more evenly distributed like the top plot. This appears to be related to a drop in speech quality with male speakers.


If we can understand the sources of distortion, we can improve Codec 2 speech quality at all bit rates.

Administration

On the admin side I’ve been writing up some Work Package descriptions – chunks of work we need to get done to reach the goals of our project. They contain a list of tasks to be done and a description of deliverables. The idea is to be very clear about the work we need done so it slots in neatly with the rest of the project. If you are interested in working on any of these (as a paid or volunteer team member), please contact us.

The development of various policies is continuing – we’ve put a freeze on maintenance for existing features that are likely to be superseded, in order to focus on the new and exciting work. We’re also working on a process for reviewing feature requests – we want to make sure any significant work we do is lined up with our project goals like improving Codec 2 and enhancing HF Digital voice. As our resources are finite we need to “filter” feature requests somehow.

If you want to contribute code or have a feature request for this project, it’s a very good idea to contact us before writing any code or raising a PR. We have many years of experience and very good idea of what needs to be done. We could really use your skills and enthusiasm, but would like to make sure we work together in ways that will most benefit this project.