David’s FreeDV Update – March 2024

This month was spent building up the “classical” DSP support code around the Radio Autoencoder, so I could test it over the air using real radio signals. Can we repeat the impressive low SNR results from simulation over real radio channels? This meant coding up an OFDM modem in PyTorch, lots of testing, and a bunch of support scripts to drive the radio hardware.

The algorithms we developed last year on improving FreeDV acquisition and designing filters came in handy, especially at the low SNRs required for this work.

The first test was “over the cable” (OTC) at VHF (144.5 MHz) using a HackRF transmitter, switchable attenuator as the “channel”, and a RTLSDR as the receiver. The noise (N) is injected by the physical properties (noise figure) of the RTLSDR receiver, so the S/N is controllable by the level (S) presented by the switchable attenuator. My calculations indicated it should work around the -135dBm level, and sure enough it did – sounding just like the simulations (see Feb 2024 for examples). This was a great confidence boost as it’s hard to argue with real world noise, but easy to mess up the calibration of noise simulated by software.

For comparison a narrow band FM signal will fall over at around -120dBm, and a first generation digital VHF radio (using proprietary speech codecs) perhaps a few dB lower. Although to be fair the digital VHF systems also transmit ancillary digital data at the same time, which consumes some of their power and bandwidth.

Radio Autoencoder VHF signal on my spectrum analyser – getting hard to measure as it’s close to the noise floor of the spec-an

Next step was HF radio, a somewhat tougher channel. This required quite a bit more work on the OFDM sync algorithms, but eventually I was ready to transmit a signal using my HF radio. I sent a 5W signal over a 500km HF path to a KiwiSDR, and passed the received signal through the Radio Autoencoder system. Much to my surprise, it worked first time! Good quality audio over several different paths and channel types, up to 2000 km away. It seems quite robust to the channels I have tested so far, including NVIS, EMI corrupted receivers, and SNRs below 0dB.

Plot of test signal sent over HF radio – a sine wave tone, compressed SSB, Radio Autoencoder signal. All signals have the same RMS power.
Sine wave header and compressed SSB received over a 475km HF path
Radio Autoencoder signal received over the same 475km HF path
Spectrogram of received signals over a 30km Near Vertical Incidence (NVIS) path – chosen as the fading is pretty bad when the ground and sky wave mix. Note the “barber pole” effect on the Radio Autoencoder signal RHS.

These are encouraging results for the Radio Autoencoder. I’m now pondering next steps. I think it makes sense to test the system with some more samples and over different channels. Plus so many things we could do with the Machine Learning side, like using ML instead of classical DSP for synchronisation, and trying our PAPR reduction system over the air. Also, at some point we need a C port so this can be used in real time by anyone.

FreeDATA Update

Part of our ARDC grant activities is to support the FreeDATA project. Simon and team have recently completed a major re-write and FreeDATA is back on the air. This month I’ve been working with Simon on a faster modem waveform for “ACK” packets, that will help speed up the FreeDATA protocol. I’m also pleased to see FreeDATA working over real HF channels, including this 7 hour 1.44Mbyte file transfer over an 800km path.

David’s FreeDV Update Feb 2024

This month I’ve been working on a feasibility study using an autoencoder derived from RDOVAE [1], based on code originally written by Jean Marc Valin and Jan Büthe for an Opus application. The goal is to see if we can send good quality speech over HF multipath channels at low SNRs.

The autoencoder takes as input a typical set of vocoder features (short term spectrum, pitch, voicing), then applies time based prediction and transforms to arrive at a small number of parameters that can be sent over a channel. This is similar to an old school vocoder that uses classical DSP, except Machine Learning (ML) allows us to learn non-linear transforms and prediction, which tend to be more powerful.

Usually, after the transformation/prediction stage we then quantise to a low bit rate, then use Forward Error Correction (FEC) and modems to send the bits over a channel. However this latest work takes a novel twist – we train the autoencoder to generate PSK symbols that we send over the channel. It effectively combines quantisation, channel coding, and modulation. The symbols tend to cluster around +/-1 like BPSK but are continuously valued. So it’s like a discrete time, continuously valued (analog) PSK.

Scatter Plot showing the signal constellation from the Radio Autoencoder, with symbols mapped to two dimensions like QPSK. Unlike conventional PSK, they are continuously valued.
A 3D scatter plot makes the picture clearer. Most symbols are at the +/- 1 points – the network has learned that in the presence of noise, these are the best points.


This month I’ve been building up the code required to test the idea over multipath (HF) channels. This mean reshaping the PSK symbols into an OFDM modem frame, and adding a multipath simulation. The initial results are encouraging, with speech quality better than any existing FreeDV mode, and competitive with SSB at low SNRs. At high SNRs the quality is also quite good, better than analog FM.

Simulated SSB with compressor at -3dB SNR on an AWGN channel.

The Radio Autoencoder at -3dB SNR on an AWGN channel.

Spectrogram of received signal at -3dB SNR, the autoencoder output has been mapped to an OFDM signal about 1000 Hz wide.

However this is all early days. To expedite answering the key questions, the current simulation ignores a lot of real world issues like acquisition, phase, frequency and timing offset correction. I reasoned that we have classical DSP solutions to these problems that work pretty well, so instead I focused on multipath performance as experience has shown that is the toughest issue with HF digital speech.

The ML code used for training includes a channel model. As an experiment, I added a saturating HF power amplifier model. The output was an OFDM modem waveform with a 1dB Peak to Average Power Ratio (PAPR), which is an excellent result. Our FreeDV waveforms run at around 4.5 dB, and SSB with a good compressor 4-6dB.

ML systems tend to work well until they experience conditions outside what they have been trained for. So I’m taking small steps, and planning to test a variety of channel impairments one by one, looking for that “ML gotcha”. I’m also spending way to much time checking my channel model calculations – handling the shift from digital PSK to analog has taken some careful thought and is a bit mind bending after 35 years of work in digital PSK!

The next step is to build up acquisition and synchronization code, and get to a point where we can send and receive signals over real RF channels. I’ll start with an Over The Cable (OTC) test on the bench, and work up to the point where we can play stored files over real HF channels.

[1] J.-M. Valin, J. Büthe, A. Mustafa, Low-Bitrate Redundancy Coding of Speech Using a Rate-Distortion-Optimized Variational Autoencoder, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023.

Davids FreeDV Update Jan 2024

This month I’ve been trying to apply Machine Learning (ML) techniques to quantise Codec 2 features.

At low bit rates, Codec 2 [1] sends a smoothed version of the speech spectrum, that is represented by 20 spectrum samples, which is updated every 10-40ms. The first sample covers 200-300 Hz, the second 300-450 Hz, up to the last sample at 3700 Hz. These 20 spectral samples are grouped together into a vector, or in ML-speak “features”, that we need to somehow send over the radio channel.

Here is a 3D plot of those 20 samples, plotted over 250 frames (2.5 seconds) of time. The Y axis is amplitude in dB. Turns out that adjacent samples are similar, (or correlated) along both the time and frequency axis.

The key advantage of ML over classical DSP is it can find linear and non-linear correlations, leaving us less information to transmit over the channel. If there is less information, then we require less bits for a given speech quality level.

I worked in two stages. First I built a simple autoencoder than reduced the number of Codec 2 features that needed to be quantised from 20 to 10. This means the ML network has worked out how to reduce the amount of information by about half. It does this by working out that some of the adjacent samples are similar, so we only need to send the information common to both of them. I then applied Vector Quantisation (VQ) to quantise the dimension 10 vectors, and obtained reasonable speech quality at 24 bits/frame. I’ve documented the work in [2], including lessons learned.

The following three samples show a single sample after passing through each processing stage. The idea is to have minimal change between each sample. In the final sample we have quantised the speech spectrum with 24 bits/frame. If we send the 24 bits/frame over the channel every 20ms this would result in 24/0.02 = 1200 bit/s.

Dimension 20 Vector
Autoencoder Dimension 10
Autoencoder Dimension 10 and 24 bit/s frame Vector Quantiser (VQ)

These techniques could be used to build an improved quality Codec 2 mode in the 1200-2400 bit/s range. As a next step I’d like to work out how to include the pitch and voicing information in the same vector, and take advantage of correlation across time, which might lead to techniques applicable for even lower bit rates.

Towards the end of the month, I started to investigate the latest LPCNet technology (a high quality ML vocoder), and how it can be applied to our goal of high quality speech over HF radio channels. There has been some interesting work in LPCNet quantisation [3], that may be useful for HF radio. To explore this I’ve kicked off a feasibility study, and have built up a PC with a RTX4090 GPU card, as it requires some serious ML resources for training.

[1] Codec2
[2] ML Quantisation of Codec 2 Features
[3] Low-Bitrate Redundancy Coding of Speech Using a Rate-Distortion-Optimized Variational Autoencoder



David’s FreeDV update Dec 2023

Codec 2 Algorithm Description

This month I finished the Codec 2 algorithm description document. It was quite a lot of work, and ended up being 30 pages long. Thanks jimt for proof reading and Mooneer for helping with the automation (we rebuild the doc as part of our automated tests).

There was some discussion on this Codec 2 mailing list thread around the need for a formal specification versus the documentation/reference code/test approach I have taken in explaining Codec 2. I haven’t had any comments or questions on the technical content yet, I guess the audience interested in the DSP is small.

Codec 2 Machine Learning and Male Speech

For a few months I’ve been exploring the use of Machine Learning (ML) with Codec 2. This has been something of a side project, as I don’t feel competent in ML. While the field shows a lot of promise, in the past I have struggled to build any ML systems that actually do something useful. However my side project appears to work and I have some meaningful improvements in speech quality of the core Codec 2 vocoder. As I’m on a learning curve, I’ve only billed a fraction of the hours spent on this work to the ARDC grant. Thanks Jean-Marc for your tips on ML and inspiring work.

The project involved building a filter that “narrows” the bandwidth of vocal tract resonances (or formants) for low pitched male speakers. It’s these peaks (known as formants) that convey the information in speech. It addresses the problem with energy distribution that I mentioned in the October report.

The following plot shows the ML “inference” in action. The aqua plot is the smoothed spectrum (ML input), and red is the ML output, which is pretty close to the ideal (green).

The smoothed spectrum (aqua) is an intermediate processing step that reduces the information (and hence bit rate) we need to transmit. Unfortunately it messes up low pitched, male speakers, while high pitched speakers such as females pass through OK. This is puzzling, as it seems reasonable to assume that male and female speech has the same amount of information. I have a theory that the formant bandwidth is important for male speech. So I designed a ML system to recover the narrow formant bandwidths for males from the smoothed spectrum. It’s a bit like un-blurring an image. This is tricky using traditional DSP that can only do linear transformations, but finding complex, non-linear relationships is something that ML is meant to be good at.

Here are some speech samples, with Codec 2 3200 as a reference (used for M17, and roughly the same quality as AMBE/MELP). Especially through headphones, the ML input sounds buzzy and muffled. Please note the ML output sample is not quantised (that’s the next step), but previous work suggests this could be quantised to about 1600 bits/s at the same quality using traditional DSP, and perhaps lower with ML based quantisation.

ML Input, smoothed spectrum, wide bandwidth formants
ML Output, narrow formant bandwidths restored
Codec 2 3200 bit/s mode (reference anchor)

This addresses a long standing mystery (to me at least) of why low pitched male speakers sound poorer than females when the spectrum is coarsely represented. I had previously addressed this issue with a poorly understood “post filter” that used parameters based on educated guesses.

Next step is to see if we can use ML to help quantise the Codec 2 model parameters. If I can gain skills in this area we may be able to improve speech quality at a given bit rate and perhaps robustness to channel errors.

David’s FreeDV Update Nov 2023

This month I’ve been busy documenting the Codec 2 algorithm. Codec 2 evolved from some code I developed in the 1990s when I studied speech coding. Around 2009 I pulled that code off the shelf and turned it into a practical speech codec, adding bits and pieces over the next decade. So now I’m making an effort to pull all the algorithms details together into one document. It’s a work in progress, currently located in this PR.

One mission of this project is to explain how speech codecs work, as they are often shrouded in mystery and indeed understanding is often discouraged by various closed source strategies.

I’m explaining Codec 2 at two levels, the first is aimed at the Radio Amateur, at a technical level that could be published in a Ham Radio magazine. Codec 2 was written by Hams for Hams so this is important. The second level is a deeper dive into the DSP, using math where appropriate, and assuming a familiarity with signal processing.

As I pull the various building blocks together and write about each one, I realise it is rather complicated. There are a lot of moving parts, and it’s been a while since I looked at it as a whole. So it’s important to document the algorithm to make it easier to understand for others, and as a baseline to help reach our ARDC project goals like improved speech quality and robustness over HF channels.

I feel the algorithm is best described by a combination of the source code, text description and math. Sometimes the source code does a better job of explaining than text or math, and want to avoid describing things twice (don’t repeat yourself principle).

Automated tests can be useful in explaining the algorithm, so I have made a list of tests and source code cleans ups I’d like to work on. I’m also including some GNU Octave simulations that let us peer into the algorithm and run it step by step – like a software oscilloscope.

Some of the code is 30 years old now, and still builds and runs cleanly. A testament to the longevity of C I guess. Signal processing, like math and physics, tends to remain relevant over time. Our human speech production hardware is somewhat older than 30 years and isn’t likely to change soon.

ARDC Grant Project Plan

In early 2023, we were fortunate to receive a grant from the ARDC, to support a two year program of FreeDV development. This post in an excerpt from our grant application that describes the project plan we are now busy executing.



I have a background in project planning, so with the FreeDV Project Leadership team (PLT) designed the plan in work package form, much as I would plan a commercial project.

The work package breakdown is provided below, however we expect this will evolve over the course of the project. The grant funding will be directed by the FreeDV PLT to best accomplish our mission of creating great free software and hardware for digital voice communication over HF radio.

WP1000 Project Management. Budget at 10% of project. Coordinate resources, hiring a DSP engineer, and communication with stakeholders.  Evaluate progress against the plan, revise plan as project evolves.

WP2000 Codec 2 Improvement.  In this WP we will attempt to improve the speech quality of two Codec 2 modes, at a low (around 700 bit/s) and moderate (1200-2400) bit/s rate. Establish requirements for Codec 2 (bit rate, voice quality, background noise robustness, CPU load). The most promising areas are spectral quantisation and a better excitation model.  Develop algorithms for handling background noise. The moderate bit rate modes in Codec 2 (above 700 bits/s) have not been actively developed for many years, so there is likely some low hanging fruit here.  Progress in this WP will be measured by conducting tests using speech samples processed with the revised modes and SSB/commercial codecs as references.

WP3000 Rag chew mode.  A waveform for HF digital voice will be designed to use the new 1200-2400 bits/s mode from WP2000. Waveform design involves designing the modem, FEC code selection, and implementation.  The new waveform will be supported by automated tests, and integrated into libcodec2 and freedv-gui for use over the air.  A option for a high quality mode is to employ the neural speech coding (as per the experimental FreeDV 2020 modes). 

WP4000 Low SNR mode. Using our existing 700D/700E modes, carefully investigate modem performance over HF multipath channels (real and simulated), measure performance compared to theory, and look for potential optimisations.  Investigate the use of non-pilot symbol synchronization methods to lower waveform overheads. Investigate methods for Peak to Average Power Ratio (PAPR) reduction by Vector Quantiser (VQ) and interleaver optimisation.  With the new 700 bit/s codec from WP2000, develop a new low SNR waveform, integrate into libcodec2 and freedv-gui for use over the air.  Success will be measured by conducting controlled over the air experiments where we compare SSB to the low SNR mode using the same speech samples and Peak power.  The goal is to outperform SSB at low SNRs.

WP5000 Commercial radio integration.  When WP3000 and WP4000 are mature, reach out to commercial radio companies.  Promote the benefits of FreeDV: open standards, no license fees, and performance competitive to SSB.  Options for integration include linking libcodec2 into their radios DSP/Host CPU, or a plug-in OEM FreeDV module using a STM32 or ESP32 microcontroller.  Success will be measured against the goal of integrating FreeDV into at least 2 COTS HF radios serving the Amateur Radio market.

WP6000 HF Data Modes: Extend the current suite of HF data waveforms by developing and testing a high bit rate/high SNR QAM mode and sub 0dB SNR low SNR mode. Work with FreeDATA to integrate and test Over the Air (OTA).  Conduct an automated test campaign over many months that provides objective evidence of system performance over real world channels.

WP7000 ezDV Development: Extend ezDV with additional functionality and for ease of use. Develop a usable enclosure for the ezDV board and optimize the hardware for production, as well as manufacture a small run of devices for general ham use and testing. 

WP8000 Packaging Improvements: Improves the codebase and workflow to enable easier packaging of official releases by the major Linux distributions. Code signing (for platforms that heavily encourage/require it) will be put in place.

WP9000 CPU load optimizations: These are items intended to reduce the CPU required to run FreeDV. This activity is scheduled to start after the new modes have been developed.

WP10000: freedv-gui improvements: These are items intended to improve the general usability of the FreeDV application.

WP11000 Ongoing development and maintenance: libcodec2, freedv-gui, and embedded platforms (ezDV, the SM1000 successor). In addition to ongoing maintenance such as documentation of Codec 2 algorithm and FreeDV waveforms to a professional level.  This will encourage commercial adoption.

WP12000 Ongoing promotion: Provide a presence at major hamventions in the form of a FreeDV/Codec2 booth and/or talks given by the group. This presence would have live demos of FreeDV (both using the application and with embedded devices such as ezDV). There would be up to five hamventions targeted in this grant, with the ones selected based on historical attendance figures.

The project will also participate in various forms of other promotion (for example, on-air events, mailing lists, social media and discussion forums). Usage figures can be measured from online reporting during events (e.g. reports that are sent from the FreeDV PC application to the PSK Reporter service).

David’s FreeDV Update Oct 2023

This post is summary of the work I have performed in October 2023 for our Enhancing HF Digital Voice With FreeDV ARDC grant.

Acquisition

My deep dive into the OFDM modem algorithms continues, this month focusing on the algorithms used by a FreeDV receiver to lock onto off air signals. The goal is fast sync with up to +/- 200 Hz frequency offset on low SNR multipath fading channels. In the past this has been quite a challenge, and was a source of problems with the earlier versions of FreeDV 700D.

This time around I’m using a little probability to describe the chances of successfully acquiring the FreeDV signal. A 90% chance means that 9 times out of 10 you will acquire the FreeDV signal in one frame (about 180ms). On poorer channels it may take 2 or 3 frames. Some other problems to watch out for are false acquisition – we don’t want to output speech when there is random noise on the channel, or acquire on a carrier when someone is tuning up. So there is a trade off in being sensitive enough to detect weak signals, but ignore those that aren’t valid FreeDV signals.

Anyway, I’ve been working through those issues one by one, doing a little math, running some simulations, and writing up the results. The acquisition work is documented in Section 6 of this report. It’s getting close to the point where we can update the OFDM modem code and try it over the air.

Codec 2

I’ve started working on Codec 2 again, revisiting some of the algorithms I prototyped in the ratek resampler study. In particular I’m looking at why low pitched speakers like males require a higher bit rate to quantise the spectrum than high pitched speakers (females or children). I have a theory that it is related to the distribution of energy over a pitch cycle.

Here’s a plot of some speech that shows the problem. At top we have the original (input) speech, at bottom after it’s passed through Codec 2. Notice how at the bottom the signal decays before the end of each cycle? The energy is confined to the start of each pitch cycle, rather than being more evenly distributed like the top plot. This appears to be related to a drop in speech quality with male speakers.


If we can understand the sources of distortion, we can improve Codec 2 speech quality at all bit rates.

Administration

On the admin side I’ve been writing up some Work Package descriptions – chunks of work we need to get done to reach the goals of our project. They contain a list of tasks to be done and a description of deliverables. The idea is to be very clear about the work we need done so it slots in neatly with the rest of the project. If you are interested in working on any of these (as a paid or volunteer team member), please contact us.

The development of various policies is continuing – we’ve put a freeze on maintenance for existing features that are likely to be superseded, in order to focus on the new and exciting work. We’re also working on a process for reviewing feature requests – we want to make sure any significant work we do is lined up with our project goals like improving Codec 2 and enhancing HF Digital voice. As our resources are finite we need to “filter” feature requests somehow.

If you want to contribute code or have a feature request for this project, it’s a very good idea to contact us before writing any code or raising a PR. We have many years of experience and very good idea of what needs to be done. We could really use your skills and enthusiasm, but would like to make sure we work together in ways that will most benefit this project.