David’s FreeDV Update – June 2024

This month I’ve been working on the DSP detail work required for a practical HF waveform based on RADAE. Not as interesting as the Machine Learning (ML) work, but something we need to grind through for a real world HF speech system.

Acquisition

Acquisition is where we determine (a) is a received signal present and (b) if so what is it’s frequency offset and where each frame of “data” starts (coarse timing). The general approach is to search for the pilot symbols at the start of each frame over a grid of time and frequency points. The problem is complicated by the presence of noise, multipath, and high power ML data symbols.

In my earlier FreeDV work I built some ad-hoc acquisition algorithms but this time I took a more mathematical approach. The problem with RADAE is that it operates at very low SNRs which makes acquisition using traditional DSP difficult. Due to the PAPR optimisation the RMS power of the ML data symbols is higher than the classical DSP pilot symbols used for acquisition. While reduced PAPR is in general a good thing, it complicates detection of the pilots.

So I needed a deep dive into the math behind acquisition to get an extra boost in performance. Anyway, the sums showed me two ways I can improve acquisition performance, and it seems to be working well in simulation down to reasonably low SNRs. I’m pretty sure we can do better with ML-based acquisition, but that’s a significant side-project that I’ll put in the “further work” basket for now.

Automated Tests

There has been a lot of RADAE code developed over the course of 2024, so much that I’m starting to lose track of it myself. So I’ve added a set of automated tests to make sure everything keeps working and help trap any bugs I might introduce as the code develops. It’s also a neat framework to guide future refactoring and a real time/C port.

Chirp SNR estimator

The April Over the Air (OTA) test campaign showed the need for a way to measure the SNR of off-air samples. It needs to work on HF multipath channels which tend to notch out various frequencies. After a few false starts, I’ve built a “chirp” based SNR estimator. At the start of a transmission, I send a few seconds of chirp signal that sweeps over a range of frequencies. The receiver script knows where this signal is and using a little math can come up with a good estimate of the actual channel SNR.

Chirp at the beginning of the spectrogram, followed by SSB, then RADAE. The chirp allows us to measure signal power across a range of frequencies, averaging out the effects of frequency selective fading.

As further work I’d also like to develop a way to measure delay spread, to help optimise the waveform to handle long distance paths.

Interesting Bugs

The previous round of OTA tests was in April. After thinking about the results I found some bugs in the waveform we tested.

I accidentally omitted the cyclic prefix in the waveform tested in April. The cyclic prefix protects us from intersymbol interference, so it “shouldn’t have worked” on HF channels. Exploring just why it worked (and worked rather well) is on the TODO list, and might explain the poor performance on DX channels (e.g Japan to Australia). Sometimes accidents lead to “light bulb” moments.

Another possible bug is the use of fixed timing estimate used for the entire 10 second sample (we don’t adjust timing after the initial estimate). The ionosphere is changing all the time, and the Tx DAC and Rx ADC sample clocks are also slightly different which means a timing estimate that varies over time. So a fixed timing estimate is a bad idea, and I was kind of lucky it worked on most of the samples we collected.

Recent Progress and OTA Low PAPR Tests

So I figure the last few months of work is probably enough for this round of development:

  • Two new low PAPR waveforms (750 and 1500Hz RF bandwidth)
  • Acquisition system improvements
  • Addressing some bugs from the April 2024 test campaign
  • Chirp based SNR measurement to calibrate our OTA tests

While there are many possibilities for further development, I don’t want to go too far down any R&D rabbit holes without checking against real world performance. So I’m preparing for some more stored file OTA tests, to see how we are performing against our stated goals of low and high SNR performance that is competitive with SSB.

Here are some initial samples (using a sample of my voice) of the 1500Hz low PAPR waveform (model17) over a 2000km path at 14.250 MHz, at a few watts transmit power:

Low SNR (0.5dB peak) SSB over 2000km path at 14.250 MHz
Low SNR (0.5dB peak) RADAE over 2000km path at 14.250 MHz
Low SNR spectrogram – significant “barber pole” fading can be seen on the RADAE sample

The SNR is measured from the chirp. The chirp signal has 0dB PAPR, so this is the SNR at the peak power of the SSB and RADAE signals. The RMS power and hence average SNR of the SSB signal would be about 6dB lower (-5.5dB), and the RADAE about 0.8dB lower (-0.3 dB). So with the same power amplifier, RADAE delivers about 5dB more power to the receiver than SSB.

An hour or so later I turned up the power to get a high SNR sample over the same 2000km path:

High SNR (18.5dB peak) SSB sample
High SNR (18.5dB peak) RADAE sample

While much easier to understand, even at high SNR there is quite a bit of background noise with SSB (this could possibly be improved with DSP noise reduction). However there is some “vocoder” distortion on the RADAE signal as well – it’s not totally clean. You actually have to listen fairly carefully to hear differences between the low and high SNR RADAE samples. This might mean we’ve biased the training towards “low SNR”, rather than “highest quality”. These results also suggests we can run 1.5W rather than 100W, for similar speech quality, as 10log10(100/1.5) = 18dB.

While performing these test I noticed a bunch of little things to look into:

  • A pop artifact in one of my samples that goes away when the input speech level changes. Suggests the ML is entering territory is hasn’t seen in training.
  • I’m not sure if my Tx power from my SSB radio is staying constant as intended with a low PAPR waveform – need to sample the actual Tx power and plot on the spec-an. I need to confirm all three signals are at the same peak power.
  • The high SNR RADAE speech quality isn’t consistent across samples, some speakers sound a bit better. This is subjective of course so needs a further look.

Tx Spurious

At high SNRs there is some out of band spurious Tx energy (e.g. from 2000 to 3000 Hz) in the in the PAPR optimised RADAE signal. We should remove this if possible. A naive approach would be to filter this at the RADAE Tx ouput. However from experience we know that a side effect of filtering will be to increase the PAPR that we have so carefully tried to reduce, hence reducing the SNR at the receiver. So while the filtering approach would be acceptable for a high SNR link on crowded bands, it would cost you a few dB at low SNRs. A better approach would be to include spurious reduction in the ML training, for example train the network to reduce the out of band energy or insert a filter in the training loop. Another interesting topic for further work!

Next Steps

Every time I put this technology over real radio channels I learn a lot and have a bunch more questions and tasks added to my TODO list. However I do feel it’s time to focus on building a real time system that we can test with real PTT conversations. Even a rudimentary system that has some teething problems will teach us a lot. We have several ML models we can try (e.g. high and low PAPR, 750 and 1500 Hz wide waveforms), and it’s quite easy to try others as our experience improves.

So I will continue working towards a real time implementation so we can get on the air and test this technology with real time PTT conversations. Some challenges ahead are (a) a state machine sync system that can acquire and determine when an over is complete (b) refactoring the code to run on modem frame size chunks rather than several seconds of samples (c) some way for anyone to run RADAE in real time (either in Python or a C port) with streaming audio (d) other chunks of DSP like tracking frequency, amplitude, and timing offsets as they evolve (e) a way to perform controlled tests and evaluate quality automatically – subjective reports and ad-hoc testing is not very reliable.

2 Replies to “David’s FreeDV Update – June 2024”

  1. Great work! I think you can refer to some methods in 3Gpp, such as adding “comfort noise” in the speech processing stage, and then adding clipping processing, after which the sound will be clear and improve intelligibility.

Leave a Reply

Your email address will not be published. Required fields are marked *