David’s FreeDV Update – May 2024

The last few months have been focused on building up the DSP code required to try the Radio Auto-encoder (RADAE) over the air. In order to answer the big question of “does it really work” as quickly as possible, I had to skim over many intriguing topics. So now that we have a qualified “yes” to the big question – I’ve returned to some Machine Learning (ML) R&D to explore a some intriguing ideas:

  • Reduction of the “latent dimension” and hence RF bandwidth of the RADAE signal.
  • Encouraging the network to train 2 dimensional constellations rather than 1D.
  • Training for low Peak to Average Power Ratio (PAPR) – a potential 6dB improvement.

To date RADAE has used a “latent dimension” of 80 symbols every 40ms, which are mapped to 20 OFDM carriers at 50 symbols/s, resulting in a RF bandwidth of 1000 Hz. I spent some time exploring how to to reduce this to dimension 40, i.e. a 10 carrier, 500 Hz bandwidth signal. This would result in more efficient use of spectrum. With fewer carriers our pilot based equalization work better as there would be more power per pilot symbol. Fewer carriers also helps reduce PAPR. On the negative side, classical communications theory predicts a narrower bandwidth signal will perform worse on HF channels, and may be less power efficient (e.g. BER performance of 8PSK versus QPSK).

The original RADAE design has a one dimensional bottleneck that limits the amplitude of real valued symbols to +/-1. Given additive noise, the network would always place constellation points at +/-1 in order to minimize the effect of noise. As the dimension reduced, distortion increased as there was nowhere in 1D space to place additional constellation points without being unduly affected by noise. I reasoned that encouraging the network to train two dimensional constellations would help. For example in classical digital systems, we can use an 8PSK constellation, each point is equal distance away from the origin. If the SNR is high enough, this can send more information per symbol than QPSK.

So I arranged the elements of the latent vector in complex number pairs (e.g. 20 complex valued symbols for a 40 element latent vector), and set up a two dimensional bottleneck that constrained the magnitude of the complex symbols trained by the network. This worked, I can now obtain good performance from a dimension 40 system. Curiously, the resulting constellations are circles, rather than discrete points.

Constellation of PSK symbols when trained with a 2D bottleneck on the symbol magnitude.

Also this month I developed a method for comparing ML models objectively. The method runs the training database through a trained model at a range of SNRs, and produces curves of model “loss against Eq/No” for the model (Eq is the energy of one PSK symbol). I feel there is a reasonable match between these curves and the subjective speech quality. Having an objective method of measuring a models performance lets me know if I’m on the right track with a ML model design without tedious listening tests.

Loss v Eq/No curves for 4 models. model05 (m5) is the control – this was used for the recent the OTA test campaign, and is a dim=80 1D bottleneck. Model 17 looks comparable (PAPR optimised 2D bottleneck), however m14 & m18 are not so great.
As above, but loss v C/No. This normalizes for the different symbol rates. Now m18 is dim=40, so only has half as many symbols to send across the channel. Given the same Tx power, we therefore have twice the energy per symbol. It now looks competitive to m5 and M17.

OK, so now we have an objective measure for comparing models, a way of training lower dimensional models, and some understanding of 2D constellations: i.e. how to train them, and what to expect from the 2D constellations developed by training.

Using these tools, I attempted to build a PAPR optimised ML model. I estimate a low PAPR waveform has the potential to provide a further 6dB improvement at the receiver compared to a classical DSP OFDM waveform – so this is definitely worth exploring. This requires a “time domain” 2D bottleneck that simulates the way a power amplifier saturates. Combining this with multipath training is tricky, and I have tried several different approaches. At the time of writing I believe I have a way forward with a hybrid time-frequency domain model, and am currently evaluating the results. The design uses OFDM and classical DSP for equalisation, and ML for PAPR optimisation, and achieves a PAPR of less than 1 dB.

Here are some samples that show the PAPR optimised waveform over a simulated multipath poor (MPP) fast fading channel. They both have the same “peak power to noise” P/No ratio. Imagine them both being transmitted from the same radio with 100W peak power, over the same (really bad) HF radio channel, to the same receiver.

Peter, VK5APR, using SSB at a P/No of 39dB (Rx SNR -2.4dB)
Peter, VK5APR, using RADAE model18 also at a P/No of 39dB (Rx SNR 3.4dB)

Note the difference in the receiver SNR. The “S” in S/N is the RMS power at the receiver, which is lower for SSB as the SSB PAPR is higher (around 6dB, after compression). The goal of most radio systems is to maximise the RMS power at the receiver. So with the same transmitter, we have achieved around 6dB higher SNR at the Rx by carefully minimising the PAPR of the RADAE waveform.

Here are the spectrograms, note the model18 dim 40 RADAE signal uses only about 750 Hz of RF bandwidth (500 Hz for the ML PSK symbols plus some bandwidth for OFDM overheads). The moth-eaten effect is the multipath channel wiping out chunks of the signal.

There are many other areas we could explore (e.g. ML based equalization), but as we don’t have infinite time, I’m choosing to time box the ML R&D before we lock in a V1.0 design, and proceed to real time implementation.

Next month I will round out the ML design work, address a few other bugs, and attempt to arrive at a RADAE design suitable for our first real time implementation.

The Right to Innovate in the HF Data Space

On the HF data front, I’ve been working with Simon DJ2LS to test and merge several libcodec2 PRs to support FreeDATA. This work has improved protocol efficiency and enabled Simon to “homebrew” his own custom OFDM waveforms. His first attempt at a new waveform has roughly doubled the highest data transfer speed of FreeDATA. Simon is working on a new FreeDATA release that includes these improvements. We also have a 16QAM prototype waveform under development, which in high SNR channels, will double the speed again.

One of the PRs supports custom configuration of the OFDM modem, for example you can plug in the number of carriers, symbol rate, and number of bits per frame at “init time” without writing any C code. Empowering Hams (and indeed anyone) to build their own HF data waveforms is important. This work “preserves the right to innovate” in the HF data space, a key value of the ARDC.

Mooneer’s FreeDV Update – April 2024

This month, freedv-gui got the following bug fixes and feature enhancements:

  • Resolved a memory leak in the FreeDV Reporter window.
  • Fixed an issue causing the GUI to prevent stopping with PTT input enabled.
  • Fixed broken links in the README file.
  • Added build logic to allow building the application without LPCNet.
  • Reverted previous fix for a delayed filtering bug and adopted an alternate solution.
  • Partially reverted audio device discovery optimization due to Windows-specific bug.
  • Released versions 1.9.9 and 1.9.9.1.
  • Fixed display bug where 800XA radio button is still enabled in RX Only mode.
  • Fixed display bug causing tooltip to block squelch and TX Attenuation values.

ezDV also got the following changes:

  • Added additional debugging options in “make menuconfig”.
  • Fixed a bug where ezDV maintains a connection to FreeDV Reporter even after clearing the callsign and grid square.
  • Added support for Wi-Fi roaming.
  • Updated previous websocket task workaround due to an update to esp_websocket_client to only kill the task when ezDV is powering down.
  • Added initial logic for IPv6 support.
  • Added various memory and performance optimizations

More information can be found in the commit history below:

(Note that all commit logs above were generated with the following command line:)

git log --author="member@email" --after "Month 1, 2024" --before "Month 31, 2024" --all > commit.log

David’s FreeDV Update – April 2024

Breakfast on the Nullarbor Plain

This month I took a vacation, so less work that usual on FreeDV. I traveled by road from my home in Adelaide to Western Australia (WA), reaching the South-Western tip of Australia, about 3000km away. South Western WA is a lovely part of the country, and the trip included the adventure of crossing the 1200km Nullarbor plane (translated: no trees).

But back to the business of HF digital voice. Given the encouraging results from our initial Radio Autoencoder (RADAE) over the air (OTA) tests, we have expanded our program of testing to include Hams from different countries. It takes a lot of work to develop a new speech communication system , so it’s important to validate the design as early as possible. Much smarter to do this in the current simulation form, rather than put in ten times the work on a real time implementation, release it, and find out it falls over in a common use case. This is experience talking – we’ve learned many lessons after a decade of FreeDV development.

So we are testing the prototype RADAE design using crowd sourcing. I have approached several Hams for help in testing RADAE signals over their local radio channels and in different languages. They provide me with a 10s speech sample in their language, I send them back a file of RADAE samples that they can transmit over the air. The received RADAE signal is recorded off air by them (e.g. using a KiwiSDR), we then decode and evaluate. This is all done in non real time using stored files being emailed back and forth.

In particular I would like to thank Kanda JH0PCF, Yuichi JH0VEQ, and Simon DJ2LS for your help. Some take aways:

  1. RADAE works well in Japanese and German, as well as English.
    It handles Near Vertical Incidence Skywave (NVIS) channels quite well. This is typical of local (several hundred km) HF communication in countries like Japan and the UK. The signal goes straight up and straight down.
  2. However on the samples tested so far, RADAE is falling over with long distance (e.g. Japan to Australia) communications, so some more work required there.
  3. The speech quality is competitive with SSB at high and low SNRs, however we need a better way to measure the actual SNR of off air signals to “calibrate” the results. So I need to hit the math on that one to develop a suitable algorithm.

Some comments from the test team:

I listened to the voice that was replied. RADAE has no problem with Japanese demodulation. In fact, I feel that RADAE’s demodulated audio is easier to hear than SSB. I thought the sound quality was close to the 2020 mode implemented in the current version. It’s a strange feeling to be able to experience FM mode quality audio with SSB. (Kanda, JH0PCF)

It feels, if we can hear SSB clearly, then RADAE also works, but as soon as we are coming closer to the edge of ability of hearing SSB, also RADAE struggles. But the radae output is really nice, much better voice output compared to SSB. Using your last example, I first thought, you’ve send me the default audio example (ie source sample), so yes, its really nice. (Simon, DJ2LS)

Here is a example from Kanda at low SNR

Kanda JH0PCF, 20W, received by Tokyo KiwiSDR (SSB)

Kanda JH0PCF, 20W, received by Tokyo KiwiSDR (RADAE)

The two signals are transmitted one after the other, so get (more or less) the same channel conditions. The spectrogram below shows (left to right) the sine wave tone, SSB, then RADAE. Note about half the RADAE signal is wiped out at the start, but it seems to sound OK.

Here’s a medium SNR (8dB-ish) sample from Simon, DJ2LS, over a 1100km path at about 10W peak (1-2W RMS):

SSB
RADAE

Yuichi performed a couple of novel experiments that I hadn’t thought of. Here is RADAE compared to existing FreeDV modes, transmitted a few seconds apart, so over more or less the same channel at roughly the same power. SNRs are around 8dB.

SSB
RADAE
FreeDV 2020B
FreeDV 700E

Unfortunately 2020B didn’t sync on this channel/SNR. I feel the 700E sample could possibly be improved (e.g. levels, microphone, filtering), but it does illustrate the problems with current FreeDV modes – the speech quality borders on the lower level required for communication and it can be a struggle to get consistent results. This is something we have recognized and are attempting to address with our ARDC funded R&D program.

As a fairer comparison, here is the sample used for RADAE testing passed through a FreeDV 700E simulation with no channel errors:

Project Plan Pivot

Given the encouraging results with RADAE, we’ve pivoted our ARDC project plan to focus on RADAE, and have paused development of Codec 2 and FreeDV modes. RADAE appears to be our strongest candidate for satisfying the top three goals we set for ourselves when applying for the ARDC grant:

  1. Improve speech quality to a level comparable to commercial codecs.
  2. Develop a “rag chew” FreeDV mode with subjective speech quality comparable to SSB at high SNRs.
  3. Improve low SNR operation such that FreeDV is superior to SSB over poor HF channels.

We are on track to meet (and indeed exceed) the first two goals, but I think the final goal has yet to be demonstrated (e.g. SSB and the current incarnation of RADAE fall over at roughly the same SNR). There are a few bugs and many practical issues to work through before we have a real world version of RADAE that anyone can use. Plus there will be a few “gotchas” we haven’t thought of yet. Plenty for me to do in the coming months!

I’ve also been working on HF data modem software with Simon from the FreeDATA project. In next months report we hope to present a new FreeDATA release incorporating this work, resulting in a significant boost in FreeDATA performance.