David Dec 2024 – Testing RADE with Automatic Speech Recognition

An important goal of our project is improved speech quality over SSB and both low and high SNRs. We have anecdotal reports of good performance of RADE compared to SSB, but need an objective, controlled way of comparing performance. For speech systems this generally means ITU-T P.800 or P.808 standards based subjective testing. However this is complex and requires skills, experience and resources not available to our team.

A few months ago Simon, DJ2LS suggested the use of Automatic Speech Recognition (ASR). More recently, when discussing the issue of subjective testing, Jean Marc Valin also suggested ASR and provided suggestions for a practical test system. So I spent much of December building up a framework for ASR tests.

The general idea is to take a dataset of speech samples, pass them through simulations of SSB and RADE over HF radio channels, then use a ASR engine to detect the words in the received speech. A post processing system then compares the detected words to the original words and determines the Word Error rate (WER) as a performance metric. Our work uses the Librispeech dataset, and the Whisper ASR system.

These sentences are complex English sentences, spoken quickly with no contextual cues. I have trouble understanding many of them on the first listen. This is a much tougher test than the typical low SNR Amateur Radio contact where someone shouts their callsign 5 times then reports “5 by 9”. For example, here is one sample from the Librispeech dataset processed with SSB/RADE/original (listen to the original last); SSB and RADE were at about 6dB SNR on a MPP (fading) channel.

The plot below show some initial results over 500 sentences. The x-axis is receiver SNR measured in a 3kHz noise bandwidth. The y-axis is the word error rate WER). Green is RADE, and blue SSB. The solid lines are for a AWGN channel, the dashed lines the multi-path poor (MPP) fading channel. The dots (placed arbitrarily on the x-axis) in the lower right are controls, e.g. the FARGAN synthesizer used by RADE with no encoding, 4kHz band limited speech, and the original, clean speech.


A low word error rate (WER), say 5%, would correspond to an effortless “armchair copy”; a 30% WER could be the limits of practical voice communication (1 in 3 words can’t be understood). The distance between the RADE and SSB curves shows the benefits of RADE, at least using this test.

For example, if you draw a line across the 10% WER level, RADE achieves this (dashed MPP curves) at 3dB, SSB at 12dB. The x-axis doesn’t include the PAPR advantage of RADE, which is roughly an additional 5dB when using a transmitter with the same peak power output (depending on how hard the SSB is compressed).

Also this month I have been working on SNR measurement of received RADE signals. This is quite challenging, due to the lack of structure in the ML-generated RADE constellation. At present I’m attempting to use a classical DSP approach using the pilots symbols. This will be the last feature we will add to RADE V1, as we’d like to use the lessons learned to start designing RADE V2.

Mooneer’s FreeDV Update – December 2024

This month involved more improvements to the FreeDV GUI application. One improvement involved the unit test framework; it’s now possible to capture the features decoded by RADE (prior to being fed into the FARGAN codec). This is useful for quantifying changes in the receive pipeline and ensuring that what’s encoded by RADE is also mostly returned by the decoder on a clean channel.

The biggest improvement, however, is the implementation of the same LDPC based callsign encoding and decoding system that’s used in the legacy FreeDV modes. This data is placed in what’s known as the End Of Over (EOO) block at the end of the RADE transmission and allows the application to report received callsigns to FreeDV Reporter and PSK Reporter, albeit only at the end of the transmission. FreeDV Reporter specific logic was added to mitigate this by reporting that a RADE signal is being received once a second while still in sync (just with no callsign), hopefully still allowing people to see that someone’s possibly decoding them in real time.

Since we’re touching the FreeDV Reporter logic, it was also a good opportunity to make some significant changes to the FreeDV Reporter service and website. First, the “left the chat”/”entered the chat” messages were removed by PLT request in order to make it easier to see actual chat messages. Next, the separate popup window for viewing who’s in the chat was removed in favor of an always-visible bar at the bottom of the chat tab containing the callsigns of the users that are logged into chat. The message backlog was also extended to 30 days (from 7 days) and preserved into a database so that the chat messages aren’t lost in the event that the FreeDV Reporter server needs to be restarted.

Besides the above, there were some other minor fixes with the Windows installer/uninstaller along with logic added to detect whether microphone permissions have been granted. RADE is also now called RADEV1 in the FreeDV application to differentiate it versus a future version 2 of RADE. Some infrastructure was also added to be able to sign macOS builds (required to avoid errors involving “damaged” applications in newer versions of macOS).

In any case, we’re now going to focus on additional testing prior to releasing a new preview build of FreeDV for general usage. Hopefully we’ll have additional updates on that soon.

More information can be found in the commit history below:

(Note that all commit logs above were generated with the following command line:)

git log --author="member@email" --after "Month 1, 2024" --before "Month 31, 2024" --all > commit.log