Tibor Bece and George Karan are collaborating with me on the baseband FM (BBFM) project. Tibor and George are veterans of the land mobile radio (LMR) industry, having worked together for many years and helped develop commercial VHF and UHF radio hardware with over 2 million units manufactured. They are pretty excited about the Radio Autoencoder work and what it could mean for LMR.
George has managed to build the RADE V1 stack, and run the ctests on a variety of embedded platforms, including AM625 – this is a high end embedded processor with enough power to run RADE (including the FARGAN stack); and a Librem 5 phone!
Tibor has been interfacing the BBFM ML stack to a COTs LMR radio, using a modified conventional digital voice frame structure to carry the “analog” BBFM symbols. Unlike my passband demo, this implementation has direct access to the FM modulator and discriminator so it’s a “DC coupled” arrangement – closer to what a real world, commercial implementation would look like.
Like me, Tibor was initially thinking the speech quality and low SNR performance of this technology was in the “too good to be true” category. However he has now performed controlled experiments on his (very well equipped) RF work bench, as was quite surprised to be getting high quality speech at RX signals levels down to -125dBm, several dB lower than analog FM or digital LMR systems like P25 would allow. At this low RF level the cut off is due to framing of the RADE symbols (not BBFM), as he never dreamed it would be necessary to operate at such a low SNR.
Tibor writes:
The 11dB SINAD point (around -121dBm) is where the squelch would normally fail to open, and a P25 frame would start dropping out. The RADE decoder munches through this with great ease, there is some barely perceptible degradation.
All I can say – WOW!
Here are samples (over the same radios) of analog FM and BBFM at various RF input levels from Tibor’s workbench:
This month was focused on improving the integration of RADE with the freedv-gui application. One way this was done is through the creation of an automated test framework in the latter. This framework allows for the injection of audio into the receive or transmit chain and analysis of the result. Currently, we can retrieve the number of times FreeDV goes in and out of sync as well as analyze the loss between the result decoded by freedv-gui and the loss from the RADE reference decoder.
Another benefit of this automated test framework is that we can now automate testing of the FreeDV receive and transmit chain as part of our Continuous Integration process (CI). CI allows FreeDV developers to get immediate feedback when a change breaks existing functionality versus waiting until a user reports breakage after a release, improving the user experience. That said, there was significant initial effort involved in getting virtual audio devices working in our CI environment (and in the case of Linux testing, getting a working virtual GUI environment running).
On the RADE side, some minor work was done as part of the C port to ensure that freedv-gui could still compile. This involved ensuring that files weren’t defined more than once, as well as removing the version of libopus built by FreeDV in favor of the RADE version.
Further improvements will be made in our testing process over the next few months to ensure that freedv-gui produces the best result from RADE and integrates functionality currently missing from RADE (such as reporting of received callsigns).
More information can be found in the commit history below:
This month I conducted a successful test of the Baseband FM (BBFM) waveform, over a short UHF radio link on my bench. This demonstrates high quality, 8000 Hz audio bandwidth speech being transmitted over the air (OTA) using commodity FM UHF radios and machine learning. It’s early days, but the speech quality already appears competitive with analog FM and any VHF/UHF digital voice system I am aware of.
Here is a sample of the “ideal” BBFM audio ( a perfect channel), and the audio through the UHF radio link. The initial word “G” is missing due a sync issue that will be cleaned up soon.
The experimental system was a Yaesu FT-817 with a Rigblaster USB sound interface as the transmitter into a dummy load, and a Yaesu VX3 handheld with a USB dongle sound card as the receiver. I used the Python “passband” modem developed last month so the signal could be sent over the regular 300-3000 Hz audio bandwidth that commodity FM radios provide (i.e. no DC coupling to the FM modulator or special mods).
To test the modem I can send BPSK symbols instead of ML symbols – in this case I could see a bit of distortion on the scatter diagram. However when I plug the ML symbols back in the audio sounds just fine, indicating the system is quite robust to noise as expected. It’s early days so I haven’t set the deviation carefully or fine tuned the system, but this is a fine start.
C Port of Core ML
The next chunk of work from November was a C port of the Python core encoder and decoder at the heart of the RADE system. Fortunately, this is very close to RDOVAE that is now part of Opus, so much of the Opus ML code could be re-used, with the main change being a new set of weights. The C port resulted in a significant drop in CPU load, in fact it’s now hard to measure on my laptop.
Profiling suggests the remaining receiver Python DSP now dominates the CPU load. However I am reluctant to port this to C as (a) it’s complicated so this would take months and (b) I have some improvements planned for RADE V2 which, if successful, will make much of this DSP unnecessary.
End of Over Text
Unlike earlier FreeDV modes RADE V1 does not, at present, have a way of sending small amounts of text over the channel (along side the voice). This is particularly useful for “spotting” RADE signals, e.g. on FreeDV reporter and PSK reporter. We have plans for a text system in RADE V2. but this is several months away. As an interim solution for RADE V1, we are building up a text system that uses the currently empty “End of Over” frame to send digital data. Turns out we have room for 180 bits available there. So every time an over ends, a chunk of text can be sent by the system. I have developed the modem DSP side of this, and it seems to work OK on simulated fading channels at moderate SNRs (e.g. 6dB on fading channels).
Conference Paper
Finally, I have been working on a conference paper on the HF RADE system. This is new technology for HF speech communications, and combines several disparate technologies, e.g. machine learning, speech coding, OFDM, and HF radio. So I am putting in some effort to document and publish the work in a paper, hopefully at a conference (TBD) in 2025.
This month, additional work was done to clean up bugs encountered in the C API for the RADE library. One bug in particular involved an interaction between the threading already present in freedv-gui and the threads Python itself creates (i.e. for PyTorch); this bug resulted in extremely slow operation and even deadlocks in some cases.
Once this work was completed, it was time to integrate it into RADE. This work culminated into the first preview release of FreeDV with RADE support. Initial feedback thus far has been extremely positive, indicating that we’re on the right track with meeting the goals set out in the ARDC grant.
Further work over the next few months will involve fixing bugs discovered by users of this preview release as well as work on adding missing functionality (such as received callsign reporting) and a port of the main logic in the library to C to reduce/eliminate the need for Python.
More information can be found in the commit history below:
The working acronym for the Radio Autoencoder has been changed from RADAE to the more use-friendly RADE (pronounced”raid”).
This month I continued working on getting RADE V1 into a form where it can be used in real time. The hybrid Python/C model seems to be working out quite well, meeting our goal of enabling hams to use the waveform early, and allowing us to spot any significant bugs that we may have missed with the stored file test campaign. It also makes development really fast and efficient compared to coding in C.
To support the RADE release I wrote a RADE Introduction document, that covers the basics of the mode and a description of the waveform. Thank you to the test team for providing feedback on this document, and Yuichi, JH0VEQ, for a Japanese translation.
Initial reports of on air performance are encouraging, and match our experience from the stored file test campaign. This represents a significant step towards our goals for our ADRC funded project:
Improve speech quality to a level comparable to commercial codecs.
Develop a “rag chew” FreeDV mode with subjective speech quality comparable to SSB at high SNRs.
Improve low SNR operation such that FreeDV is superior to SSB over poor HF channels.
We are making good progress on all three goals, although it would be useful to perform some formal subjective tests to support the anecdotal reports. There is some work we could do to improve the usability of real world RADE, e.g. reduce PTT turn around delays, improved acquisition, and integration into SDRs.
RADE V1 is an interim step, and we need to keep moving forward. While a neat way to get the mode “on air” quickly – the hybrid Python model is not the end goal, nor is it intended for widespread packaging and distribution. Next step will be a C port of the core RADE encoder/decoder, which will significantly lower the CPU load and bring us one step closer to a more general purpose C library version of RADE, suitable for distribution and integration into SDRs.
The Baseband FM (BBFM) work (see demos from last month) is also moving along nicely. This project is aimed at high quality speech over VHF/UHF radio. This month I have been developing a single carrier PSK modem that can be used over DC coupled or bandpass filtered FM radio channels. This will support an on air experiments of high quality speech using off the shelf FM radios and handsets.
This is the first preview release of FreeDV containing the new RADE mode. For more information about RADE’s development, check out the blog posts on the FreeDV website:
As this is the first preview release, there are some limitations:
As RADE currently doesn’t return the signal’s signal to noise ratio (SNR), it’s not currently possible to receive it and the other FreeDV modes at the same time (as in, if you choose RADE and push Start, that’s the only mode you can work; you’ll need to stop, choose another mode and start again to work FreeDV with the existing modes).
Squelch cannot currently be disabled with RADE. It’s unknown at this time whether disabling squelch is possible.
Due to compilation problems, 2020/2020B modes are disabled.
There is currently no Windows ARM build; this will hopefully be included in a future preview build. You may be able to use the 64-bit Intel/AMD Windows build in the meantime.
Minimum hardware requirements haven’t been fully outlined, so your system currently may not be able to use RADE. Future planned optimizations may improve this.
FreeDV Reporter does not currently report receiving RADE signals, but will report that you are using it and when you’re transmitting.
Other notes:
These preview builds are significantly bigger than previous releases. This is due to needing to include Python and the modules that RADE requires. Planned porting to C/C++ will eventually negate the need for Python.
The Windows build includes Python but not the modules that RADE requires. As part of the install process, the version of Python built into FreeDV will go out to the internet to download the needed modules.
As development is expected to happen quickly, these preview builds have a six month expiry date (currently April 18, 2025).
32-bit Windows is no longer supported due to its likely inability to work with RADE.
From mid-August to mid-September, we conducted a Radio Autoencoder (RADAE) test campaign in two phases (a) stored files and (b) a prototype real time system. Ten people joined our test group, with many submitting stored file and real time test results. In particular I would like to thank Mooneer K6AQ, Walter K5WH, Rick W7YC, Yuichi JH0VEQ, Lee BX4ACP, and Simon DJ2LS for posting many useful samples, and for collecting samples of voices other than their own to test.
We are quite pleased with the results, here is a summary:
It works well with most speakers, with the exception of one voice tested. We will look into that issue over the next few months.
Some of the samples suggest acquisition issues on certain very long distance channels, but this issue seems to be an outlier, perhaps an area for further work.
RADAE works well on high and low SNR channels. In both cases the speech quality is competitive with SSB.
It works on local (groundwave), NVIS, and International DX channels. It works well for (most) males and females, across several languages.
Prototype real time/PTT tests suggest it also works well for real time QSOs, no additional problems were encountered compared to the stored files tests. Mooneer will tell you more about that in his September report!
Selected Samples
I estimate we collected around 50 samples, here are just a few that I have selected as good examples. I apologise that I don’t have room to present samples from all our testers, however your work is much appreciated and has contributed greatly to moving this project forward.
Our stored file test system sent SSB and RADAE versions immediately after each other, so the channel is approximately the same. Both SSB and RADAE have the same peak power, and the SSB is compressed to around 6dB Peak to Average Power Ratio (PAPR). In each audio sample below, SSB is presented first.
Here is a sample of Joey K5CJ, provided by Rick W7YC. The path is over 13,680km, from Texas, USA to New South Wales, Australia (VK2), on just 25W. Measured SNR was 4dB. Note the fading in the spectrogram, you can hear RADAE lose sync then recover through the fade.
Using another sample of Joey, K5CJ (also at 25W), Rick has provided a novel way to compare the samples:
He writes:
RADAE is in the (R) channel & analog SSB is in the (L) left channel. Listen using stereo speakers, and slide the balance control L-R to hear the impact. Or, listen to it on your smart phone & alternately remove the L & R earbuds – wow. It demonstrates how very well RADAE does over a 13,680 km path!
Here is Lee, BX4ACP, sending signals from Taiwan to Thailand in a mixture of English and Chinese using 100W. The measured SNR was 5dB, and frequency selective “barber pole” fading can be seen on the spectrogram.
Here is Yuriko (XYL of Yuichi JH0VEQ) using 100W over a 846 km path from Niigata Prefecture to Oita Prefecture in Japan. The reported SNR was just 2dB. From the spectrogram of the RADAE signal, the channel looks quite benign with no obvious fading. However I note the chirp at the start has a few “pieces missing”, which suggests the reported SNR was lower than the SNR experienced by the RADAE signal a few seconds later.
Next Steps for HF RADAE
Encouraged by these results, the FreeDV Project Leadership Team (PLT) has decided to press on with the real time implementation of RADAE, and integration into freedv-gui, so that any ham with a laptop and rig interface can enjoy the mode. This work will take a little time, and involves porting (or linking) some of the Python code to C. Once again, we’ll start with a small test team to get the teething problems worked out before making a general release.
ML Applied to Baseband FM
To date the Radio Autoencoder has been applied to the HF radio channel and OFDM radio architectures. We have obtained impressive results when compared to classical DSP (vocoders + FEC + OFDM modems) and analog (SSB).
A common radio architecture for Land Mobile Radio (LMR) at VHF and UHF is the baseband FM (BBFM) radio, which is used for analog FM, M17, DMR, P25, DStar, C4FM etc. For the digital modes, the bits are converted to baseband pulses (often multi-level) that are fed into an analog FM modulator, passed through the radio channel, and converted back into a sequence of pulses by an analog FM demodulator. Channel impairments include AWGN noise and Rayleigh fading due to vehicle movement. Unlike, HF, low SNR operation is not a major requirement, instead voice quality, spectral occupancy (channel spacing), flat fading, and the use of a patent free vocoder are key concerns.
We have been designing a hybrid machine learning (ML) and DSP system to send high quality voice over the BBFM channel. This is not intended to be a new protocol like those listed above, rather a set of open source building blocks (equivalent to vocoder, modulation and channel coding) that could be employed in a next generation LMR protocol.
It’s early days, but here are some samples from our simulated BBFM system, with an analog FM simulation for comparison.
Original
BBFM, CNR=20dB
BBFM, CNR=20dB, Rayleigh Fading at 60 km/hr
Analog FM, CNR=20dB
CNR=20dB is equivalent to a Rx level of -107dBm (many LMR contacts operate somewhat above that). The analog FM sample has a 300-3100Hz audio bandwidth, 5kHz deviation, and some Hilbert compression. For the BBFM system we use a pulse train at 2000 symbols/s, that has been trained using a simulation of the BBFM channel. As with HF RADAE, the symbols tend to cluster at +/-1, but are continuously valued. Compared to the HF work, we have ample link margin, which can be traded off for spectral occupancy (channel spacing and adjacent channel interference).
This work is moving quite quickly, so more next month!
This month was spent continuing the RADAE prototyping and testing efforts started last month. I focused primarily on creating a prototype of the FreeDV application that is able to route audio to/from separate processes that can actually handle the RADAE modulation and demodulation. By doing so, technically-inclined users can get an idea as to how RADAE would work with actual two way QSOs on the air.
One major challenge was getting transmit working reliably. With the prototype scripts, there was a few seconds delay on startup before a modulated signal could actually go out over the air. This was fixed by simply never stopping the TX or RX scripts. There’s still a delay at the beginning but for the current test effort, it’s tolerable.
Another challenge is that forking processes on Windows works significantly differently than on Unix/Linux (especially if you want to route stdin/stdout through your application). RADAE has a feature where on the end of transmission, a special signal is sent out that immediately causes squelch to close. This prevents the R2D2 sounding audio at the end of transmissions that’s common with the existing FreeDV software. Unfortunately, I wasn’t able to get this working on Linux during my testing as something was still keeping file handles open and not in an EOF state (despite my efforts to force the latter).
Regardless, I was able to get enough of a prototype working (along with instructions) to have a two way communication with Walter K5WH, audio of which is below:
Several other users in the test campaign were also able to successfully set up this prototype and have two way contacts as well, helping to prove out RADAE in real-world conditions. We did discover that some of us were accidentally using voice keyer files encoded at 8 kHz sample rate, which impacted audio quality on RADAE (since it was trained on 16 kHz samples).
Additionally, for the above contact (and other testing done during the campaign), Speex noise suppression was disabled. This caused some background noise to enter my side of the contact. One possible future avenue of investigation is the use of rrnoise for noise suppression instead of Speex, as it promises better performance than the latter.
Next up on the list is to actually integrate RADAE in a manner that doesn’t require significant setup (i.e. by just installing FreeDV as is the case today). This will require at least an API wrapper written in C/C++ to accomplish, with possibly some additional C code as required to maintain reasonable performance.
More information can be found in the commit history below:
This month was spent primarily assisting David with RADAE testing. Most of this involved using a web-based interface to generate a file to send out over the air (and subsequently record from a remote SDR and decode).
I also did some work on a version of freedv-gui that is able to use the existing RADAE scripts to have a two-way QSO with someone else also running the same software. So far this appears to work fine on Linux and macOS, but I am running into challenges on Windows. The main challenge is that PyTorch and/or Python seem to run significantly slower in the Windows VM that I’m using than on other platforms, which means that decoding unfortunately can’t happen in real time using this setup. I’ll investigate this further, time permitting, but it’s possible that Windows users will need to use a PC with a nVidia GPU to use the modified version of freedv-gui.
Other than that, some minor bugs and GUI tweaks were done for ezDV and freedv-gui, namely adding the configuration filename to the titlebar (for the latter) and increasing maximum HTTP header length (for the former).
More information can be found in the commit history below:
Many digital voice systems have the ability to send small amounts of digital data in parallel with the compressed voice. For example in FreeDV we allocate a few bits/frame for call sign and grid square (location) information. This is a bit complex with RADAE, as we don’t actually send any “bits” over the system – it’s all analog PSK symbols.
So I’ve work out a way to inject 25 bits/s of data into the ML network along side the vocoder features. The ML magic spreads these bits across OFDM carriers and appears to do some sort of error protection, as I note the BER is quite low and it show some robustness to multipath. I can tune the bit error rate (BER) by adjusting the loss function and bit rate; a few percent BER at low SNRs (where the voice link falls over) is typical.
The plot below shows the “loss” (RMS error) of the vocoder features as a function of SNR (Energy per symbol/noise density). The vertical axis is the mean square error of the vocoder features through the system – lower is better. It’s useful for comparing networks.
So “red” is model17, which is our control with no auxiliary data. Yellow was my first attempt at injecting data, and purple the final version. You can see purple and red are almost on top of each other, which suggests the vocoder speech quality has barely changed, despite the injection of the data. Something for nothing? Or perhaps this suggests the data bits consume a small amount of power compared the vocoder features.
Much of this month was spent preparing for the August test campaign. I performed a dry run of some over the air (OTA) tests, leading to many tweaks and bug fixes. As usual, I spent a lot of time on making acquisition reliable. Sigh.
The automated tests (ctests) were invaluable, as they show up any effects of tuning one parameter on other system functions. They also let me test in simulation, rather than finding obscure problems through unrepeatable OTA tests. The loss function is a very useful measure for trapping subtle issues. A useful objective measure of speech quality is something I have been missing in many years of speech coding development. It’s sensitive to small errors, and saves a lot of time with listening tests.
I have developed a test procedure for the stored file phase of the August 2024 test campaign. The first phase of testing uses stored files (just like the April test campaign) but this time using the new PAPR optimised waveform and with a chirp header that lets us measure SNR. To make preparation and processing easier, I have developed a web based system for processing the Tx and Rx samples. This means the test team can decode RADAE samples by themselves, without using the command line Linux tools. A test team of about 10 people has been assembled and a few of them have already posted some interesting samples (thanks Yuichi, Simon, and Mooneer).
If you would like to actively participate in RADAE testing, please see this post.
The next phase of testing is real time PTT. The Python code runs in real time, so I have cobbled together a bash script based system (ptt_test.sh) – think of it as crude command line version of freedv-gui. It works OK for me – I can transmit in real time using my IC-7200 to KiwiSDRs, and receive off air from the IC-7200. By using loop back sound devices I can also receive from a KiwSDR. The script only runs on Linux and requires some knowledge of sound cards, but if I can find a few Linux-savvy testers we can use ptt_test.sh to obtain valuable early on-air experience with RADAE. This is an opportunity for someone to make the first live RADAE QSO.
An interesting side project was working with Mooneer to establish the feasibility of running RADAE on ezDV. Unfortunately, this looks unlikely. Modern machine learning systems really require a bit more CPU (like a 1GHz multi-core machine). Fortunately, this sort of CPU is pretty common now (e.g. a Raspberry Pi or cell phone). Once RADAE matures, we will need to reconsider our options for a “headless” adapter type platform.