David’s FreeDV Update – August 2024

Many digital voice systems have the ability to send small amounts of digital data in parallel with the compressed voice. For example in FreeDV we allocate a few bits/frame for call sign and grid square (location) information. This is a bit complex with RADAE, as we don’t actually send any “bits” over the system – it’s all analog PSK symbols.

So I’ve work out a way to inject 25 bits/s of data into the ML network along side the vocoder features. The ML magic spreads these bits across OFDM carriers and appears to do some sort of error protection, as I note the BER is quite low and it show some robustness to multipath. I can tune the bit error rate (BER) by adjusting the loss function and bit rate; a few percent BER at low SNRs (where the voice link falls over) is typical.

The plot below shows the “loss” (RMS error) of the vocoder features as a function of SNR (Energy per symbol/noise density). The vertical axis is the mean square error of the vocoder features through the system – lower is better. It’s useful for comparing networks.

So “red” is model17, which is our control with no auxiliary data. Yellow was my first attempt at injecting data, and purple the final version. You can see purple and red are almost on top of each other, which suggests the vocoder speech quality has barely changed, despite the injection of the data. Something for nothing? Or perhaps this suggests the data bits consume a small amount of power compared the vocoder features.

Much of this month was spent preparing for the August test campaign. I performed a dry run of some over the air (OTA) tests, leading to many tweaks and bug fixes. As usual, I spent a lot of time on making acquisition reliable. Sigh.

The automated tests (ctests) were invaluable, as they show up any effects of tuning one parameter on other system functions. They also let me test in simulation, rather than finding obscure problems through unrepeatable OTA tests. The loss function is a very useful measure for trapping subtle issues. A useful objective measure of speech quality is something I have been missing in many years of speech coding development. It’s sensitive to small errors, and saves a lot of time with listening tests.

I have developed a test procedure for the stored file phase of the August 2024 test campaign. The first phase of testing uses stored files (just like the April test campaign) but this time using the new PAPR optimised waveform and with a chirp header that lets us measure SNR. To make preparation and processing easier, I have developed a web based system for processing the Tx and Rx samples. This means the test team can decode RADAE samples by themselves, without using the command line Linux tools. A test team of about 10 people has been assembled and a few of them have already posted some interesting samples (thanks Yuichi, Simon, and Mooneer).

If you would like to actively participate in RADAE testing, please see this post.

The next phase of testing is real time PTT. The Python code runs in real time, so I have cobbled together a bash script based system (ptt_test.sh) – think of it as crude command line version of freedv-gui. It works OK for me – I can transmit in real time using my IC-7200 to KiwiSDRs, and receive off air from the IC-7200. By using loop back sound devices I can also receive from a KiwSDR. The script only runs on Linux and requires some knowledge of sound cards, but if I can find a few Linux-savvy testers we can use ptt_test.sh to obtain valuable early on-air experience with RADAE. This is an opportunity for someone to make the first live RADAE QSO.

An interesting side project was working with Mooneer to establish the feasibility of running RADAE on ezDV. Unfortunately, this looks unlikely. Modern machine learning systems really require a bit more CPU (like a 1GHz multi-core machine). Fortunately, this sort of CPU is pretty common now (e.g. a Raspberry Pi or cell phone). Once RADAE matures, we will need to reconsider our options for a “headless” adapter type platform.

Radio Auto Encoder Test Team

We are ready to start another test campaign for the radio autoencoder (RADAE). This will consist of stored file tests (like the April campaign), and some real time PTT testing. The draft test procedure is here.

If you would like to join the team testing RADAE, please reach out to us directly or via the comments below.

Measuring ESP32-S3 Performance for RADAE

To use FreeDV with commercial radios we have developed a series of “rig adapters” such as the SM1000 and now ezDV. These are embedded devices that run “headless'”(no GUI) and connect between your SSB radio and a microphone/headset to allow it to run FreeDV.

Our latest prototype speech waveform is RADAE, which is showing promise of improved voice quality and robustness over our existing FreeDV modes and indeed SSB. RADAE uses machine learning (ML) and requires significantly more CPU and memory than existing FreeDV modes.

We would like to know if we can run RADAE on the ezDV, which is based around an ESP32-S3.

The RADAE “stack” consists of the RADAE encoder and decoder, and the FARGAN vocoder. The RADAE encoder and decoder requires around 80 MMAC/s (million multiply-accumulates per second) each, and 1 Mbyte of RAM for the weights. The FARGAN vocoder (used only on receive) requires 1 Mbytes of weights, and around 300 MMAC/s of CPU. The CPU is dominated by the FARGAN vododer, which runs on receive. As the weights are quantised to 8 bits the MMACs can be use 8 bit multiply accumulates, which suits many machines with 8 bit SIMD support.

In practice, you want plenty of overhead, so for a 300 MMACS/s algorithm a machine with above 3x this capability will make the port “easy” (e.g. a recompile with a little SIMD assembly language for the heavy lifting). It also allows you to tweak the algorithm, and run other code on the same machine without worrying about real time issues. If the CPU is struggling you will spent a great deal of time optimizing the code and the algorithm – time that could be better spent elsewhere.

ezDV is based on a ESP32-S3 CPU which has two cores that run at about 240 MHz, has 512 kbytes of local (fast) memory, and 8 MBytes of slower PSRAM that is accessed over a SPI bus. It does have hardware acceleration for integer multiply accumulates.

To answer our question, we developed a simple test program to characterize the ESP32. Many ML operations are “dot products”, or multiplying two vectors together. So we generated a 1Mbyte matrix in PSRAM, and performed a dot product with it one “row” at a time. The other input vector in the dot product was in fast internal memory. The idea was to exercise both the CPU and memory access performance in a way similar to RADAE, but without the hassle of porting the entire algorithm across.

Data typeSIMD?Raw time (us)MMACS
int8No3488730
int8Yes8490123
int16No4760822
int16Yes1656363
int16Yes (using ESP-DSP matrix multiply)1647963
int32No4268924
Results using matrix containing 1M elements (1024 x 1024) for various datatypes. This does not fit entirely within the 32-64 KB of on-chip cache, so the ESP32-S3 needs to repeatedly access PSRAM to complete the operation. PSRAM was configured to execute at 120 MHz (currently experimental per Espressif).
Data typeSIMD?Raw time (us)MMACS
int8No48633
int8Yes84195
int16No63125
int16Yes98167
int16Yes (using ESP-DSP matrix multiply)88186
int32No41939
Results using matrix containing 16384 (128 x 128) elements for various datatypes. This smaller matrix fits entirely within the ESP32-S3’s cache, reducing the number of times that it has to go out to PSRAM.

Here is the source code for the program used to measure the ezDV performance.

As shown above, the performance of the matrix multiplication operation on the ESP32-S3 is highly dependent on the size of the matrices involved. For matrices that fit entirely within its internal RAM (either because it can fit within the internal RAM-backed PSRAM cache without many cache misses or because it was originally allocated entirely within internal RAM), performance is fairly reasonable for a micro-controller. In other applications, the ESP32-S3 is able to perform inference on smaller ML models with good performance.

Unfortunately, with larger matrices, the system becomes memory bandwidth limited extremely quickly. For instance, using int16 and ESP-DSP’s matrix multiplication function is slightly more performant than handwritten SIMD assembly when the dataset fits entirely in internal RAM, but are both limited to approximately the same MMACS when the system repeatedly has to go out to PSRAM. int8 using SIMD additionally performs 2x better than int16 because it has to access to PSRAM only half of the time.

These results suggest we will not be able to run the RADAE stack on ezDV. While unfortunate, is it useful to reach this conclusion early so we can consider alternatives for an adapter style implementation of RADAE.

We thought this characterization testing might be useful for others using the ESP32 for ML and other CPU-intense applications, so as part of our open source project philosophy, have written it up here to share.

This post was jointly written by Mooneer and David.

Mooneer’s FreeDV Update – July 2024

This month, the FreeDV application got a few updates:

  • The previous work on updating the Voice Keyer feature was finally completed and merged into the repository. This mainly consisted of updating the appearance of the voice keyer file’s name in the Voice Keyer button based on user feedback.
  • wxWidgets inside the Windows and macOS binary builds was updated to version 3.2.5.
  • Adjustment dials for the monitor volume (for both Voice Keyer and standard PTT) were added to their respective right-click menus.
  • Logic to automatically adjust the audio configuration upon detection of missing devices was removed by user request (mainly due to the feature never working properly).

ezDV also got the following updates:

  • The in-progress work on Ethernet support for ezDV was finally merged. This resulted in version 1.1.0 of the firmware being released as well as additional content added to the User’s Guide to document the required hardware modifications.
  • Minor code cleanup of the I2C bus handling due to deprecation of the “legacy” I2C driver by Espressif.
  • Updated the minimum ESP-IDF version to 5.3.
  • Reenabled asynchronous HTTP request handling (previously disabled due to an ESP-IDF bug that is now fixed).

More information can be found in the commit history below:

(Note that all commit logs above were generated with the following command line:)

git log --author="member@email" --after "Month 1, 2024" --before "Month 31, 2024" --all > commit.log