An important goal of our project is improved speech quality over SSB and both low and high SNRs. We have anecdotal reports of good performance of RADE compared to SSB, but need an objective, controlled way of comparing performance. For speech systems this generally means ITU-T P.800 or P.808 standards based subjective testing. However this is complex and requires skills, experience and resources not available to our team.
A few months ago Simon, DJ2LS suggested the use of Automatic Speech Recognition (ASR). More recently, when discussing the issue of subjective testing, Jean Marc Valin also suggested ASR and provided suggestions for a practical test system. So I spent much of December building up a framework for ASR tests.
The general idea is to take a dataset of speech samples, pass them through simulations of SSB and RADE over HF radio channels, then use a ASR engine to detect the words in the received speech. A post processing system then compares the detected words to the original words and determines the Word Error rate (WER) as a performance metric. Our work uses the Librispeech dataset, and the Whisper ASR system.
These sentences are complex English sentences, spoken quickly with no contextual cues. I have trouble understanding many of them on the first listen. This is a much tougher test than the typical low SNR Amateur Radio contact where someone shouts their callsign 5 times then reports “5 by 9”. For example, here is one sample from the Librispeech dataset processed with SSB/RADE/original (listen to the original last); SSB and RADE were at about 6dB SNR on a MPP (fading) channel.
The plot below show some initial results over 500 sentences. The x-axis is receiver SNR measured in a 3kHz noise bandwidth. The y-axis is the word error rate WER). Green is RADE, and blue SSB. The solid lines are for a AWGN channel, the dashed lines the multi-path poor (MPP) fading channel. The dots (placed arbitrarily on the x-axis) in the lower right are controls, e.g. the FARGAN synthesizer used by RADE with no encoding, 4kHz band limited speech, and the original, clean speech.
A low word error rate (WER), say 5%, would correspond to an effortless “armchair copy”; a 30% WER could be the limits of practical voice communication (1 in 3 words can’t be understood). The distance between the RADE and SSB curves shows the benefits of RADE, at least using this test.
For example, if you draw a line across the 10% WER level, RADE achieves this (dashed MPP curves) at 3dB, SSB at 12dB. The x-axis doesn’t include the PAPR advantage of RADE, which is roughly an additional 5dB when using a transmitter with the same peak power output (depending on how hard the SSB is compressed).
Also this month I have been working on SNR measurement of received RADE signals. This is quite challenging, due to the lack of structure in the ML-generated RADE constellation. At present I’m attempting to use a classical DSP approach using the pilots symbols. This will be the last feature we will add to RADE V1, as we’d like to use the lessons learned to start designing RADE V2.
The third one, is perfect, that’s the one on the bottom.
As stated in the text, the third one is the original.
Being able to objectively quantify RADE speech quality will be a catalyst for improvement. A wise adage says: “When performance is measured, performance improves…when performance is measured and reported, improvement accelerates.”
Nice work, David!
Thanks,
Rick
saludos, soy CE4NBE, instale freedv v2.0.0 al ejecutarlo con Rade activado el programa se cierra es imposible ejecutar, lo otro no escucho nada
For help: https://freedv.org/#getting-help