TIME-SCALE MODIFICATIONS BASED ON A FULL-BAND ADAPTIVE HARMONIC MODEL
George P. Kafentzis, Gilles Degottex, Olivier Rosec, and Yannis Stylianou
Abstract - In this paper, a simple method for time-scale modifications of speech
based on a recently suggested model for AM-FM decomposition of
speech signals, is presented. This model is referred to as the adaptive
Harmonic Model (aHM). A full-band speech analysis/synthesis
system based on the aHM representation is built, without the necessity
of separating a deterministic and/or a stochastic component
from the speech signal. The aHM models speech as a sum of harmonically
related sinusoids that can adapt to the local characteristics
of the signal and provide accurate instantaneous amplitude, frequency,
and phase trajectories. Because of the high quality representation
and reconstruction of speech, aHM can provide high quality
time-scale modifications. Informal listenings show that the synthetic
time-scaled waveforms are natural and free of some common artifacts
encountered in other state-of-the-art models, such as “metallic
quality”, chorusing, or musical noise.
Thank you for your time !
In this test, the goal is to evaluate the perceptual quality between recordings of speech and their time-scaled reconstruction by several algorithms and for several time-scale factors.
You will listen several artificially time-scaled speech waveforms. The time scale factor varies from 0.5 to 6. The criteria of quality include artifact-free waveforms, such as chorusing, buzziness, whispering, metallic/robotic voice, etc.
At first, you will play the original sound as many times as you want (labeled Original). Then, you will play the other sounds as many times as you want again, and select which one time-scaled speech signal has the highest quality. The original signal is given as a reference.
Recommendations
- If there is any technical problem with one sound, select Prob.
- Absolutely use headphones. Do not use earphones or speakers.
- Verify that the sound is loud enough to hear the details properly.
- Do the test in a quiet place.
- Take the time to listen !
- Please, do not stop the sound before it finishes!
- Please, do not play audio files simultaneously !
- Before answering the test, do not hesitate to ask me any question.
The following time scaling modifications have a factor of 0.5 to 6, respectively.
The test
Please, select the sound that is more naturally time-scaled according to the Original.