Filter by type:

Sort by year:

Glottal Inverse Filtering using Stabilised Weighted Linear Prediction

George P. Kafentzis, Yannis Stylianou, Paavo Alku
Conference Papers International Conference on Acoustics, Speech, and Signal Processing, Prague, Czech Republic, May 22-27, 2011

Abstract

This paper presents and evaluates an inverse filtering technique of the speech signal which is based on the Stabilized Weighted Linear Prediction (SWLP) of speech. SWLP emphasizes the speech samples that fit the underlying speech production model well, by imposing temporal weighting of the square of the residual signal. The performance of SWLP is compared to the conventional Linear Prediction based inverse filtering techniques, such as the Autocorrelation and Closed Phase Covariance method. All the inverse filtering approaches are evaluated on a database of speech signals generated by a physical model of the voice production system. Results show that the estimated glottal flows using SWLP are closer to the original glottal flow than those estimated by the Autocorrelation approach, while its performance is comparable to the Closed Phase Covariance approach.

An Extension of the Adaptive Quasi Harmonic Model

George P. Kafentzis, Yannis Pantazis, Olivier Rosec, Yannis Stylianou
Conference Papers International Conference on Acoustics, Speech, and Signal Processing, Kyoto, Japan, March 25-30, 2012

Abstract

In this paper, we present an extension of a recently developed AM-FM decomposition algorithm, which will be referred to as the extended adaptive Quasi-Harmonic Model (eaQHM). It was previously shown that the adaptive Quasi-Harmonic Model (aQHM) is an efficient AM-FM decomposition algorithm with applications in speech analysis. In this paper, we show that a simple extension of the aQHM algorithm to include not only frequency but also amplitude adaptation results in higher performance in terms of Signal-to-Reconstruction-Error Ratio (SRER). To support our hypothesis, eaQHM is tested both on synthetic signals and on a subset of the ARCTIC database of speech. Overall, compared with aQHM, eaQHM improves the SRER by more than 2 dB, on average.

On the Modeling of Voiceless Stop Sounds of Speech using Adaptive Quasi-Harmonic Models

George P. Kafentzis, Olivier Rosec, Yannis Stylianou
Conference Papers Interspeech, Portland, U.S.A, September 9-13, 2012

Abstract

In this paper, the performance of the recently proposed adaptive signal models on modeling speech voiceless stop sounds is presented. Stop sounds are transient parts of speech that are highly non-stationary in time. State-of-the-art sinusoidal models fail to model them accurately and efficiently, thus introducing an artifact known as the pre-echo effect. The adaptive QHM and the extended adaptive QHM (eaQHM) are tested to confront this effect and it is shown that highly accurate, pre-echo-free representations of stop sounds are possible using adaptive schemes. Results on a large database of voiceless stops show that, on average, eaQHM improves by 100% the Signal to Reconstruction Error Ratio (SRER) obtained by the standard sinusoidal model.

Time-scale Modications based on a Full-Band Adaptive Harmonic Model

George P. Kafentzis, Gilles Degottex, Olivier Rosec, Yannis Stylianou
Conference Papers International Conference on Acoustics, Speech, and Signal Processing, Vancouver, Canada, May 26-31, 2013.

Abstract

In this paper, a simple method for time-scale modifications of speech based on a recently suggested model for AM-FM decomposition of speech signals, is presented. This model is referred to as the adaptive Harmonic Model (aHM). A full-band speech analysis/synthesis system based on the aHM representation is built, without the necessity of separating a deterministic and/or a stochastic component from the speech signal. The aHM models speech as a sum of harmonically related sinusoids that can adapt to the local characteristics of the signal and provide accurate instantaneous amplitude, frequency, and phase trajectories. Because of the high quality representation and reconstruction of speech, aHM can provide high quality time-scale modifications. Informal listenings show that the synthetic time-scaled waveforms are natural and free of some common artifacts encountered in other state-of-the-art models, such as “metallic quality”, chorusing, or musical noise.

Adaptive Sinusoidal Modeling of Percussive Musical Instrument Sounds

Marcelo Caetano, George P. Kafentzis, Athanasios Mouchtaris, Yannis Stylianou
Conference Papers European Signal Processing Conference (EUSIPCO), Marrakech, Morocco, September 9-13, 2013

Abstract

Percussive musical instrument sounds figure among the most challenging to model using sinusoids particularly due to the characteristic attack that features a sharp onset and transients. Attack transients present a highly nonstationary inharmonic behaviour that is very difficult to model with traditional sinusoidal models which use slowly varying sinusoids, commonly introducing an artifact known as pre-echo. In this work we use an adaptive sinusoidal model dubbed eaQHM to model percussive sounds from musical instruments such as plucked strings or percussion and investigate how eaQHM handles the sharp onsets and the nonstationary inharmonic nature of the attack transients. We show that adaptation renders a virtually perceptually identical sinusoidal representation of percussive sounds from different musical instruments, improving the Signal to Reconstruction Error Ratio (SRER) obtained with a traditional sinusoidal model. The result of a listening test revealed that the percussive sounds modeled with eaQHM were considered perceptually closer to the original sounds than their traditional-sinusoidal-modeled counterparts. Most listeners reported that they used the attack as cue.

Evaluating How Well Filtered White Noise Models the Residual from Sinusoidal Modeling of Musical Instrument Sounds

Marcelo Caetano, George P. Kafentzis, Gilles Degottex, Athanasios Mouchtaris, Yannis Stylianou
Conference Papers IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, N.Y., USA, October 20-23, 2013

Abstract

Nowadays, sinusoidal modeling commonly includes a residual obtained by the subtraction of the sinusoidal model from the original sound. This residual signal is often further modeled as filtered white noise. In this work, we evaluate how well filtered white noise models the residual from sinusoidal modeling of musical instrument sounds for several sinusoidal algorithms. We compare how well each sinusoidal model captures the oscillatory behavior of the partials by looking into how “noisy” their residuals are. We performed a listening test to evaluate the perceptual similarity between the original residual and the modeled counterpart. Then we further investigate whether the result of the listening test can be explained by the fine structure of the residual magnitude spectrum. The results presented here have the potential to subsidize improvements on residual modeling.

Pitch modifications of speech based on an adaptive Harmonic Model

George P. Kafentzis, Gilles Degottex, Olivier Rosec, Yannis Stylianou
Conference Papers International Conference on Acoustics, Speech, and Signal Processing, Florence, Italy, May 4-9, 2014

Abstract

In this paper, a simple method for pitch-scale modifications of speech based on a recently suggested model for AM-FM decomposition of speech signals, is presented. This model is referred to as the adaptive Harmonic Model (aHM). The aHM models speech as a sum of harmonically related sinusoids that can adapt to the local characteristics of the signal. It was shown that this model provides high quality reconstruction of speech and thus, it can also provide high quality pitch-scale modifications. For the latter, the amplitude envelope is estimated using the Discrete All-Pole (DAP) method, and the phase envelope estimation is performed by utilizing the concept of relative phase. Formal listening tests on a database of several languages show that the synthetic pitch-scaled waveforms are natural and free of some common artefacts encountered in other state-of-the-art models, such as HNM and STRAIGHT.

Robust Adaptive Sinusoidal Analysis and Synthesis of Speech

George P. Kafentzis, Olivier Rosec, Yannis Stylianou
Conference Papers International Conference on Acoustics, Speech, and Signal Processing, Florence, Italy, May 4-9, 2014

Abstract

Recent advances in speech analysis have shown that voiced speech can be very well represented using quasi-harmonic frequency tracks and local parameter adaptivity to the underlying signal. In this paper, we revisit the quasi-harmonicity approach through the extended adaptive Quasi-Harmonic Model - eaQHM, and we show that the application of a continuous f0 estimation method plus an adaptivity scheme can yield high resolution quasi-harmonic analysis and perceptually indistinguishable resynthesized speech. This method assumes an initial harmonic model which successively converges to quasi-harmonicity. Formal listening tests showed that eaQHM is robust against f0 estimation artefacts and can provide a higher quality in resynthesizing speech, compared to a recently developed model, called the adaptive Harmonic Model (aHM), and the classic Sinusoidal Model (SM).

Analysis of Emotional Speech using an Adaptive Sinusoidal Model

George P. Kafentzis, Theodora Yakoumaki, Athanasios Mouchtaris, Yannis Stylianou
Conference Papers European Signal Processing Conference (EUSIPCO), Lisbon, Portugal, September 1-5, 2014

Abstract

Processing of emotional (or expressive) speech has gained attention over recent years in the speech community due to its numerous applications. In this paper, an adaptive sinusoidal model (aSM), dubbed extended adaptive Quasi-Harmonic Model - eaQHM, is employed to analyze emotional speech in accurate, robust, continuous, timevarying parameters (amplitude, frequency, and phase). It is shown that these parameters can adequately and accurately represent emotional speech content. Using a well known database of narrowband expressive speech (SUSAS) we show that very high Signal-to-Reconstruction-Error Ratio (SRER) values can be obtained, compared to the standard sinusoidal model (SM). Formal listening tests on a smaller wideband speech database show that the eaQHM outperforms SM from a perceptual resynthesis quality point of view. Finally, preliminary emotion classification tests show that the parameters obtained from the adaptive model lead to a higher classification score, compared to the standard SM parameters.

Emotional Speech Classification using Adaptive Sinusoidal Modelling

George P. Kafentzis, Theodora Yakoumaki, Yannis Stylianou
Conference Papers Interspeech, Singapore, 2014

Abstract

Automatic classification of emotional speech is a challenging task with applications in synthesis and recognition. In this paper, an adaptive sinusoidal model (aSM), called the extended adaptive Quasi-Harmonic Model — eaQHM, is applied on emotional speech analysis for classification purposes. The parameters of the model (amplitude and frequency) are used as features for the classification. Using a well known database of narrowband expressive speech (SUSAS), we develop two separate Vector Quantizers (VQ) for the classification, one for the amplitude and one for the frequency features. It is shown that the eaQHM can outperform the standard Sinusoidal Model in classification scores. However, single feature classification is inappropriate for higher-rate classification. Thus, we suggest a combined amplitude-frequency classification scheme, where the classification scores of each VQ are weighted and ranked, and the decision is made based on the minimum value of this ranking. Experiments show that the proposed scheme achieves higher performance when the features are obtained from eaQHM. Future work can be directed to different classifiers, such as HMMs or GMMs, and ultimately to emotional speech transformations and synthesis.

Adaptive Modeling of Nonstationary Sinusoids

Marcelo Caetano, George P. Kafentzis, Athanassios Mouchtaris
Conference Papers The 18th International Conference on Digital Audio Effects, Trondheim, Norway, 2015

Abstract

Nonstationary oscillations are ubiquitous in music and speech, ranging from the fast transients in the attack of musical instruments and consonants to amplitude and frequency modulations in expressive variations present in vibrato and prosodic contours. Modeling nonstationary oscillations with sinusoids remains one of the most challenging problems in signal processing because the fit also depends on the nature of the underlying sinusoidal model. For example, frequency modulated sinusoids are more appropriate to model vibrato than fast transitions. In this paper, we propose to model nonstationary oscillations with adaptive sinusoids from the extended adaptive quasi-harmonic model (eaQHM). We generated synthetic nonstationary sinusoids with different amplitude and frequency modulations and compared the modeling performance of adaptive sinusoids estimated with eaQHM, exponentially damped sinusoids estimated with ESPRIT, and log-linear-amplitude quadratic-phase sinusoids estimated with frequency reassignment. The adaptive sinusoids from eaQHM outperformed frequency reassignment for all nonstationary sinusoids tested and presented performance comparable to exponentially damped sinusoids.

High-Resolution Sinusoidal Modeling of Unvoiced Speech

George P. Kafentzis, Yannis Stylianou
Conference Papers International Conference on Acoustics, Speech, and Signal Processing, Shanghai, China, 2016.

Abstract

In this paper, a recently proposed high-resolution Sinusoidal Model, dubbed the extended adaptive Quasi-Harmonic Model (eaQHM), is applied on modeling unvoiced speech sounds. Unvoiced speech sounds are parts of speech that are highly non-stationary in the time-frequency plane. Standard sinusoidal models fail to model them accurately and efficiently, thus introducing artefacts, while the reconstructed signals do not attain the quality and naturalness of the originals. Motivated by recently proposed non-stationary transforms, such as the Fan-Chirp Transform (FChT), eaQHM is tested to confront these effects and it is shown that highly accurate, artefact-free representations of unvoiced sounds are possible using the non-stationary properties of the model. Experiments on databases of unvoiced sounds show that, on average, eaQHM improves the Signal to Reconstruction Error Ratio (SRER) obtained by the standard Sinusoidal Model (SM) by 93%. Moreover, modeling superiority is also supported via informal listening tests with two other models, namely the SM and the well-known STRAIGHT method.

Assessing voice features of Greek speakers with hearing loss

Anna Sfakianaki, George P. Kafentzis
Conference Papers 1st Conference on Interdisciplinary Approaches to Linguistic Theory, Rhethymnon, Crete, Greece, 2017.

Abstract

N/A

An Acoustic Study of Greek Voiceless Stops

Katerina Nicolaidis, Anna Sfakianaki, George Vlahavas, George P. Kafentzis
Conference Papers International Congress of Phonetic Sciences, Melbourne, Australia, 2019.

Abstract

The paper investigates acoustic properties of the Greek voiceless plosives /p, t, k/, including the palatal allophone [c], by examining absolute and relative VOT and closure duration, relative burst intensity and spectral moments. Variability due to place of articulation, vowel context, gender and age is examined. The speech material comprised C1VC2V real words (C1=/p, t, k/, V=/i, a/, C2=dental/alveolar). Data from 12 adult speakers and 12 children (6 male and 6 female in each group) were analysed. Results showed that relative closure duration decreased and relative VOT duration increased in the order /p/, /t/, /k/ showing the anticipated inverse relationship reported in the literature. VOT was longer in the high vowel context for /t, k/. All spectral moments were significantly affected by place of articulation. Relative burst intensity was greater for the velar. Effects of gender and age were variable. Results are discussed in relation to theory and crosslinguistic evidence.

Investigating voice function characteristics of Greek speakers with hearing loss using automatic glottal source feature extraction

Anna Sfakianaki, George P. Kafentzis
Conference Papers Interspeech 2021, Brno, Czech Republic.

Abstract

The current study investigates voice quality characteristics of Greek adults with normal hearing and hearing loss, automatically obtained from glottal inverse filtering analysis using the Aalto Aparat toolkit. Aalto Aparat has been employed in glottal flow analysis of disordered speech, but to the best of the authors' knowledge, not as yet in hearing impaired voice analysis and assessment. Five speakers, three women and two men, with normal hearing (NH) and five speakers with prelingual profound hearing impairment (HI), matched for age and sex, produced symmetrical /'pVpV/ disyllables, where V=/i, a, u/. A state-of-the-art method named quasi-closed phase analysis (QCP) is offered in Aparat and it is used to estimate the glottal source signal. Glottal source features were obtained using time- and frequency-domain parametrization methods and analysed statistically. The interpretation of the results attempts to shed light on potential differences between HI and NH phonation strategies, while advantages and limitations of inverse filtering methods in HI voice assessment are discussed.

Full-Band Quasi-Harmonic Analysis and Synthesis of Musical Instrument Sounds with Adaptive Sinusoids

Marcelo Caetano, George P. Kafentzis, Athanasios Mouchtaris, Yannis Stylianou
Journal Paper Applied Sciences, Special Issue on Audio Signal Processing, vol. 6, 127, 2016.

Abstract

Sinusoids are widely used to represent the oscillatory modes of musical instrument sounds in both analysis and synthesis. However, musical instrument sounds feature transients and instrumental noise that are poorly modeled with quasi-stationary sinusoids, requiring spectral decomposition and further dedicated modeling. In this work, we propose a full-band representation that fits sinusoids across the entire spectrum. We use the extended adaptive Quasi-Harmonic Model (eaQHM) to iteratively estimate amplitude- and frequency-modulated (AM–FM) sinusoids able to capture challenging features such as sharp attacks, transients, and instrumental noise. We use the signal-to-reconstruction-error ratio (SRER) as the objective measure for the analysis and synthesis of 89 musical instrument sounds from different instrumental families. We compare against quasi-stationary sinusoids and exponentially damped sinusoids. First, we show that the SRER increases with adaptation in eaQHM. Then, we show that full-band modeling with eaQHM captures partials at the higher frequency end of the spectrum that are neglected by spectral decomposition. Finally, we demonstrate that a frame size equal to three periods of the fundamental frequency results in the highest SRER with AM–FM sinusoids from eaQHM. A listening test confirmed that the musical instrument sounds resynthesized from full-band analysis with eaQHM are virtually perceptually indistinguishable from the original recordings.

A Fast Method for High-Resolution Voiced/Unvoiced Detection and Glottal Closure/Opening Instant Estimation of Speech

Andreas Koutrouvelis, George P. Kafentzis, Nikolay Gaubitch, Richard Heusdens
Journal Paper IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, iss. 2, pp. 316-328, 2016.

Abstract

We propose a fast speech analysis method which simultaneously performs high-resolution voiced/unvoiced detection (VUD) and accurate estimation of glottal closure and glottal opening instants (GCIs and GOIs, respectively). The proposed algorithm exploits the structure of the glottal flow derivative in order to estimate GCIs and GOIs only in voiced speech using simple time-domain criteria. We compare our method with well-known GCI/GOI methods, namely, the dynamic programming projected phase-slope algorithm (DYPSA), the yet another GCI/GOI algorithm (YAGA) and the speech event detection using the residual excitation and a mean-based signal (SEDREAMS). Furthermore, we examine the performance of the aforementioned methods when combined with state-of-the-art VUD algorithms, namely, the robust algorithm for pitch tracking (RAPT) and the summation of residual harmonics (SRH). Experiments conducted on the APLAWD and SAM databases show that the proposed algorithm outperforms the state-of-the-art combinations of VUD and GCI/GOI algorithms with respect to almost all evaluation criteria for clean speech. Experiments on speech contaminated with several noise types (white Gaussian, babble, and car-interior) are also presented and discussed. The proposed algorithm outperforms the state-of-the-art combinations in most evaluation criteria for signal-to-noise ratio greater than 10 dB.

Source code is available here

Cough acoustic analysis using artificial intelligence for COVID-19 detection: a comparative study of patient cohorts from Lima, Peru and Montreal, Canada

Alexandra J Zimmer, Vijay Ravi, Patricia Espinoza-Lopez, George P. Kafentzis, Mirco Ravanelli, Samira Abbasgholizadeh Rahimi, Madhukar Pai, César Ugarte-Gil, Simon Grandjean Lapierre
Journal Paper Annals of Epidemiology, vol. 118, pp. 110076, 2026.

Abstract

Digital cough screening for COVID-19 detection shows promise, but population differences in cough acoustics and screening accuracy require investigation. This study examined cough characteristics and COVID-19 screening performance in Lima, Peru and Montreal, Canada. Cough recordings and clinical data were prospectively collected from 605 adults. COVID-19 and other respiratory pathogens were diagnosed via NAAT. Acoustic features were extracted and compared. COVID-19 classification used eXtreme Gradient Boosting (XGBoost) and a deep learning neural network, assessed via internal and external validations for audio-only, clinical-only, and combined models. A sub-analysis explored XGBoost prediction scores by underlying disease status. Significant heterogeneity in cough acoustic features existed between Lima and Montreal cohorts. XGBoost audio-based models trained and tested in Lima showed superior performance (area under the curve [AUC]: 0.71 ± 0.08) compared to Montreal (AUC: 0.53 ± 0.04). Both models demonstrated poor external validation performance when tested on the alternate dataset. Neural network models showed similar trends. Additionally, individuals with other respiratory diseases had differing COVID-19 prediction scores between sites, suggesting epidemiological context influences model performance. Cough acoustics are population-specific, impacting cough-based classification algorithm utility across different epidemiological settings. COVID-19 cough screening models demonstrated limited transferability, highlighting challenges in developing globally applicable tools without representative training data.

Tuberculosis Screening from Cough Audio: Baseline Models, Clinical Variables, and Uncertainty Quantification

George P. Kafentzis, Efstratios Selisios
Journal Paper Sensors, vol. 26 (4), pp. 1223, 2026.

Abstract

In this paper, we propose a standardized framework for automatic tuberculosis (TB) detection from cough audio and routinely collected clinical data using machine learning. While TB screening from audio has attracted growing interest, progress is difficult to measure because existing studies vary substantially in datasets, cohort definitions, feature representations, model families, validation protocols, and reported metrics. Consequently, reported gains are often not directly comparable, and it remains unclear whether improvements stem from modeling advances or from differences in data and evaluation. We address this gap by establishing a strong, well-documented baseline for TB prediction using cough recordings and accompanying clinical metadata from a recently compiled dataset from several countries. Our pipeline is reproducible end-to-end, covering feature extraction, multimodal fusion, cougher-independent evaluation, and uncertainty quantification, and it reports a consistent suite of clinically relevant metrics to enable fair comparison. We further quantify performance for cough audio-only and fused (audio + clinical metadata) models, and release the full experimental protocol to facilitate benchmarking. This baseline is intended to serve as a common reference point and to reduce methodological variance that currently holds back progress in the field.

Adaptive Sinusoidal Models for Speech with Applications in Speech Modifications and Audio Analysis

George P. Kafentzis
Thesis University of Crete, Greece - University of Rennes 1, France

Abstract

Sinusoidal Modeling is one of the most widely used parametric methods for speech and audio signal processing. The accurate estimation of sinusoidal parameters (amplitudes, frequencies, and phases) is a critical task for close representation of the analyzed signal. In this thesis, based on recent advances in sinusoidal analysis, we propose high resolution adaptive sinusoidal models for analysis, synthesis, and modifications systems of speech. Our goal is to provide systems that represent speech in a highly accurate and compact way. Inspired by the recently introduced adaptive Quasi-Harmonic Model (aQHM) and adaptive Harmonic Model (aHM), we overview the theory of adaptive Sinusoidal Modeling and we propose a model named the extended adaptive Quasi-Harmonic Model (eaQHM), which is a non-parametric model able to adjust the instantaneous amplitudes and phases of its basis functions to the underlying time-varying characteristics of the speech signal, thus significantly alleviating the so-called local stationarity hypothesis. The eaQHM is shown to outperform aQHM in analysis and resynthesis of voiced speech. Based on the eaQHM, a hybrid analysis/synthesis system of speech is presented (eaQHNM), along with a hybrid version of the aHM (aHNM). Moreover, we present motivation for a full-band representation of speech using the eaQHM, that is, representing all parts of speech as high resolution AM-FM sinusoids. Experiments show that adaptation and quasi-harmonicity is sufficient to provide transparent quality in unvoiced speech resynthesis. The full-band eaQHM analysis and synthesis system is presented next, which outperforms state-of-the-art systems, hybrid or full-band, in speech reconstruction, providing transparent quality confirmed by objective and subjective evaluations. Regarding applications, the eaQHM and the aHM are applied on speech modifications (time and pitch scaling). The resulting modifications are of high quality, and follow very simple rules, compared to other state-of-the-art modification systems. Results show that harmonicity is preferred over quasi-harmonicity in speech modifications due to the embedded simplicity of representation. Moreover, the full-band eaQHM is applied on the problem of modeling audio signals, and specifically of musical instrument sounds. The eaQHM is evaluated and compared to state-of-the-art systems, and is shown to outperform them in terms of resynthesis quality, successfully representing the attack, transient, and stationary part of a musical instrument sound. Finally, another application is suggested, namely the analysis and classification of emotional speech. The eaQHM is applied on the analysis of emotional speech, providing its instantaneous parameters as features that can be used in recognition and Vector-Quantization-based classification of the emotional content of speech. Although the sinusoidal models are not commonly used in such tasks, results are promising.

Validation and accuracy of the Hyfe cough monitoring system: a multicenter clinical study

Carlos Chaccour, Isabel Sánchez-Olivieri, Sarah Siegel, Gina Megson, Kevin L. Winthrop, Juan Berto Botella, Juan P. de-Torres, Lola Jover, Joe Brew, George P. Kafentzis, Mindaugas Galvosas, Matthew Rudd & Peter Small
Journal Paper Scientific Reports, vol. 15, 880, 2025.

Abstract

Background: The ability to passively and continuously monitor coughing for prolonged periods of time would significantly improve cough management and research. To date there is no automated clinically validated cough monitor that can be routinely used in clinical care and research. Here we describe the validation of such an automated cough monitor. Methods: This multicenter observational study compared the results of the Hyfe CoughMonitor wrist-worn device with manually counted coughs in subjects with a variety of etiologies as they went about their usual daily activities. We collected 24 h of continuous sounds from subjects while they simultaneously wore a CoughMonitor and an audio recorder. Coughs were labelled by multiple trained annotators who listened to the continuous audio recordings using validated methodology. The time stamps of these human-detected coughs were compared to those of the CoughMonitor to determine the system’s overall performance using event-to-event and hourly rate correlation analyses. Results: Over the 546 h monitored, 4,454 cough events were recorded; The overall sensitivity was 90.4% (95% CI of 88.3–92.2%). The overall false positive rate was 1.03 false positives per hour (95% CI of 0.84 to 1.24). The overall correlation between manual and CoughMonitor measured hourly coughing was high (Pearson correlation coefficient of 0.99). Two case studies of long-term monitoring of patients with chronic cough are presented. Conclusion: The present analysis of cough events demonstrated that the Hyfe CoughMonitor accurately reflects them with a high sensitivity and a low false positive rate. Future studies should focus on its potential role in the management of patients with cough in clinical practice.

Speech emotion recognition via graph-based representations

Anastasia Pentari, George P. Kafentzis & Manolis Tsiknakis
Journal Paper Scientific Reports, vol. 14, 4484, 2024.

Abstract

Speech emotion recognition (SER) has gained an increased interest during the last decades as part of enriched affective computing. As a consequence, a variety of engineering approaches have been developed addressing the challenge of the SER problem, exploiting different features, learning algorithms, and datasets. In this paper, we propose the application of the graph theory for classifying emotionally-colored speech signals. Graph theory provides tools for extracting statistical as well as structural information from any time series. We propose to use the mentioned information as a novel feature set. Furthermore, we suggest setting a unique feature-based identity for each emotion belonging to each speaker. The emotion classification is performed by a Random Forest classifier in a Leave-One-Speaker-Out Cross Validation (LOSO-CV) scheme. The proposed method is compared with two state-of-the-art approaches involving well known hand-crafted features as well as deep learning architectures operating on mel-spectrograms. Experimental results on three datasets, EMODB (German, acted) and AESDD (Greek, acted), and DEMoS (Italian, in-the-wild), reveal that our proposed method outperforms the comparative methods in these datasets. Specifically, we observe an average UAR increase of almost 18%, 8%, and 13%, respectively.

Temporal, spectral and amplitude characteristics of the Greek fricative /s/ in hearing-impaired and normal-hearing speech

Anna Sfakianaki, Katerina Nicolaidis, & George P. Kafentzis
Journal Paper Clinical Linguistics and Phonetics, 38(8), 720-746, 2024.

Abstract

Fricatives, and especially sibilants, are very frequently misarticulated by speakers with hearing loss. Misarticulations can result in phonemic contrast weakening or loss, compromising intelligibility. The present study focuses on the examination of acoustic characteristics of the Greek alveolar fricative /s/, an articulatorily demanding sound, produced by young adult speakers with profound hearing impairment and with normal hearing. An array of variables was examined using mixed-effects and random forest models aiming to assess the effectiveness of various measures in differentiating hearing-impaired and normal-hearing /s/ production. Significant differences were found in spectral and amplitude measures, but not in temporal measures. In hearing-impaired speech, spectral slope and RMS amplitude had significantly lower values, indicating a more distributed spectrum, suggestive of decreased flow velocity through the fricative constriction. Also, a trend for concentration of energy at lower frequencies was observed suggesting more posterior fricative articulation than normal. Moreover, measures capturing the variation of frequency and amplitude over time revealed different patterns of sibilance development across time than normal, denoting the production of a less well-formed or less sibilant /s/ by speakers with hearing impairment. The investigation of contextual effects on /s/ in hearing-impaired speech showed increased spectral variance, negative skewness and lower kurtosis in the labial (rounded) context /u/ in relation to the nonlabial contexts /i/ and /a/, indicating a more diffuse, less compact spectrum with concentration at high frequencies. Findings are discussed in relation to previous literature on fricative production by speakers with hearing impairment and normal hearing in Greek and other languages.

Methods for automatic cough detection and uses thereof

Iulian-Alexandru Circo, Joseph Russell Brew, Paul Simon Rieger, Peter McMichael Small, George P. Kafentzis
PatentsUS12004851B1

Abstract

WHAT IS CLAIMED IS: 1. A method of automatically detecting cough events comprising the following steps: a. continuously recording ambient sound, b. upon detecting a change in ambient sound exceeding a predefined first threshold, identifying an onset of a possible cough event, c. recording an audio snippet including the onset of a possible cough event and continuing for a predefined audio snippet duration exceeding 100 msec thereafter, d. classifying the audio snippet recorded in step (c) as cough or non-cough based on analyzing acoustic energy distribution of the audio snippet in a frequency-time domain at a frequency above 100 Hz, e. discarding all identified non-cough events, f. repeating steps (b) through (e) if further possible cough events are identified after the audio snippet recorded in step (c), and g. compiling a record of all cough events detected during the duration of continuously recording ambient sound in step (a). 2. The method of automatically detecting cough events, as in claim 1, wherein in step (g) the record of all cough events comprises none or at least one time stamp associated with the cough event classified in step (d). 3. The method of automatically detecting cough events, as in claim 1, wherein in step (d) classifying the audio snippet recorded in step (c) as cough or non-cough further comprises a step of comparing the acoustic energy distribution of the audio snippet against a predefined second threshold. 4. The method of automatically detecting cough events, as in claim 1, wherein in step (d) classifying the audio snippet recorded in step (c) as cough or non-cough comprises a step of using a statistical classifier. 5. The method of automatically detecting cough events, as in claim 4, wherein the statistical classifier is a neural network classifier. 6. The method of automatically detecting cough events, as in claim 5, wherein the neural network classifier is a convolutional neural network classifier. 7. The method of automatically detecting cough events, as in claim 6, wherein the convolutional neural network is pre-trained on a plurality of prior recordings of known human cough and non-cough events. 8. The method of automatically detecting cough events, as in claim 7, wherein the plurality of prior recordings of known human cough and non-cough events are processed to define a probability measure of the event to be a cough event or a non-cough event. 9. The method of automatically detecting cough events, as in claim 8, wherein the plurality of prior recordings of known human cough and non-cough events are processed by analyzing acoustic energy distributions of each recording in the frequency-time domain. 10.The method of automatically detecting cough events, as in claim 1, wherein in step (c) the predefined audio snippet duration does not exceed 1 sec after the onset of the possible cough event. 11.The method of automatically detecting cough events, as in claim 1, wherein in step (c) recording the audio snippet starts prior to the onset of the possible cough event. 12.The method of automatically detecting cough events, as in claim 11, wherein in step (c) recording the audio snippet starts at least 20 msec prior to the onset of the possible cough event. 13.The method of automatically detecting cough events, as in claim 1, wherein in step (b) detecting the change in acoustic energy comprises a step of detecting individual changes in acoustic energy in a plurality of bins, wherein each bin corresponds to a predefined range of acoustic frequencies. 14. The method of automatically detecting cough events, as in claim 13, wherein the predefined second threshold corresponds to a number of bins from the plurality of bins in which a detected change in acoustic energy exceeds a predefined acoustic energy limit. 15.The method of automatically detecting cough events, as in claim 1, wherein step (a) further comprises a step of removing silence sounds identified using a predefined silence threshold. 16.The method of automatically detecting cough events, as in claim 1, wherein in step (d), the step of analyzing the audio snippet comprises a step of dividing the audio snippet into a plurality of time frames. The method of automatically detecting cough events, as in claim 16, wherein in step (d), all of the time frames of the plurality of time frames overlap with adjacent time frames.

Continuous and Discrete Time Signal Processing (in Greek)

George P. Kafentzis
Book Gutenberg | July, 2019 | ISBN-10: 978-960-01-2042-4
image

This book presents the fundamental principles of continuous and discrete time signal processing and system analysis in a simple and comprehensible way. Emphasis is given on intuitive analysis and iterpretation of its subjects, accompanied with mathematical rigor where necessary. Fourier, Laplace, and Z Transforms play a major role, along with LTI system analysis and their applications. Each chapter of the book emerges from the unsolved problems and requirements of previous chapters, thus providing a flow that helps the reader to grow the thinking skills of an engineer. The book contains a variety of images and figures while it includes many, carefully selected solved examples. Moreover, the reader can find a number of exercises at the end of each chapter. Finally, the theory is supported by selected implementations in MATLAB, while the source code of each example is provided in a dedicated webpage, along with supplementary files.