HY578-Projects

0-th Project - Part 1: Time domain of speech

Deadline: 20/10/2023

During this lab, you will have a first contact with speech signals and their short-time processing.
You will see the time domain structure of the most basic speech elements, such as vowels, plosives, and consonants. You will learn about time domain properties of speech signals and you will calculate basic metrics on speech, such as energy and zero-crossings.
The final goal of this lab is to implement a simple adaptive, fully-automated voiced/unvoiced/silence (VUS) discriminator. An algorithm of this kind is of practical use in real-time systems such as mobile communications systems. A typical Voice Activity Detector (VAD), which is a subset of a VUS discriminator, is used in the Global System for Mobile Communications (GSM), the European system for cellular communications. Of course, such an application should be fast and efficient in real-time. Thus, the algorithm is based on energy measures and zero-crossings measures of the speech signal. For this Lab, we will consider a non-real time approach. This means that we have the whole signal available for our measures. [Click "More" on the right for the PDF].
Send your project to : kafentz AT csd dot uoc dot gr.

More Helpful files: .m file, .wav file

0-th Project - Part 2 : Frequency domain of speech

Deadline: 25/10/2023

During this lab, you will have a first contact with frequency domain analysis of speech signals. You will see the frequency domain structure of the most basic speech elements, such as vowels, plosives, and consonants, using the Fast Fourier Transform (FFT). You will learn about time-frequency representation of speech signals, with the help of Short-Time Fourier Analysis (spectrogram). The spectrogram can be produced using wideband or narrowband analysis. Wideband analysis includes the use of short analysis window, whereas narrowband analysis is performed using long analysis window. Finally, you will learn how to estimate basic components on speech, such as pitch. The goal of this lab is to implement a simple system for speaker gender (male, female) and age (adult or children) detection. [Click "More" on the right for the PDF]
Send your project to : kafentz AT csd dot uoc dot gr.

More Helpful files: .wav file, Other files to test

1st Project : Linear Prediction

Deadline: 10/11/2023

During this project you will explore the Linear Prediction theory and an implementation in MATLAB of a Linear Prediction based Analysis and Synthesis system for speech. In the MATLAB code there are some incomplete command lines that are waiting for you to fill them in. Once you do this, you can play with the code to do various speech modifications in an input speech signal. In this project you will use the code in the MATLAB file: lpc_as_toyou.m. You will play with a speech signal in a WAV format (speechsample.wav). Submit your project using the TURNIN software.

More Helpful files: .m file, .wav file, More files (Right Click -> Save as ...)

2nd Project : Sinusoidal Modeling

Deadline: 24/11/2023

During this project you will explore the Sinusoidal Representation of speech signals and you will work with an implementation in MATLAB of the Sinusoidal Model (SM) suggested by McAulay and Quatieri. In the provided MATLAB code, there are some empty command lines that are waiting for you to fill in. Once you do this, you can play with the code to perform speech analysis and synthesis based on SM.
In this project you will use the code in the MATLAB file: SinM_test_hy578.m. You will play with a speech signal in a wav format named arctic_bdl1_snd_norm.wav. Submit your project using the TURNIN software.

More Helpful files: .m file, .wav file, .PDF file.

3rd Project : Vector Quantization & LPC coding

Deadline: 22/12/23

During this project you will explore the quantization process. More specifically you will develop a uniform scalar quantizer and a vector quantizer. You will apply this into the Linear Prediction algorithm studied during the 1st project. Submit your project using the TURNIN software.

More Helpful files: Dataset (Right Click -> Save as ...)

4th Project : Speech Enhancement

Deadline: 22/12/2023

This is by far the shortest announcement of project in this course. During this project you will develop a speech enhancement tool in MATLAB. You will use the techniques of Spectral Subtraction and Wiener Filtering. Submit your project using the TURNIN software.

More Helpful files: .m file, Dataset (Right Click -> Save as ...)

5th Project : Speaker Identification

Deadline: 22/1/2024

During this project you will develop an automatic speaker identification system. More specifically the identification system is split into two modules; the features extraction module and the classification or machine learning module which you will develop. You will use MFCCs as features and GMMs as classification module. Submit your project using the TURNIN software.

More Helpful files: Dataset, Tutorial

Digital Speech Signal Processing

Projects

Project 0 should be submitted via e-mail (check info below).

Projects 1 to 5 should be submitted via the TURNIN software.

You can find information for TURNIN here.

0-th Project - Part 1: Time domain of speech