Projects

Projects 1 to 5 should be submitted via the TURNIN software.

You can find information here.



0-th Project - Part 1: Time domain of speech

Image 03

During this lab, you will have a first contact with speech signals and their short-time processing.
You will see the time domain structure of the most basic speech elements, such as vowels, plosives, and consonants. You will learn about time domain properties of speech signals and you will calculate basic metrics on speech, such as energy and zero-crossings.
The final goal of this lab is to implement a simple adaptive, fully-automated voiced/unvoiced/silence (VUS) discriminator. An algorithm of this kind is of practical use in real-time systems such as mobile communications systems. A typical Voice Activity Detector (VAD), which is a subset of a VUS discriminator, is used in the Global System for Mobile Communications (GSM), the European system for cellular communications. Of course, such an application should be fast and efficient in real-time. Thus, the algorithm is based on energy measures and zero-crossings measures of the speech signal. For this Lab, we will consider a non-real time approach. This means that we have the whole signal available for our measures. Send your project to : kafentz AT csd dot uoc dot gr.

More Helpful files: .m file, .wav file

0-th Project - Part 2 : Frequency domain of speech

Image 04

During this lab, you will have a first contact on frequency domain analysis of speech signals. You will see the frequency domain structure of the most basic speech elements, such as vowels, plosives, and consonants, using the Fast Fourier Transform (FFT). You will learn about time-frequency representation of speech signals, with the help of Short-Time Fourier Analysis (spectrogram). The spectrogram can be produced using wideband or narrowband analysis. Wideband analysis includes the use of short analysis window, whereas narrowband analysis is performed using long analysis window. Finally, you will learn how to estimate basic components on speech, such as pitch.
The final goal of this lab is to implement a simple system for speaker gender (male, female) and age (adult or children) detection. Send your project to : kafentz AT csd dot uoc dot gr.

More Helpful files: .wav file, Other files to test

0-th Project - Part 3 : Phonetics Assignment

Image 04

This part includes three acoustic phonetic exercises that will train you to locate phonetic segments based on their waveform and spectrogram properties. The transcription of all sentences is provided for you in brackets. Note that you need to use the phonetic symbols and not the orthographic symbols for labeling. For exercise 1 you are asked to segment and label the waveforms of two English sentences. For exercise 2 you are asked to segment and label the spectrograms of two English sentences. For exercise 3, you are given the waveform and corresponding spectrogram of a mystery seven-digit telephone number in English. On the basis of the acoustic characteristics displayed on the waveform and spectrogram, you need to decide which numbers are shown and fill the information in on the top tier, and then segment and label each number on the bottom tier. Send your project to : asfakianaki AT csd dot uoc dot gr.

More

1st Project : Linear Prediction

Image 05

During this project you will explore the Linear Prediction theory and an implementation in MATLAB of a Linear Prediction based Analysis and Synthesis system for speech. In the MATLAB code there are some incomplete command lines that are waiting for you to fill them in. Once you do this, you can play with the code to do various speech modifications in an input speech signal. In this project you will use the code in the MATLAB file: lpc_as_toyou.m. You will play with a speech signal in a WAV format (speechsample.wav). Submit your project using the TURNIN software.

More Helpful files: .m file, .wav file, More files

2nd Project : Sinusoidal Modeling

Image 06

During this project you will explore the Sinusoidal Representation of speech signals and you will work with an implementation in MATLAB of the Sinusoidal Model (SM) suggested by McAulay and Quatieri. In the provided MATLAB code, there are some empty command lines that are waiting for you to fill in. Once you do this, you can play with the code to perform speech analysis and synthesis based on SM.
In this project you will use the code in the MATLAB file: SinM_test_hy578.m. You will play with a speech signal in a wav format named arctic_bdl1_snd_norm.wav. Submit your project using the TURNIN software.

More Helpful files: .m file, .wav file, .PDF file.

3rd Project : Vector Quantization & LPC coding

Image 07

During this project you will explore the quantization process. More specifically you will develop a uniform scalar quantizer and a vector quantizer. You will apply this into the Linear Prediction algorithm studied during the 1st project. Submit your project using the TURNIN software.

More Helpful files: Dataset

4th Project : Speaker Identification

Image 09

During this project you will develop an automatic speaker identification system. More specifically the identification system is split into two modules; the features extraction module and the classification or machine learning module which you will develop. You will use MFCCs as features and GMMs as classification module. Submit your project using the TURNIN software.

More Helpful files: Dataset, Tutorial

5th Project : Speech Enhancement

Image 08

This is by far the shortest announcement of project in this course. During this project you will develop a speech enhancement tool in MATLAB. You will use the techniques of Spectral Subtraction and Wiener Filtering. Submit your project using the TURNIN software.

More Helpful files: .m file, Dataset