ENRICH: EU Project 675324 Marie Curie (MSCA) Innovative Training Network
2016-2020


ENRICH logo

Greek Harvard Sentences

A Harvard-based corpus for speech technology and audiology



The current material consists of 720 sentences of variable syntactic structure, designed according to the following criteria. Each sentence comprises exactly 5 keywords which are almost always content words, and (optionally) 1 to 4 non-keywords which are pronouns and other function words; hence total sentence length ranges strictly from 5 to 9 words. All words contain one, two or -maximally- three syllables, and have been selected so that the sentences are meaningful and resemble everyday language. Keywords have been combined so that the sentences are semi-predictable. Although a number of the original Harvard sentences have been translated into Greek, the majority of the sentences in the present corpus are authentic.

This is still work in progress, so please check for updates and additional recordings.

You can find below:


To cite this work:

Sfakianaki, A. "Designing a Modern Greek sentence corpus for audiological and speech technology research". TO APPEAR: In Proc. of the 14th International Conference on Greek Linguistics (ICGL14), September 5-8, 2019, University of Patras, Greece. [Unpublished PDF]

Conference website: https://icgl14.events.upatras.gr/

Material: