Anna Sfakianaki

GrHarvard Corpus

The GrHarvard Corpus is a Harvard/IEEE-type sentence corpus for Modern Greek. The sentences have been designed according to the following objective criteria: (1) Each sentence includes five keywords, (2) The total number of words in each sentence varies from five to nine, (3) All words contain maximally three syllables, (4) Sentences are either statements or commands. Keywords are mostly content words, although function words can sometimes serve as keywords, depending on sentence structure and meaning. Although the material was inspired in part by the original American English Harvard/IEEE sentences, most sentences have been modified or have little relevance to the original material.

Click on the link below to download an EXCEL file that contains the Greek orthography and transcription in SAMPA in addition to meta-data including number of words, syllables and phonemes per sentence as well as keywords and number of syllables in keywords.

GrHarvard corpus

To cite this work:

Sfakianaki, A. (2021). Designing a Modern Greek sentence corpus for audiological and speech technology research. In T. Markopoulos, C. Vlachos, A. Archakis, D. Papazachariou, G. J. Xydopoulos and A. Roussou (?ds). Proceedings of the 14th International Conference on Greek Linguistics, pp. 1119-1129. University of Patras (ISBN: 978-618-5496-03-6).

Available at: https://pasithee.library.upatras.gr/icgl/article/view/3745/3787

ENRICH