2020 Speech Processing Courses in Crete
Conversational Speech Synthesis: from design to evaluation

27-31 July 2020    University of Crete, Heraklion, Crete, Greece

LECTURERS (confirmed so far)


Paris Smaragdis is faculty at the Computer Science and the Electrical and Computer Engineering departments of the University of Illinois at Urbana-Champaign. He completed his masters, PhD, and postdoctoral studies at MIT, performing research on computational audition. In 2006 he was selected by MIT's Technology Review as one of the year's top young technology innovators for his work on machine listening, in 2015 he was elevated to an IEEE Fellow for contributions in audio source separation and audio processing, and during 2016-2017 he is an IEEE Signal Processing Society Distinguished Lecturer. He has authored more than 150 papers on various aspects of audio signal processing, holds more than 50 patents worldwide, and his research has been productized by multiple companies. He has previously been the chair of the LVA/ICA community, and the chair of the IEEE Machine Learning for Signal Processing Technical Committee. He is currently the chair of the IEEE Audio and Acoustics Signal Processing Technical Committee, a senior area editor of IEEE Open Journal of Signal Processing, and a member of IEEE Signal Processing Society's Board of Governors.

Vassilis Tsiaras received his degree in Mathematics from the University of Thessaloniki in 1990, his M.Sc in Mathematics from the QMW College, University of London in 1992 and his Ph.D. in Computer Science from the University of Crete in 2009. His research areas of interest include graph algorithms, biomedical signal processing, statistical speech synthesis, and machine learning.

Junichi Yamagishi is a Professor at NII, Japan. His research topics include speech processing, machine learning, signal processing, biometrics, digital media cloning and media forensics. He served previously as co-organizer for the bi-annual ASVspoof special sessions at INTERSPEECH 2013-9, the bi-annual Voice conversion challenge at INTERSPEECH 2016 and Odyssey 2018, an organizing committee member for the 10th ISCA Speech Synthesis Workshop 2019 and a technical program committee member for IEEE ASRU 2019. He also served as a member of the IEEE Speech and Language Technical Committee, as an Associate Editor of the IEEE/ACM TASLP and a Lead Guest Editor for the IEEE JSTSP SI on Spoofing and Countermeasures for Automatic Speaker Verification. He is currently a PI of JST-CREST and ANR supported VoicePersona project. He also serves as a chairperson of ISCA SynSIG and as a Senior Area Editor of the IEEE/ACM TASLP.

Jan Skoglund received his Ph.D. degree from Chalmers University of Technology, Sweden. From 1999 to 2000, he worked on low bit rate speech coding at AT&T Labs-Research, Florham Park, NJ. He was with Global IP Solutions (GIPS), San Francisco, CA, from 2000 to 2011 working on speech and audio processing tailored for packet-switched networks. GIPS' audio and video technology was found in many deployments by, e.g., IBM, Google, Yahoo, WebEx, Skype, and Samsung. Since a 2011 acquisition of GIPS he has been a part of Chrome at Google, Inc. He leads a team in San Francisco, CA, developing speech and audio signal processing components for capture, real-time communication, storage, and rendering.

Xin Wang is a project reseacher at National Institute of Informatics, Japan. He received the Ph.D. degree from SOKENDAI, Japan, in 2018 for his work on neural F0 modeling for text-to-speech synthesis. Before that, he received M.S. and B.E degrees from University of Science and Technology of China and University of Electronic Science and Technology of China in 2015 and 2012, respectively. He is one of the organizers of ASVspoof challenge 2019.

Sivanand Achanta is a research engineer at Apple Siri Text-to-Speech team where he is responsible for core research in neural text-to-speech synthesis. He did his PhD in Text-to-Speech Synthesis from International Institute of Information Technology Hyderabad (IIIT-H), India. His thesis work focussed on using recurrent neural networks for statistical speech synthesis and extending them for multi-lingual speech synthesis using single recurrent model. His research interests include neural vocoders, sequence-to-sequence modeling and novel neural network architectures for speech synthesis.