Semester
Fall
Date of Graduation
2005
Document Type
Thesis
Degree Type
MS
College
Statler College of Engineering and Mineral Resources
Department
Lane Department of Computer Science and Electrical Engineering
Committee Chair
Powsiri Klinkhachorn.
Abstract
For over a thousand years, inventors, scientists and researchers have tried to reproduce human speech. Today, the quality of synthesized speech is not equivalent to the quality of real speech. Most research on speech synthesis focuses on improving the quality of the speech produced by Text-to-Speech (TTS) systems. The best TTS systems use unit selection-based concatenation to synthesize speech. However, this method is very timely and the speech database is very large. Diphone concatenated synthesized speech requires less memory, but sounds robotic. This thesis explores the use of fuzzy logic to make diphone concatenated speech sound more natural. A TTS is built using both neural networks and fuzzy logic. Text is converted into phonemes using neural networks. Fuzzy logic is used to control the fundamental frequency for three types of sentences. In conclusion, the fuzzy system produces f0 contours that make the diphone concatenated speech sound more natural.
Recommended Citation
Williams, Jonathan Brent, "Prosody in text-to-speech synthesis using fuzzy logic" (2005). Graduate Theses, Dissertations, and Problem Reports. 1690.
https://researchrepository.wvu.edu/etd/1690