Synthesis of Emotional Speech by Prosody Modification of Vowel Segments of Neutral Speech

6 Pages Posted: 11 Apr 2019

See all articles by Md Shah Fahad

Md Shah Fahad

National Institute of Technology (NIT), Patna

Shreya Singh

National Institute of Technology (NIT), Patna

Shruti Gupta

National Institute of Technology (NIT), Patna

Akshay Deepak

National Institute of Technology (NIT), Patna - Department of Computer Science and Engineering

Abhinav

National Institute of Technology (NIT), Patna

Date Written: February 8, 2019

Abstract

Speech is viewed as a combination of voiced and unvoiced regions. Voiced speech is produced due to vibration of the vocal cords. The vibrating pattern of vocal cords is different in different emotions. During production of some consonant sound units, vocal cords do not vibrate. Therefore, consonants are less effective for emotion generation in speech signal. In this paper, we have considered only vowel regions for emotion synthesis using three prosody parameters duration, intensity and pitch patterns. Vowel like regions (VLR) is identified using vowel onset and offset points. Onset and offset points are starting and ending points of the vowel like regions. It is observed that during emotional synthesis from neutral speech mainly vowel regions of speech utterance are modified significantly. Our experimental result shows that the emotion synthesis using only prosody modification of VLR is significantly better than emotion synthesis of prosody modification at syllable level and it is also very effective in time consideration. The average mean opinion score is calculated using only vowel level prosody modification. The average mean opinion scores for angry, happy and fear emotional speeches are 3.85, 3.60 and 4.03, respectively. These mean opinion scores are better than syllable level prosody modification which are 3.56, 3.17 and 3.92 for angry, happy and fear emotions, respectively.

Keywords: Duration, Emotional Speech, Intensity, Pitch, PSOLA, Prosody Modification, Vowel Onset-Offset Points

Suggested Citation

Fahad, Md Shah and Singh, Shreya and Gupta, Shruti and Deepak, Akshay and Abhinav, Abhinav, Synthesis of Emotional Speech by Prosody Modification of Vowel Segments of Neutral Speech (February 8, 2019). Proceedings of 2nd International Conference on Advanced Computing and Software Engineering (ICACSE) 2019. Available at SSRN: https://ssrn.com/abstract=3349023 or http://dx.doi.org/10.2139/ssrn.3349023

Md Shah Fahad (Contact Author)

National Institute of Technology (NIT), Patna ( email )

Ashok Rajpath, Mahendru
Patna - 800005, Bihar
Patna, Bihar 800005
India

Shreya Singh

National Institute of Technology (NIT), Patna ( email )

Ashok Rajpath, Mahendru
Patna - 800005, Bihar
Patna, Bihar 800005
India

Shruti Gupta

National Institute of Technology (NIT), Patna ( email )

Ashok Rajpath, Mahendru
Patna - 800005, Bihar
Patna, Bihar 800005
India

Akshay Deepak

National Institute of Technology (NIT), Patna - Department of Computer Science and Engineering ( email )

Ashok Rajpath, Mahendru
Patna, Bihar 800005
India

Abhinav Abhinav

National Institute of Technology (NIT), Patna ( email )

Ashok Rajpath, Mahendru
Patna - 800005, Bihar
Patna, Bihar 800005
India

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
41
Abstract Views
258
PlumX Metrics