A Review of Voicing Decision in Whispered Speech: From Rules to Machine Learning

83 Pages Posted: 9 Feb 2025

See all articles by João Pinto da Silva

João Pinto da Silva

INESC TEC - Institute for Systems and Computer Engineering, Technology and Science

Gonçalo Duarte Nunes

INESC TEC - Institute for Systems and Computer Engineering, Technology and Science

Aníbal Ferreira

affiliation not provided to SSRN

Abstract

Speech serves as a fundamental medium of human communication, encompassing diverse modes such as normal and whispered speech, each characterized by distinct acoustic properties. Normal speech relies on vocal fold vibration, producing a rich harmonic structure that enhances intelligibility and vocal projection, while whispered speech, devoid of such vibration, manifests a noisier signal with diminished clarity. Individuals with impaired phonation, resulting from conditions like vocal fold paralysis or laryngeal trauma, often resort to unintentional whispered speech, leading to significant challenges in communication. In response to these challenges, whispered-to-normal speech conversion systems have been developed to reconstruct the missing voicing components of whispered speech, thereby improving speech quality. Central to the effectiveness of these systems is the voicing decision process, which classifies speech segments into candidates and non-candidates for voicing, ensuring that harmonic structures are appropriately restored. This review aims to provide a comprehensive examination of the voicing decision process within whispered-to-normal speech conversion systems. By analyzing current methodologies and identifying research gaps, this review highlights the critical need for advancements in the voicing decision process to enhance communication for individuals with phonation disorders, ultimately improving their quality of life. Recent trends highlight a shift from rule-based methods to machine learning approaches, reflecting their increasing effectiveness and potential.

Keywords: Voicing decision, candidate for voicing, whispered speech

Suggested Citation

Pinto da Silva, João and Duarte Nunes, Gonçalo and Ferreira, Aníbal, A Review of Voicing Decision in Whispered Speech: From Rules to Machine Learning. Available at SSRN: https://ssrn.com/abstract=5123047 or http://dx.doi.org/10.2139/ssrn.5123047

João Pinto da Silva (Contact Author)

INESC TEC - Institute for Systems and Computer Engineering, Technology and Science ( email )

Campus da FEUP Rua Dr. Roberto Frias
Porto, 4200-465
Portugal

Gonçalo Duarte Nunes

INESC TEC - Institute for Systems and Computer Engineering, Technology and Science ( email )

Campus da FEUP Rua Dr. Roberto Frias
Porto, 4200-465
Portugal

Aníbal Ferreira

affiliation not provided to SSRN ( email )

No Address Available

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
57
Abstract Views
221
Rank
796,170
PlumX Metrics