A Review of Voicing Decision in Whispered Speech: From Rules to Machine Learning
83 Pages Posted: 9 Feb 2025
Abstract
Speech serves as a fundamental medium of human communication, encompassing diverse modes such as normal and whispered speech, each characterized by distinct acoustic properties. Normal speech relies on vocal fold vibration, producing a rich harmonic structure that enhances intelligibility and vocal projection, while whispered speech, devoid of such vibration, manifests a noisier signal with diminished clarity. Individuals with impaired phonation, resulting from conditions like vocal fold paralysis or laryngeal trauma, often resort to unintentional whispered speech, leading to significant challenges in communication. In response to these challenges, whispered-to-normal speech conversion systems have been developed to reconstruct the missing voicing components of whispered speech, thereby improving speech quality. Central to the effectiveness of these systems is the voicing decision process, which classifies speech segments into candidates and non-candidates for voicing, ensuring that harmonic structures are appropriately restored. This review aims to provide a comprehensive examination of the voicing decision process within whispered-to-normal speech conversion systems. By analyzing current methodologies and identifying research gaps, this review highlights the critical need for advancements in the voicing decision process to enhance communication for individuals with phonation disorders, ultimately improving their quality of life. Recent trends highlight a shift from rule-based methods to machine learning approaches, reflecting their increasing effectiveness and potential.
Keywords: Voicing decision, candidate for voicing, whispered speech
Suggested Citation: Suggested Citation