Machine Learning Mitigants for Speech Based Cyber Risk

73 Pages Posted: 31 Jul 2020 Last revised: 25 Aug 2021

See all articles by Marta Campi

Marta Campi

Institut Pasteur - Hearing Institute

Gareth Peters

University of California Santa Barbara; University of California, Santa Barbara

Nourddine Azzaoui

Mathematics Department, Université Blaise Pascal

Tomoko Matsui

The Institute of Statistical Mathematics

Date Written: August 23, 2021

Abstract

Statistical analysis of speech is an emerging area of machine learning. In this paper, we tackle the biometric challenge of Automatic Speaker Verification (ASV) of differentiating between samples generated by two distinct populations of utterances, those of an authentic human voice and those generated by a synthetic one. Solving such an issue through a statistical perspective foresees the definition of a decision rule function and a learning procedure to identify the optimal classifier. Classical state-of-the-art countermeasures rely on strong assumptions such as stationarity or local-stationarity of speech that may be atypical to encounter in practice. We explore in this regard a robust non-linear and non-stationary signal decomposition method known as the Empirical Mode Decomposition combined with the Mel-Frequency Cepstral Coefficients in a novel fashion with a refined classifier technique known as multi-kernel Support Vector machine. We undertake significant real data case studies covering multiple ASV systems using different datasets, including the ASVSpoof 2019 challenge database. The obtained results overwhelmingly demonstrate the significance of our feature extraction and classifier approach versus existing conventional methods in reducing the threat of cyber-attack perpetrated by synthetic voice replication seeking unauthorised access.

Keywords: Speech Bio-metric Cyber Security, Automatic Speaker Verification, Support Vector Machines, Non-Stationary Feature Extraction, Empirical Mode Decomposition, Cyber Risk Mitigation

Suggested Citation

Campi, Marta and Peters, Gareth and Azzaoui, Nourddine and Matsui, Tomoko, Machine Learning Mitigants for Speech Based Cyber Risk (August 23, 2021). Available at SSRN: https://ssrn.com/abstract=3643826 or http://dx.doi.org/10.2139/ssrn.3643826

Marta Campi

Institut Pasteur - Hearing Institute ( email )

France

Gareth Peters (Contact Author)

University of California Santa Barbara ( email )

Santa Barbara, CA 93106
United States

University of California, Santa Barbara ( email )

Nourddine Azzaoui

Mathematics Department, Université Blaise Pascal ( email )

24 Avenue des Landais
63117 Aubière Cedex
France

Tomoko Matsui

The Institute of Statistical Mathematics ( email )

10-3 Midori-cho
Tachikawa-shi
Tokyo, 1908562
Japan

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
124
Abstract Views
1,133
Rank
427,241
PlumX Metrics