affiliation not provided to SSRN
Lip-to-speech, End-to-end training, Differentiable digital signal process, Speech reconstruction