No Address Available
affiliation not provided to SSRN
One-shot voice conversion, U$^{2}$-Net structure, Time-frequency multi-scale features, Hierarchical vector quantization
Lip-to-speech, End-to-end training, Differentiable digital signal process, Speech reconstruction