Target Speaker Extraction, End-to-end, Complex spectral mapping, Time-frequency domain, Adaptive speaker embedding fusion
Speaker diarization, End-to-end, Adaptive attractor estimation, Iterative refinement, Unified training
Monaural speech enhancement, time-frequency domain optimization, magnitude-phase estimation, trade-off coefficients, supervised deep learning
Multi-channel speech enhancement, Taylor's series expansion, neural networks, multi-source information fusion
Bone conduction transducer, lumped-parameter model, electrical input impedance, mastoid impedance
One-shot voice conversion, U$^{2}$-Net structure, Time-frequency multi-scale features, Hierarchical vector quantization
Lip-to-speech, End-to-end training, Differentiable digital signal process, Speech reconstruction