Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection R Tao, Z Pan, RK Das, X Qian, MZ Shou, H Li Proceedings of the 29th ACM International Conference on Multimedia, 3927-3935, 2021 | 101 | 2021 |
Multi-modal Attention for Speech Emotion Recognition Z Pan, Z Luo, J Yang, H Li Proc. Interspeech 2020, 364--368, 2020 | 56 | 2020 |
Muse: Multi-modal target speaker extraction with visual cues Z Pan, R Tao, C Xu, H Li ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 24 | 2021 |
Multi-target DoA estimation with an audio-visual fusion mechanism X Qian, M Madhavi, Z Pan, J Wang, H Li ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 23 | 2021 |
Selective listening by synchronizing speech with lips Z Pan, R Tao, C Xu, H Li IEEE/ACM Transactions on Audio, Speech and Language Processing 30, 1650 - 1664, 2022 | 22 | 2022 |
USEV: Universal speaker extraction with visual cue Z Pan, M Ge, H Li IEEE/ACM Transactions on Audio, Speech and Language Processing 30, 3032 - 3045, 2022 | 17 | 2022 |
Speaker Extraction with Co-Speech Gestures Cue Z Pan, X Qian, H Li IEEE Signal Processing Letters 29, 1467 - 1471, 2022 | 11 | 2022 |
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction Z Pan, M Ge, H Li Proc. Interspeech 2022, 2022 | 7 | 2022 |
VCSE: Time-Domain Visual-Contextual Speaker Extraction Network J Li, M Ge, Z Pan, L Wang, J Dang Proc. Interspeech 2022, 906-910, 2022 | 6 | 2022 |
Target Active Speaker Detection with Audio-visual Cues Y Jiang, R Tao, Z Pan, H Li arXiv preprint arXiv:2305.12831, 2023 | 3 | 2023 |
Time-domain speech separation networks with graph encoding auxiliary T Wang, Z Pan, M Ge, Z Yang, H Li IEEE Signal Processing Letters 30, 110-114, 2023 | 2 | 2023 |
Rethinking the visual cues in audio-visual speaker extraction J Li, M Ge, Z Pan, R Cao, L Wang, J Dang, S Zhang arXiv preprint arXiv:2306.02625, 2023 | 1 | 2023 |
Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech J Li, R Tao, Z Pan, M Ge, S Wang, H Li arXiv preprint arXiv:2309.08408, 2023 | | 2023 |
NeuroHeed: Neuro-Steered Speaker Extraction using EEG Signals Z Pan, M Borsdorf, S Cai, T Schultz, H Li arXiv preprint arXiv:2307.14303, 2023 | | 2023 |
Towards End-to-end Speaker Diarization in the Wild Z Pan, G Wichern, FG Germain, A Subramanian, JL Roux arXiv preprint arXiv:2211.01299, 2022 | | 2022 |
ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding Inpainting Z Pan, W Wang, M Borsdorf, H Li ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2022 | | 2022 |
Speaker Extraction with Detection of Presence and Absence of Target Speakers K Zhang, M Borsdorf, Z Pan, H Li, Y Wei, Y Wang | | |