About Me
Xize Cheng (成曦泽) is a Second-Year Master’s student (expected to graduate at 2024.03) in the College of Computer Science and Software at Zhejiang University, supervised by Prof. Zhou Zhao.
I am actively looking for academic collaboration, feel free to drop me an email.
🔥 News
- 2023.10: 🎉🎉 I am awarded National Scholarship (2023, Grauate student). Top 0.1% in Zhejiang University.
- 2023.09: 🎉🎉 1 paper is accepted by EMNLP 2023!
- 2023.09: 🎉🎉 1 paper is accepted by NIPS 2023!
- 2023.07: 🎉🎉 1 Paper are accepted by ACMMM 2023!
- 2023.05: 🎉🎉 3 Paper are accepted by ICCV 2023!
- 2023.06: AV-TranSpeech comes out! Media coverage: PaperWeekly and ByteDance.
- 2023.05: OpenSR will be presented in oral presentation at ACL2023!
- 2023.05: 🎉🎉 7 Paper are accepted by ACL 2023!
- 2023.03: We create the first Audio-Visual Multi-lingual Speech Translation dataset AVMuST-TED ! Soon to be open source!
- 2022.12: OpenSR is well regarded by the reviewers at October 2022 ACL-ARR.
- 2022.10: I award the Outstanding Graduate Student and Triple Excellence Graduate Student of Zhejiang University!
- 2021.03: I start my internship at Taobao as an algorithm intern, conducting multi-modality research.
📝 Publications
- OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment Xize Cheng, Tao Jin, Linjun Li, Wang Lin, Xinyu Duan, Zhou Zhao ACL2023(Oral)
- MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition Xize Cheng, Tao Jin, Rongjie Huang, Linjun Li, Wang Lin, Zehan Wang, Huadai Liu, Ye Wang, Aoxiong Yin, Zhou Zhao ICCV2023
- AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation Rongjie Haung*, Xize Cheng*, Huadai Liu*, Yi Ren, Linjun Li, Zhenhui Ye, Jinzheng He, Lichao Zhang, Jinglin Liu, Xiang Yin, Zhou Zhao ACL2023
Full Publication List
Audio-Visual Speech
-
TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation. Xize Cheng, Rongjie Huang, Linjun Li, Tao Jin, Zehan Wang, Aoxiong Yin, Minglei Li, Xinyu Duan, changpeng yang, Zhou Zhao. submitted to ICLR2024
- Rethinking Missing Modality Learning from a Decoding Perspective. Tao Jin, Xize Cheng, Linjun Li, Wang Lin, Ye Wang, Zhou Zhao. ACMMM2023
-
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition. Xize Cheng, Tao Jin, Rongjie Huang, Linjun Li, Wang Lin, Zehan Wang, Huadai Liu, Ye Wang, Aoxiong Yin, Zhou Zhao. ICCV2023
-
OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment. Xize Cheng, Tao Jin, Linjun Li, Wang Lin, Xinyu Duan, Zhou Zhao. ACL2023(Oral)
-
AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation. Rongjie Haung*, Xize Cheng*, Huadai Liu*, Yi Ren, Linjun Li, Zhenhui Ye, Jinzheng He, Lichao Zhang, Jinglin Liu, Xiang Yin, Zhou Zhao. ACL2023
- Contrastive Token-Wise Meta-Learning for Unseen Performer Visual Temporal-Aligned Translation. Linjun Li*, Tao Jin*, Xize Cheng*, Ye Wang, Wang Lin, Rongjie Huang and Zhou Zhao. ACL2023finding
Multi-modality Interpretation
-
Connecting Multi-modal Contrastive Representations. Zehan Wang, Yang Zhao, Xize Cheng, Haifeng Huang, Jiageng Liu, Li Tang, Linjun Li, Yongqi Wang, Aoxiong Yin, Ziang Zhang, Zhou Zhao. NIPS2023
-
3drp-net: 3d relative position-aware network for 3d visual grounding. Zehan Wang, Haifeng Huang, Yang Zhao, Linjun Li, Xize Cheng, Yichen Zhu, Aoxiong Yin, Zhou Zhao. EMNLP2023
-
Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding. Zehan Wang, Haifeng Huang, Yang Zhao, Xize Cheng, Linjun Li, Yichen Zhu and Zhou Zhao. ICCV2023
-
Exploring Group Video Captioning with Efficient Relational Approximation. Wang Lin, Tao Jin, Ye Wang, Wenwen Pan, Linjun Li, Xize Cheng, Zhou Zhao ICCV2023
-
Weakly-Supervised Spoken Video Grounding via Semantic Interaction Learning. Ye Wang, Wang Lin, Shengyu Zhang, Tao Jin, Linjun Li, Xize Cheng and Zhou Zhao. ACL2023(Oral)
-
TAVT: Towards Transferable Audio-Visual Text Generation. Wang Lin, Tao Jin, Wenwen Pan, Linjun Li, Xize Cheng, Ye Wang and Zhou Zhao. ACL2023
-
Semantic-conditioned Dual Adaptation for Cross-domain Query-based Visual Segmentation. Ye Wang, Tao Jin, Wang Lin, Xize Cheng, Linjun Li and Zhou Zhao. ACL2023finding
📖 Educations
-
2021.09 - 2024.03, Master, Zhejiang University, Hangzhou.
-
2017.09 - 2021.06, Undergraduate, Shandong Univeristy, Jinan.
🎖 Honors and Awards
- National Scholarship (2023, Grauate student). Top 0.1% in Zhejiang University.
- Excellent Graduate, Shandong Province (2021), Top 1%.
- Outstanding Student Cadres (2017-2021 in Shandong University and 2021-2023 in Zhejiang University), Top 1%.
- Academic Scholarship (2017-2021 in Shandong University and 2021-2023 in Zhejiang University), Top3%.
- Outstanding Graduate Student & Triple Excellence Graduate Student(2022) in Zhejiang University.
- First Prize (Meritorious Winner) in American Mathematical Modeling Competition (2019), Top 7% worldwide.
- First Prize of National Mathematical Modeling Competition in Shandong Province (2018).
💬 Professional Services
- Conference Reviewer: ARR 2023, ICCV 2023, ACL 2023
- Assist to Review: KDD 2022, TNNLS 2022, TMM 2022, TMM 2023
💻 Internships & Projects
-
2023.06 - Present: Research Intern: Huawei Cloud at Shenzhen, China.
Research on Multi-modality-driven Talking Head Generation. -
2023.02 - 2023.04: Project Staff: Huawei Noah’s Ark Lab.
Research on Simultaneous Speech Translation. -
2021.02 - 2021.08: Algorithm Engineer Intern: Taobao(China) Software
Research on Multi-modality Interaction.