Resume

Recent Update: 2024-3-2

Education

Carnegie Mellon University (CMU)

M.S. in Computer Vision

Beijing Normal University (BNU)

B.S. in Computer Science and Technology | Overall GPA: 90.55/100 | Rank:3/55 | 2020.09 – 2024.06

Minor in International Economics and Trade | Overall GPA: 86.93/100 | 2020.09 – 2024.06


Publications

FreeDance: Towards Harmonic Free-Number Group Dance Generation via a Unified Framework.

Yiwen Zhao, Yang Wang, Liting Wen, Hengyuan Zhang, Xingqun Qi. (in submission)

ESPnet-SpeechLM: An Open Speech Language Model Toolkit.

Jinchuan Tian, Jiatong Shi, William Chen, Siddhant Arora, Yoshiki Masuyama, Takashi Mackaku, Yihan Wu, Junyi Peng, Shikhar Bharadwaj, Yiwen Zhao, Samuele Cornell, Yifan Peng, Xiang Yue, Chao-Han Huck Yang, Graham Neubig, Shinji Watanabe. (NAACL 2025 | arxiv | code)

VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music.

Jiatong Shi, Hyejin Shim, Jinchuan Tian, Siddhant Arora, Haibin Wu, Darius Petermann, Jia Qi Yip, You Zhang, Yuxun Tang, Wangyou Zhang, Dareen Alharthi, Yichen Huang, Koichi Saito, Jionghao Han, Yiwen Zhao, Chris Donahue, Shinji Watanabe. (NAACL 2025 | arxiv | code)

Fashion Chatroom: An Automated Pipeline for Fashion Dataset Construction.

YiwenZhao, Huizhu Jia, Shanghang Zhang. (AAAI 2024 workshop | paper)

Exploring Locomotion Methods with Upright Redirected Views for VR Users in Reclining & Lying Positions.

Tianren Luo, Chenyang Cai, Yiwen Zhao, Yachun Fan, Zhigeng Pan, Teng Han, Feng Tian. (UIST 2023 | paper)


Research Experience

Carnegie Mellon University | WAVLab

Pittsburgh, PA, U.S. | Research Intern | 2024.06 – Present

- Contribute to espnet repo, by creating new text-to-speech recipe of discrete tts on aishell3 corpus. Add new features to the speech toolkit.

- Perform singing voice synthesis under speechlm paradigm.

Beijing Normal University & Watrix.AI | IVC Lab

Beijing, China | Research Intern | 2023.03 – 2024.06

- Implementing 3D human body reconstruction to assist in clothes-changing gait recognition. Disentangle dynamic and static gait features, enhancing zero-shot recognition precision in challenging settings. Complete graduate thesis.

AWS & Peking University

Shanghai (Remote), China | Research Intern | 2023.07 – 2023.11

- Exploring the extended applications of the diffusion t2i Model. Addressing the issue of the lack of high-quality diversified full-body fashion data, integrate LLM, VLM, and diffusion models to generate multimodal fashion data, reducing the complexity of manual operations during the data collection process.

- Explore the inpainting task guided by fabric material images and style text descriptions as joint conditions in the diffusion model, achieving satisfactory visual results.

Institute of Software, Chinese Academy of Science | HCI Lab

Beijing, China | Research Intern | 2022.06 – 2022.09

- Participated in the research and development of locomotion methods, focusing on upright redirected views for HMD users. Designed, engineered and optimized VR interactive postures in leaning and reclining positions using Unity+C#. Executed user experiments.