Shaolei Zhang (张绍磊) is a fifth-year Ph.D. candidate (2020-2025) in Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences (中国科学院计算技术研究所), advised by Yang Feng (冯洋). He received his bachelor’s degree from Beijing University of Posts and Telecommunications in 2020, majoring in computer science and technology (北京邮电大学计算机科学与技术实验班).
His research interests include nature language processing, simultaneous model of text/speech and large language model. He has published over 20 papers at the top international AI/NLP conferences such as ACL, NeurIPS, ICLR, AAAI. He won the first place in the streaming transcription track of AutoSimTrans 2021.
I’m willing to communicate and share my research, and interested in opportunities in the industry, academia or postdoc. If you would like to connect with me, please feel free to reach out via Email
zhangshaolei20z@ict.ac.cn
or WeChatzhangshaolei0331
.
🔥 News
- 2024.05: 🎉 6 papers are accepted by ACL 2024!
- 2023.12: 🎉 1 paper is accepted by ICASSP 2024!
- 2023.10: 🎉 2 papers are accepted by EMNLP 2023!
- 2023.09: 👏 Serve as Area Chair of ACL/EACL/NAACL ARR 2023!
- 2023.09: 🎉 1 paper is accepted by NeurIPS 2023!
- 2023.06: 🎉 Our cross-lingual aligned LLM BayLing is released.
- 2023.05: 🎉 2 papers are accepted by ACL 2023.
- 2023.01: 🎉 1 paper is accepted by ICLR 2023 (Spotlight)!
- 2022.10: 🎉 3 papers are accepted by EMNLP 2022!
- 2022.02: 🎉 3 papers are accepted by ACL 2022!
📝 Publications
Demo
BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models
Shaolei Zhang, Qingkai Fang, Zhuocheng Zhang, Zhengrui Ma, Yan Zhou, Langlin Huang, Mengyu Bu, Shangtong Gui, Yunji Chen, Xilin Chen, Yang Feng
- BayLing (百聆) is a LLM equipped with advanced language alignments.
- BayLing is the first research to use language alignments to enhance LLM's multilingual capabilities.
- BayLing is selected for inclusion in the 2022-2023 Top 100 Opensource achievements: Open100 (2022-2023), launched by the International Open Benchmark Council (BenchCouncil).
Demo
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space
Shaolei Zhang, Tian Yu, Yang Feng
- TruthX is an inference-time method to activate the truthfulness of LLMs by editing their internal representations, thereby mitigating the hallucinations.
- TruthX can control LLMs to generate truthful or hallucinatory responses by editing only a vector in truthful space.
- On TruthfulQA benchmark, TruthX yields an average enhancement of 20% in truthfulness across 13 LLMs. #Ranked 2 behind GPT-4.
Demo
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
Shaolei Zhang, Qingkai Fang, Shoutao Guo, Zhengrui Ma, Min Zhang, Yang Feng
- StreamSpeech is an "All in One" seamless model for over 8 tasks of offline and simultaneous speech recognition, speech translation and speech synthesis.
- StreamSpeech can present intermediate results (i.e., ASR or translation results) during simultaneous translation, offering a more comprehensive low-latency communication experience .
- Get over 300 reposts and 100K views on Twitter!
Awesome Simultaneous Translation
Shaolei Zhang
- A repository that collects the tookits, common datasets and paper list related to the research on Simultaneous Translation, including text-to-text machine translation and speech-to-text translation.
2024
-
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space. ACL 2024 (CCF-A).
Shaolei Zhang, Tian Yu, Yang Feng
-
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning. ACL 2024 (CCF-A).
Shaolei Zhang, Qingkai Fang, Shoutao Guo, Zhengrui Ma, Min Zhang, Yang Feng
-
Truth-Aware Context Selection: Mitigating Hallucinations of Large Language Models Being Misled by Untruthful Contexts. ACL 2024 findings (CCF-A).
Tian Yu, Shaolei Zhang, Yang Feng
-
Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data? ACL 2024 (CCF-A).
Qingkai Fang, Shaolei Zhang, Zhengrui Ma, Min Zhang, Yang Feng
-
Decoder-only Streaming Transformer for Simultaneous Translation. ACL 2024 (CCF-A).
Shoutao Guo, Shaolei Zhang, Yang Feng
-
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation. ACL 2024 (CCF-A).
Zhengrui Ma, Qingkai Fang, Shaolei Zhang, Shoutao Guo, Min Zhang, Yang Feng
-
Agent-SiMT: Agent-assisted Simultaneous Machine Translation with Large Language Models. Preprint 2024.
Shoutao Guo, Shaolei Zhang, Zhengrui Ma, Min Zhang, Yang Feng
-
Glancing Future for Simultaneous Machine Translation. ICASSP 2024 Oral (CCF-B).
Shoutao Guo, Shaolei Zhang, Yang Feng
2023
-
BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models. Preprint 2023.
Shaolei Zhang, Qingkai Fang, Zhuocheng Zhang, Zhengrui Ma, Yan Zhou, Langlin Huang, Mengyu Bu, Shangtong Gui, Yunji Chen, Xilin Chen, Yang Feng
-
Unified Segment-to-Segment Framework for Simultaneous Sequence Generation. NeurIPS 2023 (CCF-A).
Shaolei Zhang, Yang Feng
-
Non-autoregressive Streaming Transformer for Simultaneous Translation. EMNLP 2023 Oral (CCF-B).
Zhengrui Ma, Shaolei Zhang, Shoutao Guo, Chenze Shao, Min Zhang, Yang Feng
-
Simultaneous Machine Translation with Tailored Reference. EMNLP 2023 findings (CCF-B).
Shoutao Guo, Shaolei Zhang, Yang Feng
-
End-to-End Simultaneous Speech Translation with Differentiable Segmentation. ACL 2023 findings (CCF-A).
Shaolei Zhang, Yang Feng
-
Learning Optimal Policy for Simultaneous Machine Translation via Binary Search. ACL 2023 (CCF-A).
Shoutao Guo, Shaolei Zhang, Yang Feng
-
Hidden Markov Transformer for Simultaneous Machine Translation. ICLR 2023 Spotlight.
Shaolei Zhang, Yang Feng
2022
-
Information-Transport-based Policy for Simultaneous Translation. EMNLP 2022 Oral (CCF-B).
Shaolei Zhang, Yang Feng
-
Wait-info Policy: Balancing Source and Target at Information Level for Simultaneous Machine Translation. EMNLP 2022 findings (CCF-B).
Shaolei Zhang, Shoutao Guo, Yang Feng
-
Turning Fixed to Adaptive: Integrating Post-Evaluation into Simultaneous Machine Translation. EMNLP 2022 findings (CCF-B).
Shoutao Guo, Shaolei Zhang, Yang Feng
-
Modeling Dual Read/Write Paths for Simultaneous Machine Translation. ACL 2022 (CCF-A).
Shaolei Zhang, Yang Feng
-
Reducing Position Bias in Simultaneous Machine Translation with Length-Aware Framework. ACL 2022 (CCF-A).
Shaolei Zhang, Yang Feng
-
Gaussian Multi-head Attention for Simultaneous Machine Translation. ACL 2022 findings (CCF-A).
Shaolei Zhang, Yang Feng
2021
-
Universal Simultaneous Machine Translation with Mixture-of-Experts Wait-k Policy. EMNLP 2021 Oral (CCF-B).
Shaolei Zhang, Yang Feng
-
Modeling Concentrated Cross-Attention for Neural Machine Translation with Gaussian Mixture Model. EMNLP 2021 findings (CCF-B).
Shaolei Zhang, Yang Feng
-
Future-Guided Incremental Transformer for Simultaneous Translation. AAAI 2021 Oral (CCF-A).
Shaolei Zhang, Yang Feng, Liangyou Li
-
ICT’s System for AutoSimTrans 2021: Robust Char-Level Simultaneous Translation. AutoSimTrans@NAACL 2021 Oral (CCF-B).
Shaolei Zhang, Yang Feng
🏆 Honors and Awards
- [2022] ICT’s Special Scholarship (Xia Peisu Award) (计算所 所长特别奖(夏培肃奖)) [Highest award in ICT/CAS, Top 2]
- [2022] National Scholarship (国家奖学金)
- [2021] First place in the streaming track of AutoSimTrans 2021 (organized by Baidu/Huawei/Google)
- [2020] Beijing Outstanding Graduates Award (北京市优秀毕业生)
- [2018] Beijing Merit Student (北京市三好学生)
- [2017] National Scholarship (国家奖学金)
👏 Services
- Area Chair of ACL/EACL/NAACL ARR 2023
- Reviewer of ACL/EMNLP/COLING/NAACL/EACL/NeurIPS, Computing Survey
- Session Chair of Student Seminar in CCL 2024
- Session Chair of Student Seminar in YSSNLP 2024
- 中国中文信息学会青年工作委员会 学生执委会主任 2020-2024
- Programming Chair of CSSNLP 2020/2021/2023
📖 Educations
- 2020.06 - 2025.06: Ph.D. Candidate. Nature language processing. Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences.
- 2016.09 - 2020.06: Bachelor’s degree. Computer science and technology. Beijing University of Posts and Telecommunications.
💬 Invited Talks
- “大模型时代的科研选题和实践分享” on MLNLP Academic Seminar [Slides]
- “跨语言对齐增强大模型——百聆” on AI TIME 大模型嘉年华 [Slides] [Video]
- “如何在大模型时代找到科研切入点?” on CCMT 2023 [Slides] [Video]
- “从机器翻译到同声传译:挑战与进展” on MLNLP Academic Seminar [Slides] [Video]
- AI Time Youth Talk for ICLR 2023 [Video]
- Share talks in ByteDance, Huawei, Tencent, Li Auto
💻 Internships
- 2019.12 - 2021.12, Huawei Noah’s Ark Lab, industry-university-research collaboration project, China.