About Me
I’m a final-year PhD student at the Department of Computer Science & Engineering, Hong Kong University of Science and Technology, co-supervised by Prof. Heung-Yeung Shum and Prof. Lionel M. Ni. Previously, I obtained my bachelor’s degree in Computer Science and Technology, South China University of Science and Technology.
I have interned at International Digital Economy Academy, Shenzhen, Microsoft Research, Redmond, Meta AI (FAIR), Menlo Park, and ByteDance Seed, Shenzhen.
Looking for interns and students to work on unified understanding and generation models and world models. Feel free to contact me if you are interested!
📌My research focuses on multi-modal learning and fine-grained visual understanding. My previous work can be categorized into three main areas:
- Improve multi-model LLMs for visual understanding and generation, like LLaVA-Next series
(i.e, LLaVA-Interleave and LLaVA-OneVision) and BAGEL
.
- Enable more promotable detection and grounding systems, including
- Visual in-context prompt, including DINOv
and T-Rex/T-Rex2
.
- Visual geometry prompt, including SEEM
and Semantic-SAM
.
- Text prompt, including OpenSeed
and Grounding DINO
.
- Visual in-context prompt, including DINOv
- Push close-set detection and segmentation performance, including Mask DINO
, DINO
, DN-DETR
, and DAB-DETR
.
✉️ Welcome to contact me for any discussion and cooperation!
🔥 News
- [2025/5]: BAGEL is out! An open-source unified model for visual understanding and generation, trained on large‑scale interleaved multimodal data.
[2024/9]: Grounding DINO is selected as the most influential ECCV 2024 papers.
- [2024/7]: LLaVA-Interleave is out! We utilize image-text interleaved format to unify multi-image, video, and 3D tasks in one LLM. Check out blog, train data, LLaVA-Interleave Bench and code
to see new capabilities and improved performance!
- [2024/3]: Check our recent works on Visual Prompting for detection and segmentation! A series of works including DINOv
, T-Rex/T-Rex2
have been released.
- [2023/9]: Mask DINO is selected as the most influential CVPR 2023 papers.
- [2023/7]: Two works that focus on Interactive Segmentation have been released, including SEEM
and Semantic-SAM
. Check them out!
- [2023/4]: DINO and DAB-DETR are selected as the most influential ICLR 2023 and ICLR 2022 papers.
- [2023/3]: Two works on Open-set Detection & Segmentation have been released, including Grounding DINO
and OpenSeed
. Check them out!
- [2023/3]: DINO and DN-DETR are selected as the top 100 most cited AI papers for 2022, rank 38 and 53, respectively.
- [2022/6]: A series of works that push Transformer-based Close-set Detection & Segmentation models to SOTA performance, including Mask DINO
, DINO
, DN-DETR
, DAB-DETR
, have been released.
📝 Selected Works
Refer to my google scholar for the full list.
BAGEL: Emerging Properties in Unified Multimodal Pretraining.
Chaorui Deng*, Deyao Zhu*, Kunchang Li*, Chenhui Gou*, Feng Li*, Zeyu Wang, Shu Zhong, Weihao Yu, Xiaonan Nie, Ziang Song, Guang Shi, Haoqi Fan*.
Arxiv, 2025.
[Paper][Website][Code]LLaVA-OneVision: Easy Visual Task Transfer.
Bo Li, Yuanhan Zhang, Dong Guo, Renrui Zhang, Feng Li, Hao Zhang, Kaichen Zhang, Yanwei Li, Ziwei Liu, Chunyuan Li.
TMLR, 2025.
[Paper][Website][Code]LLaVA-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.
Feng Li*, Renrui Zhang*, Hao Zhang*, Yuanhan Zhang, Bo Li, Wei Li, Zejun Ma, Chunyuan Li
ICLR 2025.
[Paper][Blog][Code]Visual In-Context Prompting.
Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe Xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao.
CVPR 2024.
[Paper][Code]SoM: Set-of-Mark Visual Prompting for GPT-4V.
Jianwei Yang*, Hao Zhang*, Feng Li*, Xueyan Zou*, Chunyuan Li, Jianfeng Gao.
arxiv 2023.
[Paper][Code]Semantic-SAM: Segment and Recognize Anything at Any Granularity.
Feng Li*, Hao Zhang*, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao.
ECCV 2024.
[Paper][Code]OpenSeeD: A Simple Framework for Open-Vocabulary Segmentation and Detection.
Hao Zhang*, Feng Li*, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang.
ICCV 2023.
[Paper][Code]SEEM: Segment Everything Everywhere All at Once.
Xueyan Zou*, Jianwei Yang*, Hao Zhang*, Feng Li*, Linjie Li, Jianfeng Gao, Yong Jae Lee.
NeurIPS 2023.
[Paper][Code]Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.
Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang.
ECCV 2024.
[Paper][Code]Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation.
Feng Li*, Hao Zhang*, Huaizhe Xu, Shilong Liu, Lei Zhang, Lionel M. Ni, Heung-Yeung Shum.
CVPR 2023. Rank 9th on CVPR 2023 Most Inflentical Papers
[Paper][Code]DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection.
Hao Zhang*, Feng Li*, Shilong Liu*, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum.
ICLR 2023. Rank 2nd on ICLR 2023 Most Inflentical Papers
[Paper][Code]DN-DETR: Accelerate DETR Training by Introducing Query DeNoising.
Feng Li*, Hao Zhang*, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang.
CVPR 2022 | TPAMI 2023. Oral presentation.
[Paper][Code]
(* denotes equal contribution or core contributor.)
🎖 Selected Awards
- Hong Kong Postgraduate Scholoarship, 2021
- Contemporary Undergraduate Mathematical Contest in Modeling(CUMCM), National first prize, 2019.