About Me

I’m a final-year PhD student at the Department of Computer Science & Engineering, Hong Kong University of Science and Technology, co-supervised by Prof. Heung-Yeung Shum and Prof. Lionel M. Ni. Previously, I obtained my bachelor’s degree in Computer Science and Technology, South China University of Science and Technology.

I have interned at International Digital Economy Academy, Shenzhen, Microsoft Research, Redmond, Meta AI (FAIR), Menlo Park, and ByteDance Seed, Shenzhen.

📌My research focuses on multi-modal learning and fine-grained visual understanding. My previous work can be categorized into three main areas:

Improve multi-model LLMs for visual understanding and generation, like LLaVA-Next series (i.e, LLaVA-Interleave and LLaVA-OneVision) and BAGEL.
Enable more promotable detection and grounding systems, including
- Visual in-context prompt, including DINOv and T-Rex/T-Rex2.
- Visual geometry prompt, including SEEM and Semantic-SAM.
- Text prompt, including OpenSeed and Grounding DINO.
Push close-set detection and segmentation performance, including Mask DINO, DINO, DN-DETR, and DAB-DETR.

✉️ Welcome to contact me for any discussion and cooperation!

🔥 News

[2025/5]: BAGEL is out! An open-source unified model for visual understanding and generation, trained on large‑scale interleaved multimodal data.
[2024/9]: Grounding DINO is selected as the most influential ECCV 2024 papers.
[2024/7]: LLaVA-Interleave is out! We utilize image-text interleaved format to unify multi-image, video, and 3D tasks in one LLM. Check out blog, train data, LLaVA-Interleave Bench and code to see new capabilities and improved performance!
[2024/3]: Check our recent works on Visual Prompting for detection and segmentation! A series of works including DINOv, T-Rex/T-Rex2 have been released.
[2023/9]: Mask DINO is selected as the most influential CVPR 2023 papers.
[2023/7]: Two works that focus on Interactive Segmentation have been released, including SEEM and Semantic-SAM. Check them out!
[2023/4]: DINO and DAB-DETR are selected as the most influential ICLR 2023 and ICLR 2022 papers.
[2023/3]: Two works on Open-set Detection & Segmentation have been released, including Grounding DINO and OpenSeed. Check them out!
[2023/3]: DINO and DN-DETR are selected as the top 100 most cited AI papers for 2022, rank 38 and 53, respectively.
[2022/6]: A series of works that push Transformer-based Close-set Detection & Segmentation models to SOTA performance, including Mask DINO, DINO, DN-DETR, DAB-DETR, have been released.

📝 Selected Works

Refer to my google scholar for the full list.

BAGEL: Emerging Properties in Unified Multimodal Pretraining.
Chaorui Deng*, Deyao Zhu*, Kunchang Li*, Chenhui Gou*, Feng Li*, Zeyu Wang, Shu Zhong, Weihao Yu, Xiaonan Nie, Ziang Song, Guang Shi, Haoqi Fan*.
Arxiv, 2025.
[Paper][Website][Code]
LLaVA-OneVision: Easy Visual Task Transfer.
Bo Li, Yuanhan Zhang, Dong Guo, Renrui Zhang, Feng Li, Hao Zhang, Kaichen Zhang, Yanwei Li, Ziwei Liu, Chunyuan Li.
TMLR, 2025.
[Paper][Website][Code]
LLaVA-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.
Feng Li*, Renrui Zhang*, Hao Zhang*, Yuanhan Zhang, Bo Li, Wei Li, Zejun Ma, Chunyuan Li
ICLR 2025.
[Paper][Blog][Code]
Visual In-Context Prompting.
Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe Xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao.
CVPR 2024.
[Paper][Code]
SoM: Set-of-Mark Visual Prompting for GPT-4V.
Jianwei Yang*, Hao Zhang*, Feng Li*, Xueyan Zou*, Chunyuan Li, Jianfeng Gao.
arxiv 2023.
[Paper][Code]
Semantic-SAM: Segment and Recognize Anything at Any Granularity.
Feng Li*, Hao Zhang*, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao.
ECCV 2024.
[Paper][Code]
OpenSeeD: A Simple Framework for Open-Vocabulary Segmentation and Detection.
Hao Zhang*, Feng Li*, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang.
ICCV 2023.
[Paper][Code]
SEEM: Segment Everything Everywhere All at Once.
Xueyan Zou*, Jianwei Yang*, Hao Zhang*, Feng Li*, Linjie Li, Jianfeng Gao, Yong Jae Lee.
NeurIPS 2023.
[Paper][Code]
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.
Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang.
ECCV 2024.
[Paper][Code]
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation.
Feng Li*, Hao Zhang*, Huaizhe Xu, Shilong Liu, Lei Zhang, Lionel M. Ni, Heung-Yeung Shum.
CVPR 2023. Rank 9th on CVPR 2023 Most Inflentical Papers
[Paper][Code]
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection.
Hao Zhang*, Feng Li*, Shilong Liu*, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum.
ICLR 2023. Rank 2nd on ICLR 2023 Most Inflentical Papers
[Paper][Code]
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising.
Feng Li*, Hao Zhang*, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang.
CVPR 2022 | TPAMI 2023. Oral presentation.
[Paper][Code]

(* denotes equal contribution or core contributor.)

🎖 Selected Awards

Hong Kong Postgraduate Scholoarship, 2021
Contemporary Undergraduate Mathematical Contest in Modeling(CUMCM), National first prize, 2019.

Feng Li (李峰)

🔥 News

📝 Selected Works

🎖 Selected Awards