About Me
I’m a final-year PhD student at the Department of Computer Science & Engineering, Hong Kong University of Science and Technology, co-supervised by Prof. Heung-Yeung Shum and Prof. Lionel M. Ni. Previously, I obtained my bachelor’s degree in Computer Science and Technology, South China University of Science and Technology.
I have interned at International Digital Economy Academy, Shenzhen (advised by Prof. Lei Zhang), Microsoft Research, Redmond (advised by Dr. Jianwei Yang and Dr. Chunyuan Li), and FAIR MPK (Facebook AI Research, Menlo Park).
📌My research focuses on fine-grained visual understanding and multi-modal learning. My previous work can be categorized into three main areas.
- Improve multi-model LLMs, like LLaVA-Next series, including LLaVA-Next(stronger), LLaVA-Next(ablations), LLaVA-Next-Interleave, and LLaVA-OneVision.
- Enable more promotable detection and grounding systems, including
- Visual in-context prompt, including DINOv and T-Rex/T-Rex2.
- Visual geometry prompt, including SEEM and Semantic-SAM.
- Text prompt, including OpenSeed and Grounding DINO.
- Push close-set detection and segmentation performance, including Mask DINO, DINO, DN-DETR, and DAB-DETR.
I anticipate graduating in 2025 and am open to both academic and industrial research positions in North America and Asia. If you are interested, please feel free to contact me.
✉️ Welcome to contact me for any discussion and cooperation!
🔥 News
[2024/9]: Grounding DINO is selected as the most influential ECCV 2024 papers.
- [2024/7]: LLaVA-Next-Interleave is out! We utilize image-text interleaved format to unify multi-image, video, and 3D tasks in one LLM. Check out blog, train data, LLaVA-Interleave Bench and code to see new capabilities and improved performance!
- [2024/3]: Check our recent works on Visual Prompting for detection and segmentation! A series of works including DINOv, T-Rex/T-Rex2 have been released.
- [2023/9]: Mask DINO is selected as the most influential CVPR 2023 papers.
- [2023/7]: Two works that focus on Interactive Segmentation have been released, including SEEM and Semantic-SAM. Check them out!
- [2023/4]: DINO and DAB-DETR are selected as the most influential ICLR 2023 and ICLR 2022 papers.
- [2023/3]: Two works on Open-set Detection & Segmentation have been released, including Grounding DINO and OpenSeed. Check them out!
- [2023/3]: DINO and DN-DETR are selected as the top 100 most cited AI papers for 2022, rank 38 and 53, respectively.
- [2022/6]: A series of works that push Transformer-based Close-set Detection & Segmentation models to SOTA performance, including Mask DINO, DINO, DN-DETR, DAB-DETR, have been released.
📝 Selected Works
Refer to my google scholar for the full list.
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.
Feng Li*, Renrui Zhang*, Hao Zhang*, Yuanhan Zhang, Bo Li, Wei Li, Zejun Ma, Chunyuan Li
arxiv 2024.
[Paper][blog][Code]Visual In-Context Prompting.
Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe Xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao.
CVPR 2024.
[Paper][Code]SoM: Set-of-Mark Visual Prompting for GPT-4V.
Jianwei Yang*, Hao Zhang*, Feng Li*, Xueyan Zou*, Chunyuan Li, Jianfeng Gao.
arxiv 2023.
[Paper][Code]Semantic-SAM: Segment and Recognize Anything at Any Granularity.
Feng Li*, Hao Zhang*, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao.
ECCV 2024.
[Paper][Code]OpenSeeD: A Simple Framework for Open-Vocabulary Segmentation and Detection.
Hao Zhang*, Feng Li*, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang. ICCV 2023.
[Paper][Code]SEEM: Segment Everything Everywhere All at Once.
Xueyan Zou*, Jianwei Yang*, Hao Zhang*, Feng Li*, Linjie Li, Jianfeng Gao, Yong Jae Lee.
NeurIPS 2023.
[Paper][Code]Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.
Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang.
ECCV 2024.
[Paper][Code]Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation.
Feng Li*, Hao Zhang*, Huaizhe Xu, Shilong Liu, Lei Zhang, Lionel M. Ni, Heung-Yeung Shum.
CVPR 2023. Rank 9th on CVPR 2023 Most Inflentical Papers
[Paper][Code]DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection.
Hao Zhang*, Feng Li*, Shilong Liu*, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum.
ICLR 2023. Rank 2nd on ICLR 2023 Most Inflentical Papers
[Paper][Code]DN-DETR: Accelerate DETR Training by Introducing Query DeNoising.
Feng Li*, Hao Zhang*, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang.
CVPR 2022 | TPAMI 2023. Oral presentation.
[Paper][Code]
(* denotes equal contribution or core contributor.)
🎖 Selected Awards
- Hong Kong Postgraduate Scholoarship, 2021
- Contemporary Undergraduate Mathematical Contest in Modeling(CUMCM), National first prize, 2019.