About Me
I’m a final-year PhD student at the Department of Computer Science & Engineering, Hong Kong University of Science and Technology, co-supervised by Prof. Heung-Yeung Shum and Prof. Lionel M. Ni. Previously, I obtained my bachelor’s degree in Computer Science and Technology, South China University of Science and Technology.
I am an intern at FAIR MPK (Facebook AI Research, Menlo Park). I have interned at International Digital Economy Academy, Shenzhen (advised by Prof. Lei Zhang) and Microsoft Research, Redmond (advised by Dr. Jianwei Yang and Dr. Chunyuan Li).
📌My research focuses on fine-grained visual understanding and multi-modal learning. My previous work can be categorized into three main areas.
- Improve multi-model LLMs, like LLaVA-Next series, including LLaVA-Next(stronger), LLaVA-Next(ablations), LLaVA-Next-Interleave, and LLaVA-OneVision.
- Enable more promotable detection and grounding systems, including
- Visual in-context prompt, including DINOv and T-Rex/T-Rex2.
- Visual geometry prompt, including SEEM and Semantic-SAM.
- Text prompt, including OpenSeed and Grounding DINO.
- Push close-set detection and segmentation performance, including Mask DINO, DINO, DN-DETR, and DAB-DETR.
I anticipate graduating in 2025 and am open to both academic and industrial research positions in North America and Asia. If you are interested, please feel free to contact me.
✉️ Welcome to contact me for any discussion and cooperation!
🔥 News
- [2024/7]: LLaVA-Next-Interleave is out! We utilize image-text interleaved format to unify multi-image, video, and 3D tasks in one LLM. Check out blog, train data, LLaVA-Interleave Bench and code to see new capabilities and improved performance!
- [2024/3]: Check our recent works on Visual Prompting for detection and segmentation! A series of works including DINOv, T-Rex/T-Rex2 have been released.
- [2023/9]: Mask DINO is selected as the most influential CVPR 2023 papers.
- [2023/7]: Two works that focus on Interactive Segmentation have been released, including SEEM and Semantic-SAM. Check them out!
- [2023/4]: DINO and DAB-DETR are selected as the most influential ICLR 2023 and ICLR 2022 papers.
- [2023/3]: Two works on Open-set Detection & Segmentation have been released, including Grounding DINO and OpenSeed. Check them out!
- [2023/3]: DINO and DN-DETR are selected as the top 100 most cited AI papers for 2022, rank 38 and 53, respectively.
- [2022/6]: A series of works that push Transformer-based Close-set Detection & Segmentation models to SOTA performance, including Mask DINO, DINO, DN-DETR, DAB-DETR, have been released.
📝 Selected Works
Refer to my google scholar for the full list.
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.
Feng Li*, Renrui Zhang*, Hao Zhang*, Yuanhan Zhang, Bo Li, Wei Li, Zejun Ma, Chunyuan Li
arxiv 2024.
[Paper][blog][Code]Visual In-Context Prompting.
Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe Xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao.
CVPR 2024.
[Paper][Code]SoM: Set-of-Mark Visual Prompting for GPT-4V.
Jianwei Yang*, Hao Zhang*, Feng Li*, Xueyan Zou*, Chunyuan Li, Jianfeng Gao.
arxiv 2023.
[Paper][Code]Semantic-SAM: Segment and Recognize Anything at Any Granularity.
Feng Li*, Hao Zhang*, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao.
ECCV 2024.
[Paper][Code]OpenSeeD: A Simple Framework for Open-Vocabulary Segmentation and Detection.
Hao Zhang*, Feng Li*, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang. ICCV 2023.
[Paper][Code]SEEM: Segment Everything Everywhere All at Once.
Xueyan Zou*, Jianwei Yang*, Hao Zhang*, Feng Li*, Linjie Li, Jianfeng Gao, Yong Jae Lee.
NeurIPS 2023.
[Paper][Code]Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.
Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang.
ECCV 2024.
[Paper][Code]Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation.
Feng Li*, Hao Zhang*, Huaizhe Xu, Shilong Liu, Lei Zhang, Lionel M. Ni, Heung-Yeung Shum.
CVPR 2023. Rank 9th on CVPR 2023 Most Inflentical Papers
[Paper][Code]DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection.
Hao Zhang*, Feng Li*, Shilong Liu*, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum.
ICLR 2023. Rank 2nd on ICLR 2023 Most Inflentical Papers
[Paper][Code]DN-DETR: Accelerate DETR Training by Introducing Query DeNoising.
Feng Li*, Hao Zhang*, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang.
CVPR 2022 | TPAMI 2023. Oral presentation.
[Paper][Code]
(* denotes equal contribution or core contributor.)
🎖 Selected Awards
- Hong Kong Postgraduate Scholoarship, 2021
- Contemporary Undergraduate Mathematical Contest in Modeling(CUMCM), National first prize, 2019.