About Me

I’m a final-year PhD student at the Department of Computer Science & Engineering, Hong Kong University of Science and Technology, co-supervised by Prof. Heung-Yeung Shum and Prof. Lionel M. Ni. Previously, I obtained my bachelor’s degree in Computer Science and Technology, South China University of Science and Technology.

I have interned at International Digital Economy Academy, Shenzhen, Microsoft Research, Redmond, Meta AI (FAIR), Menlo Park, and ByteDance Seed, Shenzhen.

Looking for interns and students to work on unified understanding and generation models and world models. Feel free to contact me if you are interested!

📌My research focuses on multi-modal learning and fine-grained visual understanding. My previous work can be categorized into three main areas:

✉️ Welcome to contact me for any discussion and cooperation!

🔥 News

📝 Selected Works

Refer to my google scholar for the full list.

  • BAGEL: Emerging Properties in Unified Multimodal Pretraining.
    Chaorui Deng*, Deyao Zhu*, Kunchang Li*, Chenhui Gou*, Feng Li*, Zeyu Wang, Shu Zhong, Weihao Yu, Xiaonan Nie, Ziang Song, Guang Shi, Haoqi Fan*.
    Arxiv, 2025.
    [Paper][Website][Code]GitHub stars

  • LLaVA-OneVision: Easy Visual Task Transfer.
    Bo Li, Yuanhan Zhang, Dong Guo, Renrui Zhang, Feng Li, Hao Zhang, Kaichen Zhang, Yanwei Li, Ziwei Liu, Chunyuan Li.
    TMLR, 2025.
    [Paper][Website][Code]GitHub stars

  • LLaVA-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.
    Feng Li*, Renrui Zhang*, Hao Zhang*, Yuanhan Zhang, Bo Li, Wei Li, Zejun Ma, Chunyuan Li
    ICLR 2025.
    [Paper][Blog][Code]GitHub stars

  • Visual In-Context Prompting.
    Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe Xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao.
    CVPR 2024.
    [Paper][Code]GitHub stars

  • SoM: Set-of-Mark Visual Prompting for GPT-4V.
    Jianwei Yang*, Hao Zhang*, Feng Li*, Xueyan Zou*, Chunyuan Li, Jianfeng Gao.
    arxiv 2023.
    [Paper][Code]Github stars

  • Semantic-SAM: Segment and Recognize Anything at Any Granularity.
    Feng Li*, Hao Zhang*, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao.
    ECCV 2024.
    [Paper][Code]Github stars

  • OpenSeeD: A Simple Framework for Open-Vocabulary Segmentation and Detection.
    Hao Zhang*, Feng Li*, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang.
    ICCV 2023.
    [Paper][Code]Github stars

  • SEEM: Segment Everything Everywhere All at Once.
    Xueyan Zou*, Jianwei Yang*, Hao Zhang*, Feng Li*, Linjie Li, Jianfeng Gao, Yong Jae Lee.
    NeurIPS 2023.
    [Paper][Code]Github stars

  • Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.
    Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang.
    ECCV 2024.
    [Paper][Code]Github stars

  • Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation.
    Feng Li*, Hao Zhang*, Huaizhe Xu, Shilong Liu, Lei Zhang, Lionel M. Ni, Heung-Yeung Shum.
    CVPR 2023. Rank 9th on CVPR 2023 Most Inflentical Papers
    [Paper][Code]Github stars

  • DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection.
    Hao Zhang*, Feng Li*, Shilong Liu*, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum.
    ICLR 2023. Rank 2nd on ICLR 2023 Most Inflentical Papers
    [Paper][Code]Github stars

  • DN-DETR: Accelerate DETR Training by Introducing Query DeNoising.
    Feng Li*, Hao Zhang*, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang.
    CVPR 2022 | TPAMI 2023. Oral presentation.
    [Paper][Code]Github stars

(* denotes equal contribution or core contributor.)

🎖 Selected Awards

  • Hong Kong Postgraduate Scholoarship, 2021
  • Contemporary Undergraduate Mathematical Contest in Modeling(CUMCM), National first prize, 2019.

Flag Counter

Flag Counter