About Me

I’m a final-year PhD student at the Department of Computer Science & Engineering, Hong Kong University of Science and Technology, co-supervised by Prof. Heung-Yeung Shum and Prof. Lionel M. Ni. Previously, I obtained my bachelor’s degree in Computer Science and Technology, South China University of Science and Technology.

I have interned at International Digital Economy Academy, Shenzhen (advised by Prof. Lei Zhang), Microsoft Research, Redmond (advised by Dr. Jianwei Yang and Dr. Chunyuan Li), and FAIR MPK (Facebook AI Research, Menlo Park).

📌My research focuses on fine-grained visual understanding and multi-modal learning. My previous work can be categorized into three main areas.

I anticipate graduating in 2025 and am open to both academic and industrial research positions in North America and Asia. If you are interested, please feel free to contact me.

✉️ Welcome to contact me for any discussion and cooperation!

🔥 News

📝 Selected Works

Refer to my google scholar for the full list.

  • LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.
    Feng Li*, Renrui Zhang*, Hao Zhang*, Yuanhan Zhang, Bo Li, Wei Li, Zejun Ma, Chunyuan Li
    arxiv 2024.
    [Paper][blog][Code]GitHub stars

  • Visual In-Context Prompting.
    Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe Xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao.
    CVPR 2024.
    [Paper][Code]GitHub stars

  • SoM: Set-of-Mark Visual Prompting for GPT-4V.
    Jianwei Yang*, Hao Zhang*, Feng Li*, Xueyan Zou*, Chunyuan Li, Jianfeng Gao.
    arxiv 2023.
    [Paper][Code]Github stars

  • Semantic-SAM: Segment and Recognize Anything at Any Granularity.
    Feng Li*, Hao Zhang*, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao.
    ECCV 2024.
    [Paper][Code]Github stars

  • OpenSeeD: A Simple Framework for Open-Vocabulary Segmentation and Detection.
    Hao Zhang*, Feng Li*, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang. ICCV 2023.
    [Paper][Code]Github stars

  • SEEM: Segment Everything Everywhere All at Once.
    Xueyan Zou*, Jianwei Yang*, Hao Zhang*, Feng Li*, Linjie Li, Jianfeng Gao, Yong Jae Lee.
    NeurIPS 2023.
    [Paper][Code]Github stars

  • Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.
    Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang.
    ECCV 2024.
    [Paper][Code]Github stars

  • Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation.
    Feng Li*, Hao Zhang*, Huaizhe Xu, Shilong Liu, Lei Zhang, Lionel M. Ni, Heung-Yeung Shum.
    CVPR 2023. Rank 9th on CVPR 2023 Most Inflentical Papers
    [Paper][Code]Github stars

  • DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection.
    Hao Zhang*, Feng Li*, Shilong Liu*, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum.
    ICLR 2023. Rank 2nd on ICLR 2023 Most Inflentical Papers
    [Paper][Code]Github stars

  • DN-DETR: Accelerate DETR Training by Introducing Query DeNoising.
    Feng Li*, Hao Zhang*, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang.
    CVPR 2022 | TPAMI 2023. Oral presentation.
    [Paper][Code]Github stars

(* denotes equal contribution or core contributor.)

🎖 Selected Awards

  • Hong Kong Postgraduate Scholoarship, 2021
  • Contemporary Undergraduate Mathematical Contest in Modeling(CUMCM), National first prize, 2019.

Flag Counter

Flag Counter