I am now a master student in the National Key Microsoft Key Laboratory of Multimedia Computing and Communication of University of Science and Technology of China, mainly interested in 2D diffusion generation, multimedia retrieval.

I graduated from School of College of Information Science and Engineering, Hohai University (河海大学信息科学与工程学院) with a bachelor’s degree. Now, I’m pursuing my master’s degree in the School of Information Science and Technology, University of Science and Technology of China (中国科学技术大学信息科学技术学院), supervised by Wengang Zhou (周文罡) and Houqiang Li (李厚强).

My research interests include computer vision, multimodal retrieval, 2D generation and editing.

📖 Educations

💻 Internships

📃 Papers

ACM MM 2024
sym

SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval

Longtao Jiang, Min Wang, Zecheng Li, Yao Fang, Wengang Zhou, Houqiang Li

abstract

  • propose a novel representation framework called Semantically Enhanced Dual-Stream Encoder (SEDS), which aggregates Pose and RGB modalities to represent local and global information of sign language videos.
  • The aggregated clip-level video features are then fed into the CLIP vision encoder for interaction, and matched with the word-level text features embedded using the CLIP text encoder.

🔍 Projects

Pre-Training of Web Page Understanding Based on Document Multimodal Large Language Model

2023.01 - 2023.06
sym

In recent years, a series of document pre training models represented by Microsoft LayoutLM have also achieved significant results in document understanding tasks.The Quark Vertical Innovation Team is responsible for solving the problem of document multimodal understanding in library type documents and web-based documents. The task of understanding web-based documents includes layout classification, page experience, semantic segmentation, node extraction, etc; The task of understanding library type documents includes layout classification, layout experience, domain categories, title extraction, etc. The two are very similar in terms of multimodal model structure and downstream analysis and understanding tasks. Quark explored modal document pre training techniques suitable for web-based data based on its own search engine’s massive web page data, and achieved significant improvements in downstream tasks of analyzing and understanding multiple web pages.

Intelligent Delivery Robot Based on Machine Vision

2022.08 - 2022.12
sym

In view of the current shortage of human resources under the COVID-19 epidemic, the high risk of medical personnel, and the demand for the automation of the medical distribution service system. A medical intelligent delivery robot system based on embedded technology has been designed. The staff can provide the door number room that the robot should deliver at the initial pharmacy location. After automatic detection of drug cargo loading, the robot will complete a series of delivery tasks including identifying the door number of the delivery ward, path planning, parking and unloading. The system uses the STM32 high-performance ARM chip as the control core and is equipped with the K210 embedded platform with KPU neural network operator for house number recognition. It uses OPENMV4Plus open-source hardware for machine vision tracking and can accurately complete its own delivery tasks in complex industrial environments.

📚 Patents

📝 Academic Services

  • ACM Multimedia (ACMMM) 2024, Reviewer
  • Conference on Neural Information Processing Systems (NeurIPS) 2024, Reviewer

🏅 Honors

  • 2023.12, First Class Master’s Scholarship from USTC
  • 2023.09, Outstanding Graduate of Hohai University (Top 1%)
  • 2023.02, Xiaomi Scholarship Special Prize (Top 1%)
  • 2022.10, First Prize of Yan Kai Scholarship (Top 1%)
  • 2020.09, National Scholarship for Undergraduate Student (Top 1%)
  • 2020.12, Excellent Student Model in Hohai University
  • 2020.09, The First Prize Scholarship in Hohai University

🏆 Competitions

  • 2022.12, National Second Prize in the National College Student Electronic Design Competition.
  • 2022.09, Blue Bridge Cup Software Design Competition National Third Prize.
  • 2021.09, Third Prize in the National College Student Intelligent Car Competition.

💼 Societies

  • 2023.09 - 2024.07 (now), Leader of Multimodal Generation Group of Microsoft Multimedia Computing and Communication State Key Laboratory.
  • 2020.09 - 2021.06, Director of the Competition Service Department of the Internet of Things Institute.
  • 2019.09 - 2020.06, Member of the Software Technology Department of the Association for Science and Technology of the Internet of Things.