Haoyu Li is an undergraduate student in Computer Science at Wuhan University. His research interests include 3D computer vision, multimodal learning, generative AI, and embodied intelligence. He has been a research assistant at the MARS Lab, Wuhan University, under Prof. Mang Ye, and a remote research intern at the Intelligent Interface Center, Harbin Institute of Technology, under Prof. Tiejun Zhao.

Contact: haoyuli404@outlook.com · +86-158-2700-2669 · CV (PDF)

Vision

My ambition is to work at the intersection of 3D computer vision, multimodal learning, generative AI, and embodied intelligence. I hope to study unified pipelines that move from scene geometry and multimodal representations to generative world models and language-driven robot execution, so that AI can not only see and synthesize plausible visual worlds, but also deliver verifiable, deployable closed-loop capabilities on real data and physical platforms. This is the research path I intend to pursue long term.

News

2026.06: FedMental was accepted to ECCV 2026.
2026.06: Released SkyArm-VLA, an embodied vision-language-action project for robotic arm manipulation.
2026.05: Released Diffusion Models from Zero to Hero as an open-source course; contributions are welcome.
2026.04: Project approved, National College Students’ Innovation and Entrepreneurship Training Program.
2026.04: S3Mamba-Pan was accepted by IEEE TGRS with DOI 10.1109/TGRS.2026.3686021.
2025.10: Received Meritorious Student Leader, Wuhan University Merit Student, and related honors.
2025.08: National Third Prize, National Computer System Capability Competition.
2025.05: Central-South Regional First Prize, National Undergraduate Computer Design Competition.
2025.05: Honorable Mention, Mathematical Contest in Modeling.
2024.11: Hubei Province Second Prize, National College Student Mathematics Competition.

Publications

ECCV 2026

FedMental framework: topology-aware federated prototype learning for polymorphic multimodal psychiatry

FedMental: Topology-Aware Federated Prototype Learning for Polymorphic Multimodal Psychiatry

Haoyu Li; He Li; Wenke Huang; Yujing Rao; Xiaofen Zong; Mang Ye

European Conference on Computer Vision (ECCV), 2026.

IEEE TGRS

S3Mamba-Pan framework: frequency-decoupled dual-stream Mamba for pansharpening

S3Mamba-Pan: Spectral-Spatial-Scale Mamba With Frequency-Decoupled Dual-Stream for Pansharpening

Zishun Song; Yao Zhang; Haoyu Li; Yanlin He; Jiawei Zhao; Yi Yang; Wei Zhang; Dezhen Wang

IEEE Transactions on Geoscience and Remote Sensing, 2026. DOI: 10.1109/TGRS.2026.3686021

Honors and Awards

2025 Meritorious Student Leader, Wuhan University
2025 Wuhan University Merit Student
2025 National Third Prize, National Computer System Capability Competition
2025 Central-South Regional First Prize, National Undergraduate Computer Design Competition
2025 Hubei Province Second Prize, National College Student Mathematics Competition
2025 Honorable Mention, Mathematical Contest in Modeling
2026 Project Approved, National College Students’ Innovation and Entrepreneurship Training Program

Education

Sept. 2023 - June 2027 (expected), Bachelor of Engineering in Computer Science, Wuhan University, Wuhan, China
- GPA: 91.75 / 100
- Selected coursework: Data Structures (97), Algorithm Design and Analysis (92), Computer Graphics (92), Fundamentals of Software Construction (94), Advanced Mathematics (96), Probability Theory and Mathematical Statistics (92)

Selected Coursework & Self-Study

Beyond my formal curriculum at Wuhan University, I completed a focused self-study track in robot learning, vision-language-action (VLA), generative modeling, and 3D vision to prepare for research in embodied intelligence, world models, and geometry-aware generation.

Robot learning & embodied AI

Vision-language-action & LeRobot

Generative modeling & diffusion

3D vision & inverse graphics

Hugging Face ML for 3D
MIT: Machine Learning for Inverse Graphics (Scene Representation Group)
Stanford CS348N: Neural Models for 3D Geometry

Research Experience

MARS Lab, Wuhan University - Research Assistant · Apr. 2025 - Present · Wuhan, China · Advisor: Prof. Mang Ye
- Studied federated prototype learning for multi-center, multimodal psychiatric diagnosis, focusing on non-IID data, privacy constraints, and cross-domain generalization; designed topology-aware prototype modeling to reduce prototype shift and semantic mixing across heterogeneous clinical centers.
- Designed a multimodal agentic diagnosis framework that organizes clinical text, structured scales, and multimodal evidence into a hierarchical reasoning chain.
- Co-developed a multidimensional psychiatric benchmark, focusing on interpretability, robustness, and failure-mode diagnosis in complex settings; the dataset is released on Harvard Dataverse.
Intelligent Interface Center, Harbin Institute of Technology - Remote Research Intern · Dec. 2025 - Present · Remote · Advisor: Prof. Tiejun Zhao
- Contributed to S3Mamba-Pan, an efficient visual modeling project, studying the trade-offs among spatial detail, spectral consistency, and inference efficiency through frequency-decoupled dual-stream Mamba, Haar-wavelet decomposition, spectral anchors, and adaptive distribution recalibration.
- Time-series generation: Developed ChronoRect, using rectified flow to model continuous distribution transport in clinical time-series data, and proposed EHR-TriDiT to improve synthetic fidelity, downstream utility, and empirical privacy safety, gaining hands-on understanding of flow matching, generative sampling, and conditional modeling.
- Controllable video generation: Studied region and trajectory conditioning in object-centric text-to-video diffusion, aiming to improve object motion control, identity preservation, attribute binding, and temporal consistency in generated videos.

Projects

SkyArm-VLA - Apr. 2026 - June 2026 · Vision-Language-Action, robotic manipulation, LeRobot, embodied AI

Built a robotic-arm VLA project around LingBot-VLA, adapting a pretrained vision-language-action model to a custom desktop manipulation setting and connecting language-conditioned perception with executable robot actions.
Completed the post-training workflow from LeRobot data collection, camera/robot-state alignment, action-space mapping, and action-expert training to open-loop evaluation and WebSocket-based remote inference deployment.

Diffusion Models from Zero to Hero: hands-on course covering DDPM through video generation

Diffusion Models from Zero to Hero - Oct. 2025 - Present · Diffusers, Stable Diffusion, LoRA, ControlNet, video generation

Organized runnable notebooks and documentation around Diffusers, scheduler design, UNet training loops, fine-tuning, classifier-free guidance, class conditioning, Stable Diffusion components (VAE, CLIP text encoder, UNet, cross-attention), img2img, inpainting, and depth-to-image.
Covered DDIM inversion, DreamBooth personalization, audio spectrogram diffusion, toy video diffusion, troubleshooting notes, GPU-memory guidance, and a modern roadmap spanning LoRA, ControlNet, SDXL, DiT, Flow Matching, and video models.

Happy-LLM - Aug. 2025 - Apr. 2026 · Decoder-only, Pretraining -> SFT, LLM engineering

Co-developed Happy-LLM (30k+ stars), organizing teaching and practice materials around building LLMs from scratch, covering tokenizer, embedding, Transformer decoder, attention, MLP, normalization, RoPE, and KV Cache.
Structured the end-to-end engineering workflow across pretraining, supervised fine-tuning, PEFT, and inference optimization, connecting data preparation, training objectives, memory management, inference caching, evaluation, and application extensions to bridge model understanding and runnable LLM practice.

CyberMars: Intelligent CyberDog Robotics Control - Apr. 2025 - Aug. 2025 · ROS reinforcement learning

Built an embodied control system for Xiaomi CyberDog, using ROS to coordinate task scheduling, real-time visual perception, lane following, QR / marker recognition, and obstacle-aware navigation, and converting fused sensor observations into executable motion decisions.
Mapped visual recognition outputs into task states and action constraints, designed low-level motion interfaces and an LCMT trajectory-tracking pipeline, and combined reinforcement-learning-guided policy tuning with real-time feedback correction to improve closed-loop stability under sensor noise, action delay, and scene disturbance.

Leadership & Service

Class monitor, Wuhan University · Sept. 2023 - Present - responsible for a class of 31 students; coordinated academic, administrative, and collective affairs; organized 16 themed activities across class, college, and inter-college settings.
Committee member, External Liaison Department · Sept. 2023 - Present - facilitated 2 campus-enterprise collaborations; planned events reaching 5,000+ participants cumulatively.

Skills

Languages: Mandarin (native), English (fluent)
Programming: Python, Java, C#, C++, SQL, Go
Tools & frameworks: PyTorch, Linux, Git, ROS, Docker

Haoyu Li

Vision

News

Publications

Honors and Awards

Education

Selected Coursework & Self-Study

Research Experience

Projects

Leadership & Service

Skills

Vision

News

Publications

Honors and Awards

Education

Selected Coursework & Self-Study

Research Experience

Projects

Leadership & Service

Skills

愿景

动态

发表论文

荣誉与奖项

教育背景

精选课程与自学

科研经历

项目

学生工作与志愿服务

技能