Research
Our research is driven by advanced Computer Vision and Machine Learning & Robot Vision, equipping machines to truly understand complex real-world environments. Our research spans 3D Reconstruction for precise geometric scene understanding, Social Kinematics for modeling human-agent dynamics, and LLMs & Creative AI for generating physics-based content. Especially, our group's research achievements are listed in below:
Autonomous Driving
- Trajectory Prediction & Reasoning
- Motion & Behavior Generation
- Robotics & Physical Simulation
Related Works:
AAAI'21, CVPR'22, ECCV'22, AAAI'23ORAL, ICCV'23, CVPR'24(1), CVPR'24(2), CVPR'25, IEEE TPAMI'26
Generative AI
- Video & Image Content Creation
- Sign Language Generation
- TEM / Medical Image Analysis
Related Works:
CVPR'24(2), ECCV'24, ECCV'26(Under-Review)
3D Reconstruction
- 2D, 3D, 4D Scene Reconstruction
- Depth Estimation & Completion
Related Works:
ICML'23, ICLR'24, NeurIPS'24, ICCV'25HIGHLIGHT, ECCV'26(Under-Review)
Large Language Models
- Vision-Language Alignment
- Reasoning via VLMs
- Vision-Language-Action
Related Works:
CVPR'24(2), IEEE TPAMI'26, ECCV'26(Under-Review)
Machine Learning for CV
- Machine Learning Theory
- Feature Encoding & Latent Mapping
- Mathematical Formulation
Related Works:
CVPR'22, ICML'23, ICCV'23
- Generative AI and Diffusion Model-based Dynamic Video Generation and Editing
- 3D Scene Understanding and Geometry-based Multi-view Video Synthesis (Gaussian Splatting)
- Multi-agent Dynamic Behavior and Interaction Reasoning based on LLMs and VLA Models
- Physics-based Generative Models and Embodied AI