Publications
* Equal contribution, †Equal advising
Your browser does not support the video tag.
Eureka: Human-Level Reward Design via Coding Large Language Models
Jason Ma ,
William Liang ,
Guanzhi Wang ,
De-An Huang ,
Osbert Bastani ,
Dinesh Jayaraman ,
Yuke Zhu ,
Linxi "Jim" Fan†,
Anima Anandkumarâ€
International Conference on Learning Representations (ICLR) , 2024
[paper]  
[project page]  
[code]  
We present Eureka, an open-ended LLM-powered agent that designs reward functions for robot dexterity at super-human level.
Your browser does not support the video tag.
Voyager: An Open-Ended Embodied Agent with Large Language Models
Guanzhi Wang ,
Yuqi Xie ,
Yunfan Jiang* ,
Ajay Mandlekar* ,
Chaowei Xiao ,
Yuke Zhu ,
Linxi "Jim" Fan†,
Anima Anandkumarâ€
Transactions on Machine Learning Research (TMLR) , 2024
[paper]  
[project page]  
[code]  
We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention.
Your browser does not support the video tag.
VIMA: General Robot Manipulation with Multimodal Prompts
Yunfan Jiang ,
Agrim Gupta* ,
Zichen "Charles" Zhang* ,
Guanzhi Wang* ,
Yongqiang Dou ,
Yanjun Chen ,
Li Fei-Fei ,
Anima Anandkumar ,
Yuke Zhu†,
Linxi "Jim" Fanâ€
International Conference on Machine Learning (ICML) , 2023
[paper]  
[project page]  
[code]  
We introduce a novel multimodal prompting formulation that converts diverse robot manipulation tasks into a uniform sequence modeling problem.
Your browser does not support the video tag.
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Linxi "Jim" Fan ,
Guanzhi Wang* ,
Yunfan Jiang* ,
Ajay Mandlekar ,
Yuncong Yang ,
Haoyi Zhu ,
Andrew Tang ,
De-An Huang ,
Yuke Zhu†,
Anima Anandkumarâ€
Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track , 2022
✨ Outstanding Paper Award ✨
[paper]  
[project page]  
[code]  
We introduce MineDojo, a new framework based on the popular Minecraft game for building generally capable, open-ended embodied agents.
SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies
Linxi "Jim" Fan ,
Guanzhi Wang ,
De-An Huang ,
Zhiding Yu ,
Li Fei-Fei ,
Yuke Zhu ,
Anima Anandkumar
International Conference on Machine Learning (ICML) , 2021
[paper]  
[project page]  
[code]  
We propose SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to decouple robust representation learning from policy optimization.
iGibson 1.0: a Simulation Environment for Interactive Tasks in Large Realistic Scenes
Bokui Shen* ,
Fei Xia* ,
Chengshu Li* ,
Roberto MartÃn-MartÃn* ,
Linxi "Jim" Fan ,
Guanzhi Wang ,
Claudia D’Arpino ,
Shyamal Buch ,
Sanjana Srivastava ,
Lyne P. Tchapmi ,
Micael E. Tchapmi ,
Kent Vainio ,
Josiah Wong ,
Li Fei-Fei ,
Silvio Savarese
International Conference on Intelligent Robots and Systems (IROS) , 2021
[paper]  
[project page]  
[code]  
We present iGibson, a novel simulation environment for developing interactive robotic agents in large-scale realistic scenes.
Deep Video Matting via Spatio-Temporal Alignment and Aggregation
Yanan Sun ,
Guanzhi Wang* ,
Qiao Gu* ,
Chi-Keung Tang ,
Yu-Wing Tai
Conference on Computer Vision and Pattern Recognition (CVPR) , 2021
[paper]  
[code]  
[dataset]
We propose a deep learning-based video matting framework which employs a novel and effective spatio-temporal feature aggregation module.
RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition
Linxi "Jim" Fan* ,
Shyamal Buch* ,
Guanzhi Wang ,
Ryan Cao ,
Yuke Zhu ,
Juan Carlos Niebles ,
Li Fei-Fei
European Conference on Computer Vision (ECCV) , 2020
[paper]  
[project page]  
[video]  
[supplementary]  
[code]  
We propose RubiksNet, a new efficient architecture for video action recognition based on a proposed learnable 3D spatiotemporal shift operation (RubiksShift).
LADN: Local Adversarial Disentangling Network for Facial Makeup and De-Makeup
Qiao Gu* ,
Guanzhi Wang* ,
Mang Tik Chiu ,
Yu-Wing Tai ,
Chi-Keung Tang
International Conference on Computer Vision (ICCV) , 2019
[paper]  
[project page]  
[code]  
[dataset]
We propose a local adversarial disentangling network for facial makeup and de-makeup, using multiple and overlapping local discriminators in a content-style disentangling network.
Conference Reviewer: ICML 2024, ICLR 2024, NeurIPS 2023, NeurIPS 2022, ICLR 2022, ICCV 2021, CVPR 2021, ECCV 2020
Awards
Honorable Mention for the NVIDIA Graduate Fellowship (2024)
NeurIPS Outstanding Paper Award (2022)
Kortschak Scholar (2021)
Stanford Human-Centered AI Google Cloud Credits Grant (2021)
Stanford Human-Centered AI AWS Cloud Credits Award (2020)
HKUST Academic Achievement Medal (2019)
Talent Development Scholarship (2019)
Reaching Out Award (2018)
High Fashion Charitable Foundation Exchange Scholarship (2018)
Overseas Learning Experience Scholarship (2018)
Dean’s List (2015-2019)
University Recruitment Scholarship (2015-2019)