Publications
(* indicates equal contribution)
Your browser does not support the video tag.
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Linxi Fan ,
Guanzhi Wang* ,
Yunfan Jiang* ,
Ajay Mandlekar ,
Yuncong Yang ,
Haoyi Zhu ,
Andrew Tang ,
Yuke Zhu ,
Animashree Anandkumar
[paper]  
[project page]  
[code]  
We introduce MineDojo, a new framework based on the popular Minecraft game for building generally capable, open-ended embodied agents.
SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies
Linxi Fan ,
Guanzhi Wang ,
De-An Huang ,
Zhiding Yu ,
Li Fei-Fei ,
Yuke Zhu ,
Animashree Anandkumar
International Conference on Machine Learning (ICML) , 2021
[paper]  
[project page]  
[code]  
We proposed SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to decouple robust representation learning from policy optimization.
iGibson 1.0: a Simulation Environment for Interactive Tasks in Large Realistic Scenes
Bokui Shen* ,
Fei Xia* ,
Chengshu Li* ,
Roberto Martín-Martín* ,
Linxi Fan ,
Guanzhi Wang ,
Claudia D’Arpino ,
Shyamal Buch ,
Sanjana Srivastava ,
Lyne P. Tchapmi ,
Micael E. Tchapmi ,
Kent Vainio ,
Josiah Wong ,
Li Fei-Fei ,
Silvio Savarese
International Conference on Intelligent Robots and Systems (IROS) , 2021
[paper]  
[project page]  
[code]  
We presented iGibson, a novel simulation environment for developing interactive robotic agents in large-scale realistic scenes.
Deep Video Matting via Spatio-Temporal Alignment and Aggregation
Yanan Sun ,
Guanzhi Wang* ,
Qiao Gu* ,
Chi-Keung Tang ,
Yu-Wing Tai
Conference on Computer Vision and Pattern Recognition (CVPR) , 2021
[paper]  
[code]  
[dataset]
We proposed a deep learning-based video matting framework which employs a novel and effective spatio-temporal feature aggregation module.
RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition
Linxi Fan* ,
Shyamal Buch* ,
Guanzhi Wang ,
Ryan Cao ,
Yuke Zhu ,
Juan Carlos Niebles ,
Li Fei-Fei
European Conference on Computer Vision (ECCV) , 2020
[paper]  
[project page]  
[video]  
[supplementary]  
[code]  
We proposed RubiksNet, a new efficient architecture for video action recognition based on a proposed learnable 3D spatiotemporal shift operation (RubiksShift).
LADN: Local Adversarial Disentangling Network for Facial Makeup and De-Makeup
Qiao Gu* ,
Guanzhi Wang* ,
Mang Tik Chiu ,
Yu-Wing Tai ,
Chi-Keung Tang
International Conference on Computer Vision (ICCV) , 2019
[paper]  
[project page]  
[code]  
[dataset]
We proposed a local adversarial disentangling network for facial makeup and de-makeup, using multiple and overlapping local discriminators in a content-style disentangling network.
Conference Reviewer: ICLR 2022, ICCV 2021, CVPR 2021, ECCV 2020
Awards
Kortschak Scholar (2021)
Stanford Human-Centered AI Google Cloud Credits Grant (2021)
Stanford Human-Centered AI AWS Cloud Credits Award (2020)
HKUST Academic Achievement Medal (2019)
Talent Development Scholarship (2019)
Reaching Out Award (2018)
High Fashion Charitable Foundation Exchange Scholarship (2018)
Overseas Learning Experience Scholarship (2018)
Dean’s List (2015-2019)
University Recruitment Scholarship (2015-2019)