Guanzhi Wang

I am a second-year Ph.D. student at Caltech, advised by Prof. Anima Anandkumar.

I obtained my M.S. degree from Stanford University, where I have been fortunate to work with Prof. Fei-Fei Li, Prof. Yuke Zhu, Dr. Jim Fan and Dr. Shyamal Buch. Previously, I was a research intern at NVIDIA AI and Tencent YouTu.

My research interests lie in the area of foundation models, robotics, and policy learning. I am passionate about building embodied foundation models that are generally capable to discover and pursue complex and open-ended objectives, do a large number of tasks, and understand how the world works through massive pre-trained knowledge.

Email  /  Google Scholar  /  Twitter  /  GitHub  /  LinkedIn

profile photo

(* indicates equal contribution)

MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Linxi Fan, Guanzhi Wang*, Yunfan Jiang*, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, Anima Anandkumar
Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2022
Outstanding Paper Award
[paper]   [project page]   [code]  

We introduced MineDojo, a new framework based on the popular Minecraft game for building generally capable, open-ended embodied agents.

VIMA: General Robot Manipulation with Multimodal Prompts
Yunfan Jiang, Agrim Gupta*, Zichen "Charles" Zhang*, Guanzhi Wang*, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, Linxi "Jim" Fan
Neural Information Processing Systems (NeurIPS) Foundation Models for Decision Making Workshop, 2022 (Oral Presentation)
[paper]   [project page]   [code]  

We introduced a novel multimodal prompting formulation that converts diverse robot manipulation tasks into a uniform sequence modeling problem.

SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies
Linxi Fan, Guanzhi Wang, De-An Huang, Zhiding Yu, Li Fei-Fei, Yuke Zhu, Anima Anandkumar
International Conference on Machine Learning (ICML), 2021
[paper]   [project page]   [code]  

We proposed SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to decouple robust representation learning from policy optimization.

iGibson 1.0: a Simulation Environment for Interactive Tasks in Large Realistic Scenes
Bokui Shen*, Fei Xia*, Chengshu Li*, Roberto Martín-Martín*, Linxi Fan, Guanzhi Wang, Claudia D’Arpino, Shyamal Buch, Sanjana Srivastava, Lyne P. Tchapmi, Micael E. Tchapmi, Kent Vainio, Josiah Wong, Li Fei-Fei, Silvio Savarese
International Conference on Intelligent Robots and Systems (IROS), 2021
[paper]   [project page]   [code]  

We presented iGibson, a novel simulation environment for developing interactive robotic agents in large-scale realistic scenes.

Deep Video Matting via Spatio-Temporal Alignment and Aggregation
Yanan Sun, Guanzhi Wang*, Qiao Gu*, Chi-Keung Tang, Yu-Wing Tai
Conference on Computer Vision and Pattern Recognition (CVPR), 2021
[paper]   [code]   [dataset]

We proposed a deep learning-based video matting framework which employs a novel and effective spatio-temporal feature aggregation module.

RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition
Linxi Fan*, Shyamal Buch*, Guanzhi Wang, Ryan Cao, Yuke Zhu, Juan Carlos Niebles, Li Fei-Fei
European Conference on Computer Vision (ECCV), 2020
[paper]   [project page]   [video]   [supplementary]   [code]  

We proposed RubiksNet, a new efficient architecture for video action recognition based on a proposed learnable 3D spatiotemporal shift operation (RubiksShift).

LADN: Local Adversarial Disentangling Network for Facial Makeup and De-Makeup
Qiao Gu*, Guanzhi Wang*, Mang Tik Chiu, Yu-Wing Tai, Chi-Keung Tang
International Conference on Computer Vision (ICCV), 2019
[paper]   [project page]   [code]   [dataset]

We proposed a local adversarial disentangling network for facial makeup and de-makeup, using multiple and overlapping local discriminators in a content-style disentangling network.


CS231n: ConvNet for Visual Recognition (Spring 2021)

Teaching Assistant


CS129: Applied Machine Learning (Fall 2020)

CS229: Machine Learning (Spring 2020)

Teaching Assistant

Academic Services
Conference Reviewer: NeurIPS 2022, ICLR 2022, ICCV 2021, CVPR 2021, ECCV 2020

  • NeurIPS Outstanding Paper Award (2022)
  • Kortschak Scholar (2021)
  • Stanford Human-Centered AI Google Cloud Credits Grant (2021)
  • Stanford Human-Centered AI AWS Cloud Credits Award (2020)
  • HKUST Academic Achievement Medal (2019)
  • Talent Development Scholarship (2019)
  • Reaching Out Award (2018)
  • High Fashion Charitable Foundation Exchange Scholarship (2018)
  • Overseas Learning Experience Scholarship (2018)
  • Dean’s List (2015-2019)
  • University Recruitment Scholarship (2015-2019)

This guy makes a nice webpage.