Guanzhi Wang

I am a Research Scientist at NVIDIA. My research interests lie in the area of foundation models, robotics, and embodied agents.

Email / Google Scholar / Twitter / GitHub / LinkedIn

News

[2023.10] Eureka released!
[2023.05] Voyager released!
[2023.04] VIMA accepted to ICML 2023.
[2022.11] MineDojo has won 🎉 Outstanding Paper Award 🎉 at NeurIPS 2022!
[2021.07] I have been selected as a Kortschak Scholar at Caltech.
[2021.06] iGibson 1.0 accepted to IROS 2021.
[2021.05] SECANT accepted to ICML 2021.
[2021.03] Deep Video Matting accepted to CVPR 2021.
[2020.07] RubiksNet accepted to ECCV 2020.
[2019.07] LADN accepted to ICCV 2019.

Publications * Equal contribution, † Equal advising
	Eureka: Human-Level Reward Design via Coding Large Language Models Jason Ma, William Liang, Guanzhi Wang, De-An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi "Jim" Fan^†, Anima Anandkumar^† International Conference on Learning Representations (ICLR), 2024 [paper] [project page] [code] We present Eureka, an open-ended LLM-powered agent that designs reward functions for robot dexterity at super-human level.
	Voyager: An Open-Ended Embodied Agent with Large Language Models Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi "Jim" Fan^†, Anima Anandkumar^† Transactions on Machine Learning Research (TMLR), 2024 [paper] [project page] [code] We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention.
	VIMA: General Robot Manipulation with Multimodal Prompts Yunfan Jiang, Agrim Gupta, Zichen "Charles" Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu^†, Linxi "Jim" Fan^† International Conference on Machine Learning (ICML)*, 2023 [paper] [project page] [code] We introduce a novel multimodal prompting formulation that converts diverse robot manipulation tasks into a uniform sequence modeling problem.
	MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge Linxi "Jim" Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu^†, Anima Anandkumar^† Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2022 ✨ Outstanding Paper Award ✨ [paper] [project page] [code] We introduce MineDojo, a new framework based on the popular Minecraft game for building generally capable, open-ended embodied agents.
	SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies Linxi "Jim" Fan, Guanzhi Wang, De-An Huang, Zhiding Yu, Li Fei-Fei, Yuke Zhu, Anima Anandkumar International Conference on Machine Learning (ICML), 2021 [paper] [project page] [code] We propose SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to decouple robust representation learning from policy optimization.
	iGibson 1.0: a Simulation Environment for Interactive Tasks in Large Realistic Scenes Bokui Shen, Fei Xia, Chengshu Li, Roberto Martín-Martín, Linxi "Jim" Fan, Guanzhi Wang, Claudia D’Arpino, Shyamal Buch, Sanjana Srivastava, Lyne P. Tchapmi, Micael E. Tchapmi, Kent Vainio, Josiah Wong, Li Fei-Fei, Silvio Savarese International Conference on Intelligent Robots and Systems (IROS), 2021 [paper] [project page] [code] We present iGibson, a novel simulation environment for developing interactive robotic agents in large-scale realistic scenes.
	Deep Video Matting via Spatio-Temporal Alignment and Aggregation Yanan Sun, Guanzhi Wang, Qiao Gu, Chi-Keung Tang, Yu-Wing Tai Conference on Computer Vision and Pattern Recognition (CVPR), 2021 [paper] [code] [dataset] We propose a deep learning-based video matting framework which employs a novel and effective spatio-temporal feature aggregation module.
	RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition Linxi "Jim" Fan, Shyamal Buch, Guanzhi Wang, Ryan Cao, Yuke Zhu, Juan Carlos Niebles, Li Fei-Fei European Conference on Computer Vision (ECCV), 2020 [paper] [project page] [video] [supplementary] [code] We propose RubiksNet, a new efficient architecture for video action recognition based on a proposed learnable 3D spatiotemporal shift operation (RubiksShift).
	LADN: Local Adversarial Disentangling Network for Facial Makeup and De-Makeup Qiao Gu, Guanzhi Wang, Mang Tik Chiu, Yu-Wing Tai, Chi-Keung Tang International Conference on Computer Vision (ICCV), 2019 [paper] [project page] [code] [dataset] We propose a local adversarial disentangling network for facial makeup and de-makeup, using multiple and overlapping local discriminators in a content-style disentangling network.

Teaching
	Caltech CS148: Large Language and Vision Models (Spring 2024) Teaching Assistant
	Caltech CS101: 3D Deep Learning (Winter 2024) Teaching Assistant
	Stanford CS231n: ConvNet for Visual Recognition (Spring 2021) Teaching Assistant
	Caltech CS165: Foundations of Machine Learning and Statistical Inference (Winter 2023) Stanford CS129: Applied Machine Learning (Fall 2020) Stanford CS229: Machine Learning (Spring 2020) Teaching Assistant

Academic Services

Conference Reviewer: ICML 2024, ICLR 2024, NeurIPS 2023, NeurIPS 2022, ICLR 2022, ICCV 2021, CVPR 2021, ECCV 2020

Awards

Honorable Mention for the NVIDIA Graduate Fellowship (2024)
NeurIPS Outstanding Paper Award (2022)
Kortschak Scholar (2021)
Stanford Human-Centered AI Google Cloud Credits Grant (2021)
Stanford Human-Centered AI AWS Cloud Credits Award (2020)
HKUST Academic Achievement Medal (2019)
Talent Development Scholarship (2019)
Reaching Out Award (2018)
High Fashion Charitable Foundation Exchange Scholarship (2018)
Overseas Learning Experience Scholarship (2018)
Dean’s List (2015-2019)
University Recruitment Scholarship (2015-2019)

This guy makes a nice webpage.