I am a postdoctoral scholar in Prof. Song-Chun Zhu’s VCLA lab at UCLA.

My research builds interactive AI by integrating high-level common sense ( functionality, affordance, physics, causality ) with raw sensory inputs (pixels and haptic signals) to enable richer representation and abstract reasoning on objects, scenes, shapes, numbers, and agents.

My work is currently supported by ONR MURI on Scene Understanding, DARPA XAI, and ONR Cognitive Systems for Human-Machine Teaming. I am a co-organizer of Vision Meets Cognition (FPIC) workshops, 3D Scene Understanding for Vision, Graphics, and Robotics workshops, and Virtual Reality Meets Physical Reality workshops.

We are looking for highly motivated students with exceptional programming skills and solid math background to work on 3D computer vision, abstract reasoning, physics-based simulation, and robotics. If you are a UCLA student and interested in working with me, please read some papers from the reading list before sending me an email. For a partial list of ongoing projects in the lab, see VCLA Project Bulletin.


  • Computer Vision
  • Artificial Intelligence
  • Human–Robot Interaction


  • PhD in Statistics, 2018


  • MS in Computer Science, 2013


  • BEng in Software Engineering, 2012

    Xi'an Jiaotong University


[ECCV20] LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities

Understanding and interpreting human actions is a long-standing challenge and a critical indicator of perception in artificial …

[IROS20] Human-Robot Interaction in a Shared Augmented Reality Workspace

We design and develop a new shared Augmented Reality (AR) workspace for Human-Robot Interaction (HRI), which establishes a …

[IROS20] Graph-based Hierarchical Knowledge Representation for Robot Task Transfer from Virtual to Physical World

We study the hierarchical knowledge transfer problem using a cloth-folding task, wherein the agent is first given a set of human …

[Engineering20] Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense

Recent progress in deep learning is essentially based on a “big data for small tasks” paradigm, under which massive amounts …

[SIGGRAPH20] A Massively Parallel and Scalable Multi-GPU Material Point Method

Harnessing the power of modern multi-GPU architectures, we present a massively parallel simulation system based on the Material Point …

[SIGGRAPH20] IQ-MPM: An Interface Quadrature Material Point Method for Non-sticky Strongly Two-Way Coupled Nonlinear Solids and Fluids

We propose a novel scheme for simulating two-way coupled interactions between nonlinear elastic solids and incompressible fluids. The …

[arXiv20] LETO: Hybrid Lagrangian-Eulerian Method for Topology Optimization

We propose LETO, a new hybrid Lagrangian-Eulerian method for topology optimization. At the heart of LETO lies in a hybrid particle-grid …

[ICRA20] Joint Inference of States, Robot Knowledge, and Human (False-)Beliefs

Aiming to understand how human (false-)belief—a core socio-cognitive ability—would affect human interactions with robots, …

[ICRA20] Congestion-aware Evacuation Routing using Augmented Reality Devices

We present a congestion-aware routing solution for indoor evacuation, which produces real-time individual-customized evacuation routes …

[ScienceRobotics19] A tale of two explanations: Enhancing human trust by explaining robot behavior

The ability to provide comprehensive explanations of chosen actions is a hallmark of intelligence. Lack of this ability impedes the …

[AAAI20] Theory-based Causal Transfer: Integrating Instance-level Induction and Abstract-level Structure Learning

Learning transferable knowledge across similar but different settings is a fundamental component of generalized intelligence. In this …

[AAAI20] Machine Number Sense: A Dataset of Visual Arithmetic Problems for Abstract and Relational Reasoning

As a comprehensive indicator of mathematical thinking and intelligence, the number sense (Dehaene 2011) bridges the induction of …

[NeurIPS19] Learning Perceptual Inference by Contrasting

‘Thinking in pictures,’ [1] i.e., spatial-temporal reasoning, effortless and instantaneous for humans, is believed to be a …

[NeurIPS19] PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

Detecting 3D objects from a single RGB image is intrinsically ambiguous, thus requiring appropriate prior knowledge and intermediate …

[ICCV19] Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense

We propose a new 3D holistic++ scene understanding problem, which jointly tackles two tasks from a single-view image: (i) holistic …



Students Mentored

  • Wenhe Zhang, Master, Computer Science, UCLA, 2020 Fall
  • Xiaojian Ma, Master, Computer Science, UCLA, 2019 Fall
  • Xiaolin Fang, Ph.D., CSAIL, MIT, 2019 Fall
  • Shu Wang, Ph.D., Statistics, UCLA, 2018 Fall
  • Wenwen Si, Master, Computer Vision, CMU, 2018 Fall
  • Hangxin Liu, Ph.D., Computer Science, UCLA, 2018 Spring
  • Jenny Lin, Ph.D., Computer Science, CMU, 2017 Fall
  • Mark Edmonds, Ph.D., Computer Science, UCLA, 2017 Fall
  • Tian Ye, Master, Robotics, CMU, 2017 Fall
  • Feng Gao, Master, Statistics, UCLA, 2017 Fall
  • Xu Xie, Master, Statistics, UCLA, 2017 Fall
  • Xingwen Guo, Master, Computer Science, Yale, 2017 Fall
  • Chi Zhang, Master, Computer Science, UCLA, 2017 Fall
  • Jingyu Shao, Master, Statistics, UCLA, 2016 Winter
  • Yutong Zhang, Master in Computer Science, UCLA, 2015 Fall