Publications | Yixin Zhu

[NatureMachineIntelligence25] Embedding high-resolution touch across robotic hands enables adaptive human-like grasping

Developing robotic hands that adapt to real-world dynamics remains a fundamental challenge in robotics and machine intelligence. …

Zihang Zhao, Wanlin Li, Yuyang Li, Tengyu Liu, Boren Li, Meng Wang, Kai Du, Hangxin Liu, Yixin Zhu, Qining Wang, Kaspar Althoefer, Song-Chun Zhu

[NatureMachineIntelligence25] Embedding high-resolution touch across robotic hands enables adaptive human-like grasping

[CogSci25] Probing and Inducing Combinational Creativity in Vision-Language Models

The ability to combine existing concepts into novel ideas stands as a fundamental hallmark of human intelligence. Recent advances in …

Yongqian Peng, Yuxi Ma, Mengmeng Wang, Yuxuan Wang, Yizhou Wang, Chi Zhang, Yixin Zhu, Zilong Zheng

[CogSci25] Probing and Inducing Combinational Creativity in Vision-Language Models

[CogSci25] Word Embeddings Track Social Group Changes Across 70 Years in China

Language encodes societal beliefs about social groups through word patterns. While computational methods like word embeddings enable …

Yuxi Ma, Yongqian Peng, Yixin Zhu

[CogSci25] A simulation-heuristics dual-process model for intuitive physics

The role of mental simulation in human physical reasoning is widely acknowledged, but whether it is employed across scenarios with …

Shiqian Li, Yuxi Ma, Jiajun Yan, Bo Dai, Yujia Peng, Chi Zhang, Yixin Zhu

[CVPR25] GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill

Learning open-vocabulary physical skills for simulated agents presents a significant challenge in artificial intelligence. Current …

Jieming Cui, Tengyu Liu, Ziyu Meng, Jiale Yu, Ran Song, Wei Zhang, Yixin Zhu, Siyuan Huang

[CVPR25] GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill

[CVPR25] Dynamic Motion Blending for Versatile Motion Editing

Text-guided motion editing enables high-level semantic control and iterative modifications beyond traditional keyframe animation. …

Nan Jiang, Hongjie Li, Ziye Yuan, Zimo He, Yixin Chen, Tengyu Liu, Yixin Zhu, Siyuan Huang

[T-RO24] Tac-Man: Tactile-Informed Prior-Free Manipulation of Articulated Objects

Integrating robotics into human-centric environments such as homes, necessitates advanced manipulation skills as robotic devices will …

Zihang Zhao, Yuyang Li, Wanlin Li, Zhenghao Qi, Lecheng Ruan, Yixin Zhu, Kaspar Althoefer

[RA-L24] MiniTac: An Ultra-Compact 8 mm Vision-Based Tactile Sensor for Enhanced Palpation in Robot-Assisted Minimally Invasive Surgery

Robot-assisted minimally invasive surgery (RAMIS) provides substantial benefits over traditional open and laparoscopic methods. …

Wanlin Li, Zihang Zhao, Leiyao Cui, Weiyi Zhang, Hangxin Liu, Li-an Li, Yixin Zhu

[RA-L24] MiniTac: An Ultra-Compact 8 mm Vision-Based Tactile Sensor for Enhanced Palpation in Robot-Assisted Minimally Invasive Surgery

[NeurIPS24] PhyRecon: Physically Plausible Neural Scene Reconstruction

Neural implicit representations have gained popularity in multi-view 3D reconstruction. However, most previous work struggles to yield …

Junfeng Ni, Yixin Chen, Bohan Jing, Nan Jiang, Bin Wang, Bo Dai, Puhao Li, Yixin Zhu, Song-Chun Zhu, Siyuan Huang

[NeurIPS24] PhyRecon: Physically Plausible Neural Scene Reconstruction

[SIGGRAPHAsia24] Autonomous Character-Scene Interaction Synthesis from Text Instruction

Synthesizing human motions in 3D environments, particularly those with complex activities such as locomotion, hand-reaching, and …

Nan Jiang, Zimo He, Zi Wang, Hongjie Li, Yixin Chen, Siyuan Huang, Yixin Zhu

[SIGGRAPHAsia24] Autonomous Character-Scene Interaction Synthesis from Text Instruction

[IROS24] Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations

Autonomous robotic systems capable of learning novel manipulation tasks are poised to transform industries from manufacturing to …

Puhao Li, Tengyu Liu, Yuyang Li, Muzhi Han, Haoran Geng, Shu Wang, Yixin Zhu, Song-Chun Zhu, Siyuan Huang

[IROS24] PreAfford: Universal Affordance-Based Pre-Grasping for Diverse Objects and Environments

Robotic manipulation with two-finger grippers is challenged by objects lacking distinct graspable features. Traditional pre-grasping …

Kairui Ding, Boyuan Chen, Ruihai Wu, Yuyang Li, Zongzheng Zhang, Huan-Ang Gao, Siqi Li, Yixin Zhu, Guyue Zhou, Hao Dong, Hao Zhao

[IROS24] PreAfford: Universal Affordance-Based Pre-Grasping for Diverse Objects and Environments

[ECCV24] Zero-Shot Image Feature Consensus with Deep Functional Maps

Correspondences emerge from large-scale vision models trained for generative and discriminative tasks. This has been revealed and …

Xinle Cheng, Congyue Deng, Adam Harley, Yixin Zhu, Leonidas Guibas

[CogSci24] Evaluating and Modeling Social Intelligence: A Comparative Study of Human and AI Capabilities

Facing the current debate on whether Large Language Models (LLMs) attain near-human intelligence levels (Mitchell & Krakauer, 2023; …

Junqi Wang, Chunhui Zhang, Jiapeng Li, Yuxi Ma, Lixing Niu, Jiaheng Han, Yujia Peng, Yixin Zhu, Lifeng Fan

[CogSci24] Evaluating and Modeling Social Intelligence: A Comparative Study of Human and AI Capabilities

[ScienceAdvances24] Human-level few-shot concept induction through minimax entropy learning

Humans learn concepts both from labeled supervision and by unsupervised observation of patterns, a process machines are being taught to …

Chi Zhang, Baoxiong Jia, Yixin Zhu, Song-Chun Zhu

[ScienceAdvances24] Human-level few-shot concept induction through minimax entropy learning

[CVPR24] Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

Despite significant advancements in text-to-motion synthesis, generating language-guided human motion within 3D environments poses …

Zan Wang, Yixin Chen, Baoxiong Jia, Puhao Li, Jinlu Zhang, Jingze Zhang, Tengyu Liu, Yixin Zhu, Wei Liang, Siyuan Huang

[CVPR24] Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

[CVPR24] AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents

Traditional approaches in physics-based motion generation, centered around imitation learning and reward shaping, often struggle to …

Jieming Cui, Tengyu Liu, Nian Liu, Yaodong Yang, Yixin Zhu, Siyuan Huang

[CVPR24] AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents

[CVPR24] Scaling Up Dynamic Human-Scene Interaction Modeling

Confronting the challenges of data scarcity and advanced motion synthesis in human-scene interaction modeling, we introduce the TRUMANS …

Nan Jiang, Zhiyuan Zhang, Hongjie Li, Xiaoxuan Ma, Zan Wang, Yixin Chen, Tengyu Liu, Yixin Zhu, Siyuan Huang

[CVPR24] Scaling Up Dynamic Human-Scene Interaction Modeling

[RA-L24] Grasp Multiple Objects with One Hand

The intricate kinematics of the human hand enable simultaneous grasping and manipulation of multiple objects, essential for tasks such …

Yuyang Li, Bo Liu, Yiran Geng, Puhao Li, Yaodong Yang, Yixin Zhu, Tengyu Liu, Siyuan Huang

[3DV24] Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture

Reconstructing detailed 3D scenes from single-view images remains a challenging task due to limitations in existing approaches, which …

Yixin Chen, Junfeng Ni, Nan Jiang, Yaowei Zhang, Yixin Zhu, Siyuan Huang

[ICLR24] I-PHYRE: Interactive Physical Reasoning

Current evaluation protocols predominantly assess physical reasoning in stationary scenes, creating a gap in evaluating agents’ …

Shiqian Li, Kewen Wu, Chi Zhang, Yixin Zhu

[ICLR24] SparseDFF: Sparse-View Feature Distillation for One-Shot Dexterous Manipulation

Humans demonstrate remarkable skill in transferring manipulation abilities across objects of varying shapes, poses, and appearances, a …

Qianxu Wang, Haotong Zhang, Congyue Deng, Yang You, Hao Dong, Yixin Zhu, Leonidas Guibas

[ICLR24] Neural-Symbolic Recursive Machine for Systematic Generalization

Current learning models often struggle with human-like systematic generalization, particularly in learning compositional rules from …

Qing Li, Yixin Zhu, Yitao Liang, Ying Nian Wu, Song-Chun Zhu, Siyuan Huang

[NeurIPS23] ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab

The challenge of replicating research results has posed a significant impediment to the field of molecular biology. The advent of …

Jieming Cui, Ziren Gong, Baoxiong Jia, Siyuan Huang, Zilong Zheng, Jianzhu Ma, Yixin Zhu

[NeurIPS23] Active Reasoning in an Open-World Environment

Recent advances in vision-language learning have achieved notable success on complete-information question-answering datasets through …

Manjie Xu, Guangyuan Jiang, Wei Liang, Chi Zhang, Yixin Zhu

[NeurIPS23] Active Reasoning in an Open-World Environment

[NeurIPS23] ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors

Understanding the behavior of non-human primates is crucial for improving animal welfare, modeling social behavior, and gaining …

Xiaoxuan Ma, Stephan Kaufhold, Jiajun Su, Wentao Zhu, Jack Terwilliger, Andres Meza, Yixin Zhu, Federico Rossano, Yizhou Wang

[NeurIPS23] ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors

[NeurIPS23] Evaluating and Inducing Personality in Pre-trained Language Models

Standardized and quantified evaluation of machine behaviors is a crux of understanding LLMs. In this study, we draw inspiration from …

Guangyuan Jiang, Manjie Xu, Song-Chun Zhu, Wenjuan Han, Chi Zhang, Yixin Zhu

[NeurIPS23] Evaluating and Inducing Personality in Pre-trained Language Models

[NeurIPS23] Interactive Visual Reasoning under Uncertainty

One of the fundamental cognitive abilities of humans is to quickly resolve uncertainty by generating hypotheses and testing them via …

Manjie Xu, Guangyuan Jiang, Wei Liang, Chi Zhang, Yixin Zhu

[NeurIPS23] Interactive Visual Reasoning under Uncertainty

[ICCV23] Full-Body Articulated Human-Object Interaction

Fine-grained capture of 3D Human-Object Interactions (HOIs) enhances human activity comprehension and supports various downstream …

Nan Jiang, Tengyu Liu, Zhexuan Cao, Jieming Cui, Yixin Chen, He Wang, Yixin Zhu, Siyuan Huang

[ICCV23] X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events

Intuitive physics is pivotal for human understanding of the physical world, enabling prediction and interpretation of events even in …

Bo Dai, Linge Wang, Baoxiong Jia, Zeyu Zhang, Song-Chun Zhu, Chi Zhang, Yixin Zhu

[ICCV23] X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events

[IROS23] Learning a Causal Transition Model for Object Cutting

Cutting objects into desired fragments is challenging for robots due to the spatially unstructured nature of fragments and the complex …

Zeyu Zhang, Muzhi Han, Baoxiong Jia, Ziyuan Jiao, Yixin Zhu, Song-Chun Zhu, Hangxin Liu

[IROS23] Learning a Causal Transition Model for Object Cutting

[IROS23] Part-level Scene Reconstruction Affords Robot Interaction

Existing methods for reconstructing interactive scenes primarily focus on replacing reconstructed objects with CAD models retrieved …

Zeyu Zhang, Lexing Zhang, Zaijin Wang, Ziyuan Jiao, Muzhi Han, Yixin Zhu, Song-Chun Zhu, Hangxin Liu

[IROS23] Part-level Scene Reconstruction Affords Robot Interaction

[IROS23] Sequential Manipulation Planning for Over-actuated Unmanned Aerial Manipulators

We investigate the sequential manipulation planning problem for unmanned aerial manipulators (UAMs). Unlike prior work that primarily …

Yao Su, Jiarui Li, Ziyuan Jiao, Meng Wang, Chi Chu, Hang Li, Yixin Zhu, Hangxin Liu

[IROS23] Sequential Manipulation Planning for Over-actuated Unmanned Aerial Manipulators

[ACL23Demo] PersLEARN: Research Training through the Lens of Perspective Cultivation

Scientific research is inherently shaped by its authors’ perspectives, influenced by various factors such as their personality, …

Yu-Zhe Shi, Shiqian Li, Xinyi Niu, Qiao Xu, Jiawen Liu, Yifan Xu, Shiyu Gu, Bingru He, Xinyang Li, Xinyu Zhao, Zijian Zhao, Yidong Lyu, Zhen Li, Sijia Liu, Lin Qiu, Jinhao Ji, Lecheng Ruan, Yuxi Ma, Wenjuan Han, Yixin Zhu

[ACL23Demo] PersLEARN: Research Training through the Lens of Perspective Cultivation

[ICML23] MEWL: Few-shot multimodal word learning with referential uncertainty

Without explicit feedback, humans can rapidly learn the meaning of words. Children can acquire a new word after just a few passive …

Guangyuan Jiang, Manjie Xu, Shiji Xin, Wei Liang, Yujia Peng, Chi Zhang, Yixin Zhu

[ICML23] MEWL: Few-shot multimodal word learning with referential uncertainty

[ICML23] On the Complexity of Bayesian Generalization

We examine concept generalization at a large scale in the natural visual spectrum. Established computational modes (i.e., rule-based or …

Yu-Zhe Shi, Manjie Xu, John E. Hopcroft, Kun He, Joshua B. Tenenbaum, Song-Chun Zhu, Ying Nian Wu, Wenjuan Han, Yixin Zhu

[ICML23] On the Complexity of Bayesian Generalization

[AIR22] Artificial Social Intelligence: A Comparative and Holistic View

In addition to a physical comprehension of the world, humans possess a high social intelligence–the intelligence that senses …

Lifeng Fan, Manjie Xu, Zhihao Cao, Yixin Zhu, Song-Chun Zhu

[CVPR23] Diffusion-based Generation, Optimization, and Planning in 3D Scenes

We introduce SceneDiffuser, a conditional generative model for 3D scene understanding. SceneDiffuser provides a unified model for …

Siyuan Huang, Zan Wang, Puhao Li, Baoxiong Jia, Tengyu Liu, Yixin Zhu, Wei Liang, Song-Chun Zhu

[ICLR23] Understanding Embodied Reference with Touch-Line Transformer

We study embodied reference understanding, the task of locating referents using embodied gestural signals and language references. …

Yang Li, Xiaoxue Chen, Hao Zhao, Jiangtao Gong, Guyue Zhou, Federico Rossano, Yixin Zhu

[ICLR23] Understanding Embodied Reference with Touch-Line Transformer

[ICLR23] A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics

Inspired by humans’ exceptional ability to master arithmetic and generalize to new problems, we present a new dataset, …

Qing Li, Siyuan Huang, Yining Hong, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

[ICLR23] A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics

[ICRA23] GenDexGrasp: Generalizable Dexterous Grasping

Generating dexterous grasping has been a long-standing and challenging robotic task. Despite recent progress, existing methods …

Puhao Li, Tengyu Liu, Yuyang Li, Yiran Geng, Yixin Zhu, Yaodong Yang, Siyuan Huang

[ICRA23] Rearrange Indoor Scenes for Human-Robot Co-Activity

We present an optimization-based framework for rearranging indoor furniture to accommodate human-robot co-activities better. The …

Weiqi Wang, Zihang Zhao, Ziyuan Jiao, Yixin Zhu, Song-Chun Zhu, Hangxin Liu

[ICRA23] Rearrange Indoor Scenes for Human-Robot Co-Activity

[Engineering23] A Reconfigurable Data Glove for Reconstructing Physical and Virtual Grasps

In this work, we present a reconfigurable data glove design to capture different modes of human hand-object interactions, which are …

Hangxin Liu, Zeyu Zhang, Ziyuan Jiao, Zhenliang Zhang, Minchen Li, Chenfanfu Jiang, Yixin Zhu, Song-Chun Zhu

[Engineering23] A Reconfigurable Data Glove for Reconstructing Physical and Virtual Grasps

[NeurIPS22] On the Learning Mechanisms in Physical Reasoning

Is dynamics prediction indispensable for physical reasoning? If so, what kind of roles do the dynamics prediction modules play during …

Shiqian Li, Kewen Wu, Chi Zhang, Yixin Zhu

[NeurIPS22] Emergent Graphical Conventions in a Visual Communication Game

Humans communicate with graphical sketches apart from symbolic languages (Fay et al., 2014). Primarily focusing on the latter, recent …

Shuwen Qiu, Sirui Xie, Lifeng Fan, Tao Gao, Jungseock Joo, Song-Chun Zhu, Yixin Zhu

[NeurIPS22] HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes

Learning to generate diverse scene-aware and goal-oriented human motions in 3D scenes remains challenging due to the mediocre …

Zan Wang, Yixin Chen, Tengyu Liu, Yixin Zhu, Wei Liang, Siyuan Huang

[NeurIPS22] HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes

[IJCV22] Scene Reconstruction with Functional Objects for Robot Autonomy

In this paper, we rethink the problem of scene reconstruction from an embodied agent’s perspective: While the classic view focuses on …

Muzhi Han, Zeyu Zhang, Ziyuan Jiao, Xu Xie, Yixin Zhu, Song-Chun Zhu, Hangxin Liu

[IJCV22] Scene Reconstruction with Functional Objects for Robot Autonomy

[ScienceRobotics22] In situ bidirectional human-robot value alignment

A prerequisite for social coordination is bidirectional communication between teammates, each playing two roles simultaneously: as …

Luyao Yuan, Xiaofeng Gao, Zilong Zheng, Mark Edmonds, Ying Nian Wu, Federico Rossano, Hongjing Lu, Yixin Zhu, Song-Chun Zhu

[ScienceRobotics22] In situ bidirectional human-robot value alignment

[ECCV22] Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning

Is intelligence realized by connectionist or classicist? While connectionist approaches have achieved superhuman performance, there has …

Chi Zhang, Sirui Xie, Baoxiong Jia, Ying Nian Wu, Song-Chun Zhu, Yixin Zhu

[ECCV22] Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning

[ECCV22Workshop] PartAfford: Part-level Affordance Discovery from 3D Objects

Understanding what objects could furnish for humans-namely, learning object affordance-is the crux to bridge perception and action. In …

Chao Xu, Yixin Chen, He Wang, Song-Chun Zhu, Yixin Zhu, Siyuan Huang

[IROS22] Sequential Manipulation Planning on Scene Graph

We devise a 3D scene graph representation, contact graph+ (cg+), for efficient sequential manipulation planning. Augmented with …

Ziyuan Jiao, Yida Niu, Zeyu Zhang, Song-Chun Zhu, Yixin Zhu, Hangxin Liu

[IROS22] Sequential Manipulation Planning on Scene Graph

[IROS22] Downwash-aware Control Allocation for Over-actuated UAV Platforms

Tracking position and orientation independently affords more agile maneuver for over-actuated multirotor Unmanned Aerial Vehicles …

Yao Su, Chi Chu, Meng Wang, Jiarui Li, Liu Yang, Yixin Zhu, Hangxin Liu

[IROS22] Downwash-aware Control Allocation for Over-actuated UAV Platforms

[RA-L/IROS22] Understanding Physical Effects for Effective Tool-use

We present a robot learning and planning framework that produces an effective tool-use strategy with the least joint efforts, capable …

Zeyu Zhang, Ziyuan Jiao, Weiqi Wang, Yixin Zhu, Song-Chun Zhu, Hangxin Liu

[RA-L/IROS22] Understanding Physical Effects for Effective Tool-use

[ICML22] Latent Diffusion Energy-Based Model for Interpretable Text Modeling

Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in generative modeling. Fueled …

Peiyu Yu, Sirui Xie, Xiaojian Ma, Baoxiong Jia, Bo Pang, Ruiqi Gao, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

[ICML22] Latent Diffusion Energy-Based Model for Interpretable Text Modeling

[JCB2022] Sharing Rewards Undermines Coordinated Hunting

Coordinated hunting is widely observed in animals, and sharing rewards is often considered a major incentive for its success. While …

Minglu Zhao, Ning Tang, Annya Dahmani, Yixin Zhu, Federico Rossano, Tao Gao

[JCB2022] Sharing Rewards Undermines Coordinated Hunting

[CogSci22] What Is the Point? A Theory of Mind Model of Relevance

Although pointing is sparse, overloaded, and indirect, it allows humans to effectively decode shared information, (ex)change their …

Kaiwen Jiang, Annya Dahmani, Stephanie Stacy, Boxuan Jiang, Federico Rossano, Yixin Zhu, Tao Gao

[RA-L/ICRA22] Object Gathering with a Tethered Robot Duo

We devise a cooperative planning framework to generate optimal trajectories for a tethered robot duo, who is tasked to gather scattered …

Yao Su, Yuhong Jiang, Yixin Zhu, Hangxin Liu

[AAIL21] Patching interpretable And-Or-Graph knowledge representation using augmented reality

We present a novel augmented reality (AR) interface to provide effective means to diagnose a robot’s erroneous behaviors, endow …

Hangxin Liu, Yixin Zhu, Song-Chun Zhu

[AAIL21] Patching interpretable And-Or-Graph knowledge representation using augmented reality

[NeurIPS21] Unsupervised Foreground Extraction via Deep Region Competition

We present Deep Region Competition (DRC), an algorithm designed to extract foreground objects from images in a fully unsupervised …

Peiyu Yu, Sirui Xie, Xiaojian Ma, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

[RA-L21] Synthesizing Diverse and Physically Stable Grasps with Arbitrary Hand Structures using Differentiable Force Closure Estimator

Existing grasp synthesis methods are either analytical or data-driven. The former one is oftentimes limited to specific application …

Tengyu Liu, Zeyu Liu, Ziyuan Jiao, Yixin Zhu, Song-Chun Zhu

[RA-L21] Synthesizing Diverse and Physically Stable Grasps with Arbitrary Hand Structures using Differentiable Force Closure Estimator

[ICCV21] YouRefIt: Embodied Reference Understanding with Language and Gesture

We study the machine’s understanding of embodied reference: One agent uses both language and gesture to refer to an object to …

Yixin Chen, Qing Li, Deqian Kong, Yik Lun Kei, Song-Chun Zhu, Tao Gao, Yixin Zhu, Siyuan Huang

[ICCV21] YouRefIt: Embodied Reference Understanding with Language and Gesture

[ICCV21] Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds

To date, various 3D scene understanding tasks still lack practical and generalizable pre-trained models, primarily due to the intricate …

Siyuan Huang, Yichen Xie, Song-Chun Zhu, Yixin Zhu

[ICCV21] Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds

[IROS21] Consolidating Kinematic Models to Promote Coordinated Mobile Manipulations

We construct a Virtual Kinematic Chain (VKC) that readily consolidates the kinematics of the mobile base, the arm, and the object to be …

Ziyuan Jiao, Zeyu Zhang, Xin Jiang, David Han, Song-Chun Zhu, Yixin Zhu, Hangxin Liu

[IROS21] Consolidating Kinematic Models to Promote Coordinated Mobile Manipulations

[IROS21] Efficient Task Planning for Mobile Manipulation: a Virtual Kinematic Chain Perspective

We present a Virtual Kinematic Chain (VKC) perspective, a simple yet effective method, to improve task planning efficacy for mobile …

Ziyuan Jiao, Zeyu Zhang, Weiqi Wang, David Han, Song-Chun Zhu, Yixin Zhu, Hangxin Liu

[IROS21] Efficient Task Planning for Mobile Manipulation: a Virtual Kinematic Chain Perspective

[IROS21] Communicative Learning with Natural Gestures for Embodied Navigation Agents with Human-in-the-Scene

Human-robot collaboration is an essential research topic in artificial intelligence (AI), enabling researchers to devise cognitive AI …

Qi Wu, Cheng-Ju Wu, Yixin Zhu, Jungseock Joo

[IROS21] Communicative Learning with Natural Gestures for Embodied Navigation Agents with Human-in-the-Scene

[CogSci21] Individual vs. Joint Perception: a Pragmatic Model of Pointing as Communicative Smithian Helping

The simple gesture of pointing can greatly augment one’s ability to comprehend states of the world based on observations. It …

Kaiwen Jiang, Stephanie Stacy, Chuyu Wei, Adelpha Chan, Federico Rossano, Yixin Zhu, Tao Gao

[CogSci21] Individual vs. Joint Perception: a Pragmatic Model of Pointing as Communicative Smithian Helping

[ICLR21Workshop] HALMA: Humanlike Abstraction Learning Meets Affordance in Rapid Problem Solving

Humans learn compositional and causal abstraction, i.e., knowledge, in response to the structure of naturalistic tasks. When presented …

Sirui Xie, Xiaojian Ma, Peiyu Yu, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

[ICLR21Workshop] HALMA: Humanlike Abstraction Learning Meets Affordance in Rapid Problem Solving

[ACL-Findings21] GRICE: A Grammar-based Dataset for Recovering Implicature and Conversational rEasoning

Understanding what we genuinely mean instead of what we literally say in conversations is challenging for both humans and machines; …

Zilong Zheng, Shuwen Qiu, Lifeng Fan, Yixin Zhu, Song-Chun Zhu

[ACL-Findings21] GRICE: A Grammar-based Dataset for Recovering Implicature and Conversational rEasoning

[CVPR21] Learning Triadic Belief Dynamics in Nonverbal Communication from Videos

Humans possess a unique social cognition capability; nonverbal communication can convey rich social information among agents. In …

Lifeng Fan, Shuwen Qiu, Zilong Zheng, Tao Gao, Song-Chun Zhu, Yixin Zhu

[CVPR21] Learning Triadic Belief Dynamics in Nonverbal Communication from Videos

[CVPR21] ACRE: Abstract Causal Reasoning Beyond Covariation

Causal induction, i.e., identifying unobservable mechanisms that lead to the observable relations among variables, has played a pivotal …

Chi Zhang, Baoxiong Jia, Mark Edmonds, Song-Chun Zhu, Yixin Zhu

[CVPR21] Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution

Spatial-temporal reasoning is a challenging task in Artificial Intelligence (AI) due to its demanding but unique nature: a theoretic …

Chi Zhang, Baoxiong Jia, Song-Chun Zhu, Yixin Zhu

[CVPR21] Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution

[ICRA21] Reconstructing Interactive 3D Scene by Panoptic Mapping and CAD Model Alignments

In this paper, we rethink the problem of scene reconstruction from an embodied agent’s perspective: While the classic view …

Muzhi Han, Zeyu Zhang, Ziyuan Jiao, Xu Xie, Yixin Zhu, Song-Chun Zhu, Hangxin Liu

[ICRA21] Reconstructing Interactive 3D Scene by Panoptic Mapping and CAD Model Alignments

[ICRA21] Congestion-aware Multi-agent Trajectory Prediction for Collision Avoidance

Predicting agents’ future trajectories plays a crucial role in modern AI systems, yet it is challenging due to intricate …

Xu Xie, Chi Zhang, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

[IJNME21] Lagrangian‐Eulerian Multi‐Density Topology Optimization with the Material Point Method

In this paper, a hybrid Lagrangian‐Eulerian topology optimization (LETO) method is proposed to solve the elastic force equilibrium with …

Yue Li, Xuan Li, Minchen Li, Yixin Zhu, Bo Zhu, Chenfanfu Jiang

[IJNME21] Lagrangian‐Eulerian Multi‐Density Topology Optimization with the Material Point Method

[ECCV20] LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities

Understanding and interpreting human actions is a long-standing challenge and a critical indicator of perception in artificial …

Baoxiong Jia, Yixin Chen, Siyuan Huang, Yixin Zhu, Song-Chun Zhu

[ECCV20] LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities

[IROS20] Human-Robot Interaction in a Shared Augmented Reality Workspace

We design and develop a new shared Augmented Reality (AR) workspace for Human-Robot Interaction (HRI), which establishes a …

Shuwen Qiu, Hangxin Liu, Zeyu Zhang, Yixin Zhu, Song-Chun Zhu

[IROS20] Human-Robot Interaction in a Shared Augmented Reality Workspace

[IROS20] Graph-based Hierarchical Knowledge Representation for Robot Task Transfer from Virtual to Physical World

We study the hierarchical knowledge transfer problem using a cloth-folding task, wherein the agent is first given a set of human …

Zhenliang Zhang, Yixin Zhu, Song-Chun Zhu

[IROS20] Graph-based Hierarchical Knowledge Representation for Robot Task Transfer from Virtual to Physical World

[Engineering20] Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense

Recent progress in deep learning is essentially based on a “big data for small tasks” paradigm, under which massive amounts …

Yixin Zhu, Tao Gao, Lifeng Fan, Siyuan Huang, Mark Edmonds, Hangxin Liu, Feng Gao, Chi Zhang, Siyuan Qi, Ying Nian Wu, Josh Tenenbaum, Song-Chun Zhu

[Engineering20] Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense

[SIGGRAPH20] A Massively Parallel and Scalable Multi-GPU Material Point Method

Harnessing the power of modern multi-GPU architectures, we present a massively parallel simulation system based on the Material Point …

Xinlei Wang, Yuxing Qiu, Stuart Slattery, Yu Fang, Minchen Li, Song-Chun Zhu, Yixin Zhu, Min Tang, Dinesh Manocha, Chenfanfu Jiang

[SIGGRAPH20] A Massively Parallel and Scalable Multi-GPU Material Point Method

[SIGGRAPH20] IQ-MPM: An Interface Quadrature Material Point Method for Non-sticky Strongly Two-Way Coupled Nonlinear Solids and Fluids

We propose a novel scheme for simulating two-way coupled interactions between nonlinear elastic solids and incompressible fluids. The …

Yu Fang, Ziyin Qu, Minchen Li, Xinxin Zhang, Yixin Zhu, Mridul Aanjaneya, Chenfanfu Jiang

[SIGGRAPH20] IQ-MPM: An Interface Quadrature Material Point Method for Non-sticky Strongly Two-Way Coupled Nonlinear Solids and Fluids

[ICRA20] Joint Inference of States, Robot Knowledge, and Human (False-)Beliefs

Aiming to understand how human (false-)belief—a core socio-cognitive ability—would affect human interactions with robots, …

Tao Yuan, Hangxin Liu, Lifeng Fan, Zilong Zheng, Tao Gao, Yixin Zhu, Song-Chun Zhu

[ICRA20] Joint Inference of States, Robot Knowledge, and Human (False-)Beliefs

[ICRA20] Congestion-aware Evacuation Routing using Augmented Reality Devices

We present a congestion-aware routing solution for indoor evacuation, which produces real-time individual-customized evacuation routes …

Zeyu Zhang, Hangxin Liu, Ziyuan Jiao, Yixin Zhu, Song-Chun Zhu

[ICRA20] Congestion-aware Evacuation Routing using Augmented Reality Devices

[ScienceRobotics19] A tale of two explanations: Enhancing human trust by explaining robot behavior

The ability to provide comprehensive explanations of chosen actions is a hallmark of intelligence. Lack of this ability impedes the …

Mark Edmonds, Feng Gao, Hangxin Liu, Xu Xie, Siyuan Qi, Brandon Rothrock, Yixin Zhu, Ying Nian Wu, Hongjing Lu, Song-Chun Zhu

[ScienceRobotics19] A tale of two explanations: Enhancing human trust by explaining robot behavior

[AAAI20] Theory-based Causal Transfer: Integrating Instance-level Induction and Abstract-level Structure Learning

Learning transferable knowledge across similar but different settings is a fundamental component of generalized intelligence. In this …

Mark Edmonds, Xiaojian Ma, Siyuan Qi, Yixin Zhu, Hongjing Lu, Song-Chun Zhu

[AAAI20] Theory-based Causal Transfer: Integrating Instance-level Induction and Abstract-level Structure Learning

[AAAI20] Machine Number Sense: A Dataset of Visual Arithmetic Problems for Abstract and Relational Reasoning

As a comprehensive indicator of mathematical thinking and intelligence, the number sense (Dehaene 2011) bridges the induction of …

Wenhe Zhang, Chi Zhang, Yixin Zhu, Song-Chun Zhu

[AAAI20] Machine Number Sense: A Dataset of Visual Arithmetic Problems for Abstract and Relational Reasoning

[NeurIPS19] PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

Detecting 3D objects from a single RGB image is intrinsically ambiguous, thus requiring appropriate prior knowledge and intermediate …

Siyuan Huang, Yixin Chen, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu

[NeurIPS19] PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

[NeurIPS19] Learning Perceptual Inference by Contrasting

‘Thinking in pictures,’ [1] i.e., spatial-temporal reasoning, effortless and instantaneous for humans, is believed to be a …

Chi Zhang, Baoxiong Jia, Feng Gao, Yixin Zhu, Hongjing Lu, Song-Chun Zhu

[ICCV19] Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense

We propose a new 3D holistic++ scene understanding problem, which jointly tackles two tasks from a single-view image: (i) holistic …

Yixin Chen, Siyuan Huang, Tao Yuan, Yixin Zhu, Siyuan Qi, Song-Chun Zhu

[ICCV19] Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense

[IROS19] Learning Virtual Grasp with Failed Demonstrations via Bayesian Inverse Reinforcement Learning

We propose Bayesian Inverse Reinforcement Learning with Failure (BIRLF), which makes use of failed demonstrations that were often …

Xu Xie, Changyang Li, Chi Zhang, Yixin Zhu, Song-Chun Zhu

[IROS19] Learning Virtual Grasp with Failed Demonstrations via Bayesian Inverse Reinforcement Learning

[CogSci19] Decomposing Human Causal Learning: Bottom-up Associative Learning and Top-down Schema Reasoning

Transfer learning is fundamental for intelligence; agents expected to operate in novel and unfamiliar environments must be able to …

Mark Edmonds, Siyuan Qi, Yixin Zhu, James Kubricht, Song-Chun Zhu, Hongjing Lu

[CogSci19] Decomposing Human Causal Learning: Bottom-up Associative Learning and Top-down Schema Reasoning

[TURC19] VRGym: A Virtual Testbed for Physical and Interactive AI

We propose VRGym, a virtual reality testbed for realistic human-robot interaction. Different from existing toolkits and virtual reality …

Xu Xie, Hangxin Liu, Zhenliang Zhang, Yuxing Qiu, Feng Gao, Siyuan Qi, Yixin Zhu, Song-Chun Zhu

[TURC19] VRGym: A Virtual Testbed for Physical and Interactive AI

[CVPR19] RAVEN: A Dataset for Relational and Analogical Visual Reasoning

Dramatic progress has been witnessed in basic vision tasks involving low-level perception, such as object recognition, detection, and …

Chi Zhang, Feng Gao, Baoxiong Jia, Yixin Zhu, Song-Chun Zhu

[CVPR19] RAVEN: A Dataset for Relational and Analogical Visual Reasoning

[ICRA19] Self-Supervised Incremental Learning for Sound Source Localization in Complex Indoor Environment

This paper presents an incremental learning framework for mobile robots localizing the human sound source using a microphone array in a …

Hangxin Liu, Zeyu Zhang, Yixin Zhu, Song-Chun Zhu

[ICRA19] Self-Supervised Incremental Learning for Sound Source Localization in Complex Indoor Environment

[ICRA19] High-Fidelity Grasping in Virtual Reality using a Glove-based System

This paper presents a design that jointly provides hand pose sensing, hand localization, and haptic feedback to facilitate real-time …

Hangxin Liu, Zhenliang Zhang, Xu Xie, Yixin Zhu, Yue Liu, Yongtian Wang, Song-Chun Zhu

[ICRA19] High-Fidelity Grasping in Virtual Reality using a Glove-based System

[AAAI19] Mirroring without Overimitation: Learning Functionally Equivalent Manipulation Actions

This paper presents a mirroring approach, inspired by the neuroscience discovery of the mirror neurons, to transfer demonstrated …

Hangxin Liu, Chi Zhang, Yixin Zhu, Chenfanfu Jiang, Song-Chun Zhu

[AAAI19] Mirroring without Overimitation: Learning Functionally Equivalent Manipulation Actions

[AAAI19] MetaStyle: Three-Way Trade-Off Among Speed, Flexibility and Quality in Neural Style Transfer

An unprecedented booming has been witnessed in the research area of artistic style transfer ever since Gatys et.al. introduced the …

Chi Zhang, Yixin Zhu, Song-Chun Zhu

[AAAI19] MetaStyle: Three-Way Trade-Off Among Speed, Flexibility and Quality in Neural Style Transfer

[NeurIPS18] Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation

Holistic 3D indoor scene understanding refers to jointly recovering the i) object bounding boxes, ii) room layout, and iii) camera …

Siyuan Huang, Siyuan Qi, Yinxue Xiao, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

[NeurIPS18] Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation

[ECCV18] Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image

We propose a computational framework to jointly parse a single RGB image and reconstruct a holistic 3D configuration composed by a set …

Siyuan Huang, Siyuan Qi, Yixin Zhu, Yinxue Xiao, Yuanlu Xu, Song-Chun Zhu

[ECCV18] Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image

[IJCV18] Configurable 3D Scene Synthesis and 2D Image Rendering with Per-Pixel Ground Truth using Stochastic Grammars

We propose a systematic learning-based approach to the generation of massive quantities of synthetic 3D scenes and arbitrary numbers of …

Chenfanfu Jiang, Siyuan Qi, Yixin Zhu, Siyuan Huang, Jenny Lin, Lapfai Yu, Demetri Terzopoulos, Song-Chun Zhu

[IJCV18] Configurable 3D Scene Synthesis and 2D Image Rendering with Per-Pixel Ground Truth using Stochastic Grammars

[CogSci18] Human Causal Transfer: Challenges for Deep Reinforcement Learning

Discovery and application of causal knowledge in novel problem contexts is a prime example of human intelligence. As new information is …

Mark Edmonds, James Kubricht, Colin Summers, Yixin Zhu, Brandon Rothrock, Song-Chun Zhu, Hongjing Lu

[CogSci18] Human Causal Transfer: Challenges for Deep Reinforcement Learning

[SIGGRAPH18] A Moving Least Squares Material Point Method with Displacement Discontinuity and Two-Way Rigid Body Coupling

In this paper, we introduce the Moving Least Squares Material Point Method (MLS-MPM). MLS-MPM naturally leads to the formulation of …

Yuanming Hu, Yu Fang, Ziheng Ge, Ziyin Qu, Yixin Zhu, Andre Pradhana, Chenfanfu Jiang

[CVPR18] Human-centric Indoor Scene Synthesis Using Stochastic Grammar

We present a human-centric method to sample and synthesize 3D room layouts and 2D images thereof, for the purpose of obtaining …

Siyuan Qi, Yixin Zhu, Siyuan Huang, Chenfanfu Jiang, Song-Chun Zhu

[CVPR18] Human-centric Indoor Scene Synthesis Using Stochastic Grammar

[ICRA18] Unsupervised Learning using Hierarchical Models for Hand-Object Interactions

Contact forces of the hand are visually unobservable, but play a crucial role in understanding hand-object interactions. In this paper, …

Xu Xie, Hangxin Liu, Mark Edmonds, Feng Gao, Siyuan Qi, Yixin Zhu, Brandon Rothrock, Song-Chun Zhu

[ICRA18] Unsupervised Learning using Hierarchical Models for Hand-Object Interactions

[ICRA18] Interactive Robot Knowledge Patching using Augmented Reality

We present a novel Augmented Reality (AR) approach, through Microsoft HoloLens, to address the challenging problems of diagnosing, …

Hangxin Liu, Yaofang Zhang, Wenwen Si, Xu Xie, Yixin Zhu, Song-Chun Zhu

[ICRA18] Interactive Robot Knowledge Patching using Augmented Reality

[AAAI18] Tracking Occluded Objects and Recovering Incomplete Trajectories by Reasoning about Containment Relations and Human Actions

This paper studies a challenging problem of tracking severely occluded objects in long video sequences. The proposed method reasons …

Wei Liang, Yixin Zhu, Song-Chun Zhu

[IROS17] Feeling the Force: Integrating Force and Pose for Fluent Discovery through Imitation Learning to Open Medicine Bottles

Learning complex robot manipulation policies for real-world objects is challenging, often requiring significant tuning within …

Mark Edmonds, Feng Gao, Xu Xie, Hangxin Liu, Siyuan Qi, Yixin Zhu, Brandon Rothrock, Song-Chun Zhu

[IROS17] Feeling the Force: Integrating Force and Pose for Fluent Discovery through Imitation Learning to Open Medicine Bottles

[IROS17] A Glove-based System for Studying Hand-Object Manipulation via Joint Pose and Force Sensing

We present a design of an easy-to-replicate glove-based system that can reliably perform simultaneous hand pose and force sensing in …

Hangxin Liu, Xu Xie, Mark Edmonds, Feng Gao, Yixin Zhu, Veronica Santos, Brandon Rothrock, Song-Chun Zhu

[IROS17] A Glove-based System for Studying Hand-Object Manipulation via Joint Pose and Force Sensing

[CogSci17] Consistent Probabilistic Simulation Underlying Human Judgment in Substance Dynamics

A growing body of evidence supports the hypothesis that humans infer future states of perceived physical situations by propagating …

James Kubricht, Yixin Zhu, Chenfanfu Jiang, Demetri Terzopoulos, Song-Chun Zhu, Hongjing Lu

[CogSci17] Consistent Probabilistic Simulation Underlying Human Judgment in Substance Dynamics

[TVCG16] The Martian: Examining Human Physical Judgments Across Virtual Gravity Fields

This paper examines how humans adapt to novel physical situations with unknown gravitational acceleration in immersive virtual …

Tian Ye, Siyuan Qi, James Kubricht, Yixin Zhu, Hongjing Lu, Song-Chun Zhu

[SIGGRAPHAsia16Workshop] A Virtual Reality Platform for Dynamic Human-Scene Interaction

Both synthetic static and simulated dynamic 3D scene data is highly useful in the fields of computer vision and robot task planning. …

Jenny Lin, Xingwen Guo, Jingyu Shao, Chenfanfu Jiang, Yixin Zhu, Song-Chun Zhu

[SIGGRAPHAsia16Workshop] A Virtual Reality Platform for Dynamic Human-Scene Interaction

[IJCAI16] What is Where: Inferring Containment Relations from Videos

In this paper, we present a probabilistic approach to explicitly infer containment relations between objects in 3D scenes. Given an …

Wei Liang, Yibiao Zhao, Yixin Zhu, Song-Chun Zhu

[CVPR16] Inferring Forces and Learning Human Utilities From Videos

We propose a notion of affordance that takes into account physical quantities generated when the human body interacts with real-world …

Yixin Zhu, Chenfanfu Jiang, Yibiao Zhao, Demetri Terzopoulos, Song-Chun Zhu

[CVPR16] Inferring Forces and Learning Human Utilities From Videos

[CogSci16] Probabilistic Simulation Predicts Human Performance on Viscous Fluid-Pouring Problem

The physical behavior of moving fluids is highly complex, yet people are able to interact with them in their everyday lives with …

James Kubricht, Chenfanfu Jiang, Yixin Zhu, Song-Chun Zhu, Demetri Terzopoulos, Hongjing Lu

[CogSci16] Probabilistic Simulation Predicts Human Performance on Viscous Fluid-Pouring Problem

[CVPR15] Understanding Tools: Task-Oriented Object Modeling, Learning and Recognition

In this paper, we present a new framework for task-oriented object modeling, learning and recognition. The framework include: i) …

Yixin Zhu, Yibiao Zhao, Song-Chun Zhu

[CVPR15] Understanding Tools: Task-Oriented Object Modeling, Learning and Recognition

[CogSci15] Evaluating Human Cognition of Containing Relations with Physical Simulation

Containers are ubiquitous in daily life. By container, we consider any physical object that can contain other objects, such as bowls, …

Wei Liang, Yibiao Zhao, Yixin Zhu, Song-Chun Zhu

[CogSci15] Evaluating Human Cognition of Containing Relations with Physical Simulation

Dr. Android and Mr. Hide: Fine-grained security policies on unmodified Android

Google’s Android platform includes a permission model thatprotects access to sensitive capabilities, such as Internet ac-cess, …

Jinseong Jeon, Kristopher Micinski, Jeff Vaughan, Nikhilesh Reddy, Yixin Zhu, Jeffrey Foster, Todd Millstein

Dr. Android and Mr. Hide: Fine-grained security policies on unmodified Android