Scene Parsing

[IJCV26] AlphaChimp: Tracking and Behavior Recognition of Chimpanzees

Understanding non-human primate behavior is essential for advancing animal welfare and uncovering the roots of human sociality. …

Xiaoxuan Ma, Yutang Lin, Yuan Xu, Stephan Kaufhold, Jack Terwilliger, Andres Meza, Yixin Zhu, Federico Rossano, Yizhou Wang

[IJCV26] AlphaChimp: Tracking and Behavior Recognition of Chimpanzees

[NeurIPS23] ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab

The challenge of replicating research results has posed a significant impediment to the field of molecular biology. The advent of …

Jieming Cui, Ziren Gong, Baoxiong Jia, Siyuan Huang, Zilong Zheng, Jianzhu Ma, Yixin Zhu

[NeurIPS23] ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors

Understanding the behavior of non-human primates is crucial for improving animal welfare, modeling social behavior, and gaining …

Xiaoxuan Ma, Stephan Kaufhold, Jiajun Su, Wentao Zhu, Jack Terwilliger, Andres Meza, Yixin Zhu, Federico Rossano, Yizhou Wang

[NeurIPS23] ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors

[ICLR23] Understanding Embodied Reference with Touch-Line Transformer

We study embodied reference understanding, the task of locating referents using embodied gestural signals and language references. …

Yang Li, Xiaoxue Chen, Hao Zhao, Jiangtao Gong, Guyue Zhou, Federico Rossano, Yixin Zhu

[ICLR23] Understanding Embodied Reference with Touch-Line Transformer

[ICCV21] YouRefIt: Embodied Reference Understanding with Language and Gesture

We study the machine’s understanding of embodied reference: One agent uses both language and gesture to refer to an object to …

Yixin Chen, Qing Li, Deqian Kong, Yik Lun Kei, Song-Chun Zhu, Tao Gao, Yixin Zhu, Siyuan Huang

[ICCV21] YouRefIt: Embodied Reference Understanding with Language and Gesture

[CVPR21] Learning Triadic Belief Dynamics in Nonverbal Communication from Videos

Humans possess a unique social cognition capability; nonverbal communication can convey rich social information among agents. In …

Lifeng Fan, Shuwen Qiu, Zilong Zheng, Tao Gao, Song-Chun Zhu, Yixin Zhu

[CVPR21] Learning Triadic Belief Dynamics in Nonverbal Communication from Videos

[ECCV20] LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities

Understanding and interpreting human actions is a long-standing challenge and a critical indicator of perception in artificial …

Baoxiong Jia, Yixin Chen, Siyuan Huang, Yixin Zhu, Song-Chun Zhu

[ECCV20] LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities

[ICRA20] Joint Inference of States, Robot Knowledge, and Human (False-)Beliefs

Aiming to understand how human (false-)belief—a core socio-cognitive ability—would affect human interactions with robots, …

Tao Yuan, Hangxin Liu, Lifeng Fan, Zilong Zheng, Tao Gao, Yixin Zhu, Song-Chun Zhu

[ICRA20] Joint Inference of States, Robot Knowledge, and Human (False-)Beliefs

[NeurIPS19] PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

Detecting 3D objects from a single RGB image is intrinsically ambiguous, thus requiring appropriate prior knowledge and intermediate …

Siyuan Huang, Yixin Chen, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu

[NeurIPS19] PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

[ICCV19] Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense

We propose a new 3D holistic++ scene understanding problem, which jointly tackles two tasks from a single-view image: (i) holistic …

Yixin Chen, Siyuan Huang, Tao Yuan, Yixin Zhu, Siyuan Qi, Song-Chun Zhu

[ICCV19] Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense