1

[ICCV21] YouRefIt: Embodied Reference Understanding with Language and Gesture

We study the machine's understanding of embodied reference: One agent uses both language and gesture to refer to an object to another agent in a shared physical environment. Of note, this new visual task requires understanding multimodal cues with …

[ICCV21] Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds

To date, various 3D scene understanding tasks still lack practical and generalizable pre-trained models, primarily due to the intricate nature of 3D scene understanding tasks and their immense variations introduced by camera views, lighting, …

[IROS21] Consolidating Kinematic Models to Promote Coordinated Mobile Manipulations

We construct a Virtual Kinematic Chain (VKC) that readily consolidates the kinematics of the mobile base, the arm, and the object to be manipulated in mobile manipulations. Accordingly, a mobile manipulation task is represented by altering the state …

[IROS21] Efficient Task Planning for Mobile Manipulation: a Virtual Kinematic Chain Perspective

We present a Virtual Kinematic Chain (VKC) perspective, a simple yet effective method, to improve task planning efficacy for mobile manipulation. By consolidating the kinematics of the mobile base, the arm, and the object being manipulated …

[IROS21] Communicative Learning with Natural Gestures for Embodied Navigation Agents with Human-in-the-Scene

Human-robot collaboration is an essential research topic in artificial intelligence (AI), enabling researchers to devise cognitive AI systems and affords an intuitive means for users to interact with the robot. Of note, communication plays a central …

[CogSci21] Individual vs. Joint Perception: a Pragmatic Model of Pointing as Communicative Smithian Helping

The simple gesture of pointing can greatly augment one's ability to comprehend states of the world based on observations. It triggers additional inferences relevant to one’s task at hand. We model an agent's update to its belief of the world based on …

[ACL-Findings21] GRICE: A Grammar-based Dataset for Recovering Implicature and Conversational rEasoning

Understanding what we genuinely mean instead of what we literally say in conversations is challenging for both humans and machines; yet, this direction is mostly left untouched in modern open-ended dialogue systems. To fill in this gap, we present a …

[CVPR21] Learning Triadic Belief Dynamics in Nonverbal Communication from Videos

Humans possess a unique social cognition capability; nonverbal communication can convey rich social information among agents. In contrast, such crucial social characteristics are mostly missing in the existing sceneunderstanding literature. In this …

[CVPR21] ACRE: Abstract Causal Reasoning Beyond Covariation

Causal induction, i.e., identifying unobservable mechanisms that lead to the observable relations among variables, has played a pivotal role in modern scientific discovery, especially in scenarios with only sparse and limited data. Humans, even young …

[CVPR21] Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution

Spatial-temporal reasoning is a challenging task in Artificial Intelligence (AI) due to its demanding but unique nature: a theoretic requirement on representing and reasoning based on spatial-temporal knowledge in mind, and an applied requirement on …