[CogSci18] Human Causal Transfer: Challenges for Deep Reinforcement Learning

(a) Starting configuration of a 3-lever trial. All levers begin pulled towards the robot arm, whose base is anchored to the center of the display. The arm interacts with levers by either pushing outward or pulling inward. This is achieved by clicking either the outer or inner regions of the levers’ radial tracks, respectively. Only push actions are needed to unlock the door in each lock situation. Light gray levers are always locked, which is unknown to both human subjects and RL at the beginning of training. Once the door is unlocked, the green button can be clicked to command the arm to push the door open. The black circle located opposite the door’s red hinge represents the door lock indicator: present if locked, absent if unlocked. (b) Push to open a lever. (c ) Open the door by clicking the green button.


Discovery and application of causal knowledge in novel problem contexts is a prime example of human intelligence. As new information is obtained from the environment during interactions, people develop and refine causal schemas to establish a parsimonious explanation of underlying problem constraints. The aim of the current study is to systematically examine human ability to discover causal schemas by exploring the environment and transferring knowledge to new situations with greater or different structural complexity. We developed a novel OpenLock task, in which participants explored a virtual “escape room” environment by moving levers that served as ``locks’ to open a door. In each situation, the sequential movements of the levers that opened the door formed a branching causal sequence that began with either a common-cause (CC) or a common-effect (CE) structure. Participants in a baseline condition completed five trials with high structural complexity (i.e., four active levers). Those in the transfer conditions completed six training trials with low structural complexity (i.e., three active levers) before completing a high-complexity transfer trial. The causal schema acquired in the transfer condition was either congruent or incongruent with that in the transfer condition. Baseline performance under the CC schema was superior to performance under the CE schema, and schema congruency facilitated transfer performance when the congruent schema was the less difficult CC schema. We compared between-subjects human performance to a deep reinforcement learning model and found that a standard deep reinforcement learning model (DDQN) is unable to capture the causal abstraction presented between trials with the same causal schema and trials with a transfer of causal schema.

In Proceedings of Annual Meeting of the Cognitive Science Society