基于Virtual Tool Game实现类人的工具使用学习算法 | Yixin Zhu

Background

We have introduced the virtual tool game, which explores the mechanics of human physical problem-solving using the Sample, Simulate, Update (SSUP) model in a 2D virtual tool-use setting. The model suggests that human flexibility comes from simulating the effects of hypothetical actions, while efficiency arises from relying on rich action priors that are continuously updated with real-world observations.

Instructions

In this project, you will implement the SSUP algorithm to reproduce the main findings from the Virtual Tools paper on human tool-use in problem-solving. Its core task is to use one of several tools to move a red object into a green goal area within a 2D physics environment. Your task is to complete the missing SSUP algorithm within the provided game framework and analyze its performance against human data. Through this, you will gain insights into how mental simulation and prior knowledge enable flexible and efficient tool use.

This process will help you explore how action priors and mental simulation contribute to problem-solving flexibility and efficiency. While the paper reports certain technical details (e.g., Gaussian policy, Policy Gradient algorithm), you may use other implementations or algorithms, provided you justify your choices. You are not required to reproduce results exactly, but a thoughtful and reasonable analysis is expected.

Task 1: Setup and Familiarization (20 pts)

Get yourself prepared to the project.

Start by reading the provided paper to understand the theoretical foundation of the SSUP model. Include a brief introduction to the Virtual Tool Game and a summary on the core of the algorithm. (15 pts)
Clone the provided codebase and set up the development environment according to the repository instructions. The game framework is available in this repo, and note that it includes the game itself but does not contain the SSUP algorithm. Include the screenshots of successful installation. (5 pts)

Task 2: SSUP Implementation (50 pts)

Implement the SSUP algorithm and run it across different game levels. It is recommended to wrap the environment with standard gym-like APIs so that it can be compatible with standard RL wrappers like Stable Baseline.

Implement the SSUP algorithm based on your understanding according to the paper. Key procedures include:
1. Sample: Proposing new actions based on a structured, object-based prior. (10pt)
2. Simulate: “Mentally” simulating the outcomes of proposed actions using a noisy physics engine. The reward is based on how much the action reduces the distance between the red object and the green goal. (10 pt)
3. Update: Adjusting beliefs about high-value actions based on both simulated and real-world outcomes. The paper uses a Gaussian mixture model policy for this step, but you may adopt alternatives if you can justify your choice. (20pt)
4. Environment setup: implement logics on environment initialization, stepping, reset, the computation of key metrics, and logging. (10 pt)

Task 3: Result Analysis (30 pts)

Following Fig. 4: Visualize the results (e.g., selected actions, model beliefs, performance, etc.) and analyze them. Your analysis could include but is not be limited to the questions below. Include the necessary visualizations, analysis and discussions in the report. (20 pts)
1. Does the algorithm learn effectively through the SSUP process?
2. How quickly does your model converge?
3. How does it perform across different levels? What are the primary failure modes and their potential causes?
Following Fig. 5: Compare your implementation’s results with the human data (see this page). Use linear regression to assess how closely the performances align. Include the necessary visualizations, analysis and discussions in the report. (10 pts)

Deliverable

A report with details mentioned above.
Codebase of your project, which includes:
1. The source code of your project, with necessary comments on the code.
2. A README.md file that illustrates the repo structure and startup script to replicate the results in your paper.
3. If you build upon the Virtual Tool Game repo, you have to either open source the repo on GitHub with your commits preserved, or provide a git diff log file that illustrates the modifications you have made in the zip file you submit. See this page for how to generate the git diff log, which you should dump into a text file.

References

Allen, K. R., Smith, K. A., & Tenenbaum, J. B. (2020). Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning. Proceedings of the National Academy of Sciences, 117(47), 29302-29310.