Lab 5: Behavior Cloning for Robotic Manipulation | Yixin Zhu

Initial Codebase for Gradescope Submission

Background

This assignment focuses on Behavior Cloning (BC), a classical imitation learning technique where an agent learns to mimic expert demonstrations via supervised learning. BC is widely used in robotics, manipulation, and reinforcement learning as a simple yet powerful baseline.

You will work with the ManiSkill platform and PyTorch to complete a full imitation learning pipeline, including dataset handling, network design, training, and evaluation.

Learning Objectives

By completing this assignment, you will:

Understand the principles and workflow of Behavior Cloning.
Gain hands-on experience with data loading, training, and evaluation in imitation learning.
Implement and design a neural network policy using PyTorch.
Train and evaluate a behavior cloning policy on manipulation tasks.

Environment Setup

Install Dependencies

Recommended: Linux, Conda, CUDA GPU (optional)

conda create -n bc python=3.10
conda activate bc
pip install -r requirements.txt

Download Datasets and Generate Demonstration Trajectories

Download the demonstration data (default to ~/.maniskill/demos/):

python -m mani_skill.utils.download_demo "PushCube-v1"

We need to first generate data for IL training, via replaying the demonstrations and saving the trajectories based on end-effector pose:

python -m mani_skill.trajectory.replay_trajectory \
  --traj-path ~/.maniskill/demos/PushCube-v1/motionplanning/trajectory.h5 \
  --use-first-env-state -c pd_ee_delta_pos -o state \
  --save-traj --num-envs 1 -b physx_cpu --vis

This should open a window, show, and save the demonstrations. Let it play for a while to generate all 1000 trajectories. You can also use --count to specify how many trajectories you want to save (e.g., --count 500).

The data will be dumped to the same directory of the trajectory.h5 file, e.g., ~/.maniskill/demos/PushCube-v1/motionplanning/trajectory.state.pd_ee_delta_pos.physx_cpu.h5.

Complete the Assignment & Run Tests

Fill in all required functions in bc_utils.py (# TODO sections only). Do not change function signatures or import statements.

Subtask 1: Implement Behavior Cloning Loss Function (30 pts)

Implement the MSE loss function for BC. Given the ground truth actions $\mathbf{a}_i$ and the predictions $\hat{\mathbf{a}}_i$, for samples $i = 1, 2, \dots, n$, the loss function is:

$$ \mathcal{L} = \frac{1}{n} \sum_{i=1}^n \Vert\hat{\mathbf{a}}_i - \mathbf{a}_i \Vert^2 $$

Complete the compute_bc_loss(...) function according to the comments after it.

Subtask 2: Implement Observation Normalization (30 pts)

Observation normalization is very helpful for efficiently learning the policy. While we cannot know the true mean and standard deviation of the observation space, we can estimate them using the statistics of the given batch (assuming the batch is large enough).

Given a batch of observations (each as a vector), $\mathbf{o}_i, i = 1, 2, \dots, B$, with $B$ being the batch size, we estimate the mean and standard deviation for each dimension $j$:

$$ \tilde{\mu}_j = \frac{1}{B} \sum_{i=1}^n \mathbf{o}_{ij}, \quad \tilde{\sigma}_j = \sqrt{\frac{1}{B} \sum_{i=1}^B (\mathbf{o}_{ij} - \tilde{\mu}_j)^2} $$

Then, normalization is computed for each dimension $j$:

$$ \mathbf{o}_{ij}^{\text{norm}} = \frac{\mathbf{o}_{ij}- \tilde{\mu}_j}{\tilde{\sigma}_j} $$

Implement the normalize_observations(...) function according to the comments after it. Note that you should actively avoid division by zero appropriately.

Subtask 3: Implement MLP Policy (20 pts)

Implement the function to create an MLP (multi-layer perceptron) as a subclass of torch.nn.Module, according to the dimensions of the state and action spaces. The network should be properly designed in size and depth to ensure sufficient expressiveness without overfitting.

Implement the create_actor_network(...) function according to the comments after it.

Subtask 4: Train the Policy (20 pts)

After filling in the required functions, you should be able to train your BC policy on the provided tasks:

# PushCube task
python run.py --env-id "PushCube-v1" \
  --demo-path ~/.maniskill/demos/PushCube-v1/motionplanning/trajectory.state.pd_ee_delta_pos.physx_cpu.h5 \
  --control-mode "pd_ee_delta_pos" --sim-backend "cpu" \
  --max-episode-steps 100 --total-iters 10000 --batch-size 64

The demo-path specified here should match the generated data above. During training, the results (including checkpoints and videos) will be saved under the runs/ directory. In submission, copy the checkpoint as results/checkpoint.pt (under the homework directory), and the auto grader will check it against your implemented MLP network. Also, pick two videos that show successful manipulation (pushing the cube into the target area), and name them to results/success_1.mp4, results/success_2.mp4.

Note that its evaluation results (manipulation success rate) will not affect your score on this lab. However, you should make sure that the successful results can be cherry-picked from the training results. If there are none, consider checking your implementations, running the training code for more iterations or re-running it with a different seed.

Submission Checklist

Please submit the following files in a single zip file:

Your completed Python code, with all required functions implemented and necessary comments.
Your trained model checkpoint and successful manipulation videos under results/.

Autograder Testing

Your submission will be tested against multiple test cases
Partial credit will be awarded for partially correct implementations
Timeout limit: 10 minutes in total