Module 2: Computer Vision | Yixin Zhu

Overview

This week focuses on Computer Vision, a key area in AI that deals with how computers can gain high-level understanding from digital images or videos. We’ll explore essential concepts, techniques, and frameworks used in modern computer vision tasks.

Instructor

Siyuan Huang, BIGAI

Topics Covered

Introduction to Computer Vision
Lightweight PytorchVision overview
Application of mainstream frameworks (e.g., ViT) in vision tasks
Single-view depth estimation
Embodied Vision tasks

Assignments

Practice Assignment:

Complete the following two tasks:

Task1: Use the pretrained models provided by Omnidata to estimate depth and normals.
Task2: Given multi-view RGBD (images and depth), complete the code in hw.ipynb to reconstruct the object’s mesh.

Written Assignment:

Complete the following task:

Task1: Use the pretrained models provided by Omnidata to estimate depth and normals.

Assignment: Computer Vision

Additional Resources

Notes

This module builds on the PyTorch skills you’ve developed in previous weeks. Make sure you’re comfortable with basic PyTorch operations before diving into vision-specific tasks.
For the practice assignment, consider how your model interacts with its environment in an embodied AI context.
As always, document your code thoroughly and use version control (Git) for your project.
Submit your assignments on GitHub Classroom.