Module 2: Computer Vision

Overview

This week focuses on Computer Vision, a key area in AI that deals with how computers can gain high-level understanding from digital images or videos. We’ll explore essential concepts, techniques, and frameworks used in modern computer vision tasks.

Instructor

Siyuan Huang, BIGAI

Topics Covered

  • Introduction to Computer Vision
  • Lightweight PytorchVision overview
  • Application of mainstream frameworks (e.g., ViT) in vision tasks
  • Single-view depth estimation
  • Embodied Vision tasks

Assignments

Practice Assignment:

Complete the following two tasks:

  • Task1: Use the pretrained models provided by Omnidata to estimate depth and normals.
  • Task2: Given multi-view RGBD (images and depth), complete the code in hw.ipynb to reconstruct the object’s mesh.

Written Assignment:

Complete the following task:

  • Task1: Use the pretrained models provided by Omnidata to estimate depth and normals.

Assignment: Computer Vision

Additional Resources

Notes

  • This module builds on the PyTorch skills you’ve developed in previous weeks. Make sure you’re comfortable with basic PyTorch operations before diving into vision-specific tasks.
  • For the practice assignment, consider how your model interacts with its environment in an embodied AI context.
  • As always, document your code thoroughly and use version control (Git) for your project.
  • Submit your assignments on GitHub Classroom.
Previous
Next