Overview
This week focuses on Computer Vision, a key area in AI that deals with how computers can gain high-level understanding from digital images or videos. We’ll explore essential concepts, techniques, and frameworks used in modern computer vision tasks.
Instructor
Siyuan Huang, BIGAI
Topics Covered
- Introduction to Computer Vision
- Lightweight PytorchVision overview
- Application of mainstream frameworks (e.g., ViT) in vision tasks
- Single-view depth estimation
- Embodied Vision tasks
Assignments
Practice Assignment:
Complete the following two tasks:
- Task1: Use the pretrained models provided by Omnidata to estimate depth and normals.
- Task2: Given multi-view RGBD (images and depth), complete the code in hw.ipynb to reconstruct the object’s mesh.
Written Assignment:
Complete the following task:
- Task1: Use the pretrained models provided by Omnidata to estimate depth and normals.
Additional Resources
Notes
- This module builds on the PyTorch skills you’ve developed in previous weeks. Make sure you’re comfortable with basic PyTorch operations before diving into vision-specific tasks.
- For the practice assignment, consider how your model interacts with its environment in an embodied AI context.
- As always, document your code thoroughly and use version control (Git) for your project.
- Submit your assignments on GitHub Classroom.