Contributors:
- Chao Xu (Affordance)
- Zeyu Zhang (Functionality)
- Tengyu Liu (HOI/HSI)
- Yuyang Li (HOI/HSI)
Reading list
survey/review/perspective paper book GitHub
Required - Affordance
- Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense (Section 5), Engineering 2020
- From 3D Scene Geometry to Human Workspace, CVPR 2011
- Inferring Forces and Learning Human Utilities From Videos, CVPR 2016
- Act the Part: Learning Interaction Strategies for Articulated Object Part Discovery, ICCV 2021
Required - HOI/HSI
- Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities, CVPR 2010
- Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense, ICCV 2019
Required - Functionality
- Scene Parsing by Integrating Function, Geometry and Appearance Models, CVPR 2013
- Make it Home: Automatic Optimization of Furniture Arrangement, SIGGRAPH 2011
Optional - Affordance in General
- Visual Affordance and Function Understanding: A Survey, ACM Computing Surveys 2021
- The Ecological Approach to Visual Perception, Boston: Houghton Mifflin (1979)
- Understanding Context: Environment, Language, and Information Architecture (Chapter 4), O’Reilly Media, Inc. (2014)
Optional - Scene Affordance
- Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image, ECCV 2018
- A Multi-Scale CNN for Affordance Segmentation in RGB Images, ECCV 2016
- EGO-TOPO: Environment Affordances from Egocentric Video, CVPR 2020
- People Watching: Human Actions as a Cue for Single View Geometry, ECCV 2012
- Binge Watching: Scaling Affordance Learning from Sitcoms, CVPR 2017
- Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments, CVPR 2019
Optional - Object Affordance
- Reasoning about Object Affordances in a Knowledge Base Representation, ECCV 2014
- O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning, CoRL 2022
- Hallucinated humans: Learning latent factors to model 3D environments, Diss. Cornell University, 2015
- Long-Horizon Manipulation of Unknown Objects via Task and Motion Planning with Estimated Affordances, arXiv preprint arXiv:2108.04145
- 3D AffordanceNet: A Benchmark for Visual Object Affordance Understanding, CVPR 2022
- Deep Affordance Foresight: Planning Through What Can Be Done in the Future, ICRA 2021
Optional - HOI/HSI
- Reconstructing Hand-Object Interactions in the Wild, CVPR 2021
- Compositional Learning for Human Object Interaction, ECCV 2018
- Exploiting Relationship for Complex-scene Image Generation, AAAI 2021
- Hand-Object Contact Consistency Reasoning for Human Grasps Generation, CVPR 2021
- Synthesizing Diverse and Physically Stable Grasps With Arbitrary Hand Structures Using Differentiable Force Closure Estimator, RA-L 2021
- GenDexGrasp: General Dexterous Grasping, ICRA 2023
- Modeling 4D Human-Object Interactions for Event and Object Recognition, CVPR 2013
- Full-Body Articulated Human-Object Interaction, ICCV 2023
- Detecting and Recognizing Human-Object Interactions, CVPR 2018
- Learning Human-Object Interactions by Graph Parsing Neural Networks, ECCV 2018
- HAKE: Human Activity Knowledge Engine, arXiv preprint arXiv:1904.06539
- Detailed 2D-3D Joint Representation for Human-Object Interaction, CVPR 2020
- Pose2Room: Understanding 3D Scenes from Human Activities, ECCV 2022
- Jointly Recognizing Object Fluents and Tasks in Egocentric Videos, ICCV 2017
- HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes, NIPS 2022
- Diffusion-based Generation, Optimization, and Planning in 3D Scenes, CVPR 2023
- HOI Learning List
Optional - Functionality
- Human-centric Indoor Scene Synthesis Using Stochastic Grammar, CVPR 2018
- Recognition of natural scenes from global properties: Seeing the forest without representing the trees, Cognitive Psychology 2009
- Shape2Pose: Human-Centric Shape Analysis, SIGGRAPH 2014
- What Can I Do Around Here? Deep Functional Scene Understanding for Cognitive Robots, ICRA 2017
- Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope, IJCV 2001
- Understanding Bayesian rooms using composite 3D object models, CVPR 2013
- Action Genome: Actions as Composition of Spatio-temporal Scene Graphs, CVPR 2020
- ConceptNet 5.5: An Open Multilingual Graph of General Knowledge, AAAI 2017
- Configurable 3D Scene Synthesis and 2D Image Rendering with Per-Pixel Ground Truth using Stochastic Grammars, IJCV 2018
Essay Option 1: A Deep Dive into the State of the Field
“The picture above is funny. But for me it is also one of those examples that make me sad about the outlook for AI and for Computer Vision. What would it take for a computer to understand this image as you or I do? I challenge you to think explicitly of all the pieces of knowledge that have to fall in place for it to make sense. … I hate to say it but the state of CV and AI is pathetic when we consider the task ahead, and when we think about how we can ever go from here to there. The road ahead is long, uncertain and unclear. … In any case, we are very, very far and this depresses me. What is the way forward?”
The above image was taken in 2010, and the above comment was made in 2012. Since then, AI technology has advanced significantly, and I’m wondering if the above comments still hold true today.
Background
In a blog post by Andrej Karpathy, the complexities of computer vision are explored through the lens of a humorous image featuring President Obama and a man standing on a scale. Karpathy outlines the numerous layers of understanding that a human applies almost instantaneously when viewing the image, from recognizing the 3D structure of the scene to understanding the implications of Obama’s foot on the scale. This serves as a stark contrast to the current state of computer vision, which struggles with such multi-layered interpretations.
Assignment
Write an essay that delves into the complexities of computer vision as outlined by Karpathy. Discuss the various tasks that an algorithm must understand to “get the joke” in the image and how far current technology is from achieving this level of understanding.
Guidelines
Introduction: Introduce the topic of computer vision and its significance in the field of AI. Reference Karpathy’s blog post as a starting point for the discussion.
List of Tasks for Understanding the Image: Enumerate and elaborate on the tasks that Karpathy mentions an algorithm must understand to interpret the image as a human does. These include but are not limited to:
- Recognizing 3D structure
- Understanding visual elements like mirrors
- Identifying people and their roles
- Understanding physics and how objects interact
- Reasoning about the state of mind of people in the image
Current State of Computer Vision: Discuss the current state-of-the-art techniques in computer vision. How do they compare to the list of tasks needed for full understanding?
Challenges in Data and Training: Address the issue of data collection and training algorithms. How can we gather data that supports complex inferences? Is “more data” the solution?
The Role of Embodiment: Explore Karpathy’s notion that embodiment—experiencing the world as humans do—might be necessary for algorithms to understand complex scenes.
Future Directions: What are the potential paths forward in this field? Is the road ahead “long, uncertain, and unclear,” as Karpathy suggests, or are there promising avenues for research?
Conclusion: Summarize the complexities involved in achieving a computer vision system that can understand the world as humans do and offer your own insights into the way forward.
References: Cite any sources, articles, or studies you use to support your arguments.
Evaluation Criteria
- Clarity and organization of thoughts
- Depth of analysis
- Use of case studies and examples
- Quality of writing, including grammar and syntax
- Proper citation of sources
Additional Resources
Good luck, and may your essay contribute to the ongoing dialogue in this fascinating field!
Essay Option 2: AI for Autonomous Driving
Background
Recent incidents involving Tesla’s Autopilot and Full Self-Driving (FSD) technologies have raised questions about the challenges of building an AI system capable of driving a car autonomously. In one case, a Tesla Model 3’s Autopilot system mistook a truck hauling deactivated traffic lights for an endless trail of actual traffic lights on the road. In another instance, Tesla’s FSD technology confused the moon for a yellow traffic light, causing the car to apply the brakes unnecessarily. These incidents highlight the difficulties in training AI systems to understand the complexities of the physical world they operate in.
Assignment
Write an essay that explores the challenges and considerations in building an AI system for autonomous driving. Specifically, focus on the aspects of the physical world that an AI should understand to operate safely and efficiently. Use the recent Tesla incidents as case studies to illustrate your points.
Guidelines
Introduction: Introduce the topic and the importance of building reliable AI systems for autonomous driving. Mention the recent Tesla incidents as examples of the challenges involved.
Understanding the Physical World: Discuss the various aspects of the physical world that an AI system should understand, such as:
- Traffic signals and signs
- Road conditions and infrastructure
- Weather conditions
- Other vehicles and pedestrians
- Unusual scenarios (e.g., a truck hauling traffic lights, the moon appearing as a traffic light, etc.)
Limitations of Current Technologies: Examine the limitations of current AI technologies in understanding the physical world. Use the Tesla incidents to demonstrate these limitations.
The Role of Data and Training: Discuss the importance of data and training in building a robust AI system. Address the argument that simply collecting ‘more data’ may not be sufficient for achieving full driving autonomy.
Ethical and Safety Considerations: Explore the ethical implications and safety concerns that arise when AI systems fail to understand the physical world correctly.
Conclusion: Sum up the challenges and considerations in building an AI system for autonomous driving and suggest possible solutions or future directions for research and development.
References: Cite any sources, studies, or news articles you’ve used to support your arguments.
Evaluation Criteria
- Clarity and organization of thoughts
- Depth of analysis
- Use of case studies and examples
- Quality of writing, including grammar and syntax
- Proper citation of sources
Additional Resources
- Tesla Autopilot Glitch of Truck Hauling Traffic Lights | Futurism
- Tesla’s Full Self-Driving tech keeps getting fooled by the moon, billboards, and Burger King signs | Business Insider
Good luck, and happy writing!