Facebook’s AI division has launched Ego4D, a long-term project that aims to solve research challenges around egocentric perception: the ability for AI to understand and interact with the world like we do, from a first-person perspective.
Facebook notes that AI typically learns from photos and videos captured in third-person, but next-generation AI will need to learn from videos that show the world from the centre of the action. AI that understands the world from this point of view could unlock a new era of immersive experiences, as devices like augmented reality (AR) glasses and virtual reality (VR) headsets become as useful in everyday life as smartphones.
Facebook has brought together a consortium of 13 universities and labs across nine countries for the project. Together, they have collected more than 2,200 hours of first-person video, featuring over 700 participants going about their daily lives.
The company has also developed five benchmark challenges for developing smarter, more useful AI assistants including episodic memory: what happened when? (e.g., “Where did I leave my keys?”); forecasting: what am I likely to do next? (e.g., “Wait, you’ve already added salt to this recipe”); hand and object manipulation: what am I doing? (e.g., “Teach me how to play the drums”); audio-visual diarization: who said what when? (e.g., “What was the main topic during class?”); and social interaction: who is interacting with whom? (e.g., “Help me better hear the person talking to me at this noisy restaurant”)