PhD student, Stanford University
Email
Abstract
Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos
Every moment counts in action recognition. A comprehensive understanding of human activity in video requires labeling every frame according to the actions occurring, placing multiple labels densely over a video sequence. We study this problem in two complex settings: first in unconstrained internet videos, and second in AI-assisted hospitals.
In unconstrained internet videos, we extend the existing THUMOS dataset and introduce MultiTHUMOS, a new dataset of dense labels over internet videos. Modeling multiple, dense labels benefits from temporal relations within and across classes. We define a novel variant of long short-term memory (LSTM) deep networks for modeling these temporal relations via multiple input and output connections. We show that this model improves action labeling accuracy and further enables deeper understanding tasks ranging from structured retrieval to action prediction.
We also present human activity understanding in the context of AI-assisted hospitals, where such algorithms can continuously sense what is happening in hospitals and be used to improve patient care. We have equipped pilot hospital units at Lucile Packard Children’s Hospital and Intermountain Healthcare with ceiling-mounted depth sensors that record privacy-safe depth video. We show that action recognition algorithms can be used to detect activities ranging from hand hygiene compliance for infection control, to clinical care activities in the ICU that enable automated nursing documentation and correlation with outcomes.
Bio
Serena Yeung is a PhD student in the Stanford Vision Lab, advised by Prof. Fei-Fei Li. Her research interests are in computer vision, machine learning, and deep learning. She is particularly interested in the areas of video understanding, human action recognition, and healthcare applications.
Serena is a member of the Stanford Program in AI-Assisted Care (PAC), a collaboration between the Stanford AI Lab and Stanford Clinical Excellence Research Center that aims to use computer vision and machine learning to create AI-assisted smart healthcare spaces. She interned at Facebook AI Research in 2016, and Google Cloud AI in 2017. She was also a co-instructor for Stanford’s CS231N course on Convolutional Neural Networks for Visual Recognition in 2017.