PhD Candidate, Boston University
Excitation Backprop Through Time for Spatio-temporal Localization of Actions in Videos.
Recent works target giving more insight into deep model predictions. Such approaches can identify the importance of class-specific image regions by means of saliency maps. In this work we visualize the internal representation learned by deep models for the task of video understanding.
Our approach extends the work of Zhang et al. on Excitation Backprop to the task of spatio-temporal action localization in videos. This approach models the top-down attention mechanism of deep models to produce interpretable and useful task-relevant attention maps.
We locate actions in space and time simultaneously, in a single pass, using top-down saliency. Although we are not directly optimizing for localization (no training is performed on bounding box spatial annotations or temporal annotations), we can utilize the internal representation of the model to perform the localization.
Sarah is a 5th year PhD candidate in the Image and Video Computing Group of the Computer Science Department at Boston University. Sarah is also an IBM PhD fellow and a Hariri PhD fellow. Her research interests lie in the intersection of Computer Vision and Machine Learning. Sarah is advised by Prof. Stan Sclaroff; They are particularly interested in developing deep learning formulations for the analysis of human motion and activities in video.