Online Action Detection and Forecast via Multi-Task Deep Recurrent Neural Networks


Teaser

Fig.1 Architecture of the proposed multi-task RNN framework for online detection and forecast. It contains three major components: three stacked LSTM layers, one classification subnetwork and a regression subnetwork.

Abstract

Online human action detection and forecast on untrimmed 3D skeleton sequences is a novel task based on traditional action recognition and has not been fully studied. Its aim is to localize and recognize one action in a long sequence while doing forecasting task at the same time. In this paper, we propose an online detection algorithm featuring Multi-Task Recurrent Neural Network to solve this problem. First, a deep Long Short Term Memory (LSTM) network is designed for feature extraction and temporal dynamic modeling. Then we utilize a classification subnetwork to classify one action, and predict the status of it at the same time. To forecast the occurrence of actions and estimate the accurate time of occurrence, we incorporate a regression subnetwork to our model. Then we split the action classes to three stages and train the model by optimizing a joint classification regression objective function. Experimental results show that the proposed model achieves satisfactory results on online action detection and forecast.

Experiment results

Teaser

Fig.2 An comparision between proposed countdown and groundtruth.

Teaser

Fig.3 Comparison with proposed model and four standard curves: 5-delayed, 5-advanced, 3-delayed and 3-advanced. K-delayed curves forcast the action with a K-frames latency and K-advanced curves forecast the action K frames before the action occurences.

Citation

@inproceedings{Liu2017Online, title={Online Action Detection and Forecast via Multi-Task Deep Recurrent Neural Networks }, author={Liu, Chunhui and Li, Yanghao and Hu, Yueyu and Liu, Jiaying}, booktitle={IEEE International Conference on Acoustics, Speech, and Signal Processing}, year={2017}}