TY - GEN
T1 - Learning to detect phone-related pedestrian distracted behaviors with synthetic data
AU - Hatay, Emre
AU - Ma, Jin
AU - Sun, Huiming
AU - Fang, Jianwu
AU - Gao, Zhiqiang
AU - Yu, Hongkai
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/6
Y1 - 2021/6
N2 - Due to the popularity and mobility of smart phones, phone-related pedestrian distracted behaviors, e.g., Texting, Game Playing, and Phone calls, have caused many traffic fatalities and accidents. As an advanced driver-assistance or autonomous-driving system, computer vision could be used to automatically detect distractions from cameras installed on the vehicle for useful safety intervention. The state-of-the-art method models this problem as a standard supervised learning method with a two-branch Convolutional Neural Network (CNN) followed by a voting on all image frames. In contrast, this paper proposes a new synthetic dataset named SYN-PPDB (448 synchronized video pairs of 53, 760 computer game images) for this research problem and models it as a transfer learning problem from synthetic data to real data. A new deep learning model embedded with spatial-temporal feature learning and pose-aware transfer learning is proposed. Experimental results show that we could improve the state-of-the-art overall recognition accuracy from 84.27% to 96.67%.
AB - Due to the popularity and mobility of smart phones, phone-related pedestrian distracted behaviors, e.g., Texting, Game Playing, and Phone calls, have caused many traffic fatalities and accidents. As an advanced driver-assistance or autonomous-driving system, computer vision could be used to automatically detect distractions from cameras installed on the vehicle for useful safety intervention. The state-of-the-art method models this problem as a standard supervised learning method with a two-branch Convolutional Neural Network (CNN) followed by a voting on all image frames. In contrast, this paper proposes a new synthetic dataset named SYN-PPDB (448 synchronized video pairs of 53, 760 computer game images) for this research problem and models it as a transfer learning problem from synthetic data to real data. A new deep learning model embedded with spatial-temporal feature learning and pose-aware transfer learning is proposed. Experimental results show that we could improve the state-of-the-art overall recognition accuracy from 84.27% to 96.67%.
UR - https://www.scopus.com/pages/publications/85116041953
U2 - 10.1109/CVPRW53098.2021.00333
DO - 10.1109/CVPRW53098.2021.00333
M3 - 会议稿件
AN - SCOPUS:85116041953
T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
SP - 2975
EP - 2983
BT - Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2021
PB - IEEE Computer Society
T2 - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2021
Y2 - 19 June 2021 through 25 June 2021
ER -