**Human Segmentation in Surveillance Video with Deep Learning** Monica Gruosso^1, Nicola Capece^2, Ugo Erra^1 ^1 Department of Mathematics, Computer Science and Economics, ^2 School of Engineering University of Basilicata, Potenza, Italy 85100 monica.gruosso@unibas.it, nicola.capece@unibas.it, ugo.erra@unibas.it ![](teaser.png width="100%") _Some examples of the functioning and the results obtained with our approach are shown in the figure. The bigger RGB sub-images taken using an RGB camera are passed through our encoder-decoder network, which performs human segmentation. The mask and the label overlay obtained are shown respectively in the bottom and the top right corners._ Abstract =============================================================================== Advanced intelligent surveillance systems are able to automatically analyze video of surveillance data without human intervention. These systems allow high accuracy of human activity recognition and then a high-level activity evaluation. To provide such features, an intelligent surveillance system requires a background subtraction scheme for human segmentation that captures a sequence of images containing moving humans from the reference background image. This paper proposes an alternative approach for human segmentation in videos through the use of a deep convolutional neural network. Two specific datasets were created to train our network, using the shapes of $35$ different moving actors arranged on background images related to the area where the camera is located, allowing the network to take advantage of the entire site chosen for video surveillance. To assess the proposed approach, we compare our results with an Adobe Photoshop tool called Select Subject, the conditional generative adversarial network Pix2Pix, and the fully-convolutional model for real-time instance segmentation Yolact. The results show that the main benefit of our method is the possibility to automatically recognize and segment people in videos without constraints on camera and people movements in the scene. Overview =============================================================================== ![](Segnet2.png width="100%") The figure shows our encoder-decoder network based on [SegNet](https://arxiv.org/pdf/1511.00561.pdf) architecture. Such network structure is represented through coloured blocks in the image, in particular: orange blocks represent the Convolutions, Batch Normalization and ReLU operations; blue blocks represent the pooling operations; yellow blocks represent the up-sampling operations and finally, the last grey block represent the pixels classification layer based on Softmax operation. A set of the image shown to the left of the image represents the input set while the set of image to the right the corresponding outputs. Video =============================================================================== ![](humanSegmentation_video.mp4) BibTeX =============================================================================== ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @article{"gruosso2020human", title = "Human segmentation in surveillance video with deep learning", author = "Gruosso, Monica and Capece, Nicola and Erra, Ugo", journal = "Multimedia Tools and Applications", pages = "1--25", year = "2020", publisher = "Springer", doi = "https://doi.org/10.1007/s11042-020-09425-0", } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Resources =============================================================================== | Download | Description | |:--------:|:---------------------------------------------------------------------------------------------------------------------------------------------:| | | Code used to train, test and query the network. | | email | Dataset used to train, test and validate the network. | | | Official pubblication: Multimedia Tools and Applications, 2020 | Acknowledgments =============================================================================== The authors thank NVIDIA's Academic Research Team for providing the GTX 1080 Ti and Titan Xp cards under the Hardware Donation Program and all the people who helped to create the dataset.