**Egocentric Upper Limb Segmentation in Unconstrained Real-Life Scenarios** Monica Gruosso, Nicola Capece, Ugo Erra Department of Mathematics, Computer Science and Economics, University of Basilicata, Potenza, Italy 85100 monica.gruosso@unibas.it, nicola.capece@unibas.it, ugo.erra@unibas.it Abstract =============================================================================== The segmentation of bare and clothed upper limbs in unconstrained real-life environments has been less explored. It is a challenging task that we tackled by training a deep neural network based on the [DeepLabv3+](https://arxiv.org/pdf/1802.02611.pdf) architecture. We collected about $46$ thousand real-life and carefully labeled RGB egocentric images with a great variety of skin tones, clothes, occlusions, and lighting conditions. We then widely evaluated the proposed approach and compared it with state-of-the-art methods for hand and arm segmentation, e.g., [Ego2Hands](https://arxiv.org/pdf/2011.07252.pdf), [EgoArm](https://arxiv.org/pdf/2003.12352.pdf), and [HGRNet](https://arxiv.org/pdf/1806.05653.pdf). We used our test set and a subset of the EgoGesture dataset (EgoGestureSeg) to assess the model generalization level on challenging scenarios. Moreover, we tested our network on hand-only segmentation since it is a closely related task. We made a quantitative analysis through standard metrics for image segmentation and a qualitative evaluation by visually comparing the obtained predictions. Our approach outperforms all comparing models in both tasks and proves the robustness of the proposed approach to hand-to-hand and hand-to-object occlusions, dynamic user/camera movements, different lighting conditions, skin colors, clothes, and limb/hand poses. Overview =============================================================================== ![](upperLimbSeg_network.png width="100%") Our network model based on the [DeepLabv3+](https://arxiv.org/pdf/1802.02611.pdf) encoder-decoder architecture. We chose the [Xception-65](https://arxiv.org/pdf/1610.02357.pdf) model as the backbone network, which allows extracting low-level features that are passed to the decoder. The input is an RGB image showing the human upper limb, while the output is the segmentation binary mask. To train our network, we collected a large comprehensive upper limb segmentation dataset to overcome the limitations of existing real datasets. It consists of $46.021$ ($43.837$ for training and $2.184$ for testing) well-annotated RGB images captured in unconstrained real-world scenarios and showing a wide range of situations. All collected data are in an egocentric perspective and comes from three different dataset: - [EDSH](https://openaccess.thecvf.com/content_cvpr_2013/papers/Li_Pixel-Level_Hand_Detection_2013_CVPR_paper.pdf), which includes indoor and outdoor video frames showing different lighting conditions and a user's bare limb (hands and forearms) during real-life actions, such as preparing tea, climbing stairs, and opening doors; - [TEgO](https://dl.acm.org/doi/10.1145/3290605.3300566), which is a large dataset including high-resolution indoor images showing two subjects' hands and forearms with different skin tones, lighting, and object occlusions; - our manually labeled EgoCam dataset showing four male and female people in simple and cluttered environments, indoor and outdoor real-life scenes, inter-hand occlusions, different lighting conditions, and skin tones. We only considered a subset of the first two datasets since we found and discarded data whose labels contained errors. More details can be found in the paper. Video =============================================================================== ![A video](UpperLimbSeg_demo_journal.mp4) BibTeX =============================================================================== ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @article{"gruosso202egocentric", title = "Egocentric Upper Limb Segmentation in Unconstrained Real-Life Scenarios", author = "Gruosso, Monica and Capece, Nicola and Erra, Ugo", journal = "Virtual Reality", year = "2022", publisher = "Springer", doi = "https://doi.org/10.1007/s10055-022-00725-4", } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Resources =============================================================================== | Download | Description | |:--------:|:---------------------------------------------------------------------------------------------------------------------------------------------:| | | Code | | email | Our Upper Limb Segmentation Dataset | | | Official publication: Virtual Reality, 2022 | For more information about original EDSH and TEgO data, please visit the following pages: - EDSH web page: http://www.cs.cmu.edu/~kkitani/datasets/ - TEgO web page: https://iamlabumd.github.io/tego/ Acknowledgments =============================================================================== The authors thank NVIDIA's Academic Research Team for providing the Titan Xp cards under the Hardware Donation Program.