**A Validation Approach for Deep Reinforcement Learning of a Robotic Arm in a 3D Simulated Environment** Monica Gruosso^1, Nicola Capece^2, Ugo Erra^1, Flavio Biancospino^1 ^1 Department of Mathematics, Computer Science and Economics, ^2 School of Engineering University of Basilicata, Potenza, Italy 85100 monica.gruosso@unibas.it, nicola.capece@unibas.it, ugo.erra@unibas.it, flavio.biancospino@studenti.unibas.it Abstract =============================================================================== In recent years, deep reinforcement learning has increasingly contributed to the development of robotic applications and boosted research in robotics. Deep learning and model-free, off-policy, value-based reinforcement learning algorithms enabled agents to successfully learn complex robotic skills through trial and error process and visual inputs. We proposed an approach for training a robot in a simulation environment by designing a Deep Q-Network (DQN) that elaborates images acquired by an RGB vision sensor and outputs a value for each action the robotic arm can execute given the current state. In particular, the robot has to push a ball into a soccer net without any knowledge of the environment and its own location. It improves by receiving rewards based on its actions. In addition, our further goal was to perform agent validation during training and assess its generalization level. Despite the many advances in reinforcement learning, it is still a challenge. Therefore, we devised a validation strategy similar to the method applied in supervised learning and tested the agent both on known and unknown experiences, achieving interesting and promising results. Overview =============================================================================== ![](net_architecture.png width="100%")

The DQN accepts as input a tensor with size $128 \times 128 \times 4$ consisting of four grayscale images. The input is processed by one convolutional layer, a max-pooling layer, and two convolutional layers. At the end of the network, there are two fully connected layers. A ReLU activation function is used for both convolutional layers and the first fully connected layer. The network outputs a vector of $5$ elements, which are the predicted Q-values for every possible action the manipulator can take. ![](frankaEmikaPanda_joints.jpeg width="40%") The robotic manipulator used in the simulation environment is based on the Franka Emika Panda model, which is a collaborative lightweight robot with $7$ DoF. The joints of the simulation model are numbered from $J_1$ to $J_7$, starting from the base. The last three joints are involved in the actions of the task. ![](task.png width="55%") The vision sensor is positioned on the floor at the base of the robot. It captures the ball, the soccer net, and the last two joints of the robotic arm, as shown in the panel at the top left. Only three joints can move. In particular, the first two actions are related to the joint $5$, the third the fourth actions concern the joint $7$, and the shot corresponds to the movement of the joint $6$. Video =============================================================================== ![A video](video_demo.mp4) BibTeX =============================================================================== ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @article{"gruosso2021valid", title = "A Validation Approach for Deep Reinforcement Learning of a Robotic Arm in a 3D Simulated Environment", author = "Gruosso, Monica and Capece, Nicola and Erra, Ugo and Biancospino, Flavio", booktitle = "2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)", pages = "000043-000048", year = "2021", doi = "https://doi.org/10.1109/SAMI50585.2021.9378684", } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Oral Presentation =============================================================================== Our work was presented at the _IEEE_ $19^{th}$ _World Symposium on Applied Machine Intelligence and Informatics_ ([SAMI 2021](http://conf.uni-obuda.hu/sami2021/)). ![A video](http://conf.uni-obuda.hu/sami2021/7_SAMI.mp4) Acknowledgment =============================================================================== The authors thank NVIDIA's Academic Research Team for providing the Titan Xp cards under the Hardware Donation Program.