A Workflow for Training Robotic End-to-End Visuomotor Policies in Simulation

Loading...
Thumbnail Image

Date

2021-05

Journal Title

Journal ISSN

Volume Title

Publisher

The Ohio State University

Research Projects

Organizational Units

Journal Issue

Abstract

The explicit programming methods which control most industrial robotic manipulators is a great option for precisely defined environments like factories and warehouses. These spaces are intentionally designed so robots can follow commands complete a task with limited or no awareness of their environment. But the real-world does not adhere to such strict rules; it is noisy, dynamic, and interactive. For these robots to work alongside humans in the real-world a new approach that can adapt to this randomness is needed. Research has turned to machine learning, specifically neural networks (NN), for this. Instead of programming exactly what the robot should do in every possible scenario, these methods let a NN control the robot. The NN is trained to control the robot and learns a general approach that it can adapt to whatever conditions it encounters. I focus specifically on end-to-end methods which take an observation of the environment and directly map this to a decision. These NN are trained on a specific task and run continuously. By using proprioceptive information about the robot's state and depth images from a camera in front of the robot as inputs these NN learn a visuomotor policy, akin to hand-eye coordination in humans. I share a workflow for creating these NN through behavior cloning and compare the performance of different network structures and training parameters. The workflow I present includes tools for generating demonstrations of a task, training the network, and evaluating the network. This process is designed to be adapted for different robots, tasks, or training methodologies. I show how recursive neural network structures and the training on domain randomized data both improve performance of the NN. I also describe issues where the NN do not learn the intended task and identify changes that may correct the learning process.

Description

Keywords

Robotics, Neural Networks, Imitation Learning, Collaborative Robotics, Machine Learning

Citation