Implementation and Evaluation of Sample Efficient Reinforcement Learning on Autonomous Manufacturing Test Bed
The Ohio State University
Reinforcement learning (RL) is a class of machine learning algorithm that receives feedback by interacting with its environment and uses that information to make sequential intelligent decisions, commonly needing an extremely large amount of data, sometimes on the order of 104 to 107 data points to build an effective policy. Today, such algorithms are used in advertising algorithms, robot manipulation, mastering strategy games such as chess, and predicting protein folding sequences to give a few examples, all of which exhibit data that can be sampled in large quantities for no cost in resources. For manufacturing systems, data is expensive due to material cost and manufacturing time, which makes standard RL infeasible in these environments. To allow feasibility of RL in manufacturing, specialized algorithms have been developed which seek to reduce the number of samples required to train the model down to an order of 10^1, for which experimental validation is desired. This paper seeks to experimentally validate sample efficient RL algorithms in a manufacturing environment. To achieve this, an existing manufacturing system that prints phononic crystals - a type of acoustic metamaterial that acts as an ultrasonic bandpass filter - was modified so that it can interface with a general cognition element, which can accommodate and test the sample-efficient algorithms. The manufacturing system's printing, data processing and storage functions were automated entirely, and a modular framework was developed in the machine's software to accommodate a general cognition element that may be swapped out with other established optimization algorithms for testing. In order to provide a benchmark, standard machine learning algorithms such as Q-learning and Bayesian optimization were implemented adhering to the established software framework structure. Multiple trials are run to benchmark their performance against sample-efficient RL. For future research purposes, a machine vision camera system is also integrated, allowing manufacturing error-checking capabilities. The results showed that the custom RL algorithms consistently yielded higher rewards during training and product performance during policy execution compared to the standard machine learning approaches. This has been the first implementation of a RL algorithm integrated into a manufacturing process, serving as experimental validation of sample efficient RL, and resulting in a modular test bed compatible with a variety of algorithms for future testing. Having an autonomous manufacturing system will eliminate a resource intensive design for a manufacturing process that is sensitive to manufacturing noise and serve as an experimental validation of an optimized sequential learning algorithm that uses a small number of samples.
Machine Learning, Reinforcement Learning, Bayesian Optimization, Experimental Validation