Abstract
Usage of a sparse and binary reward function remains one of the most challenging problems in reinforcement learning. In particular, when the environments wherein robotic agents learn are sufficiently vast, it is much more difficult to learn tasks because the probability of reaching the goal is minimal. A Hindsight Experience Replay algorithm was proposed to overcome these difficulties; however, problems persist that affect the learning speed and delay learning when a learning agent cannot receive proper rewards at the beginning of the learning process. In this paper, we present a simple method called Converging Goal Space and Binary Reward Function, which helps agents learn tasks easily and efficiently in large environments while providing a binary reward. At an early stage in training, a larger goal space margin facilitates the reward function for a more rapid policy learning. As the number of successes increases, the goal space is gradually reduced to the size used to the size used in the test. We apply this reward function to two different task experiments: Sliding and throwing, which must be explored at a wider range than the reach of the robotic arms, and then compare the learning efficiency to that of experiments that only employ a sparse and binary reward function. We show that the proposed reward function performs better in large environments using physics simulation, and we demonstrate that the function is applicable to real world robotic arms.
Original language | English |
---|---|
Article number | 09249227 |
Pages (from-to) | 921-927 |
Number of pages | 7 |
Journal | IEEE International Conference on Automation Science and Engineering |
Volume | 2020-January |
DOIs | |
Publication status | Published - 2020 |
Event | 16th IEEE International Conference on Automation Science and Engineering, CASE 2020 - Hong Kong, Hong Kong Duration: 2020 Aug 20 → 2020 Aug 21 |
Bibliographical note
Funding Information:*This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2018R1D1A1B07049267) and funded by the Korea government(MSIT) (2018R1A4A1025986). †Corresponding author 1Department of Mechanical Engineering, Yonsei University, Seodaemun-gu, Seoul 03722, Korea. {wsro0224, wsjeonno, hamidbamshad, hsyang}@yonsei.ac.kr
Publisher Copyright:
© 2020 IEEE.
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Electrical and Electronic Engineering