TY - JOUR
T1 - Adaptive dynamic programming for discrete-time linear quadratic regulation based on multirate generalised policy iteration
AU - Chun, Tae Yoon
AU - Lee, Jae Young
AU - Park, Jin Bae
AU - Choi, Yoon Ho
N1 - Publisher Copyright:
© 2017 Informa UK Limited, trading as Taylor & Francis Group.
PY - 2018/6/3
Y1 - 2018/6/3
N2 - In this paper, we propose two multirate generalised policy iteration (GPI) algorithms applied to discrete-time linear quadratic regulation problems. The proposed algorithms are extensions of the existing GPI algorithm that consists of the approximate policy evaluation and policy improvement steps. The two proposed schemes, named heuristic dynamic programming (HDP) and dual HDP (DHP), based on multirate GPI, use multi-step estimation (M-step Bellman equation) at the approximate policy evaluation step for estimating the value function and its gradient called costate, respectively. Then, we show that these two methods with the same update horizon can be considered equivalent in the iteration domain. Furthermore, monotonically increasing and decreasing convergences, so called value iteration (VI)-mode and policy iteration (PI)-mode convergences, are proved to hold for the proposed multirate GPIs. Further, general convergence properties in terms of eigenvalues are also studied. The data-driven online implementation methods for the proposed HDP and DHP are demonstrated and finally, we present the results of numerical simulations performed to verify the effectiveness of the proposed methods.
AB - In this paper, we propose two multirate generalised policy iteration (GPI) algorithms applied to discrete-time linear quadratic regulation problems. The proposed algorithms are extensions of the existing GPI algorithm that consists of the approximate policy evaluation and policy improvement steps. The two proposed schemes, named heuristic dynamic programming (HDP) and dual HDP (DHP), based on multirate GPI, use multi-step estimation (M-step Bellman equation) at the approximate policy evaluation step for estimating the value function and its gradient called costate, respectively. Then, we show that these two methods with the same update horizon can be considered equivalent in the iteration domain. Furthermore, monotonically increasing and decreasing convergences, so called value iteration (VI)-mode and policy iteration (PI)-mode convergences, are proved to hold for the proposed multirate GPIs. Further, general convergence properties in terms of eigenvalues are also studied. The data-driven online implementation methods for the proposed HDP and DHP are demonstrated and finally, we present the results of numerical simulations performed to verify the effectiveness of the proposed methods.
KW - Multirate generalised policy iteration
KW - adaptive dynamic programming
KW - dual heuristic dynamic programming
KW - heuristic dynamic programming
KW - linear quadratic regulation
KW - mixed-mode convergence
UR - http://www.scopus.com/inward/record.url?scp=85018497818&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85018497818&partnerID=8YFLogxK
U2 - 10.1080/00207179.2017.1312669
DO - 10.1080/00207179.2017.1312669
M3 - Article
AN - SCOPUS:85018497818
SN - 0020-7179
VL - 91
SP - 1223
EP - 1240
JO - International Journal of Control
JF - International Journal of Control
IS - 6
ER -