Integral temporal difference learning for continuous-time linear quadratic regulations

设为首页

收藏本站

网站地图 | English | 公务邮箱

NSTL服务站

详细信息查看全文

作者：Tae Yoon Chun ; Jae Young Lee ; Jin Bae Park…
关键词：Adaptive optimal control ; linear quadratic regulation ; reinforcement learning ; temporal difference ; value iteration
刊名：International Journal of Control, Automation and Systems
出版年：2017
出版时间：February 2017
年：2017
卷：15
期：1
页码：226-238
全文大小：
刊物类别：Engineering
刊物主题：Control, Robotics, Mechatronics;
出版者：Institute of Control, Robotics and Systems and The Korean Institute of Electrical Engineers
ISSN：2005-4092
卷排序：15

文摘

In this paper, we propose a temporal difference (TD) learning method, called integral TD learning that efficiently finds solutions to continuous-time (CT) linear quadratic regulation (LQR) problems in an online fashion where system matrix A is unknown. The idea originates from a computational reinforcement learning method known as TD(0), which is the simplest TD method in a finite Markov decision process. For the proposed integral TD method, we mathematically analyze the positive definiteness of the updated value functions, monotone convergence conditions, and stability properties concerning the locations of the closed-loop poles in terms of the learning rate and the discount factor. The proposed method includes the existing value iteration method for CT LQR problems as a special case. Finally, numerical simulations are carried out to verify the effectiveness of the proposed method and further investigate the aforementioned mathematical properties.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700