CEMAB: A Cross-Entropy-based Method for Large-Scale Multi-Armed Bandits

设为首页

收藏本站

网站地图 | English | 公务邮箱

NSTL服务站

详细信息查看全文

关键词：Cross ; Entropy method ; Sequential decision making ; Multi ; armed bandit
刊名：Lecture Notes in Computer Science
出版年：2017
出版时间：2017
年：2017
卷：10142
期：1
页码：353-365
丛书名：Artificial Life and Computational Intelligence
ISBN：978-3-319-51691-2
卷排序：10142

文摘

The multi-armed bandit (MAB) problem is an important model for studying the exploration-exploitation tradeoff in sequential decision making. In this problem, a gambler has to repeatedly choose between a number of slot machine arms to maximize the total payout, where the total number of plays is fixed. Although many methods have been proposed to solve the MAB problem, most have been designed for problems with a small number of arms. To ensure convergence to the optimal arm, many of these methods, including state-of-the-art methods such as UCB [2], require sweeping over the entire set of arms. As a result, such methods perform poorly in problems with a large number of arms. This paper proposes a new method for solving such large-scale MAB problems. The method, called Cross-Entropy-based Multi Armed Bandit (CEMAB), uses the Cross-Entropy method as a noisy optimizer to find the optimal arm with as little cost as possible. Experimental results indicate that CEMAB outperforms state-of-the-art methods for solving MABs with a large number of arms.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700