RL之Q Learning:利用强化学习之Q Learning实现走迷宫—训练智能体走到迷宫(复杂迷宫)的宝藏位置
【摘要】 RL之Q Learning:利用强化学习之Q Learning实现走迷宫—训练智能体走到迷宫(复杂迷宫)的宝藏位置
目录
输出结果
设计思路
实现代码
测试记录全过程
输出结果
设计思路
实现代码
from __future__ import print_functionimport num...
RL之Q Learning:利用强化学习之Q Learning实现走迷宫—训练智能体走到迷宫(复杂迷宫)的宝藏位置
目录
输出结果
设计思路
实现代码
-
from __future__ import print_function
-
import numpy as np
-
import time
-
from env import Env
-
from reprint import output
-
-
-
EPSILON = 0.1
-
ALPHA = 0.1
-
GAMMA = 0.9
-
MAX_STEP = 30
-
-
np.random.seed(0)
-
-
-
def epsilon_greedy(Q, state):
-
if (np.random.uniform() > 1 - EPSILON) or ((Q[state, :] == 0).all()):
-
action = np.random.randint(0, 4) # 0~3
-
else:
-
action = Q[state, :].argmax()
-
return action
-
-
-
e = Env()
-
Q = np.zeros((e.state_num, 4))
-
-
with output(output_type="list", initial_len=len(e.map), interval=0) as output_list:
-
for i in range(100):
-
e = Env()
-
while (e.is_end is False) and (e.step < MAX_STEP):
-
action = epsilon_greedy(Q, e.present_state)
-
state = e.present_state
-
reward = e.interact(action)
-
new_state = e.present_state
-
Q[state, action] = (1 - ALPHA) * Q[state, action] + \
-
ALPHA * (reward + GAMMA * Q[new_state, :].max())
-
e.print_map_with_reprint(output_list)
-
time.sleep(0.1)
-
for line_num in range(len(e.map)):
-
if line_num == 0:
-
output_list[0] = 'Episode:{} Total Step:{}, Total Reward:{}'.format(i, e.step, e.total_reward)
-
else:
-
output_list[line_num] = ''
-
time.sleep(2)
测试记录全过程
-
开始
-
......... ......... . x . ......... . x . .A x o . ......... . x . .A x o . . . ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . x o . . . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . .A x o . .A . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . . x o . . . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . .A x o . .A . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . x o . . . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x A . . A . ......... ......... . x . . x A . . . ......... ......... . x . . x A . . . ......... Episode:0 Total Step:17, Total Reward:100 . x . . x A . . . ......... Episode:0 Total Step:17, Total Reward:100 . x A . . . ......... Episode:0 Total Step:17, Total Reward:100 . . ......... Episode:0 Total Step:17, Total Reward:100 ......... Episode:0 Total Step:17, Total Reward:100 ......... ......... . x . ......... . x . .A x o . ......... . x . .A x o . . . ......... . x . .A x o . . . .........
-
-
……
-
-
......... . A . . x o . . . ......... Episode:2 Total Step:30, Total Reward:-5 . A . . x o . . . ......... Episode:2 Total Step:30, Total Reward:-5 . x o . . . ......... Episode:2 Total Step:30, Total Reward:-5 . . ......... Episode:2 Total Step:30, Total Reward:-5 ......... Episode:2 Total Step:30, Total Reward:-5 [F
-
-
-
……
-
-
-
......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . xAo . . A . ......... ......... . x . . xAo . . . ......... ......... . x . . xAo . . . ......... ......... . x . . xAo . . . ......... ......... . x . . xAo . . . ......... ......... . x . . x A . . . ......... ......... . x . . x A . . . ......... ......... . x . . x A . . . ......... Episode:98 Total Step:8, Total Reward:100 . x . . x A . . . ......... Episode:98 Total Step:8, Total Reward:100 . x A . . . ......... Episode:98 Total Step:8, Total Reward:100 . . ......... Episode:98 Total Step:8, Total Reward:100 ......... Episode:98 Total Step:8, Total Reward:100 ......... ......... . Ax . ......... . Ax . . x o . ......... . Ax . . x o . . . ......... . Ax . . x o . . . ......... ......... . Ax . . x o . . . ......... ......... . Ax . . x o . . . ......... ......... . Ax . . x o . . . ......... ......... . Ax . . x o . . . ......... ......... . Ax . . x o . . . ......... ......... . Ax . . x o . . . ......... ......... . x . . x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . x o . . . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . Ax o . . A . ......... ......... . x . . Ax o . . . ......... ......... . x . . Ax o . . . ......... ......... . x . . Ax o . . . ......... ......... . x . . Ax o . . . ......... ......... . x . . x o . . . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . xAo . . A . ......... ......... . x . . xAo . . . ......... ......... . x . . xAo . . . ......... ......... . x . . xAo . . . ......... ......... . x . . xAo . . . ......... ......... . x . . x A . . . ......... ......... . x . . x A . . . ......... ......... . x . . x A . . . ......... Episode:99 Total Step:11, Total Reward:100 . x . . x A . . . ......... Episode:99 Total Step:11, Total Reward:100 . x A . . . ......... Episode:99 Total Step:11, Total Reward:100 . . ......... Episode:99 Total Step:11, Total Reward:100 ......... Episode:99 Total Step:11, Total Reward:100 Episode:99 Total Step:11, Total Reward:100
-
文章来源: yunyaniu.blog.csdn.net,作者:一个处女座的程序猿,版权归原作者所有,如需转载,请联系作者。
原文链接:yunyaniu.blog.csdn.net/article/details/83245633
【版权声明】本文为华为云社区用户转载文章,如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)