RL之Q Learning:利用强化学习之Q Learning实现走迷宫—训练智能体走到迷宫(复杂迷宫)的宝藏位置

举报
一个处女座的程序猿 发表于 2021/04/02 03:07:35 2021/04/02
【摘要】 RL之Q Learning:利用强化学习之Q Learning实现走迷宫—训练智能体走到迷宫(复杂迷宫)的宝藏位置     目录 输出结果 设计思路 实现代码 测试记录全过程       输出结果   设计思路       实现代码 from __future__ import print_functionimport num...

RL之Q Learning:利用强化学习之Q Learning实现走迷宫—训练智能体走到迷宫(复杂迷宫)的宝藏位置

 

 

目录

输出结果

设计思路

实现代码

测试记录全过程


 

 

 

输出结果

 

设计思路

 

 

 

实现代码


  
  1. from __future__ import print_function
  2. import numpy as np
  3. import time
  4. from env import Env
  5. from reprint import output
  6. EPSILON = 0.1
  7. ALPHA = 0.1
  8. GAMMA = 0.9
  9. MAX_STEP = 30
  10. np.random.seed(0)
  11. def epsilon_greedy(Q, state):
  12. if (np.random.uniform() > 1 - EPSILON) or ((Q[state, :] == 0).all()):
  13. action = np.random.randint(0, 4) # 0~3
  14. else:
  15. action = Q[state, :].argmax()
  16. return action
  17. e = Env()
  18. Q = np.zeros((e.state_num, 4))
  19. with output(output_type="list", initial_len=len(e.map), interval=0) as output_list:
  20. for i in range(100):
  21. e = Env()
  22. while (e.is_end is False) and (e.step < MAX_STEP):
  23. action = epsilon_greedy(Q, e.present_state)
  24. state = e.present_state
  25. reward = e.interact(action)
  26. new_state = e.present_state
  27. Q[state, action] = (1 - ALPHA) * Q[state, action] + \
  28. ALPHA * (reward + GAMMA * Q[new_state, :].max())
  29. e.print_map_with_reprint(output_list)
  30. time.sleep(0.1)
  31. for line_num in range(len(e.map)):
  32. if line_num == 0:
  33. output_list[0] = 'Episode:{} Total Step:{}, Total Reward:{}'.format(i, e.step, e.total_reward)
  34. else:
  35. output_list[line_num] = ''
  36. time.sleep(2)

 

 

 

测试记录全过程


  
  1. 开始
  2. ......... ......... . x . ......... . x . .A x o . ......... . x . .A x o . . . ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . x o . . . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . .A x o . .A . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . . x o . . . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . . x o . .A . ......... ......... . x . .A x o . .A . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . .A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . x o . . . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x A . . A . ......... ......... . x . . x A . . . ......... ......... . x . . x A . . . ......... Episode:0 Total Step:17, Total Reward:100 . x . . x A . . . ......... Episode:0 Total Step:17, Total Reward:100 . x A . . . ......... Episode:0 Total Step:17, Total Reward:100 . . ......... Episode:0 Total Step:17, Total Reward:100 ......... Episode:0 Total Step:17, Total Reward:100 ......... ......... . x . ......... . x . .A x o . ......... . x . .A x o . . . ......... . x . .A x o . . . .........
  3. ……
  4. ......... . A . . x o . . . ......... Episode:2 Total Step:30, Total Reward:-5 . A . . x o . . . ......... Episode:2 Total Step:30, Total Reward:-5 . x o . . . ......... Episode:2 Total Step:30, Total Reward:-5 . . ......... Episode:2 Total Step:30, Total Reward:-5 ......... Episode:2 Total Step:30, Total Reward:-5 [F
  5. ……
  6. ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . xAo . . A . ......... ......... . x . . xAo . . . ......... ......... . x . . xAo . . . ......... ......... . x . . xAo . . . ......... ......... . x . . xAo . . . ......... ......... . x . . x A . . . ......... ......... . x . . x A . . . ......... ......... . x . . x A . . . ......... Episode:98 Total Step:8, Total Reward:100 . x . . x A . . . ......... Episode:98 Total Step:8, Total Reward:100 . x A . . . ......... Episode:98 Total Step:8, Total Reward:100 . . ......... Episode:98 Total Step:8, Total Reward:100 ......... Episode:98 Total Step:8, Total Reward:100 ......... ......... . Ax . ......... . Ax . . x o . ......... . Ax . . x o . . . ......... . Ax . . x o . . . ......... ......... . Ax . . x o . . . ......... ......... . Ax . . x o . . . ......... ......... . Ax . . x o . . . ......... ......... . Ax . . x o . . . ......... ......... . Ax . . x o . . . ......... ......... . Ax . . x o . . . ......... ......... . x . . x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . A x o . . . ......... ......... . x . . x o . . . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . Ax o . . A . ......... ......... . x . . Ax o . . . ......... ......... . x . . Ax o . . . ......... ......... . x . . Ax o . . . ......... ......... . x . . Ax o . . . ......... ......... . x . . x o . . . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . x o . . A . ......... ......... . x . . xAo . . A . ......... ......... . x . . xAo . . . ......... ......... . x . . xAo . . . ......... ......... . x . . xAo . . . ......... ......... . x . . xAo . . . ......... ......... . x . . x A . . . ......... ......... . x . . x A . . . ......... ......... . x . . x A . . . ......... Episode:99 Total Step:11, Total Reward:100 . x . . x A . . . ......... Episode:99 Total Step:11, Total Reward:100 . x A . . . ......... Episode:99 Total Step:11, Total Reward:100 . . ......... Episode:99 Total Step:11, Total Reward:100 ......... Episode:99 Total Step:11, Total Reward:100 Episode:99 Total Step:11, Total Reward:100

 

 

 

 

文章来源: yunyaniu.blog.csdn.net,作者:一个处女座的程序猿,版权归原作者所有,如需转载,请联系作者。

原文链接:yunyaniu.blog.csdn.net/article/details/83245633

【版权声明】本文为华为云社区用户转载文章,如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。