代码学习网站:
教程:https://stable-baselines.readthedocs.io/en/master/guide/examples.html
gym使用
在做rl时候 ,如何利用gym将动画动起来,让每一步训练过程可视化:
例程:
import gym
from stable_baselines import DQN
from stable_baselines.common.evaluation import evaluate_policy
# Create environment
env = gym.make('LunarLander-v2')
# Instantiate the agent
model = DQN('MlpPolicy', env, learning_rate=1e-3, prioritized_replay=True, verbose=1)
# Train the agent
model.learn(total_timesteps=int(2e5))
# Save the agent
model.save("dqn_lunar")
del model # delete trained model to demonstrate loading
# Load the trained agent
model = DQN.load("dqn_lunar")
# Evaluate the agent
mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10)
# Enjoy trained agent
obs = env.reset()
for i in range(1000):
action, _states = model.predict(obs)
obs, rewards, dones, info = env.step(action)
env.render()
from stable_baselines.common.cmd_util import make_atari_env from stable_baselines.common.vec_env import VecFrameStack from stable_baselines import ACER # There already exists an environment generator # that will make and wrap atari environments correctly. # Here we are also multiprocessing training (num_env=4 => 4 processes) env = make_atari_env('PongNoFrameskip-v4', num_env=4, seed=0) # Frame-stacking with 4 frames env = VecFrameStack(env, n_stack=4) model = ACER('CnnPolicy', env, verbose=1) model.learn(total_timesteps=25000) obs = env.reset() while True: action, _states = model.predict(obs) obs, rewards, dones, info = env.step(action) env.render()
bug解决:
在执行时:
import gym
env = gym.make('ALE/Pong-v5')
env.reset()
for i in range(1000):
env.step(env.action_space.sample())
env.render()
env.close()
输出,无法对fream进行渲染。
ImportError: cannot import name ‘rendering’ from ‘gym.envs.classic_control’.
解决办法:
打开 包gym.envs.classic_control,发现没有 rendering .py文件,去github,发现,main分支确实已经没有这个文件了,应该是版本的问题,最新版本已经去掉了该文件,然而其他分支是有的,所以将该文件下载并放在包对应位置。
此外,还需要 在代码中加入
from gym.envs.classic_control import rendering
导入rendering.py