本文介绍了python装箱数据openAI Gym的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个自定义环境,以使用openAI健身房进行强化学习.我需要表示环境在名为observation_space的变量中将看到的所有可能的值.代理可以使用以下三种可能的操作,称为action_space

I am attempting to create a custom environment for reinforcement learning with openAI gym. I need to represent all possible values that the environment will see in a variable called observation_space. There are 3 possible actions for the agent to use called action_space

更具体地说,observation_space是一个温度传感器,它将看到50到150度的可能范围,我想我可以用以下方式表示所有这些:

To be more specific the observation_space is a temperature sensor which will see possible ranges from 50 to 150 degrees and I think I can represent all of this by:

编辑,我的action_space numpy数组错误

EDIT, I had the action_space numpy array wrong

import numpy as np
action_space = np.array([ 0,  1,  2])
observation_space = np.arange(50,150,1)

有没有一种更好的方法可以用于对数据进行装箱的observation_space? IE,制作20个垃圾箱50-55、55-60、60-65等...

Is there a better method that I could use for the observation_space where I could bin the data? IE, make 20 bins 50-55, 55-60, 60-65, etc...

我认为我所拥有的会起作用,但似乎有点麻烦……而且我确信有更好的做法,因为在这个问题上我没有太多智慧.这将打印出一个Q表:

I think what I have will work but seems sort of cumbersome... And I am sure there is a better practice as there is not a lot of wisdom on my end this subject. This will print out a Q table:

action_size = action_space.shape[0]
state_size = observation_space.shape[0]

qtable = np.zeros((state_size, action_size))
print(qtable)

推荐答案

这与编程并没有真正的关系,所以也许在 stats.stackexchange 您可能会得到更好的答案.无论如何,这仅取决于您想要多少精度.我猜您想根据传感器读数更改温度(升高,降低,不改变). 50和51之间(就最佳动作而言)有很大不同吗?如果不是,则可以每2度离散状态空间.等等.

This is not really related to programming, so maybe on stats.stackexchange you may get better answers. Anyway, it just depends on how much accuracy you want. I guess you want to change the temperature (increase, decrease, don't change) according to the sensor readings. Is there much different (in terms of optimal action) between 50 and 51? If not, then you can discretize the state space every 2 degrees. And so on.

更一般地说,这样做是在RL中使用所谓的功能".在状态空间的间隔上进行离散化称为 tile编码,通常可以正常工作好吧.

More generally, doing so you are using what in RL are called "features". A discretization over an interval of the state space is called tile coding and usually works well.

如果您是RL的新手,我真的建议您阅读这本书,或者至少与您的工作相关的第1,3,4章.

If you are new to RL, I really advise to read this book, or at least Chapters 1,3,4 which are related to what you are doing.

这篇关于python装箱数据openAI Gym的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-07 18:27