- 机器学习的训练过程中会产生各类数据,包括 “标量scalar”、“图像image”、“统计图diagram”、“视频video”、“音频audio”、“文本text”、“嵌入Embedding” 等等。为了更好地追踪和分析这些数据,许多可视化工具应运而生,比如之前介绍的 wandb
- 本文介绍另一种更加常用的数据追踪工具 TensorBoard,参考见 Pytorch 官方文档
1. Tensorboard 简介
- TensorBoard 是Google开发的一个机器学习可视化工具,它原本是TensorFlow中的模块,不过现在已经集成到了Pytorch中。它的功能主要包括
- 跟踪和可视化损失及准确率等指标
- 可视化模型图(操作和层)
- 查看权重、偏差或其他张量随时间变化的直方图
- 将嵌入投射到较低的维度空间
- 显示图片、文字和音频数据
- 剖析 TensorFlow 程序
- TensorBoard 的工作原理和 Wandb 基本相同,本质也是一个网页服务,分成前端和后台两部分,两部分间是异步I/O的
- 后台程序将数据写入到本地文件中
- 前端程序读取本地文件中的数据来进行显示
- 由于 TensorBoard 已经集成到 Pytorch,无需再单独安装,直接
torch.utils.tensorboard
即可找到
2. 快速入门
2.1 运行方法
- 可以把 Tensorboard 的运行分成两步
- 记录数据:使用
SummaryWriter
类实例数据要追踪的数据。每次运行时,该类对象首先会在给定目录log_dir中创建 “事件文件”(本次运行的数据仓库),然后在训练过程中我们可以利用其提供的一系列高级 API 向事件文件中异步地添加数据,从而实时地追踪数据变化。该类的定义如下torch.utils.tensorboard.writer.SummaryWriter(log_dir=None, comment='', purge_step=None, max_queue=10, flush_secs=120, filename_suffix='')
- 启动网页服务显示数据:使用
tensorboard --logdir 数据文件夹
命令运行网页服务,其中 “数据文件夹” 应设置为之前实验时设置的 log_dir。若看到了如下输出TensorBoard 2.8.0 at http://localhost:6006/ (Press CTRL+C to quit)
则说明启动成功,在浏览器打开相应 url 就能进入TensorBoard界面看到数据显示了。注意默认端口是 6006,如果想进一步指定网页服务端口,可以用tensorboard --logdir=数据文件夹 --port=端口
命令
- 记录数据:使用
2.2 常用 API
-
我们使用
SummaryWriter
类提供的一系列 API 记录数据,比较常用的包括- add_hparams:以表格形式添加一组要比较的超参数
- add_scalar:将标量数据添加到摘要中,用来画一条折线
- add_scalars:将一组标量数据添加到摘要中,可以在同一张图内画多条折线
- add_histogram:绘制直方图
- add_image:将一张图片添加到摘要,需要
pillow
包 - add_images:将一组图片添加到摘要,需要
pillow
包 - add_figure:将matplotlib图形渲染到图像中,并将其添加到摘要中,需要
matplotlib
包 - add_video:将视频数据添加到摘要中,需要
moviepy
包 - add_audio:将音频数据添加到摘要中
- add_text:将文本数据添加到摘要中
- add_graph:将图表数据添加到摘要中,这个常用来显示模型结构
- add_embedding:将词嵌入向量数据添加到摘要中,这个可以交互式显示一组词向量在三维空间的投影
- add_pr_curve:添加精确召回曲线。绘制精确召回曲线可让您了解模型在不同阈值设置下的性能。此函数可以为每个目标提供真实标签(T/F)和预测置信度(通常是模型的输出)。利用 TensorBoard UI 可以交互式地选择阈值
- add_custom_scalars:通过在 “scalar” 中收集的图表
tag
来创建特殊图表。注意对于每个SummaryWriter
对象该函数只能调用一次,因为它只向tensorboard提供元数据,所以可以在训练循环之前或之后调用该函数 - add_mesh:向TensorBoard添加网格或3D点云。可视化基于Three.js,因此它允许用户与呈现的对象进行交互。除了顶点、面等基本定义外,用户还可以提供相机参数、光照条件等
-
# 引入SummaryWriter import numpy as np import torch from torch.utils.tensorboard import SummaryWriter import torchvision from PIL import Image import matplotlib import matplotlib.pyplot as plt matplotlib.use('Agg') ##### 1、add_scalar实例 def add_scalar_demo(): # 将信息写入logs文件夹,可以供TensorBoard消费,来可视化 writer = SummaryWriter("logs/add_scalar") # 绘制 y = 2x 实例 x = range(100) for i in x: writer.add_scalar('add_scalar实例:y=2x', i * 2, i) # 关闭 writer.close() add_scalar_demo() ##### 2、add_scalars 实例 def add_scalars_demo(): # 将信息写入logs文件夹,可以供TensorBoard消费,来可视化 writer = SummaryWriter("logs/add_scalars") r = 5 for i in range(100): writer.add_scalars('add_scalars实例', {'xsinx':i*np.sin(i/r), 'xcosx':i*np.cos(i/r), 'tanx': np.tan(i/r)}, i) # 关闭 writer.close() add_scalars_demo() ##### 3、add_text 实例 def add_text_demo(): # 将信息写入logs文件夹,可以供TensorBoard消费,来可视化 writer = SummaryWriter("logs/add_text") writer.add_text('lstm', 'This is an lstm', 0) writer.add_text('rnn', 'This is an rnn', 10) # 关闭 writer.close() add_text_demo() ##### 4、add_graph 实例 def add_graph_demo(): # 将信息写入logs文件夹,可以供TensorBoard消费,来可视化 writer = SummaryWriter("logs/add_graph") img = torch.rand([1, 3, 64, 64], dtype=torch.float32) model = torchvision.models.AlexNet(num_classes=10) # print(model) writer.add_graph(model, input_to_model=img) # 关闭 writer.close() add_graph_demo() ##### 5、add_image 实例 def add_image_demo(): # 将信息写入logs文件夹,可以供TensorBoard消费,来可视化 writer = SummaryWriter("logs/add_image") img1 = np.random.randn(1, 100, 100) writer.add_image('add_image 实例:/imag1', img1) img2 = np.random.randn(100, 100, 3) writer.add_image('add_image 实例:/imag2', img2, dataformats='HWC') img = Image.open('../imgs/1.png') img_array = np.array(img) writer.add_image('add_image 实例:/cartoon', img_array, dataformats='HWC') # 关闭 writer.close() add_image_demo() ##### 6、add_images 实例 def add_images_demo(): # 将信息写入logs文件夹,可以供TensorBoard消费,来可视化 writer = SummaryWriter("logs/add_images") imgs1 = np.random.randn(8, 100, 100, 1) writer.add_images('add_images 实例/imgs1', imgs1, dataformats='NHWC') imgs2 = np.zeros((16, 3, 100, 100)) for i in range(16): imgs2[i, 0] = np.arange(0, 10000).reshape(100, 100) / 10000 / 16 * i imgs2[i, 1] = (1 - np.arange(0, 10000).reshape(100, 100) / 10000) / 16 * i writer.add_images('add_images 实例/imgs2', imgs2) # Default is :math:`(N, 3, H, W)` img = Image.open('../imgs/1.jpg') img3 = np.array(img) imgs4= np.zeros((5, img3.shape[0], img3.shape[1], img3.shape[2])) for i in range(5): imgs4[i] = img3//(i+1) writer.add_images('add_images 实例/imgs4', imgs4, dataformats='NHWC') # Default is :math:`(N, 3, H, W)` # 关闭 writer.close() add_images_demo() ##### 7、add_figure 实例 def add_figure_demo(): # 将信息写入logs文件夹,可以供TensorBoard消费,来可视化 writer = SummaryWriter("logs/add_figure") # First create some toy data: x = np.linspace(0, 2 * np.pi, 400) y = np.sin(x ** 2) # Create just a figure and only one subplot fig, ax = plt.subplots() ax.plot(x, y) ax.set_title('Simple plot') writer.add_figure("add_figure 实例:figure", fig) # 关闭 writer.close() add_figure_demo() ##### 8、add_pr_curve 实例 def add_pr_curve_demo(): # 将信息写入logs文件夹,可以供TensorBoard消费,来可视化 writer = SummaryWriter("logs/add_pr_curve") labels = np.random.randint(2, size=100) # binary label predictions = np.random.rand(100) writer.add_pr_curve('add_pr_curve 实例:pr_curve', labels, predictions, 0) # 关闭 writer.close() add_pr_curve_demo() ##### 9、add_embedding 实例 def add_embedding_demo(): # 将信息写入logs文件夹,可以供TensorBoard消费,来可视化 writer = SummaryWriter("logs/add_embedding") import tensorflow as tf import tensorboard as tb tf.io.gfile = tb.compat.tensorflow_stub.io.gfile import keyword import torch meta = [] while len(meta) < 100: meta = meta + keyword.kwlist # get some strings meta = meta[:100] for i, v in enumerate(meta): meta[i] = v + str(i) label_img = torch.rand(100, 3, 10, 32) for i in range(100): label_img[i] *= i / 100.0 writer.add_embedding(torch.randn(100, 5), metadata=meta, label_img=label_img) # 关闭 writer.close() add_embedding_demo()
3. 使用 TensorBoard 记录 PPO 运行情况
- 将 TensorBoard 相关代码添加到前文 RL 实践(7)—— CartPole【TPRO & PPO】 的 PPO 代码中,观察 RL 的收敛过程
这里使用了两个 API,import gym import torch import random import torch.nn.functional as F import numpy as np import matplotlib.pyplot as plt from tqdm import tqdm from gym.utils.env_checker import check_env from gym.wrappers import TimeLimit from datetime import datetime from torch.utils.tensorboard import SummaryWriter class PolicyNet(torch.nn.Module): ''' 策略网络是一个两层 MLP ''' def __init__(self, input_dim, hidden_dim, output_dim): super(PolicyNet, self).__init__() self.fc1 = torch.nn.Linear(input_dim, hidden_dim) self.fc2 = torch.nn.Linear(hidden_dim, output_dim) def forward(self, x): x = F.relu(self.fc1(x)) # (1, hidden_dim) x = F.softmax(self.fc2(x), dim=1) # (1, output_dim) return x class VNet(torch.nn.Module): ''' 价值网络是一个两层 MLP ''' def __init__(self, input_dim, hidden_dim): super(VNet, self).__init__() self.fc1 = torch.nn.Linear(input_dim, hidden_dim) self.fc2 = torch.nn.Linear(hidden_dim, 1) def forward(self, x): x = F.relu(self.fc1(x)) x = self.fc2(x) return x class PPO(torch.nn.Module): def __init__(self, state_dim, hidden_dim, action_range, actor_lr, critic_lr, lmbda, epochs, eps, gamma, device): super().__init__() self.actor = PolicyNet(state_dim, hidden_dim, action_range).to(device) self.critic = VNet(state_dim, hidden_dim).to(device) self.actor_optimizer = torch.optim.Adam(self.actor.parameters(), lr=actor_lr) self.critic_optimizer = torch.optim.Adam(self.critic.parameters(), lr=critic_lr) self.device = device self.gamma = gamma self.lmbda = lmbda # GAE 参数 self.epochs = epochs # 一条轨迹数据用来训练的轮数 self.eps = eps # PPO 中截断范围的参数 self.device = device def take_action(self, state): state = torch.tensor(state, dtype=torch.float).to(self.device) state = state.unsqueeze(0) probs = self.actor(state) action_dist = torch.distributions.Categorical(probs) action = action_dist.sample() return action.item() def compute_advantage(self, gamma, lmbda, td_delta): ''' 广义优势估计 GAE ''' td_delta = td_delta.detach().numpy() advantage_list = [] advantage = 0.0 for delta in td_delta[::-1]: advantage = gamma * lmbda * advantage + delta advantage_list.append(advantage) advantage_list.reverse() return torch.tensor(np.array(advantage_list), dtype=torch.float) def update(self, transition_dict): states = torch.tensor(np.array(transition_dict['states']), dtype=torch.float).to(self.device) actions = torch.tensor(transition_dict['actions']).view(-1, 1).to(self.device) rewards = torch.tensor(transition_dict['rewards'], dtype=torch.float).view(-1, 1).to(self.device) next_states = torch.tensor(np.array(transition_dict['next_states']), dtype=torch.float).to(self.device) dones = torch.tensor(transition_dict['dones'], dtype=torch.float).view(-1, 1).to(self.device) td_target = rewards + self.gamma * self.critic(next_states) * (1-dones) td_delta = td_target - self.critic(states) advantage = self.compute_advantage(self.gamma, self.lmbda, td_delta.cpu()).to(self.device) old_log_probs = torch.log(self.actor(states).gather(1, actions)).detach() # 用刚采集的一条轨迹数据训练 epochs 轮 for _ in range(self.epochs): log_probs = torch.log(self.actor(states).gather(1, actions)) ratio = torch.exp(log_probs - old_log_probs) surr1 = ratio * advantage surr2 = torch.clamp(ratio, 1 - self.eps, 1 + self.eps) * advantage # 截断 actor_loss = torch.mean(-torch.min(surr1, surr2)) # PPO损失函数 critic_loss = torch.mean(F.mse_loss(self.critic(states), td_target.detach())) # 更新网络参数 self.actor_optimizer.zero_grad() self.critic_optimizer.zero_grad() actor_loss.backward() critic_loss.backward() self.actor_optimizer.step() self.critic_optimizer.step() if __name__ == "__main__": def moving_average(a, window_size): ''' 生成序列 a 的滑动平均序列 ''' cumulative_sum = np.cumsum(np.insert(a, 0, 0)) middle = (cumulative_sum[window_size:] - cumulative_sum[:-window_size]) / window_size r = np.arange(1, window_size-1, 2) begin = np.cumsum(a[:window_size-1])[::2] / r end = (np.cumsum(a[:-window_size:-1])[::2] / r)[::-1] return np.concatenate((begin, middle, end)) def set_seed(env, seed=42): ''' 设置随机种子 ''' env.action_space.seed(seed) env.reset(seed=seed) random.seed(seed) np.random.seed(seed) torch.manual_seed(seed) state_dim = 4 # 环境观测维度 action_range = 2 # 环境动作空间大小 actor_lr = 1e-3 critic_lr = 1e-2 num_episodes = 200 hidden_dim = 64 gamma = 0.98 lmbda = 0.95 epochs = 10 eps = 0.2 device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") # build environment env_name = 'CartPole-v0' env = gym.make(env_name, render_mode='rgb_array') check_env(env.unwrapped) # 检查环境是否符合 gym 规范 set_seed(env, 0) # build agent agent = PPO(state_dim, hidden_dim, action_range, actor_lr, critic_lr, lmbda, epochs, eps, gamma, device) # TensorBoard writer TIMESTAMP = "{0:%Y-%m-%dT%H-%M-%S/}".format(datetime.now()) writer = SummaryWriter(f"logs/PPO") #writer = SummaryWriter(f"logs/PPO/{TIMESTAMP}") # start training return_list = [] for i in range(10): with tqdm(total=int(num_episodes / 10), desc='Iteration %d' % i) as pbar: for i_episode in range(int(num_episodes / 10)): episode_return = 0 transition_dict = { 'states': [], 'actions': [], 'next_states': [], 'next_actions': [], 'rewards': [], 'dones': [] } state, _ = env.reset() # 以当前策略交互得到一条轨迹 while True: action = agent.take_action(state) next_state, reward, terminated, truncated, _ = env.step(action) next_action = agent.take_action(next_state) transition_dict['states'].append(state) transition_dict['actions'].append(action) transition_dict['next_states'].append(next_state) transition_dict['next_actions'].append(next_action) transition_dict['rewards'].append(reward) transition_dict['dones'].append(terminated or truncated) state = next_state episode_return += reward if terminated or truncated: break #env.render() # 用当前策略收集的数据进行 on-policy 更新 agent.update(transition_dict) # 更新进度条 return_list.append(episode_return) pbar.set_postfix({ 'episode': '%d' % (num_episodes / 10 * i + i_episode + 1), 'return': '%.3f' % episode_return, 'ave return': '%.3f' % np.mean(return_list[-10:]) }) pbar.update(1) writer.add_scalars( main_tag='return', tag_scalar_dict={f'hidden{hidden_dim}':episode_return}, global_step=i*int(num_episodes / 10) + i_episode ) writer.add_hparams( hparam_dict={ 'actor_lr': actor_lr, 'critic_lr': critic_lr, 'hidden_dim': hidden_dim, 'gamma': gamma, 'lmbda': lmbda, 'eps': eps, 'num_episodes': num_episodes }, metric_dict={ 'hparam/ave return': np.mean(return_list), } ) writer.close() # show policy performence mv_return_list = moving_average(return_list, 29) episodes_list = list(range(len(return_list))) plt.figure(figsize=(12,8)) plt.plot(episodes_list, return_list, label='raw', alpha=0.5) plt.plot(episodes_list, mv_return_list, label='moving ave') plt.xlabel('Episodes') plt.ylabel('Returns') plt.title(f'{agent._get_name()} on CartPole-V0') plt.legend() plt.savefig(f'./result/{agent._get_name()}.png') plt.show()
add_hparams
用来记录实验的超参数和结果,add_scalars
用来记录收敛过程(用这个是为了方便把多条曲线绘制到一张图中),结果如下
这里测试了两种隐藏层尺寸,发现 hidden_size=64 时收敛比 128 快一点
4. 其他
- 关于 TensorBoard UI 的说明可以参考:TensorBoard最全使用教程:看这篇就够了
- 关于多次实验数据混合显示互相干扰的问题可以参考:tensorboard多个events文件显示紊乱的解决办法
- 总之感觉不如 Wandb 好用,就简单记录一下