如何在Keras/TensorFlow中可视化RNN/LSTM权重?

本文介绍了如何在Keras/TensorFlow中可视化RNN/LSTM权重?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我遇到过许多研究出版物，Q& A在讨论检查RNN权重的必要性.一些相关的答案朝着正确的方向提出，建议get_weights()-但是我如何真正可视化权重 ?即，LSTM和GRU具有门，所有RNN具有通道作为独立的特征提取器-因此，我如何(1)提取每门权重，然后(2)以信息方式绘制权重?

I've come across research publications and Q&A's discussing a need for inspecting RNN weights; some related answers are in the right direction, suggesting get_weights() - but how do I actually visualize the weights meaningfully? Namely, LSTMs and GRUs have gates, and all RNNs have channels that serve as independent feature extractors - so how do I (1) fetch per-gate weights, and (2) plot them in an informative manner?

推荐答案

Keras/TF以明确定义的顺序构建RNN权重，可以从源代码或直接通过layer.__dict__进行检查-然后将其用于获取每个内核的权重和每个门的权重；给定张量的形状，然后可以使用每通道处理.下面的代码和解释涵盖了Keras/TF RNN的所有可能的情况，并且应该可以轻松扩展到将来的任何API更改.

Keras/TF build RNN weights in a well-defined order, which can be inspected from the source code or via layer.__dict__ directly - then to be used to fetch per-kernel and per-gate weights; per-channel treatment can then be employed given a tensor's shape. Below code & explanations cover every possible case of a Keras/TF RNN, and should be easily expandable to any future API changes.

另请参阅可视化RNN梯度，以及对 RNN正则化;与前一篇文章不同，我不会在此处包括一个简化的变体，因为从重量提取和组织的本质来看，它仍然相当庞大和复杂；相反，只需在存储库中查看相关的源代码(请参阅下一节).

Also see visualizing RNN gradients, and an application to RNN regularization; unlike in the former post, I won't be including a simplified variant here, as it'd still be rather large and complex per the nature of weight extraction and organization; instead, simply view relevant source code in the repository (see next section).

代码源:请参阅RNN (该帖子包括了w /大图)，我的存储库；包括:

Code source: See RNN (this post included w/ bigger images), my repository; included are:

激活可视化
重量可视化
激活梯度可视化
重量梯度可视化
说明所有功能的文档字符串
支持Eager，Graph，TF1，TF2和from keras& from tf.keras
比示例中显示的视觉自定义性更高

Activations visualization
Weights visualization
Activations gradients visualization
Weights gradients visualization
Docstrings explaining all functionality
Support for Eager, Graph, TF1, TF2, and from keras & from tf.keras
Greater visual customizability than shown in examples

可视化方法:

2D热图:绘制每个门，每个内核，每个方向的重量分布； 清楚地显示了内核与隐藏的关系
直方图:绘制每个门，每个内核，每个方向的权重分布； 丢失上下文信息

2D heatmap: plot weight distributions per gate, per kernel, per direction; clearly shows kernel-to-hidden relations
histogram: plot weight distributions per gate, per kernel, per direction; loses context info

EX 1:uni-LSTM，256个单位，重量-batch_shape = (16, 100, 20)(输入)
rnn_histogram(model, 'lstm', equate_axes=False, show_bias=False)
rnn_histogram(model, 'lstm', equate_axes=True, show_bias=False)
rnn_heatmap(model, 'lstm')

EX 1: uni-LSTM, 256 units, weights -- batch_shape = (16, 100, 20) (input)
rnn_histogram(model, 'lstm', equate_axes=False, show_bias=False)
rnn_histogram(model, 'lstm', equate_axes=True, show_bias=False)
rnn_heatmap(model, 'lstm')

顶部图是直方图子图网格，显示每个内核以及每个门内每个门的权重分布
第二个图集equate_axes=True用于在内核和门之间进行均匀比较，从而提高了比较质量，但可能会降低视觉吸引力
最后一个图是权重相同的热图，门间距用垂直线标记，并且还包括偏置权重
与直方图不同，热图保留通道/上下文信息:可以清楚地区分输入到隐藏和隐藏到隐藏的转换矩阵
请注意，在忘记"门口大量集中了最大值；作为琐事，在Keras(通常是)中，除了忘记"偏置(初始化为1)之外，所有偏置门都初始化为零.

Top plot is a histogram subplot grid, showing weight distributions per kernel, and within each kernel, per gate
Second plot sets equate_axes=True for an even comparison across kernels and gates, improving quality of comparison, but potentially degrading visual appeal
Last plot is a heatmap of the same weights, with gate separations marked by vertical lines, and bias weights also included
Unlike histograms, the heatmap preserves channel/context information: input-to-hidden and hidden-to-hidden transforming matrices can be clearly distinguished
Note the large concentration of maximal values at the Forget gate; as trivia, in Keras (and usually), bias gates are all initialized to zeros, except the Forget bias, which is initialized to ones

EX 2:bi-CuDNNLSTM，256单位，重量-batch_shape = (16, 100, 16)(输入)
rnn_histogram(model, 'bidir', equate_axes=2)
rnn_heatmap(model, 'bidir', norm=(-.8, .8))

EX 2: bi-CuDNNLSTM, 256 units, weights -- batch_shape = (16, 100, 16) (input)
rnn_histogram(model, 'bidir', equate_axes=2)
rnn_heatmap(model, 'bidir', norm=(-.8, .8))

再次注意偏差热图；它们似乎不再与EX 1中位于相同的位置.实际上，CuDNNLSTM(和CuDNNGRU)偏差的定义和初始化方式不同-无法从直方图推断出这一点

Bidirectional is supported by both; biases included in this example for histograms
Note again the bias heatmaps; they no longer appear to reside in the same locality as in EX 1. Indeed, CuDNNLSTM (and CuDNNGRU) biases are defined and initialized differently - something that can't be inferred from histograms

EX 3:uni-CuDNNGRU，64个单位，权重梯度-batch_shape = (16, 100, 16)(输入)
rnn_heatmap(model, 'gru', mode='grads', input_data=x, labels=y, cmap=None, absolute_value=True)

EX 3: uni-CuDNNGRU, 64 units, weights gradients -- batch_shape = (16, 100, 16) (input)
rnn_heatmap(model, 'gru', mode='grads', input_data=x, labels=y, cmap=None, absolute_value=True)

我们可能希望可视化渐变强度，这可以通过absolute_value=True和灰度色图
门分隔也是显而易见的:
- New是最活跃的内核门(输入到隐藏)，建议对允许信息流
- Reset是最不活跃的循环门(隐藏到隐藏)，建议在内存保持方面的错误校正最少
- We may wish to visualize gradient intensity, which can be done via absolute_value=True and a greyscale colormap
- Gate separations are apparent even without explicit separating lines in this example:
  - New is the most active kernel gate (input-to-hidden), suggesting more error correction on permitting information flow
  - Reset is the least active recurrent gate (hidden-to-hidden), suggesting least error correction on memory-keeping
  BONUS EX:LSTM NaN检测，512个单位，重量-batch_shape = (16, 100, 16)(输入)
  - 热图和直方图都带有内置的NaN检测-内核，门和方向
  - Heatmap将打印NaN到控制台，而直方图会将其直接标记在图上
  - 两者都将在绘制之前将NaN值设置为零；在下面的示例中，所有相关的非NaN权重已经为零
  这篇关于如何在Keras/TensorFlow中可视化RNN/LSTM权重?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！