问题描述
我正在使用Tensorflow来构建和训练多个神经网络.这些网络正在对相关任务(自然语言处理)进行监督学习.
我所有神经网络之间的共同点是它们共享一些早期层(有些共享另外2个).
我希望能够从一种架构中共享经过训练的公共层权重,以初始化另一种架构.
此刻我做事的方式是,每当我要传递权重时,我都会编写一个单独的(即席)代码.这使我的项目混乱,而且很费时间.
有人知道有什么方法可以使我自动进行体重转移.例如,假设要自动检测公共图层,则初始化相应的权重.
您可以创建 tf.Saver
专门用于感兴趣的变量集,只要它们具有相同的名称,就可以在另一个图中恢复它们.您可以使用集合来存储这些变量,然后为该集合创建保护程序:
TRANSFERABLE_VARIABLES = "transferable_variable"
# ...
my_var = tf.get_variable(...)
tf.add_to_collection(TRANSFERABLE_VARIABLES, my_var)
# ...
saver = tf.Saver(tf.get_collection(TRANSFERABLE_VARIABLES), ...)
这应该允许您在一个图形中调用save
并在另一个图形中调用restore
来传递权重.
如果您想避免将任何内容写入磁盘,那么我认为除了手动复制/粘贴值外没有其他任何东西.但是,也可以通过使用集合和完全相同的构建过程在一定程度上实现自动化:
model1_graph = create_model1()
model2_graph = create_model2()
with model1_graph.as_default(), tf.Session() as sess:
# Train...
# Retrieve learned weights
transferable_weights = sess.run(tf.get_collection(TRANSFERABLE_VARIABLES))
with model2_graph.as_default(), tf.Session() as sess:
# Load weights from the other model
for var, weight in zip(tf.get_collection(TRANSFERABLE_VARIABLES),
transferable_weights):
var.load(weight, sess)
# Continue training...
同样,这仅在公共层的构造相同的情况下有效,因为两个图中集合中变量的顺序应相同.
更新:
如果要确保恢复的变量不用于训练,则有几种可能性,尽管它们可能都需要对代码进行更多更改. trainable
变量只是集合 tf.GrapKeys.TRAINABLE_VARIABLES
中包含的变量. ,因此当您在第二张图中创建传输的变量时,您只需说trainable=False
,恢复过程就可以正常进行.如果要提高动态性并自动进行操作,则或多或少有可能,但是请记住这一点:必须在创建优化程序之前 知道必须用于训练的变量列表,以及之后无法更改(无需创建新的优化程序).知道这一点,我认为没有任何解决方案不会传递第一个图形中带有可传递变量名称的列表.例如:
with model1_graph.as_default():
transferable_names = [v.name for v in tf.get_collection(TRANSFERABLE_VARIABLES)]
然后,在第二张图的构建过程中,在定义模型之后并且在创建优化器之前,您可以执行以下操作:
train_vars = [v for v in tf.get_collection(tf.GrapKeys.TRAINABLE_VARIABLES)
if v.name not in transferable_names]
# Assuming that `model2_graph` is the current default graph
tf.get_default_graph().clear_collection(tf.GrapKeys.TRAINABLE_VARIABLES)
for v in train_vars:
tf.add_to_collection(tf.GrapKeys.TRAINABLE_VARIABLES, v)
# Create the optimizer...
另一种选择是不修改集合tf.GrapKeys.TRAINABLE_VARIABLES
,而是将要优化的变量列表(在示例中为train_vars
)作为参数var_list
传递给优化器的minimize
方法.原则上讲,我个人不太喜欢这种方式,因为我认为集合的内容应该与它们的语义目的相匹配(毕竟,代码的其他部分可能出于相同的目的使用同一集合),但这取决于我的猜测. /p>
I am using Tensorflow for building and training several neural networks. These, networks are doing supervised learning on related tasks (Natural language processing).
The common thing between all my neural networks is that they share some of the early layers (some share 2 others more).
I would like be able to share the trained weights of the common layers from one architecture to initialize another architecture.
The way I am doing things at the moment is that I am writing a separate (ad-hoc) piece of code every time I want to transfer the weights. This clutters my project and is time consuming.
Is anyone aware of a method that would allow me to automate the process of weight transfer. Say, for example, to automatically detect the common layers then, initialize the corresponding weights.
You can create a tf.Saver
specifically for the set of variables of interest and you would be able to restore those in another graph, as long as they have the same name. You could use a collection to store those variables and then create the saver for the collection:
TRANSFERABLE_VARIABLES = "transferable_variable"
# ...
my_var = tf.get_variable(...)
tf.add_to_collection(TRANSFERABLE_VARIABLES, my_var)
# ...
saver = tf.Saver(tf.get_collection(TRANSFERABLE_VARIABLES), ...)
This should allow you to call save
in one graph and restore
in the other to transfer the weights.
If you want to avoid writing anything to disk, then I don't think there is anything else but manually copy/paste the values. However, this can also be automated to a fair extent by using a collection and the exact same construction process:
model1_graph = create_model1()
model2_graph = create_model2()
with model1_graph.as_default(), tf.Session() as sess:
# Train...
# Retrieve learned weights
transferable_weights = sess.run(tf.get_collection(TRANSFERABLE_VARIABLES))
with model2_graph.as_default(), tf.Session() as sess:
# Load weights from the other model
for var, weight in zip(tf.get_collection(TRANSFERABLE_VARIABLES),
transferable_weights):
var.load(weight, sess)
# Continue training...
Again, this will only work if the construction of the common layers is the same, because the order of the variables in the collection should be the same for both graphs.
Update:
If you want to make sure that the restored variables are not used for training you have a few possibilities, although they may all require more changes in your code. A trainable
variable is just a variable that is included in the collection tf.GrapKeys.TRAINABLE_VARIABLES
, so you can just say trainable=False
when you create the transfered variables in in the second graph and the restoration process should work the same. If you want to be more dynamic and do it automatically it is more or less possible, but keep in mind this: the list of variables that must be used for training must be known before creating the optimizer, and cannot be changed afterwards (without creating a new optimizer). Knowing this, I don't think there is any solution that doesn't pass through passing a list with the names of the transferable variable from the first graph. E.g.:
with model1_graph.as_default():
transferable_names = [v.name for v in tf.get_collection(TRANSFERABLE_VARIABLES)]
Then, in the construction process of the second graph, after the model is defined and just before creating the optimizer you can do something like this:
train_vars = [v for v in tf.get_collection(tf.GrapKeys.TRAINABLE_VARIABLES)
if v.name not in transferable_names]
# Assuming that `model2_graph` is the current default graph
tf.get_default_graph().clear_collection(tf.GrapKeys.TRAINABLE_VARIABLES)
for v in train_vars:
tf.add_to_collection(tf.GrapKeys.TRAINABLE_VARIABLES, v)
# Create the optimizer...
Another option is not to modify the collection tf.GrapKeys.TRAINABLE_VARIABLES
and instead pass the list of variables you want to be optimized (train_vars
in the example) as the parameter var_list
to the minimize
method of the optimizer. In principle I personally like this less, because I think the contents of the collections should match their semantic purpose (after all, other parts of the code may use the same collection for other purposes), but it depends on the case I guess.
这篇关于有没有办法使用Tensorflow自动进行迁移学习?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!