问题描述
我正在尝试使用Keras API实现翘曲损失(成对排名函数的类型).我有点卡住了如何成功.
I am trying to implement warp loss (type of pairwise ranking function) with Keras API. I am kinda stuck how this can be succeeded.
翘曲损耗的定义来自 lightFM文档.:
The definition of warp loss is taken from lightFM doc.:
Warp函数用于例如 #hashtags的语义嵌入,该论文已发表来自facebook AI研究.在本文中,他们试图为短文本预测最具代表性的主题标签.其中'user'
被认为是短文本,'positive item'
是短文本的主题标签,而negative items
是从主题标签查找"中统一采样的一些随机主题标签.
Warp function is used for example in semantic embeddings of #hashtags, a paper published from facebook AI research. In this paper they try to predict the most representable hashtags for short texts. Where 'user'
is considered the short text, 'positive item'
is the hashtag of the short text, and negative items
are some random hashtags uniformly sampled from the 'hashtag lookup'.
我正在遵循另一种三重态损失的含义来创建经线: github
I am following the implimentation of another triplet loss to create the warp one: github
我的理解是,对于每个数据点,我将有3个输入.嵌入示例("semi"伪代码):
My understanding is that for each data point I will have 3 inputs. Example with embeddings('semi' pseudocode):
sequence_input = Input(shape=(100, ), dtype='int32') # 100 features per data point
positive_example = Input(shape=(1, ), dtype='int32', name="positive") # the one positive example
negative_examples = Input(shape=(1000,), dtype='int32', name="random_negative_examples") # 1000 random negative examples.
#map data points to already created embeddings
embedded_seq_input = embedded_layer(sequence_input)
embedded_positive = embedded_layer(positive_example)
embedded_negatives = embedded_layer(negative_examples)
conv1 = Convolution1D(...)(embeddded_seq_input)
.
.
.
z = Dense(vector_size_of_embedding,activation="linear")(convN)
loss = merge([z, embedded_positive, embedded_negatives],mode=warp_loss)
.
.
.
其中warp_loss
是(我假设得到1000个随机负数,而不是全部取而代之,其分数来自余弦模拟):
where warp_loss
is(where I am assuming of getting 1000 random negative instead of taking all of them and the scores comes of the cosinus similatiry):
def warp_loss(X):
# pseudocode
z, positive, negatives = X
positive_score = cosinus_similatiry(z, positive)
counts = 1
loss = 0
for negative in negatives:
score = cosinus_similatiry(z, negative)
if score > positive_score:
loss = ((number_of_labels - 1) / counts) * (score + 1 - positive_score
else:
counts += 1
return loss
很好地描述了如何计算翘曲:帖子
How to compute the warp is described nicely: post
我不确定这是否是正确的方法,但是我找不到实现warp_loss
伪函数的方法.我可以使用merge([x,u],mode='cos')
计算余弦值,但这假定尺寸相同.因此,我不确定如何对多个否定示例使用merge
模式cos,因此我尝试创建自己的warp_loss
.
I am not sure if it is the correct way of doing it but i couldn't find a way to implement the warp_loss
pseudo function. I can compute cosinus using merge([x,u],mode='cos')
but this assumes same dimensions. So I am not sure how to use merge
mode cos for the multiple negative examples so I am trying to create my own warp_loss
.
任何见解,实施类似的例子,评论都是有用的.
Any insights, implemented similar examples, comments are useful.
推荐答案
首先,我认为无法在批处理训练范式中实现WARP.因此,您无法在Keras中实现WARP.这是因为WARP本质上是顺序的,因此无法处理分解成批的数据,la Keras.我想如果您进行完全随机的批处理,则可以将其完成.
First of all, I would argue that it is not possible to implement WARP in the batch training paradigm. Therefore you can't implement WARP in Keras. This is because WARP is intrinsically sequential, so it can't handle data broken into batches, a la Keras. I suppose if you did fully stochastic batches, you could pull it off.
通常,对于WARP,您要包含1
的余量,但是正如在本文中一样,您可以将其视为超参数:
Typically for WARP you include a margin of 1
, but as in the paper you can consider it a hyperparam:
if neg_score > pos_score-1: #margin of 1
loss = log(num_items / counts) #loss weighted by sample count
loss = max(1, loss) #this looks like same thing you were doing in diff way
这优于其先前的BPR,因为它针对top k精度(而不是平均精度)进行了优化.
This is superior to it's predecessor BPR, in that optimizes for top k precision instead of average precision.
这篇关于Keras中WARP丢失的暗示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!