核心思想

基于阅读理解中QA系统的样本中可能混有对抗样本的情况,在寻找答案时,首先筛选出可能包含答案的句子,再做进一步推断。

方法

论文阅读 | Tackling Adversarial Examples in QA via Answer Sentence Selection-LMLPHP

Part 1

given: 段落C   query Q

段落切分成句子:论文阅读 | Tackling Adversarial Examples in QA via Answer Sentence Selection-LMLPHP

每个句子和Q合并:论文阅读 | Tackling Adversarial Examples in QA via Answer Sentence Selection-LMLPHP

使用依存句法分析得到表示:论文阅读 | Tackling Adversarial Examples in QA via Answer Sentence Selection-LMLPHP

基于T Si T Q ,分别构建 Tree-LSTMTree-LSTM

两个Tree-LSTMs的叶结点的输入都是GloVe word vectors

输出隐向量分别是  h h

h h连接起来并传递给一个前馈神经网络来计算出Si包含Q的答案的可能性

loss 和前馈神经网络follows语义相关性网络

有监督的训练时,si包含答案为1,否则为0。

Part 2

计算最可能答案:

论文阅读 | Tackling Adversarial Examples in QA via Answer Sentence Selection-LMLPHP

L代表QA似然神经网络预测的似然

将一对句子S*和Q传递给预先训练好的单BiDAF(Seo et al., 2016),生成Q的答案a^。

实验

数据集:sampled from the training set of SQuAD v1.1

there are 87,599 queries of 18,896 paragraphs in the training set of SQuAD v1.1. While each query refers to one paragraph, a paragraph may refer to multiple queries.

论文阅读 | Tackling Adversarial Examples in QA via Answer Sentence Selection-LMLPHP

d=87,599 is the number of queries. The set D contains 440,135 sentence pairs, among which 87,306 are positive instances and 352,829 are negative instances.

positive instance: 论文阅读 | Tackling Adversarial Examples in QA via Answer Sentence Selection-LMLPHP,前者包含后者的答案。

两种采样方法: pair-level sampling ,paragraph-level sampling

1. In pair-level sampling, 45,000 positive instances and 45,000 negative instances are randomly selected from D as the training set.

2. paragraph-level sampling 首先随机选Qk,然后从Dk中随机采样出一个positive instance 和一个negative instance

Each set has 90,000 instances. The validation set with 3,000 instances are sampled through these two methods as well.

测试集:ADDANY adversarial dataset : 1,000 paragraphs and each paragraph refers to only one query. By splitting and combining, 6,154 sentence pairs are obtained.

实验设置:The dimension of GloVe word vectors (Pennington et al., 2014) is set as 300. The sentence scoring neural network is trained by Adagrad (Duchi et al., 2011) with a learning rate of 0.01 and a batch size of 25. Model parameters are regularized by a 10 strength of per-minibatch L2 regularization.

结果

评价标准:Macro-averaged F1 score (Rajpurkar et al., 2016; Jia and Liang, 2017).

论文阅读 | Tackling Adversarial Examples in QA via Answer Sentence Selection-LMLPHP

对于table2,可以理解为二分类问题。

论文阅读 | Tackling Adversarial Examples in QA via Answer Sentence Selection-LMLPHP

consider three types of sentences: adversarial sentences, answer sentences, and the sentences that include the answers returned by the single BiDAF system.

论文阅读 | Tackling Adversarial Examples in QA via Answer Sentence Selection-LMLPHP

the x-axis denotes the ranked position for each sentence according to its likelihood score , while the y-axis is the number of sentences for each type ranked at this position.

It shows that among the 1,000 (C;Q) pairs, 647 and 657 answer sentences are selected by the QA Likelihood neural network based on pair-level sampling and paragraph-level sampling respectively, but only 136 and 141 adversarial sentences are selected by the QA Likelihood neural network.

结论

对于ADDSENT的没有做。

05-11 16:03