Awesome Knowledge-Distillation
2019-11-26 19:02:16
Source: https://github.com/FLHonker/Awesome-Knowledge-Distillation
Different forms of knowledge
Knowledge from logits
- Distilling the knowledge in a neural network. Hinton et al. arXiv:1503.02531
- Learning from Noisy Labels with Distillation. Li, Yuncheng et al. ICCV 2017
- Knowledge distillation by on-the-fly native ensemble. Lan, Xu et al. NIPS 2018
- Learning Metrics from Teachers: Compact Networks for Image Embedding. Yu, Lu et al. CVPR 2019
- Relational Knowledge Distillation. Park, Wonpyo et al, CVPR 2019
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. Huang, Zehao and Wang, Naiyan. 2017
- On Knowledge Distillation from Complex Networks for Response Prediction. Arora, Siddhartha et al. NAACL 2019
- On the Efficacy of Knowledge Distillation. Cho, Jang Hyun and Hariharan, Bharath. arXiv:1910.01348
- [noval]Revisit Knowledge Distillation: a Teacher-free Framework. Yuan, Li et al. arXiv:1909.11723
- Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher. Mirzadeh et al. arXiv:1902.03393
- Ensemble Distribution Distillation. ICLR 2020
- Noisy Collaboration in Knowledge Distillation. ICLR 2020
- On Compressing U-net Using Knowledge Distillation. arXiv:1812.00249
- Distillation-Based Training for Multi-Exit Architectures. Phuong, Mary and Lampert, Christoph H. ICCV 2019
- Self-training with Noisy Student improves ImageNet classification. Xie, Qizhe et al.(Google) arXiv:1911.04252
- Variational Student: Learning Compact and Sparser Networks in Knowledge Distillation Framework. arXiv:1910.12061
- Preparing Lessons: Improve Knowledge Distillation with Better Supervision. arXiv:1911.07471
- Adaptive Regularization of Labels. arXiv:1908.05474
- Positive-Unlabeled Compression on the Cloud. Xu, Yixing(HUAWEI) et al. NIPS 2019
Knowledge from intermediate layers
- Fitnets: Hints for thin deep nets. Romero, Adriana et al. arXiv:1412.6550
- Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. Zagoruyko et al. ICLR 2017
- Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks. Zhang, Zhi et al. arXiv:1710.09505
- A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning. Yim, Junho et al. CVPR 2017
- Paraphrasing complex network: Network compression via factor transfer. Kim, Jangho et al. NIPS 2018
- Knowledge transfer with jacobian matching. ICML 2018
- Self-supervised knowledge distillation using singular value decomposition. Lee, Seung Hyun et al. ECCV 2018
- Variational Information Distillation for Knowledge Transfer. Ahn, Sungsoo et al. CVPR 2019 9
- Knowledge Distillation via Instance Relationship Graph. Liu, Yufan et al. CVPR 2019
- Knowledge Distillation via Route Constrained Optimization. Jin, Xiao et al. ICCV 2019
- Similarity-Preserving Knowledge Distillation. Tung, Frederick, and Mori Greg. ICCV 2019
- MEAL: Multi-Model Ensemble via Adversarial Learning. Shen,Zhiqiang, He,Zhankui, and Xue Xiangyang. AAAI 2019
- A Comprehensive Overhaul of Feature Distillation. Heo, Byeongho et al. ICCV 2019
- Feature-map-level Online Adversarial Knowledge Distillation. ICLR 2020
- Distilling Object Detectors with Fine-grained Feature Imitation. ICLR 2020
- Knowledge Squeezed Adversarial Network Compression. Changyong, Shu et al. AAAI 2020
- Stagewise Knowledge Distillation. Kulkarni, Akshay et al. arXiv: 1911.06786
- Knowledge Distillation from Internal Representations. arXiv:1910.03723
- Knowledge Flow:Improve Upon Your Teachers. ICLR 2019
Graph-based
- Graph-based Knowledge Distillation by Multi-head Attention Network. Lee, Seunghyun and Song, Byung. Cheol arXiv:1907.02226
- Graph Representation Learning via Multi-task Knowledge Distillation. arXiv:1911.05700
- Deep geometric knowledge distillation with graphs. arXiv:1911.03080
Mutual Information
- Correlation Congruence for Knowledge Distillation. Peng, Baoyun et al. ICCV 2019
- Similarity-Preserving Knowledge Distillation. Tung, Frederick, and Mori Greg. ICCV 2019
- Variational Information Distillation for Knowledge Transfer. Ahn, Sungsoo et al. CVPR 2019
- Contrastive Representation Distillation. Tian, Yonglong et al. arXiv: 1910.10699
Self-KD
- Moonshine:Distilling with Cheap Convolutions. Crowley, Elliot J. et al. NIPS 2018
- Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation. Zhang, Linfeng et al. ICCV 2019
- Learning Lightweight Lane Detection CNNs by Self Attention Distillation. Hou, Yuenan et al. ICCV 2019
- BAM! Born-Again Multi-Task Networks for Natural Language Understanding. Clark, Kevin et al. ACL 2019,short
- Self-Knowledge Distillation in Natural Language Processing. Hahn, Sangchul and Choi, Heeyoul. arXiv:1908.01851
- Rethinking Data Augmentation: Self-Supervision and Self-Distillation. Lee, Hankook et al. ICLR 2020
- Regularizing Predictions via Class wise Self knowledge Distillation. ICLR 2020
- MSD: Multi-Self-Distillation Learning via Multi-classifiers within Deep Neural Networks. arXiv:1911.09418
Structured Knowledge
- Paraphrasing Complex Network:Network Compression via Factor Transfer. Kim, Jangho et al. NIPS 2018
- Relational Knowledge Distillation. Park, Wonpyo et al. CVPR 2019
- Knowledge Distillation via Instance Relationship Graph. Liu, Yufan et al. CVPR 2019
- Contrastive Representation Distillation. Tian, Yonglong et al. arXiv: 1910.10699
- Teaching To Teach By Structured Dark Knowledge. ICLR 2020
Privileged Information
- Learning using privileged information: similarity control and knowledge transfer. Vapnik, Vladimir and Rauf, Izmailov. MLR 2015
- Unifying distillation and privileged information. Lopez-Paz, David et al. ICLR 2016
- Model compression via distillation and quantization. Polino, Antonio et al. ICLR 2018
- KDGAN:Knowledge Distillation with Generative Adversarial Networks. Wang, Xiaojie. NIPS 2018
- [noval]Efficient Video Classification Using Fewer Frames. Bhardwaj, Shweta et al. CVPR 2019
- Retaining privileged information for multi-task learning. Tang, Fengyi et al. KDD 2019
- A Generalized Meta-loss function for regression and classification using privileged information. Asif, Amina et al. arXiv:1811.06885
KD + GAN
- Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks. Xu, Zheng et al. arXiv:1709.00513
- KTAN: Knowledge Transfer Adversarial Network. Liu, Peiye et al. arXiv:1810.08126
- KDGAN:Knowledge Distillation with Generative Adversarial Networks. Wang, Xiaojie. NIPS 2018
- Adversarial Learning of Portable Student Networks. Wang, Yunhe et al. AAAI 2018
- Adversarial Network Compression. Belagiannis, Vasileios et al. ECCV 2018
- Cross-Modality Distillation: A case for Conditional Generative Adversarial Networks. ICASSP 2018
- Adversarial Distillation for Efficient Recommendation with External Knowledge. TOIS 2018
- Training student networks for acceleration with conditional adversarial networks. Xu, Zheng et al. BMVC 2018
- [noval]DAFL:Data-Free Learning of Student Networks. Chen, Hanting et al. ICCV 2019
- MEAL: Multi-Model Ensemble via Adversarial Learning. Shen,Zhiqiang, He,Zhankui, and Xue Xiangyang. AAAI 2019
- Knowledge Distillation with Adversarial Samples Supporting Decision Boundary. Heo, Byeongho et al. AAAI 2019
- Exploiting the Ground-Truth: An Adversarial Imitation Based Knowledge Distillation Approach for Event Detection. Liu, Jian et al. AAAI 2019
- Adversarially Robust Distillation. Goldblum, Micah et al. arXiv:1905.09747
- GAN-Knowledge Distillation for one-stage Object Detection. Hong, Wei et al. arXiv:1906.08467
- Lifelong GAN: Continual Learning for Conditional Image Generation. Kundu et al. arXiv:1908.03884
- Compressing GANs using Knowledge Distillation. Aguinaldo, Angeline et al. arXiv:1902.00159
- Feature-map-level Online Adversarial Knowledge Distillation. ICLR 2020
KD + Meta-learning
- Few Sample Knowledge Distillation for Efficient Network Compression. Li, Tianhong et al. arxiv:1812.01839
- Learning What and Where to Transfer. Jang, Yunhun et al, ICML 2019
- Transferring Knowledge across Learning Processes. Moreno, Pablo G et al. ICLR 2019
- Few-Shot Image Recognition with Knowledge Transfer. Peng, Zhimao et al. ICCV 2019
- Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval. Liu, Qing et al. ICCV 2019
- Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation. arXiv:1911.05329v1
- Progressive Knowledge Distillation For Generative Modeling. ICLR 2020
- Few Shot Network Compression via Cross Distillation. AAAI 2020
Data-free KD
- Data-Free Knowledge Distillation for Deep Neural Networks. NIPS 2017
- Zero-Shot Knowledge Distillation in Deep Networks. ICML 2019
- DAFL:Data-Free Learning of Student Networks. ICCV 2019
- Zero-shot Knowledge Transfer via Adversarial Belief Matching. Micaelli, Paul and Storkey, Amos. NIPS 2019
- other data-free model compression:
- Data-free Parameter Pruning for Deep Neural Networks. Srinivas, Suraj et al. arXiv:1507.06149
- Data-Free Quantization Through Weight Equalization and Bias Correction. Nagel, Markus et al. ICCV 2019
KD + AutoML
- Improving Neural Architecture Search Image Classifiers via Ensemble Learning. Macko, Vladimir et al. 2019
KD + RL
- N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning. Ashok, Anubhav et al. ICLR 2018
- Knowledge Flow:Improve Upon Your Teachers. Liu, Iou-jen et al. ICLR 2019
- Transferring Knowledge across Learning Processes. Moreno, Pablo G et al. ICLR 2019
Multi-teacher KD
- Learning from Multiple Teacher Networks. You, Shan et al. KDD 2017
- Semi-Supervised Knowledge Transfer for Deep Learning from Private Training Data. ICLR 2017
- Knowledge Adaptation: Teaching to Adapt. Arxiv:1702.02052
- Deep Model Compression: Distilling Knowledge from Noisy Teachers. Sau, Bharat Bhusan et al. arXiv:1610.09650v2
- Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Tarvainen, Antti and Valpola, Harri. NIPS 2017
- Born-Again Neural Networks. Furlanello, Tommaso et al. ICML 2018
- Deep Mutual Learning. Zhang, Ying et al. CVPR 2018
- Knowledge distillation by on-the-fly native ensemble. Lan, Xu et al. NIPS 2018
- Multilingual Neural Machine Translation with Knowledge Distillation. ICLR 2019
- Unifying Heterogeneous Classifiers with Distillation. Vongkulbhisal et al. CVPR 2019
- Distilled Person Re-Identification: Towards a More Scalable System. Wu, Ancong et al. CVPR 2019
- Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System. Yang, Ze et al. WSDM 2020
- FEED: Feature-level Ensemble for Knowledge Distillation. Park, SeongUk and Kwak, Nojun. arXiv:1909.10754(AAAI20 pre)
- Stochasticity and Skip Connection Improve Knowledge Transfer. Lee, Kwangjin et al. ICLR 2020
Cross-modal KD
- Cross Modal Distillation for Supervision Transfer. Gupta, Saurabh et al. CVPR 2016
- Emotion recognition in speech using cross-modal transfer in the wild. Albanie, Samuel et al. ACM MM 2018
- Through-Wall Human Pose Estimation Using Radio Signals. Zhao, Mingmin et al. CVPR 2018
- Compact Trilinear Interaction for Visual Question Answering. Do, Tuong et al. ICCV 2019
- Cross-Modal Knowledge Distillation for Action Recognition. Thoker, Fida Mohammad and Gall, Juerge. ICIP 2019
- Learning to Map Nearly Anything. Salem, Tawfiq et al. arXiv:1909.06928
- Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval. Liu, Qing et al. ICCV 2019
- UM-Adapt: Unsupervised Multi-Task Adaptation Using Adversarial Cross-Task Distillation. Kundu et al. ICCV 2019
- CrDoCo: Pixel-level Domain Transfer with Cross-Domain Consistency. Chen, Yun-Chun et al. CVPR 2019
- XD:Cross lingual Knowledge Distillation for Polyglot Sentence Embeddings. ICLR 2020
- Effective Domain Knowledge Transfer with Soft Fine-tuning. Zhao, Zhichen et al. arXiv:1909.02236
Application of KD
- Face model compression by distilling knowledge from neurons. Luo, Ping et al. AAAI 2016
- Learning efficient object detection models with knowledge distillation. Chen, Guobin et al. NIPS 2017
- Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy. Mishra, Asit et al. NIPS 2018
- Distilled Person Re-identification: Towars a More Scalable System. Wu, Ancong et al. CVPR 2019
- [noval]Efficient Video Classification Using Fewer Frames. Bhardwaj, Shweta et al. CVPR 2019
- Fast Human Pose Estimation. Zhang, Feng et al. CVPR 2019
- Distilling knowledge from a deep pose regressor network. Saputra et al. arXiv:1908.00858 (2019)
- Learning Lightweight Lane Detection CNNs by Self Attention Distillation. Hou, Yuenan et al. ICCV 2019
- Structured Knowledge Distillation for Semantic Segmentation. Liu, Yifan et al. CVPR 2019
- Relation Distillation Networks for Video Object Detection. Deng, Jiajun et al. ICCV 2019
- Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection. Dong, Xuanyi and Yang, Yi. ICCV 2019
- Progressive Teacher-student Learning for Early Action Prediction. Wang, Xionghui et al. CVPR2019
- Lightweight Image Super-Resolution with Information Multi-distillation Network. Hui, Zheng et al. ICCVW 2019
- AWSD:Adaptive Weighted Spatiotemporal Distillation for Video Representation. Tavakolian, Mohammad et al. ICCV 2019
- Dynamic Kernel Distillation for Efficient Pose Estimation in Videos. Nie, Xuecheng et al. ICCV 2019
- Teacher Guided Architecture Search. Bashivan, Pouya and Tensen, Mark. ICCV 2019
- Online Model Distillation for Efficient Video Inference. Mullapudi et al. ICCV 2019
- Distilling Object Detectors with Fine-grained Feature Imitation. Wang, Tao et al. CVPR2019
- Relation Distillation Networks for Video Object Detection. Deng, Jiajun et al. ICCV 2019
- Knowledge Distillation for Incremental Learning in Semantic Segmentation. arXiv:1911.03462
- MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept Localization. arXiv:1910.12295
- Teacher-Students Knowledge Distillation for Siamese Trackers. arXiv:1907.10586
for NLP
- Patient Knowledge Distillation for BERT Model Compression. Sun, Siqi et al. arXiv:1908.09355
- TinyBERT: Distilling BERT for Natural Language Understanding. Jiao, Xiaoqi et al. arXiv:1909.10351
- Learning to Specialize with Knowledge Distillation for Visual Question Answering. NIPS 2018
- Knowledge Distillation for Bilingual Dictionary Induction. EMNLP 2017
- A Teacher-Student Framework for Maintainable Dialog Manager. EMNLP 2018
- Understanding Knowledge Distillation in Non-Autoregressive Machine Translation. arxiv 2019
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Sanh, Victor et al. arXiv:1910.01108
- Well-Read Students Learn Better: On the Importance of Pre-training Compact Models. Turc, Iulia et al. arXiv:1908.08962
- On Knowledge distillation from complex networks for response prediction. Arora, Siddhartha et al. NAACL 2019
- Distilling the Knowledge of BERT for Text Generation. arXiv:1911.03829v1
- Understanding Knowledge Distillation in Non-autoregressive Machine Translation. arXiv:1911.02727
- MobileBERT: Task-Agnostic Compression of BERT by Progressive Knowledge Transfer. ICLR 2020
Model Pruning
- N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning. Ashok, Anubhav et al. ICLR 2018
- Slimmable Neural Networks. Yu, Jiahui et al. ICLR 2018
- Co-Evolutionary Compression for Unpaired Image Translation. Shu, Han et al. ICCV 2019
- MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning. Liu, Zechun et al. ICCV 2019
- LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning. ICLR 2020
- Pruning with hints: an efficient framework for model acceleration. ICLR 2020
- Training convolutional neural networks with cheap convolutions and online distillation. arXiv:1909.13063
- Cooperative Pruning in Cross-Domain Deep Neural Network Compression. Chen, Shangyu et al. IJCAI 2019
Beyond
- Do deep nets really need to be deep?. Ba,Jimmy, and Rich Caruana. NIPS 2014
- When Does Label Smoothing Help? Müller, Rafael, Kornblith, and Hinton. NIPS 2019
- Towards Understanding Knowledge Distillation. Phuong, Mary and Lampert, Christoph. AAAI 2019
- Harnessing deep neural networks with logucal rules. ACL 2016
- Adaptive Regularization of Labels. Ding, Qianggang et al. arXiv:1908.05474
- Knowledge Isomorphism between Neural Networks. Liang, Ruofan et al. arXiv:1908.01581
- Role-Wise Data Augmentation for Knowledge Distillation. ICLR 2020
- Neural Network Distiller: A Python Package For DNN Compression Research. arXiv:1910.12232
Note: All papers pdf can be found and downloaded on Bing or Google.
Source: https://github.com/FLHonker/Awesome-Knowledge-Distillation
Contact: Yuang Liu([email protected]), AIDA, ECNU.