模型压缩

模型压缩

网络剪枝 Netwrok pruning

剪掉网络中无用的参数。

有意思的图，连接先增加后减少。

train large model
评估重要性
1. 参数重要性（以参数为剪枝单位）
  1. 比如根据权重的绝对值
2. 神经元重要性（以神经元为剪枝单位）
  1. 比如神经元是否为0
剪掉不重要的
微调小模型，重复执行

weights pruning

网络的形状会变得不规则，难以构造模型，GPU加速；虽然可以充0，但是实际网络并没有变小。

neuron pruning

为什么舍本逐末？不直接train小模型

小网络难以训练，为什么？

根据大乐透假说 Lottery Ticket Hypothesis
可以理解为增加试验次数，样本量等，海选总会有好的；大模型包含了很多小的子模型

知识蒸馏 knowledge Distillation

Student Net 拟合Teacher Net 的输出

temperature softmax

使用了平滑的思想

Parameter Quantization

混合精度
Weight clustering
常出现的参数使用更少的bits
- 如 Huffman encoding

架构设计 architecture design

1 Depthwise Convolution

Filter number = Input channel number
Each filter only considers one channel.
The filters are 𝑘 × 𝑘 matrices
There is no interaction between channels.

2 Pointwise Convolution

专门用来跨 channel

must \(1*1\) filter

参数变化：

\[\frac{k*k*I+I*O}{k*k*I*O}=\frac{1}{O}+\frac{1}{k*k}\]

I: input channel

O: output channel

原理（为什么有效）

Low rank approximation

Dynamic Computation

按照资源分配

方法：

模型的每一层接出来训练，使用选不同的层
Multi-Scale Dense Network (MSDNet)
Dynamic width
Computation based on Sample Difficulty
- SkipNet: Learning Dynamic Routing in Convolutional Networks
- Runtime Neural Pruning
- BlockDrop: Dynamic Inference Paths in Residual Networks

references

【1】https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/tiny_v7.pdf

【2】https://colab.research.google.com/drive/1lJS0ApIyi7eZ2b3GMyGxjPShI8jXM2UC

【3】https://colab.research.google.com/drive/1iuEkPP-SvCopHEN9X6xiPA8E6eACbL5u

【4】https://colab.research.google.com/drive/1CIn-Qqn9LBz-0f71Skm4vmdTDnE17uwy

【5】https://colab.research.google.com/drive/1G1_I5xoxnX4xfLUmQjxCZKw40rRjjZMQ

【6】https://colab.research.google.com/github/ga642381/ML2021-Spring/blob/main/HW13/HW13.ipynb

【7】https://github.com/nlp-with-transformers/notebooks/blob/main/08_model-compression.ipynb