torch基础学习

Pytorch Leture 05: Linear Rregression in the Pytorch Way
Logistic Regression 逻辑回归 - 二分类
Lecture07: How to make netural network wide and deep ?
Lecture 08: Pytorch DataLoader
Lecture 09: softmax Classifier
- part one
- part two : real problem - MNIST input
Lecture 10 : basic CNN
Lecture 11 Advanced CNN
Lecture 12: RNN

学习网址：https://www.youtube.com/watch?v=ogZi5oIo4fI
有道云笔记:http://note.youdao.com/noteshare?id=d86bd8fc60cb4fe87005a2d2e2d5b70d&sub=6911732F9FA44C68AD53A09072155ED3

Pytorch Leture 05: Linear Rregression in the Pytorch Way

第一部分，使用一个类来构建你的模型，需要写forward函数

import torch

from torch.autograd import Variable

import matplotlib.pyplot as plt

x_data = Variable(torch.Tensor([[1.0], [2.0], [3.0]]))

y_data = Variable(torch.Tensor([[2.0], [4.0], [6.0]]))

class Model(torch.nn.Module):

    def __init__(self):

        """

        In the constructor we instantiate two nn.Linear module

        """

        super(Model, self).__init__()

        self.linear = torch.nn.Linear(1, 1)  # One in and one out

    def forward(self, x):

        """

        In the forward function we accept a Variable of input data and we must return

        a Variable of output data. We can use Modules defined in the constructor as

        well as arbitrary operators on Variables.

        """

        y_pred = self.linear(x)

        return y_pred

# our model

model = Model()

第二部分，构建loss和优化器来进行参数计算



# Construct our loss function and an Optimizer. The call to model.parameters()

# in the SGD constructor will contain the learnable parameters of the two

# nn.Linear modules which are members of the model.

# criterion 标准准则 主要用来计算loss

criterion = torch.nn.MSELoss(size_average=False)

# 优化器

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

第三部分，进行训练，forward -> backward -> update parameters

# Training loop

for epoch in range(1000):

    # Forward pass: Compute predicted y by passing x to the model

    y_pred = model(x_data)

    # Compute and print loss

    loss = criterion(y_pred, y_data)

    print(epoch, loss.data[0])

    # Zero gradients, perform a backward pass, and update the weights.

    # initialize the gradients

    optimizer.zero_grad()

    # 反向传递

    loss.backward()

    # 更新优化器中的权重，即model.parrameters

    optimizer.step()

第四部分，测试

# After training

hour_var = Variable(torch.Tensor([[4.0]]))

y_pred = model(hour_var)

print("predict (after training)",  4, model(hour_var).data[0][0])

总结一下基本的训练框架：

通过写一个类，来构造你的模型
构建loss和优化器
开始训练 Forward -> compute loss -> backward -> update

Forward: y_pred = model(x_data)
Compute loss: loss = criterion(y_pred,y_data)
Backward: optimizer.zero_grad() && loss.backward()
Update: optimizer.step()

作业测试其他optimizers:

torch.optim.Adagrad
torch.optim.Adam
torch.optim.Adamax
torch.optim.ASGD
torch.optim.LBFGS
torch.optim.RRRMSprop
torch.optim.Rprop
torch.optim.SGD

Logistic Regression 逻辑回归 - 二分类

原来的：

graph LR

x-->Linear

Linear-->y

\hat{y} = x * w + b

loss = \frac{1}{N}\sum_{n=1}^{N}(\hat{y_n}-y_n)^2

激活函数：

using sigmoid functions:

graph LR

x --> Linear

Linear --> Sigmoid

Sigmoid --> y

Y 介于 [0,1] 之间，这样做可以用来压缩计算量，让计算更加容易

\sigma(z) = \frac{1}{1+e^{-z}}

\hat{y} = \sigma(x*w+b)

loss=-\frac{1}{N}\sum_{n=1}^{N}y_nlog\hat{y_n} + (1-y_n)log(1-\hat{y_n})

代码：

import torch

from torch.autograd import Variable

import torch.nn.functional as F

x_data = Variable(torch.Tensor([[1.0], [2.0], [3.0], [4.0],[5.0]]))

y_data = Variable(torch.Tensor([[0.], [0.], [1.], [1.],[1.]]))

class Model(torch.nn.Module):

    def __init__(self):

        """

        In the constructor we instantiate nn.Linear module

        """

        super(Model, self).__init__()

        self.linear = torch.nn.Linear(1, 1)  # One in and one out

    def forward(self, x):

        """

        In the forward function we accept a Variable of input data and we must return

        a Variable of output data.

        """

        y_pred = F.sigmoid(self.linear(x))

        return y_pred

# our model

model = Model()

# Construct our loss function and an Optimizer. The call to model.parameters()

# in the SGD constructor will contain the learnable parameters of the two

# nn.Linear modules which are members of the model.

criterion = torch.nn.BCELoss(size_average=True)

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Training loop

for epoch in range(400):

        # Forward pass: Compute predicted y by passing x to the model

    y_pred = model(x_data)

    # Compute and print loss

    loss = criterion(y_pred, y_data)

    print(epoch, loss.data[0])

    # Zero gradients, perform a backward pass, and update the weights.

    optimizer.zero_grad()

    loss.backward()

    optimizer.step()

# After training

hour_var = Variable(torch.Tensor([[0.0]]))

print("predict 1 hour ", 0.0, model(hour_var).data[0][0] > 0.5)

hour_var = Variable(torch.Tensor([[7.0]]))

print("predict 7 hours", 7.0, model(hour_var).data[0][0] > 0.5)

新增激活函数：

Design your model using class

y_Pred = F.sigmoid(self.linear(x))

Construct loss and optimizer

change loss into:

criterion = torch.nn.BCELoss(size_average=True)

Training cycle (forward,backward,update)

作业：尝试其他激活函数：

ReLu

ReLU是修正线性单元（The Rectified Linear Unit）的简称，近些年来在深度学习中使用得很多，可以解决梯度弥散问题，因为它的导数等于1或者就是0。相对于sigmoid和tanh激励函数，对ReLU求梯度非常简单，计算也很简单，可以非常大程度地提升随机梯度下降的收敛速度。（因为ReLU是线性的，而sigmoid和tanh是非线性的）。但ReLU的缺点是比较脆弱，随着训练的进行，可能会出现神经元死亡的情况，例如有一个很大的梯度流经ReLU单元后，那权重的更新结果可能是，在此之后任何的数据点都没有办法再激活它了。如果发生这种情况，那么流经神经元的梯度从这一点开始将永远是0。也就是说，ReLU神经元在训练中不可逆地死亡了。

ReLu6
ELU

ELU在正值区间的值为x本身，这样减轻了梯度弥散问题（x>0区间导数处处为1），这点跟ReLU、Leaky ReLU相似。而在负值区间，ELU在输入取较小值时具有软饱和的特性，提升了对噪声的鲁棒性

SELU
PReLU
LeakyReLu

Leaky ReLU主要是为了避免梯度消失，当神经元处于非激活状态时，允许一个非0的梯度存在，这样不会出现梯度消失，收敛速度快。它的优缺点跟ReLU类似。

Threshold
Hardtanh

tanh函数将输入值压缩至-1到1之间。该函数与Sigmoid类似，也存在着梯度弥散或梯度饱和的缺点。

Sigmoid

这应该是神经网络中使用最频繁的激励函数了，它把一个实数压缩至0到1之间，当输入的数字非常大的时候，结果会接近1，当输入非常大的负数时，则会得到接近0的结果。在早期的神经网络中使用得非常多，因为它很好地解释了神经元受到刺激后是否被激活和向后传递的场景（0：几乎没有被激活，1：完全被激活），不过近几年在深度学习的应用中比较少见到它的身影，因为使用sigmoid函数容易出现梯度弥散或者梯度饱和。当神经网络的层数很多时，如果每一层的激励函数都采用sigmoid函数的话，就会产生梯度弥散的问题，因为利用反向传播更新参数时，会乘以它的导数，所以会一直减小。如果输入的是比较大或者比较小的数（例如输入100，经Sigmoid函数后结果接近于1，梯度接近于0），会产生饱和效应，导致神经元类似于死亡状态。

Tanh

Lecture07: How to make netural network wide and deep ?

graph LR

a-->Linear

b-->Linear

Linear-->Sigmoid

Sigmoid-->y

多维度，更层次的网络，主要在Design your model using class 中进行的改变

import torch

from torch.autograd import Variable

import numpy as np

xy = np.loadtxt('./data/diabetes.csv.gz', delimiter=',', dtype=np.float32)

x_data = Variable(torch.from_numpy(xy[:, 0:-1]))

y_data = Variable(torch.from_numpy(xy[:, [-1]]))

print(x_data.data.shape)

print(y_data.data.shape)

class Model(torch.nn.Module):

    def __init__(self):

        """

        In the constructor we instantiate two nn.Linear module

        """

        super(Model, self).__init__()

        self.l1 = torch.nn.Linear(8, 6)

        self.l2 = torch.nn.Linear(6, 4)

        self.l3 = torch.nn.Linear(4, 1)

        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):

        """

        In the forward function we accept a Variable of input data and we must return

        a Variable of output data. We can use Modules defined in the constructor as

        well as arbitrary operators on Variables.

        """

        out1 = self.sigmoid(self.l1(x))

        out2 = self.sigmoid(self.l2(out1))

        y_pred = self.sigmoid(self.l3(out2))

        return y_pred

# our model

model = Model()

# Construct our loss function and an Optimizer. The call to model.parameters()

# in the SGD constructor will contain the learnable parameters of the two

# nn.Linear modules which are members of the model.

#criterion = torch.nn.BCELoss(size_average=True)

criterion = torch.nn.BCELoss(reduction='elementwise_mean')

optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

# Training loop

for epoch in range(1200000):

        # Forward pass: Compute predicted y by passing x to the model

    y_pred = model(x_data)

    # Compute and print loss

    loss = criterion(y_pred, y_data)

    print(epoch, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.

    optimizer.zero_grad()

    loss.backward()

    optimizer.step()

作业：

10层以上的更深层测的网络进行训练
发现并没有因为更深，效果变好
更改激励函数

Lecture 08: Pytorch DataLoader

构造Datasets主要分为三个过程：

继承自Dataset

download, rerad data etc
return one item on the index
return the data length

实例化一个dataset,在Dataloader中使用：

train_loader = DataLoader(dataset=dataset,

                          batch_size=1,

                          shuffle=True,

                          num_workers=1)

Code:

# References

# https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/01-basics/pytorch_basics/main.py

# http://pytorch.org/tutorials/beginner/data_loading_tutorial.html#dataset-class

import torch

import numpy as np

from torch.autograd import Variable

from torch.utils.data import Dataset, DataLoader

class DiabetesDataset(Dataset):

    """ Diabetes dataset."""

    # Initialize your data, download, etc.

    def __init__(self):

        xy = np.loadtxt('./data/diabetes.csv.gz',

                        delimiter=',', dtype=np.float32)

        self.len = xy.shape[0]

        self.x_data = torch.from_numpy(xy[:, 0:-1])

        self.y_data = torch.from_numpy(xy[:, [-1]])

    def __getitem__(self, index):

        return self.x_data[index], self.y_data[index]

    def __len__(self):

        return self.len

dataset = DiabetesDataset()

train_loader = DataLoader(dataset=dataset,

                          batch_size=1,

                          shuffle=True,

                          num_workers=1)

for epoch in range(2):

    for i, data in enumerate(train_loader, 0):

        # get the inputs

        inputs, labels = data

        # wrap them in Variable

        inputs, labels = Variable(inputs), Variable(labels)

        # Run your training process

        print(epoch, i, "inputs", inputs.data, "labels", labels.data)

课后作业：
使用其他数据集，MNIST，参考了官网的代码：

总结一下训练的思路：

构造继承自Dataset的自己的datasets类

[ ] 读取数据集，np.loadtxt("datas.csv") ，构建trainset testset
[ ] 构建DataLoader: 得到trainLoader , testLoader
[ ] 从DataLoader中获取数据： dataiter = iter(trainloader) images, labels = dataiter.next()
[ ] 训练
[ ] 测试

Lecture 09: softmax Classifier

part one

MNist softmax

before:

graph LR

x{x} --> Linear

Linear --> Activation

Activation --> ...

... --> Linear2

Linear2-->Activation2

Activation2-->h{y}

now:

graph LR

x{x} --> Linear

Linear --> Activation

Activation --> ...

... --> Linear2

Linear2-->Activation2

Activation2-->P_y=0

Activation2-->P_y=1

Activation2-->....

Activation2-->P_y=10

what is softmax?



\sigma(z)_j = \frac{e^{z_j}}{\sum_{k=1}^{K}e^{z_k}} for j=1,2,...,k

using softmax to get probabilities.

what is corss entropy?

loss = \frac{1}{N}\sum_i D(Softmax(wx_i+b),Y_i)

D(\hat{Y},Y) = -Ylog\hat{Y}

整个过程:

graph LR

x--LinearModel-->Z

Z--Softmax-->y'

y'--Cross_Entropy-->Y

Pytorch中的实现：

loss = torch.nn.CrossEntropyLoss()
这个中既包括了Softmax也包括了Cross_Entropy

graph LR

X--Softmax-->y'

y'--Cross_Entropy-->Y

Code:

import torch

import torch.nn as nn

import torch.nn.functional as F

import torch.optim as optim

from torchvision import datasets, transforms

from torch.autograd import Variable

# Cross entropy example

import numpy as np

# One hot

# 0: 1 0 0

# 1: 0 1 0

# 2: 0 0 1

Y = np.array([1, 0, 0])

Y_pred1 = np.array([0.7, 0.2, 0.1])

Y_pred2 = np.array([0.1, 0.3, 0.6])

print("loss1 = ", np.sum(-Y * np.log(Y_pred1)))

print("loss2 = ", np.sum(-Y * np.log(Y_pred2)))

################################################################################

# Softmax + CrossEntropy (logSoftmax + NLLLoss)

loss = nn.CrossEntropyLoss()

# target is of size nBatch

# each element in target has to have 0 <= value < nClasses (0-2)

# Input is class, not one-hot

Y = Variable(torch.LongTensor([0]), requires_grad=False)

# input is of size nBatch x nClasses = 1 x 4

# Y_pred are logits (not softmax)

Y_pred1 = Variable(torch.Tensor([[2.0, 1.0, 0.1]]))

Y_pred2 = Variable(torch.Tensor([[0.5, 2.0, 0.3]]))

l1 = loss(Y_pred1, Y)

l2 = loss(Y_pred2, Y)

print("PyTorch Loss1 = ", l1.data, "\nPyTorch Loss2=", l2.data)

print("Y_pred1=", torch.max(Y_pred1.data, 1)[1])

print("Y_pred2=", torch.max(Y_pred2.data, 1)[1])

################################################################################

"""Batch loss"""

# target is of size nBatch

# each element in target has to have 0 <= value < nClasses (0-2)

# Input is class, not one-hot

Y = Variable(torch.LongTensor([2, 0, 1]), requires_grad=False)

# input is of size nBatch x nClasses = 2 x 4

# Y_pred are logits (not softmax)

Y_pred1 = Variable(torch.Tensor([[0.1, 0.2, 0.9],

                                 [1.1, 0.1, 0.2],

                                 [0.2, 2.1, 0.1]]))

Y_pred2 = Variable(torch.Tensor([[0.8, 0.2, 0.3],

                                 [0.2, 0.3, 0.5],

                                 [0.2, 0.2, 0.5]]))

l1 = loss(Y_pred1, Y)

l2 = loss(Y_pred2, Y)

print("Batch Loss1 = ", l1.data, "\nBatch Loss2=", l2.data)

作业：CrossEntropyLoss VS NLLLoss ?

part two : real problem - MNIST input

MNIST Network

graph LR

inputLayer -.-> HiddenLayer

HiddenLayer -.-> OutputLayer

Code:

# https://github.com/pytorch/examples/blob/master/mnist/main.py

from __future__ import print_function

import torch

import torch.nn as nn

import torch.nn.functional as F

import torch.optim as optim

from torchvision import datasets, transforms

from torch.autograd import Variable

# Training settings

batch_size = 16

# MNIST Dataset

train_dataset = datasets.MNIST(root='./mnist_data/',

                               train=True,

                               transform=transforms.ToTensor(),

                               download=True)

test_dataset = datasets.MNIST(root='./mnist_data/',

                              train=False,

                              transform=transforms.ToTensor())

# Data Loader (Input Pipeline)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,

                                           batch_size=batch_size,

                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,

                                          batch_size=batch_size,

                                          shuffle=False)

class Net(nn.Module):

    def __init__(self):

        super(Net, self).__init__()

        self.l1 = nn.Linear(784, 520)

        self.l2 = nn.Linear(520, 320)

        self.l3 = nn.Linear(320, 240)

        self.l4 = nn.Linear(240, 120)

        self.l5 = nn.Linear(120, 10)

    def forward(self, x):

        x = x.view(-1, 784)  # Flatten the data (n, 1, 28, 28)-> (n, 784)

        x = F.relu(self.l1(x))

        x = F.relu(self.l2(x))

        x = F.relu(self.l3(x))

        x = F.relu(self.l4(x))

        return self.l5(x)

model = Net()

criterion = nn.CrossEntropyLoss()

optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)

def train(epoch):

    model.train()

    for batch_idx, (data, target) in enumerate(train_loader):

        data, target = Variable(data), Variable(target)

        optimizer.zero_grad()

        output = model(data)

        loss = criterion(output, target)

        loss.backward()

        optimizer.step()

        if batch_idx % 10 == 0:

            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(

                epoch, batch_idx * len(data), len(train_loader.dataset),

                100. * batch_idx / len(train_loader), loss.data[0]))

def test():

    model.eval()

    test_loss = 0

    correct = 0

    for data, target in test_loader:

        data, target = Variable(data, volatile=True), Variable(target)

        output = model(data)

        # sum up batch loss

        test_loss += criterion(output, target).data[0]

        # get the index of the max

        pred = output.data.max(1, keepdim=True)[1]

        correct += pred.eq(target.data.view_as(pred)).cpu().sum()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(

        test_loss, correct, len(test_loader.dataset),

        100. * correct / len(test_loader.dataset)))

for epoch in range(1, 10):

    train(epoch)

    test()

作业：

Use DataLoader

Lecture 10 : basic CNN

Simple convolution layer

for Example:

graph LR

3*3*1_image-->2*2*1_filter_W

3*3*1_image-->1*1_Stride

3*3*1_image-->NoPadding

NoPadding-->2*2_featureMap

2*2*1_filter_W-->2*2_featureMap

1*1_Stride-->2*2_featureMap

How to compute multi-dimension pictures ?

32 * 32 * 3 image
5 * 5 * 3 filter W

w^T + b

Get: 28 * 28 * 1 feature map * N (how many filters you used)

计算公式



OutputSize = \frac{(InputSize+PaddingSize*2-FilterSize)}{Stride} + 1

几个需要解释的参数：

CONV

卷积层，需要配合激活函数使用
filter and padding and filterSize using function above to calculate

torch.nn.Conv2d(in_channels,out_channels,kernel_size)

激活函数

activate functions

Max Pooling

选取一个n*m的Filter中最大的值作为pooling的结果
还有类似的avg Pooling

nn.MaxPool2d(kernel_size)

全连接层

self.fc = nn.Linear(320,10)

CNN & Fully Connected network 区别

CNN中的神经元不是跟每个像素都相连

Fully Connected network中的神经元是跟每个像素都相连。

implement of Simple CNN

graph TB

ConvolutionalLayer1 --> PoolingLayer1

PoolingLayer1 --> ConvolutionalLayer2

ConvolutionalLayer2 --> PoolingLayer2

PoolingLayer2 --> Fully-ConnectedLayer

Model:

class Net(nn.Module):

    def __init__(self):

        super(Net,self).__init__()

        self.conv1 = nn.Conv2d(1,10,kernel_size=5)

        self.conv2 = nn.Conv2d(10,20,kernel_size=5)

        self.mp = nn.MaxPool2d(2)

        self.fc = nn.Linear(???,10)

    def forward(self,x):

        in_size = x.size(0)

        x = F.relu(self.mp(self.conv1(x)))

        x = F.relu(self.mp(self.conv2(x)))

        x = x.view(in_size,-1) # flatten the tensor

        x = self.fc(x)

        return F.log_softmax(x)

??? 处如何填写

??? 处可以随意先填一个数值，然后通过程序的报错来填写
还可以在forward函数中print(x.size())得到tensor的维度

作业：

尝试更深层次的网络，更深的全连接层

Lecture 11 Advanced CNN

Why 1*1 convolution ?

using 32 1*1 filters to turn 64-dimension pic into 32-dimension pic.

using 1*1 filters can significantly save our computations.

Inception Module

graph LR

Filter_concat_in --> 1*1Conv0_16

Filter_concat_in --> 1*1Conv1_16

Filter_concat_in --> 1*1Conv2_16

Filter_concat_in --> AvgPooling

AvgPooling --> 1*1Conv3_16

1*1Conv0_16 --> 3*3Conv0_24

3*3Conv0_24 --> 3*3Conv1_24

3*3Conv1_24 --> Filter_Concat_out

1*1Conv1_16 --> 5*5Conv_24

5*5Conv_24 --> Filter_Concat_out

1*1Conv3_16 --> Filter_Concat_out

1*1Conv2_16 --> Filter_Concat_out

Implement

最下边的实现（第四道）

self.brach1x1 = nn.Conv2d(in_channels,16,kernel_size=1)

branch1x1 = self.branch1x1(x)

倒数第二道

self.branch_pool = nn.Conv2d(in_channels,24,kernel_size=1)

branch_pool = F.avg_pool2d(x,kernel_size=3,stride=1,padding=1)

branch_pool = self.branch_pool(branch_pool)

正数第二道

self.branch5x5_1 = nn.Conv2d(in_channels,16,kernel_size=1)

self.branch5x5_2 = nn.Conv2d(16,24,kernel_size=1,padding=2)

branch5x5 = self.branch5x5_1(x)

branch5x5 = self.branch5x5_2(branch5x5)

第一道

self.branch3x3_1=nn.Conv2d(in_channels,16,kernel_size=1)

self.branch3x3_2=nn.Conv2d(16,24,kernel_size=3,padding=1)

self.branch3x3_3=nn.Conv2d(24,24,kernel_size=3,padding=1)

branch3x3 = self.branch3x3_1(x)

branch3x3 = self.branch3x3_2(branch3x3)

branch3x3 = self.branch3x3_3(branch3x3)

output

outputs = [branch1x1,branch_pool,branch5x5,branch3x3]

ALL CODE:

# https://github.com/pytorch/examples/blob/master/mnist/main.py

from __future__ import print_function

import argparse

import torch

import torch.nn as nn

import torch.nn.functional as F

import torch.optim as optim

from torchvision import datasets, transforms

from torch.autograd import Variable

# Training settings

batch_size = 64

# MNIST Dataset

train_dataset = datasets.MNIST(root='./data/',

                               train=True,

                               transform=transforms.ToTensor(),

                               download=True)

test_dataset = datasets.MNIST(root='./data/',

                              train=False,

                              transform=transforms.ToTensor())

# Data Loader (Input Pipeline)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,

                                           batch_size=batch_size,

                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,

                                          batch_size=batch_size,

                                          shuffle=False)

class InceptionA(nn.Module):

    def __init__(self, in_channels):

        super(InceptionA, self).__init__()

        self.branch1x1 = nn.Conv2d(in_channels, 16, kernel_size=1)

        self.branch5x5_1 = nn.Conv2d(in_channels, 16, kernel_size=1)

        self.branch5x5_2 = nn.Conv2d(16, 24, kernel_size=5, padding=2)

        self.branch3x3dbl_1 = nn.Conv2d(in_channels, 16, kernel_size=1)

        self.branch3x3dbl_2 = nn.Conv2d(16, 24, kernel_size=3, padding=1)

        self.branch3x3dbl_3 = nn.Conv2d(24, 24, kernel_size=3, padding=1)

        self.branch_pool = nn.Conv2d(in_channels, 24, kernel_size=1)

    def forward(self, x):

        branch1x1 = self.branch1x1(x)

        branch5x5 = self.branch5x5_1(x)

        branch5x5 = self.branch5x5_2(branch5x5)

        branch3x3dbl = self.branch3x3dbl_1(x)

        branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl)

        branch3x3dbl = self.branch3x3dbl_3(branch3x3dbl)

        branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1)

        branch_pool = self.branch_pool(branch_pool)

        outputs = [branch1x1, branch5x5, branch3x3dbl, branch_pool]

        return torch.cat(outputs, 1)

class Net(nn.Module):

    def __init__(self):

        super(Net, self).__init__()

        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)

        self.conv2 = nn.Conv2d(88, 20, kernel_size=5)

        self.incept1 = InceptionA(in_channels=10)

        self.incept2 = InceptionA(in_channels=20)

        self.mp = nn.MaxPool2d(2)

        self.fc = nn.Linear(1408, 10)

    def forward(self, x):

        in_size = x.size(0)

        x = F.relu(self.mp(self.conv1(x)))

        x = self.incept1(x)

        x = F.relu(self.mp(self.conv2(x)))

        x = self.incept2(x)

        x = x.view(in_size, -1)  # flatten the tensor

        x = self.fc(x)

        return F.log_softmax(x)

model = Net()

optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)

def train(epoch):

    model.train()

    for batch_idx, (data, target) in enumerate(train_loader):

        data, target = Variable(data), Variable(target)

        optimizer.zero_grad()

        output = model(data)

        loss = F.nll_loss(output, target)

        loss.backward()

        optimizer.step()

        if batch_idx % 10 == 0:

            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(

                epoch, batch_idx * len(data), len(train_loader.dataset),

                100. * batch_idx / len(train_loader), loss.data[0]))

def test():

    model.eval()

    test_loss = 0

    correct = 0

    for data, target in test_loader:

        data, target = Variable(data, volatile=True), Variable(target)

        output = model(data)

        # sum up batch loss

        test_loss += F.nll_loss(output, target, size_average=False).data[0]

        # get the index of the max log-probability

        pred = output.data.max(1, keepdim=True)[1]

        correct += pred.eq(target.data.view_as(pred)).cpu().sum()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(

        test_loss, correct, len(test_loader.dataset),

        100. * correct / len(test_loader.dataset)))

for epoch in range(1, 10):

    train(epoch)

    test()

Lecture 12: RNN

Recurrrent NN

graph LR

X1 --> A1

A1 --> h1

X2 --> A2

A2 --> h2

X3 --> A3

A3 --> h3

X4 --> A4

A4 --> h4

A1 --> A2

A2 --> A3

A3 --> A4

Pytorch提供了RNN函数，可以直接使用

different RNN implementations

cell = nn.RNN(input_size=4,hidden_size=2,batch_first=True)

cell = nn.GRU(input_size=4,hidden_size=2,batch_first=True)

cell = nn.LSTM(input_size=4,hidden_size=2,batch_first=True)

How to use RNN?

cell = nn.RNN(input_size=4,hidden_size=2,batch_first=True)

inputs = ... # batch_size, seq_len,inputSize

hidden = (...) # numLayers,batch_size, hidden_size

out, hidden = cell(inputs,hidden)

有两个输出，一个是output, 一个是hidden layer的output

# Lab 12 RNN

import sys

import torch

import torch.nn as nn

from torch.autograd import Variable

torch.manual_seed(777)  # reproducibility

#            0    1    2    3    4

idx2char = ['h', 'i', 'e', 'l', 'o']

# Teach hihell -> ihello

x_data = [0, 1, 0, 2, 3, 3]   # hihell

one_hot_lookup = [[1, 0, 0, 0, 0],  # 0

                  [0, 1, 0, 0, 0],  # 1

                  [0, 0, 1, 0, 0],  # 2

                  [0, 0, 0, 1, 0],  # 3

                  [0, 0, 0, 0, 1]]  # 4

y_data = [1, 0, 2, 3, 3, 4]    # ihello

x_one_hot = [one_hot_lookup[x] for x in x_data]

# As we have one batch of samples, we will change them to variables only once

inputs = Variable(torch.Tensor(x_one_hot))

labels = Variable(torch.LongTensor(y_data))

num_classes = 5

input_size = 5  # one-hot size

hidden_size = 5  # output from the RNN. 5 to directly predict one-hot

batch_size = 1   # one sentence

sequence_length = 1  # One by one

num_layers = 1  # one-layer rnn

class Model(nn.Module):

    def __init__(self):

        super(Model, self).__init__()

        self.rnn = nn.RNN(input_size=input_size,

                          hidden_size=hidden_size, batch_first=True)

    def forward(self, hidden, x):

        # Reshape input (batch first)

        x = x.view(batch_size, sequence_length, input_size)

        # Propagate input through RNN

        # Input: (batch, seq_len, input_size)

        # hidden: (num_layers * num_directions, batch, hidden_size)

        out, hidden = self.rnn(x, hidden)

        return hidden, out.view(-1, num_classes)

    def init_hidden(self):

        # Initialize hidden and cell states

        # (num_layers * num_directions, batch, hidden_size)

        return Variable(torch.zeros(num_layers, batch_size, hidden_size))

# Instantiate RNN model

model = Model()

print(model)

# Set loss and optimizer function

# CrossEntropyLoss = LogSoftmax + NLLLoss

criterion = nn.CrossEntropyLoss()

optimizer = torch.optim.Adam(model.parameters(), lr=0.1)

# Train the model

for epoch in range(100):

    optimizer.zero_grad()

    loss = 0

    hidden = model.init_hidden()

    sys.stdout.write("predicted string: ")

    for input, label in zip(inputs, labels):

        # print(input.size(), label.size())

        hidden, output = model(hidden, input)

        val, idx = output.max(1)

        sys.stdout.write(idx2char[idx.data[0]])

        loss += criterion(output, label)

    print(", epoch: %d, loss: %1.3f" % (epoch + 1, loss.data[0]))

    loss.backward()

    optimizer.step()

print("Learning finished!")