前言

该系列文章会介绍神经网络模型从训练到部署的全流程,对于已经参加工作的人可以快速的了解如何使用深度学习技术满足项目需求;对于学生群体可以实际使用算法,获得入门的成就感,有助于后续对深度学习的理论研究!
重点强调:本系列没有关于深度学习的详细理论介绍,关于理论部分推荐去看吴恩达,李沐等大佬的视频!!!
首先你要具备以下知识:

  • 深度学习理论基础(不懂的话去B站搜吴恩达)
  • pytorch框架使用(不懂的话去B站搜李沐)

模型训练全流程

以图像分类任务为例!

1.数据准备

深度学习技术的一切基础就是数据!数据!数据!小公司被大公司按在地上摩擦的主要原因就是数据层面完全比不过大公司,本片文章使用的是花类数据集,包括:daisy(雏菊)、dandelion(蒲公英)、roses(玫瑰)、sunflowers(向日葵)和 tulips(郁金香)5个类别,下载连接
深度学习模型部署全流程-模型训练-LMLPHP
下载完数据后还需要对数据进行处理,包括:

  • 制作标签
  • 对数据进行切分,训练集(用于训练模型),验证集(用于验证训练后的模型效果)

执行以下脚本即可得到训练集,验证集

import os
import random

# 根据自己数据的路径对应修改
root = "./flower_photos/"

file_name = ["daisy", "dandelion", "roses", "sunflowers", "tulips"]

for i in range(5):
    file_path = os.path.join(root, file_name[i])
    img_name_list = os.listdir(file_path)
    
    num = len(img_name_list)
    train_num = int(num * 0.8)

    train_id = random.sample(range(0, num), train_num)
    print(train_id)

    with open("./train.txt", "a+") as f:
        for ID in train_id:
            img_path = os.path.join(file_path, img_name_list[ID])
            data = img_path + " " + str(i) + "\n"
            f.write(data)
            print(data)

    with open("./val.txt", "a+") as f:
        for ID in range(num):
            if ID in train_id:
                continue
            else:
                img_path = os.path.join(file_path, img_name_list[ID])
                data = img_path + " " + str(i) + "\n"
                f.write(data)
                print(data)

最终会得到两个txt文件,其中包含了图像路径以及每张图像对应的标签(每行末尾处的0代表第0类daisy雏菊),到此数据准备完毕!
深度学习模型部署全流程-模型训练-LMLPHP

2.数据加载

这一部分确实不知道该怎么去讲解,因为pytorch已经把加载数据的API完全制作好了,我们只需要按照固定的步骤即可加载数据,挑几个关键部分介绍下吧
PS:个人并不太推荐花费大量的时间研究这类开源API,更加推荐学习下如何使用(看几个实例,搞清楚数据流的输入与输出就懂了),除非你需要自己实现一个类似的功能函数,再去仔细研究别人怎么写的!

  • 图像预处理,把读入的图像进行resize,归一化等操作,并转化为Tensor
'''
Resize:将入读的任意图像转化为固定分辨率
ToTensor:转化为适用于pytorch的tensor数据类型
Normalize:归一化操作,该参数由一些著名实验室实验得出
'''
data_transforms = {
    'train': transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}
  • getitem’,定义具体加载数据的方式
class FlowersClsDataset(torch.utils.data.Dataset):

    def __init__(self, list_path, img_transform=None):
        super(FlowersClsDataset, self).__init__()
        self.img_transform = img_transform
        with open(list_path, 'r') as f:
            self.list = f.readlines()

    def __getitem__(self, index):
    	# 指定路径
        name = self.list[index].split()[0]
        img_path = name
        # 读入图像
        img = loader_func(img_path)

        if self.img_transform is not None:
            img = self.img_transform(img)

        # 读入标签
        label = int(self.list[index].split()[-1])
        
        return img, label

    def __len__(self):
        return len(self.list)

通过调试查看读入的数据是否正确,可以看到图像数据已经转化为tensor类型了,大功告成!
深度学习模型部署全流程-模型训练-LMLPHP

3.搭建神经网络

深度学习中最重要的部分,这一部分的可解释性较低,通常由著名实验室通过大量的实验得出,这里给出一份网络模型集合网址,大家可以根据自己的项目需求,硬件条件自行选择,并且该网址也配备了每个网络模型的论文,代码实现,非常良心!
网络模型链接
PS:该网站集合了几乎所有网络模型结构,并且也包含各种最新的具体算法,如目标检测,语义分割,图像分类等。画重点:基本上都有具体代码链接!!!
深度学习模型部署全流程-模型训练-LMLPHP
深度学习模型部署全流程-模型训练-LMLPHP
本文只用于模型训练流程演示,因此网络模型随便搭建,并没有参考某个具体的网络模型结构,只是简单的卷积+BN+Relu层的堆叠,具体代码入下

在这里插入代码片# 整合卷积,bn,relu操作
class conv_bn_relu(torch.nn.Module):
    def __init__(self,in_channels, out_channels, kernel_size, stride=1, padding=0):
        super(conv_bn_relu,self).__init__()
        self.conv = torch.nn.Conv2d(in_channels,out_channels, kernel_size, stride = stride, padding = padding)
        self.bn = torch.nn.BatchNorm2d(out_channels)
        self.relu = torch.nn.ReLU()

    def forward(self,x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.relu(x)
        return x

# 定义网络模型
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.layer1 = conv_bn_relu(3, 8, 3, 1, 1)
        self.layer2 = conv_bn_relu(8, 16, 3, 1, 1)
        self.layer3 = conv_bn_relu(16, 32, 3, 1, 1)
        self.layer4 = conv_bn_relu(32, 64, 3, 1, 1)
        self.layer5 = conv_bn_relu(64, 96, 3, 1, 1)

        self.fc1 = nn.Linear(7 * 7 * 96, 1024)
        self.fc2 = nn.Linear(1024, 128)
        self.fc3 = nn.Linear(128, 5)

        self.maxpool = nn.MaxPool2d(2, 2)
        self.softmax = nn.Softmax(dim=-1)
 
    def forward(self, x):
        x = self.layer1(x)
        x = self.maxpool(x)

        x = self.layer2(x)
        x = self.maxpool(x)

        x = self.layer3(x)
        x = self.maxpool(x)

        x = self.layer4(x)
        x = self.maxpool(x)

        x = self.layer5(x)
        x = self.maxpool(x).view(-1, 7*7*96)

        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)

        x = self.softmax(x)
 
        return x
 
net = Net()
net.to(device)

模型搭建完毕后,最重要的是验证其输入与输出的维度是否和预期的一致,以本片为例输入的数据维度(N,3,224,224),输出数据为(N,5)
此时模型输入为(10,3,224,224),10是通过batch_size得出
深度学习模型部署全流程-模型训练-LMLPHP
此时模型输出为(10,5),与我们所需数据维度一致
深度学习模型部署全流程-模型训练-LMLPHP

4.设置损失函数,优化器

# 定义损失函数,分类损失
class ClsLoss(nn.Module):
    def __init__(self):
        super(ClsLoss, self).__init__()
        self.nll = nn.NLLLoss()

    def forward(self, pre, labels):
        pre = torch.log(pre)
        loss = self.nll(pre, labels)
        return loss

# 损失函数实例化
loss_func = ClsLoss()

# 网络模型实例化
net = Net()
# 模型加载到GPU中
net.to(device)

# 提取网络模型参数
training_params = filter(lambda p: p.requires_grad, net.parameters())
# 定义优化器
optimizer = torch.optim.Adam(training_params, lr=0.0003, weight_decay=0.0001)

5.训练网络模型

# 具体执行训练过程
for epoch in range(31):
    running_loss = 0.0
    for i, data in enumerate(train_loader, 0):
        # 获取图像数据和标签
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = net(inputs)
        # 计算损失
        loss = loss_func(outputs, labels)
        # 优化模型参数
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        if i % 200 == 199:
            print('[%d %5d] loss: %.3f' % (epoch + 1, i + 1, running_loss / 200))
            running_loss = 0.0
    # 保存网络模型
    torch.save(net.state_dict(), "./model/" + str(epoch) + ".pth")
print('finished training!')

6.模型测试

#测试
correct = 0
total = 0
with torch.no_grad():
    for data in test_loader:
        images, labels = data
        images, labels = images.to(device), labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs.data, dim=1)
        total += labels.size(0)
        correct += (predicted == labels).sum()
print('Accuracy of the network on the test images: %d %%' % (100 * correct / total))

7.完整代码

import torch
from torch.utils.data import DataLoader
from torchvision import transforms
import torch.nn as nn
import torch.nn.functional as F
from PIL import Image

# 指定具体显卡设备
device = torch.device('cuda:0')

# 图像数据预处理步骤
'''
Resize:将入读的任意图像转化为固定分辨率
ToTensor:转化为适用于pytorch的tensor数据类型
Normalize:归一化操作,该参数由一些著名实验室实验得出
'''
data_transforms = {
    'train': transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

def loader_func(path):
    return Image.open(path).convert('RGB')

# 数据加载器,__getitem__模块最为重要,完成数据读入与标签读入
class FlowersClsDataset(torch.utils.data.Dataset):

    def __init__(self, list_path, img_transform=None):
        super(FlowersClsDataset, self).__init__()
        self.img_transform = img_transform
        with open(list_path, 'r') as f:
            self.list = f.readlines()

    def __getitem__(self, index):
        name = self.list[index].split()[0]
        img_path = name
        # 读入图像
        img = loader_func(img_path)

        if self.img_transform is not None:
            img = self.img_transform(img)

        # 读入标签
        label = int(self.list[index].split()[-1])
        
        return img, label

    def __len__(self):
        return len(self.list)

# 完成数据加载器实例化
train_dataset = FlowersClsDataset('train.txt', img_transform=data_transforms['train'])
test_dataset = FlowersClsDataset('val.txt', img_transform=data_transforms['val'])

# 制作DataLoader,设置batch_size
train_loader = DataLoader(train_dataset, batch_size=10, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=10, shuffle=True)

# 定义损失函数,分类损失
class ClsLoss(nn.Module):
    def __init__(self):
        super(ClsLoss, self).__init__()
        self.nll = nn.NLLLoss()

    def forward(self, pre, labels):
        pre = torch.log(pre)
        loss = self.nll(pre, labels)
        return loss

# 损失函数实例化
loss_func = ClsLoss()

# 整合卷积,bn,relu操作
class conv_bn_relu(torch.nn.Module):
    def __init__(self,in_channels, out_channels, kernel_size, stride=1, padding=0):
        super(conv_bn_relu,self).__init__()
        self.conv = torch.nn.Conv2d(in_channels,out_channels, kernel_size, stride = stride, padding = padding)
        self.bn = torch.nn.BatchNorm2d(out_channels)
        self.relu = torch.nn.ReLU()

    def forward(self,x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.relu(x)
        return x

# 定义网络模型
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.layer1 = conv_bn_relu(3, 8, 3, 1, 1)
        self.layer2 = conv_bn_relu(8, 16, 3, 1, 1)
        self.layer3 = conv_bn_relu(16, 32, 3, 1, 1)
        self.layer4 = conv_bn_relu(32, 64, 3, 1, 1)
        self.layer5 = conv_bn_relu(64, 96, 3, 1, 1)

        self.fc1 = nn.Linear(7 * 7 * 96, 1024)
        self.fc2 = nn.Linear(1024, 128)
        self.fc3 = nn.Linear(128, 5)

        self.maxpool = nn.MaxPool2d(2, 2)
        self.softmax = nn.Softmax(dim=-1)
 
    def forward(self, x):
        x = self.layer1(x)
        x = self.maxpool(x)

        x = self.layer2(x)
        x = self.maxpool(x)

        x = self.layer3(x)
        x = self.maxpool(x)

        x = self.layer4(x)
        x = self.maxpool(x)

        x = self.layer5(x)
        x = self.maxpool(x).view(-1, 7*7*96)

        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)

        x = self.softmax(x)
 
        return x

# 网络模型实例化
net = Net()
# 模型加载到GPU中
net.to(device)

# 提取网络模型参数
training_params = filter(lambda p: p.requires_grad, net.parameters())
# 定义优化器
optimizer = torch.optim.Adam(training_params, lr=0.0003, weight_decay=0.0001)

# 具体执行训练过程
for epoch in range(31):
    running_loss = 0.0
    for i, data in enumerate(train_loader, 0):
        # 获取图像数据和标签
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = net(inputs)
        # 计算损失
        loss = loss_func(outputs, labels)
        # 优化模型参数
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        if i % 200 == 199:
            print('[%d %5d] loss: %.3f' % (epoch + 1, i + 1, running_loss / 200))
            running_loss = 0.0
    torch.save(net.state_dict(), "./model/" + str(epoch) + ".pth")
print('finished training!')
 
#测试
correct = 0
total = 0
with torch.no_grad():
    for data in test_loader:
        images, labels = data
        images, labels = images.to(device), labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs.data, dim=1)
        total += labels.size(0)
        correct += (predicted == labels).sum()
print('Accuracy of the network on the test images: %d %%' % (100 * correct / total))

9.训练结果

只训练了10论,可以看到该模型的精度为67%
深度学习模型部署全流程-模型训练-LMLPHP

小结

主要介绍了深度学习模型训练的全流程,其中最重要的是pytorch框架的熟练程度,这一部分多用几次,多看看官方文档就熟悉了;更为重要的是理论部分,需要看大量的论文,并且结合多次实验(前提是你有N卡有电费并且还有数据)才能有非常大的提升!

11-03 11:40