前言

  • yolov5网络结构比较复杂,这里是简单的对它整体网络结构有个初识,并且构建了C3网络模块
  • 这周是考试周,周一到周四一直都在准备考试和去考试,昨天开始又发高烧,更新较慢;
  • 欢迎收藏加关注,本人将会持续更新。

1、网络结构简介

简介

yolov5源码中,网络结构的参数存放在.yaml文件中,如下:

深度学习基础--yolov5网络结构简介,C3模块构建-LMLPHP

每个.yaml文件中都有一下几个部分:

yolov5l.yaml为例子:

文件描述了模型的参数、骨干网络(backbone)和头部网络(head),以及它们之间的连接方式。

🎫 **提示:**不同的.yaml文件只是depth_multiplewidth_multiple不一样。

# Ultralytics YOLOv5 🚀, AGPL-3.0 license

# Parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
anchors:
  - [10, 13, 16, 30, 33, 23] # P3/8
  - [30, 61, 62, 45, 59, 119] # P4/16
  - [116, 90, 156, 198, 373, 326] # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [
    [-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
    [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
    [-1, 3, C3, [128]],
    [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
    [-1, 6, C3, [256]],
    [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
    [-1, 9, C3, [512]],
    [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
    [-1, 3, C3, [1024]],
    [-1, 1, SPPF, [1024, 5]], # 9
  ]

# YOLOv5 v6.0 head
head: [
    [-1, 1, Conv, [512, 1, 1]],
    [-1, 1, nn.Upsample, [None, 2, "nearest"]],
    [[-1, 6], 1, Concat, [1]], # cat backbone P4
    [-1, 3, C3, [512, False]], # 13

    [-1, 1, Conv, [256, 1, 1]],
    [-1, 1, nn.Upsample, [None, 2, "nearest"]],
    [[-1, 4], 1, Concat, [1]], # cat backbone P3
    [-1, 3, C3, [256, False]], # 17 (P3/8-small)

    [-1, 1, Conv, [256, 3, 2]],
    [[-1, 14], 1, Concat, [1]], # cat head P4
    [-1, 3, C3, [512, False]], # 20 (P4/16-medium)

    [-1, 1, Conv, [512, 3, 2]],
    [[-1, 10], 1, Concat, [1]], # cat head P5
    [-1, 3, C3, [1024, False]], # 23 (P5/32-large)

    [[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
  ]

参数

  • nc: 80 :表示模型被训练来检测 80 类物体,这是 COCO 数据集的类别数量。
  • depth_multiple: 0.33width_multiple: 0.50 :这两个参数用于缩放模型的深度(层数)和宽度(通道数)
  • anchors :定义了三个尺度上的先验框(anchor boxes),分别对应于特征图 P3/8, P4/16, P5/32。每个尺度有三个 anchor,用于预测目标的不同尺寸和长宽比。

骨干网络

骨干网络是模型的主要部分,负责从输入图像中提取特征。它主要由三个模块组成, Conv(卷积层)、C3(一种自定义的残差块)和 SPPF(空间金字塔池化层的快速版本)

详解:

 # [from, number, module, args]
[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2   0 代表是0层
  • -1(from):上一层输出;
  • 1(number):网络模块数量,最总等于 1 * depth_multiple取整数;
  • Conv:模块名称;
  • args:参数
    • 64:输出通道(channel)
    • 6:卷积核数(kernel_size)
    • 2:填充(padding)
    • 2:步长(stride)

但是有些Conv模块是这样的:

[-1, 1, Conv, [256, 3, 2]], # 3-P3/8

前面还是一样的,就是[256, 3, 2]不一样

  • 256:不还是输出通道
  • 3:卷积核数量
  • 2:步长
[-1, 3, C3, [128]]
  • -1(from):上一层输出;
  • 3(number):网络模块数量,最总等于 1 * depth_multiple取整数;
  • C3:模块名称;
  • args:参数
    • 128:输出通道
[-1, 1, SPPF, [1024, 5]]
  • 主要是对不同尺度特征图的融合
  • -1: 输入是上一层的输出
  • **1:**网络模块数量为1
  • SPPF: 该层的网络层名字是SPPF
  • [1024, 5]:
    • **1024:**channel=1024
    • **5:**kernel_size=5

头部网络

头部网络位于骨干网络之后,,它通常包括上采样操作,以及 Detect 层用于;他有两个部分:向上构建、向下构建

网络结构简介

宏观视角,以yolov5l.yaml为例:

深度学习基础--yolov5网络结构简介,C3模块构建-LMLPHP

深度学习基础--yolov5网络结构简介,C3模块构建-LMLPHP


深度学习基础--yolov5网络结构简介,C3模块构建-LMLPHP

C3模块简介

C3 模块是 YOLOv5 中引入的一种模块,它在 YOLOv5 的骨干网络(backbone)和头部网络(head)中被广泛使用。C3 是 Cross Convolutional Layer (交叉卷积层) 的缩写,但它的实际设计更类似于一个改进的残差块(Residual Block),旨在提高模型的特征提取能力和训练效率。

深度学习基础--yolov5网络结构简介,C3模块构建-LMLPHP

C3 模块的结构

C3 模块通常由以下几个部分组成:

  1. Bottleneck 或 CSP Bottleneck:
    • 在 YOLOv5 中,C3 模块一般包含多个 Bottleneck 或者称为 CSP (Cross Stage Partial connections) Bottleneck 层。,有助于缓解梯度消失问题并加速训练。
    • 梯度消失:它指的是在反向传播过程中,误差梯度随着层的加深而逐渐减小到接近于零的现象。这会导致靠近输入端的层(即较低层)的学习速度变得极其缓慢,甚至几乎停止更新权重,从而严重影响模型的训练效果和最终性能。
  2. 通道分割与跨阶段部分连接:
    • C3 模块的一个显著特点是它会将输入的特征图按通道维度分割成两部分,直接进入 Bottleneck 层进行处理,,这两部分最终会被合并在一起,这便是所谓的跨阶段部分连接(CSP)。这种设计可以减少计算量,同时保持良好的性能。
  3. 卷积层:
    • 在 C3 模块的开始和结束处,通常会有卷积层来调整通道数或融合来自不同路径的信息。
  4. 激活函数:
    • 卷积操作后通常跟有激活函数,如 SiLU (Sigmoid Linear Unit),也被称为 Swish,在 YOLOv5 中默认使用此激活函数以促进非线性特征的学习。

C3 模块的作用

  • 增强特征表达:
    通过多次应用 Bottleneck 结构,C3 模块,提升模型对目标物体的识别能力。
  • 优化计算资源:
    使用 CSP 技术可以在不显著增加计算成本的情况下加深网络,从而使得模型既高效又强大。
  • 防止过拟合:
    ,C3 模块可以帮助网络学习到更有效的映射关系,。

:就是采用了残差连接通道分割技术,提高了特征提取优化了计算资源,和降低了过拟合风险

2、C3网络构建

目的:构建一个C3神经网络,并用来对天气进行分类

1、数据处理

1、导入库

import torch 
import torch.nn as nn 
import torchvision 
import torchvision.transforms as transforms
from torchvision import transforms, datasets

device = ('cuda' if torch.cuda.is_available() else 'cpu')

print(device)
print(torch.__version__)
print(torchvision.__version__)
cuda
2.5.1+cu124
0.20.1+cu124

2、查看数据目录

import os, pathlib 

data_dir = './data/'
data_dir = pathlib.Path(data_dir)

classnames = [path for path in os.listdir(data_dir)]
classnames
['cloudy', 'rain', 'shine', 'sunrise']

3、导入数据与数据初始化

train_transforms = transforms.Compose([
    transforms.Resize([224, 224]),
    transforms.ToTensor(),
    transforms.Normalize(              # 数据标准化处理---> 转化为 标准状态分布,使模型更容易收敛
        mean=[0.485, 0.456, 0.406],  # rgb,均值
        std=[0.229, 0.224, 0.225]    # rgb,标准差,这两个从数据集中随机抽样得到的
    )
])

total_data = datasets.ImageFolder(data_dir, transform=train_transforms)
total_data
Dataset ImageFolder
    Number of datapoints: 1125
    Root location: data
    StandardTransform
Transform: Compose(
               Resize(size=[224, 224], interpolation=bilinear, max_size=None, antialias=True)
               ToTensor()
               Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
           )

4、划分数据集

train_size = int(len(total_data) * 0.8)
test_size = len(total_data) - train_size
# 划分数据
train_datasets, test_datasets = torch.utils.data.random_split(total_data, [train_size, test_size])
len(train_datasets), len(test_datasets)
(900, 225)

5、加载动态数据

batch_size = 32 

train_dl = torch.utils.data.DataLoader(train_datasets,
                                       batch_size=batch_size,
                                       shuffle=True)

test_dl = torch.utils.data.DataLoader(test_datasets,
                                      batch_size=batch_size,
                                      shuffle=True)

5、随机展示一批图片

import matplotlib.pyplot as plt

images, labels = next(iter(train_dl))

plt.figure(figsize=(20, 10))
for i in range(20):
    plt.subplot(5, 10, i + 1)
    image = images[i].cpu().numpy().transpose(1, 2, 0)  # 转换格式
    plt.imshow(image)
    plt.title(classnames[labels[i]])
    
    plt.axis('off')
    
plt.show()

深度学习基础--yolov5网络结构简介,C3模块构建-LMLPHP

2、构建C3网络模块

import torch.nn.functional as F

# 如果没有指定 padding,则自动计算padding
'''
# 卷积核大小为3的单个整数
print(autopad(3))  # 输出: 1

# 卷积核大小为 (3, 5) 的元组
print(autopad((3, 5)))  # 输出: [1, 2]
'''
def autopad(k, p=None):
    if p is None:
        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]
    return p

# 这里搭建的网络模型: 一层卷积 + 归一化 + 激活函数
'''
# 启用默认的 SiLU 激活函数
model = MyModel(act=True)

# 使用自定义或不同的激活函数,比如 ReLU
model = MyModel(act=nn.ReLU())

# 禁用激活函数,使用 Identity
model = MyModel(act=False)  # 或者 model = MyModel(act=None)
'''
class Conv(nn.Module):
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
        
    def forward(self, x):
        return self.act(self.bn(self.conv(x)))


class Bottleneck(nn.Module):
    def __init__(self, c1, c2, shortcut=True, g=1, e=0.5):
        super().__init__()
        c_ = int(c2 * e)  # 隐藏层的channel
        self.cv1 = Conv(c1, c_, 1, 1) # k 1, s 1
        self.cv2 = Conv(c_, c2, 3, 1, g=g) # k 3, s 1
        self.add = shortcut and c1 == c2   # 判断是否可以支持残差连接
        
    def forward(self, x):
        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))   # add为True,则为残差连接
    
class C3(nn.Module):
    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):
        super().__init__()
        c_ = int(c2 * e)
        # 3层卷积 
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(2 * c_, c2, 1)   # 激活层 ReLU
        self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)))  # n个Bottleneck层
        
    def forward(self, x):
        return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))  # 对于 cancat与Conv层
    
class model_k(nn.Module):
    def __init__(self):
        super(model_k, self).__init__()
        
        self.Conv = Conv(3, 32, 3, 2)
        
        self.C3_1 = C3(32, 64, 3, 2)
        
        # 全连接
        self.classifier = nn.Sequential(
            nn.Linear(in_features=802816, out_features=100),
            nn.ReLU(),
            nn.Linear(in_features=100, out_features=4)
        )
        
    def forward(self, x):
        x = self.Conv(x)
        x = self.C3_1(x)
        x = torch.flatten(x, start_dim=1)
        x = self.classifier(x)
        
        return x
    
model = model_k().to(device)
model
model_k(
  (Conv): Conv(
    (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (act): SiLU()
  )
  (C3_1): C3(
    (cv1): Conv(
      (conv): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act): SiLU()
    )
    (cv2): Conv(
      (conv): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act): SiLU()
    )
    (cv3): Conv(
      (conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act): SiLU()
    )
    (m): Sequential(
      (0): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (act): SiLU()
        )
      )
      (1): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (act): SiLU()
        )
      )
      (2): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (act): SiLU()
        )
      )
    )
  )
  (classifier): Sequential(
    (0): Linear(in_features=802816, out_features=100, bias=True)
    (1): ReLU()
    (2): Linear(in_features=100, out_features=4, bias=True)
  )
)

3、模型训练

1、设置超参数

loss_fn = nn.CrossEntropyLoss()  # 创建损失函数
learn_rate = 1e-4  # 学习率
opt = torch.optim.Adam(model.parameters(), lr=learn_rate)  # 优化的参数和学习率

2、训练函数

# 训练循环
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)  # 训练集的大小
    num_batches = len(dataloader)   # 批次数目, (size/batch_size,向上取整)

    train_loss, train_acc = 0, 0  # 初始化训练损失和正确率
    
    for X, y in dataloader:  # 获取图片及其标签
        X, y = X.to(device), y.to(device)
        
        # 计算预测误差
        pred = model(X)          # 网络输出
        loss = loss_fn(pred, y)  # 计算网络输出和真实值之间的差距,targets为真实值,计算二者差值即为损失
        
        # 反向传播
        optimizer.zero_grad()  # grad属性归零
        loss.backward()        # 反向传播
        optimizer.step()       # 每一步自动更新
        
        # 记录acc与loss
        train_acc  += (pred.argmax(1) == y).type(torch.float).sum().item()
        train_loss += loss.item()
            
    train_acc  /= size
    train_loss /= num_batches

    return train_acc, train_loss

3、测试函数

def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)  
    num_batches = len(dataloader)   
    
    test_acc, test_loss = 0, 0
    
    # 不进行反向传播了
    with torch.no_grad():
        for imgs, target in dataloader:
            imgs, target = imgs.to(device), target.to(device)
            
            # 计算loss
            target_pred = model(imgs)
            loss = loss_fn(target_pred, target)   # 注意:顺序有要求
            
            test_acc += (target_pred.argmax(1) == target).type(torch.float64).sum().item()
            test_loss += loss.item()
            
    
    test_acc /= size  # 整体
    test_loss /= num_batches  # 平均损失
    
    return test_acc, test_loss

4、训练

import copy

optimizer  = torch.optim.Adam(model.parameters(), lr= 1e-4)
loss_fn    = nn.CrossEntropyLoss() # 创建损失函数

epochs     = 20

train_loss = []
train_acc  = []
test_loss  = []
test_acc   = []

best_acc = 0    # 设置一个最佳准确率,作为最佳模型的判别指标

for epoch in range(epochs):
    
    model.train()
    epoch_train_acc, epoch_train_loss = train(train_dl, model, loss_fn, optimizer)
    
    model.eval()
    epoch_test_acc, epoch_test_loss = test(test_dl, model, loss_fn)
    
    # 保存最佳模型到 best_model
    if epoch_test_acc > best_acc:
        best_acc   = epoch_test_acc
        best_model = copy.deepcopy(model)
    
    train_acc.append(epoch_train_acc)
    train_loss.append(epoch_train_loss)
    test_acc.append(epoch_test_acc)
    test_loss.append(epoch_test_loss)
    
    # 获取当前的学习率
    lr = optimizer.state_dict()['param_groups'][0]['lr']
    
    template = ('Epoch:{:2d}, Train_acc:{:.1f}%, Train_loss:{:.3f}, Test_acc:{:.1f}%, Test_loss:{:.3f}, Lr:{:.2E}')
    print(template.format(epoch+1, epoch_train_acc*100, epoch_train_loss, 
                          epoch_test_acc*100, epoch_test_loss, lr))
    
# 保存最佳模型到文件中
PATH = './best_model.pth'  # 保存的参数文件名
torch.save(model.state_dict(), PATH)

print('Done')

Epoch: 1, Train_acc:81.3%, Train_loss:1.231, Test_acc:52.9%, Test_loss:2.798, Lr:1.00E-04
Epoch: 2, Train_acc:93.7%, Train_loss:0.220, Test_acc:87.1%, Test_loss:0.512, Lr:1.00E-04
Epoch: 3, Train_acc:97.6%, Train_loss:0.076, Test_acc:87.1%, Test_loss:0.428, Lr:1.00E-04
Epoch: 4, Train_acc:98.4%, Train_loss:0.071, Test_acc:87.6%, Test_loss:0.399, Lr:1.00E-04
Epoch: 5, Train_acc:98.2%, Train_loss:0.237, Test_acc:89.8%, Test_loss:0.508, Lr:1.00E-04
Epoch: 6, Train_acc:98.4%, Train_loss:0.361, Test_acc:87.6%, Test_loss:0.575, Lr:1.00E-04
Epoch: 7, Train_acc:98.3%, Train_loss:0.044, Test_acc:86.7%, Test_loss:0.514, Lr:1.00E-04
Epoch: 8, Train_acc:99.0%, Train_loss:0.043, Test_acc:87.1%, Test_loss:0.818, Lr:1.00E-04
Epoch: 9, Train_acc:99.2%, Train_loss:0.027, Test_acc:87.1%, Test_loss:0.642, Lr:1.00E-04
Epoch:10, Train_acc:98.1%, Train_loss:0.108, Test_acc:83.6%, Test_loss:1.082, Lr:1.00E-04
Epoch:11, Train_acc:98.0%, Train_loss:0.115, Test_acc:85.8%, Test_loss:0.668, Lr:1.00E-04
Epoch:12, Train_acc:99.4%, Train_loss:0.037, Test_acc:84.0%, Test_loss:1.188, Lr:1.00E-04
Epoch:13, Train_acc:99.9%, Train_loss:0.004, Test_acc:88.4%, Test_loss:0.610, Lr:1.00E-04
Epoch:14, Train_acc:99.4%, Train_loss:0.014, Test_acc:86.7%, Test_loss:0.561, Lr:1.00E-04
Epoch:15, Train_acc:99.9%, Train_loss:0.040, Test_acc:86.7%, Test_loss:0.612, Lr:1.00E-04
Epoch:16, Train_acc:98.1%, Train_loss:0.067, Test_acc:88.9%, Test_loss:0.649, Lr:1.00E-04
Epoch:17, Train_acc:99.4%, Train_loss:0.038, Test_acc:84.9%, Test_loss:0.815, Lr:1.00E-04
Epoch:18, Train_acc:99.6%, Train_loss:0.043, Test_acc:88.9%, Test_loss:0.650, Lr:1.00E-04
Epoch:19, Train_acc:100.0%, Train_loss:0.001, Test_acc:90.7%, Test_loss:0.555, Lr:1.00E-04
Epoch:20, Train_acc:100.0%, Train_loss:0.000, Test_acc:90.7%, Test_loss:0.614, Lr:1.00E-04
Done

4、结果显示

import matplotlib.pyplot as plt 

#隐藏警告和显示中文
import warnings
warnings.filterwarnings("ignore")               #忽略警告信息
plt.rcParams['font.sans-serif']    = ['SimHei'] # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False      # 用来正常显示负号
plt.rcParams['figure.dpi']         = 100        #分辨率

x = range(epochs)
# 创建画板
plt.figure(figsize=(12, 3))
# 子图一
plt.subplot(1, 2, 1)
plt.plot(x, train_acc, label='Train Accurary')
plt.plot(x, test_acc, label='Test Accurary')
plt.legend(loc='lower right')
plt.title("Train and test Accurary")
# 子图二
plt.subplot(1, 2, 2)
plt.plot(x, train_loss, label='Train loss')
plt.plot(x, test_loss, label='Test loss')
plt.legend(loc='upper right')
plt.title("Train and test Loss")

plt.show()

深度学习基础--yolov5网络结构简介,C3模块构建-LMLPHP

3、参考资料

12-11 01:05