系列文章目录

Task1
Task2 (loading…)
Task3 (loading…)




前言

本次夏令营以2024大运河杯数据开发应用创新大赛城市治理赛道为实践基础,通过代码讲解与技术分享,学习如何开发一套智能识别系统,在自动检测和分类摄像头捕获的视频中识别和分类城市管理中的违规行。该任务通过自动准确识别违规行为,并及时向管理部门发出告警,来提高城市管理的效率,具有一定社会价值。

本篇文章主要是记录和分享自己在夏令营中的学习过程和遇到的困难


一、跑通Baseline

step 1: 环境搭建

这里可以参考Datawhale官方文件:https://linklearner.com/activity/16/16/49
文件可能需要登陆后学习,里面有详细的使用指导,关于如何租用GPU,如何选择环境等等。

step 2: 安装所需依赖库和工具,准备数据集

!/opt/miniconda/bin/pip install opencv-python pandas matplotlib ultralytics
import os, sys
import cv2, glob, json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
!apt install zip unzip -y
!apt install unar -y
!wget "https://comp-public-prod.obs.cn-east-3.myhuaweicloud.com/dataset/2024/%E8%AE%AD%E7%BB%83%E9%9B%86%28%E6%9C%89%E6%A0%87%E6%B3%A8%E7%AC%AC%E4%B8%80%E6%89%B9%29.zip?AccessKeyId=583AINLNMLDRFK7CC1YM&Expires=1739168844&Signature=9iONBSJORCS8UNr2m/VZnc7yYno%3D" -O 训练集\(有标注第一批\).zip
!unar -q 训练集\(有标注第一批\).zip

!wget "https://comp-public-prod.obs.cn-east-3.myhuaweicloud.com/dataset/2024/%E6%B5%8B%E8%AF%95%E9%9B%86.zip?AccessKeyId=583AINLNMLDRFK7CC1YM&Expires=1739168909&Signature=CRsB54VqOtrzIdUHC3ay0l2ZGNw%3D" -O 测试集.zip
!unar -q 测试集.zip

step 3: 数据读取与处理

train_anno = json.load(open('训练集(有标注第一批)/标注/45.json', encoding='utf-8'))
train_anno[0], len(train_anno)
pd.read_json('训练集(有标注第一批)/标注/45.json')
video_path = '训练集(有标注第一批)/视频/45.mp4'
cap = cv2.VideoCapture(video_path)
while True:
    ret, frame = cap.read()
    if not ret:
        break
    break    
frame.shape
int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
bbox = [746, 494, 988, 786]
pt1 = (bbox[0], bbox[1])
pt2 = (bbox[2], bbox[3])
color = (0, 255, 0) 
thickness = 2 
cv2.rectangle(frame, pt1, pt2, color, thickness)
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
plt.imshow(frame)
if not os.path.exists('yolo-dataset/'):
    os.mkdir('yolo-dataset/')
if not os.path.exists('yolo-dataset/train'):
    os.mkdir('yolo-dataset/train')
if not os.path.exists('yolo-dataset/val'):
    os.mkdir('yolo-dataset/val')

dir_path = os.path.abspath('./') + '/'

with open('yolo-dataset/yolo.yaml', 'w', encoding='utf-8') as up:
    up.write(f'''
path: {dir_path}/yolo-dataset/
train: train/
val: val/
names:
    0: 非机动车违停
    1: 机动车违停
    2: 垃圾桶满溢
    3: 违法经营
''')
train_annos = glob.glob('训练集(有标注第一批)/标注/*.json')
train_videos = glob.glob('训练集(有标注第一批)/视频/*.mp4')
train_annos.sort()
train_videos.sort()
category_labels = ["非机动车违停", "机动车违停", "垃圾桶满溢", "违法经营"]
for anno_path, video_path in zip(train_annos[:5], train_videos[:5]):
    print(video_path)
    anno_df = pd.read_json(anno_path)
    cap = cv2.VideoCapture(video_path)
    frame_idx = 0 
    
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        img_height, img_width = frame.shape[:2]
        frame_anno = anno_df[anno_df['frame_id'] == frame_idx]
        cv2.imwrite('./yolo-dataset/train/' + anno_path.split('/')[-1][:-5] + '_' + str(frame_idx) + '.jpg', frame)

        if len(frame_anno) != 0:
            with open('./yolo-dataset/train/' + anno_path.split('/')[-1][:-5] + '_' + str(frame_idx) + '.txt', 'w') as up:
                for category, bbox in zip(frame_anno['category'].values, frame_anno['bbox'].values):
                    category_idx = category_labels.index(category)
                    x_min, y_min, x_max, y_max = bbox
                    x_center = (x_min + x_max) / 2 / img_width
                    y_center = (y_min + y_max) / 2 / img_height
                    width = (x_max - x_min) / img_width
                    height = (y_max - y_min) / img_height
                    if x_center > 1:
                        print(bbox)
                    up.write(f'{category_idx} {x_center} {y_center} {width} {height}\n')
        
        frame_idx += 1
for anno_path, video_path in zip(train_annos[-3:], train_videos[-3:]):
    print(video_path)
    anno_df = pd.read_json(anno_path)
    cap = cv2.VideoCapture(video_path)
    frame_idx = 0 
    
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        img_height, img_width = frame.shape[:2]        
        frame_anno = anno_df[anno_df['frame_id'] == frame_idx]
        cv2.imwrite('./yolo-dataset/val/' + anno_path.split('/')[-1][:-5] + '_' + str(frame_idx) + '.jpg', frame)

        if len(frame_anno) != 0:
            with open('./yolo-dataset/val/' + anno_path.split('/')[-1][:-5] + '_' + str(frame_idx) + '.txt', 'w') as up:
                for category, bbox in zip(frame_anno['category'].values, frame_anno['bbox'].values):
                    category_idx = category_labels.index(category)                   
                    x_min, y_min, x_max, y_max = bbox
                    x_center = (x_min + x_max) / 2 / img_width
                    y_center = (y_min + y_max) / 2 / img_height
                    width = (x_max - x_min) / img_width
                    height = (y_max - y_min) / img_height
                    up.write(f'{category_idx} {x_center} {y_center} {width} {height}\n')
        
        frame_idx += 1

step 4: 训练模型并测试效果

!wget http://mirror.coggle.club/yolo/yolov8n-v8.2.0.pt -O yolov8n.pt
!mkdir -p ~/.config/Ultralytics/
!wget http://mirror.coggle.club/yolo/Arial.ttf -O ~/.config/Ultralytics/Arial.ttf
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
import warnings
warnings.filterwarnings('ignore')
from ultralytics import YOLO
model = YOLO("yolov8n.pt")
results = model.train(data="yolo-dataset/yolo.yaml", epochs=2, imgsz=1080, batch=16)
category_labels = ["非机动车违停", "机动车违停", "垃圾桶满溢", "违法经营"]
if not os.path.exists('result/'):
    os.mkdir('result')
from ultralytics import YOLO
model = YOLO("runs/detect/train/weights/best.pt")
import glob

for path in glob.glob('测试集/*.mp4'):
    submit_json = []
    results = model(path, conf=0.05, imgsz=1080,  verbose=False)
    for idx, result in enumerate(results):
        boxes = result.boxes  # Boxes object for bounding box outputs
        masks = result.masks  # Masks object for segmentation masks outputs
        keypoints = result.keypoints  # Keypoints object for pose outputs
        probs = result.probs  # Probs object for classification outputs
        obb = result.obb  # Oriented boxes object for OBB outputs
        if len(boxes.cls) == 0:
            continue
        xywh = boxes.xyxy.data.cpu().numpy().round()
        cls = boxes.cls.data.cpu().numpy().round()
        conf = boxes.conf.data.cpu().numpy()
        for i, (ci, xy, confi) in enumerate(zip(cls, xywh, conf)):
            submit_json.append(
                {
                    'frame_id': idx,
                    'event_id': i+1,
                    'category': category_labels[int(ci)],
                    'bbox': list([int(x) for x in xy]),
                    "confidence": float(confi)
                }
            )
    with open('./result/' + path.split('/')[-1][:-4] + '.json', 'w', encoding='utf-8') as up:
        json.dump(submit_json, up, indent=4, ensure_ascii=False)
!\rm result/.ipynb_checkpoints/ -rf
!\rm result.zip
!zip -r result.zip result/

二、学习与相关问题

1. 重难点学习

 
#Datawhale #AI夏令营 #针对城市管理中违规行为的智能识别系统——YOLO解决方案-LMLPHP

2. 问题与初步解答


问题1:为什么代码中出现了两次定义类别标签列表?


问题2:为什么下载依赖库时出现如下报错:

WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

问题3:为什么训练模型时出现如下报错:

WARNING: ⚠️ imgsz=[1080] must be multiple of max stride 32, updating to [1088]

问题4:为什么测试模型时出现如下报错:

WARNING: ⚠️ inference results will accumulate in RAM unless `stream=True` is passed, causing potential out-of-memory
errors for large sources or long-running streams and videos. See https://docs.ultralytics.com/modes/predict/ for help.

Example:
    results = model(source=..., stream=True)  # generator of Results objects
    for r in results:
        boxes = r.boxes  # Boxes object for bbox outputs
        masks = r.masks  # Masks object for segment masks outputs
        probs = r.probs  # Class probabilities for classification outputs

总结

只是本篇针对Task1的总结,系列总结会在第三篇

以上就是对本赛题baseline的详细讲解,也进行了一定的学习与问题分享,后续会继续进阶学习,优化模型性能,持续分享。

08-30 01:02