在Ubuntu 18.04上使用GPU进行PyTorch对象检测-RuntimeError:CUDA内存不足.尝试分配xx.xx MiB

本文介绍了在Ubuntu 18.04上使用GPU进行PyTorch对象检测-RuntimeError:CUDA内存不足.尝试分配xx.xx MiB的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试获取此PyTorch人检测示例:

I'm attempting to get this PyTorch person detection example:

https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html

在Jupyter Notebook或常规python文件中使用GPU在本地运行.无论哪种方式，我都会得到标题中的错误.

running locally with a GPU, either in a Jupyter Notebook or a regular python file. I get the error in the title either way.

我正在使用Ubuntu 18.04.这是我已执行的步骤的摘要:

I'm using Ubuntu 18.04. Here is a summary of the steps I've performed:

1)在具有GTX 1650 GPU的Lenovo ThinkPad X1 Extreme Gen 2上安装了股票Ubuntu 18.04.

1) Stock Ubuntu 18.04 install on a Lenovo ThinkPad X1 Extreme Gen 2 with a GTX 1650 GPU.

2)执行标准CUDA 10.0/cuDNN 7.4安装.我不想重述所有步骤，因为这篇文章已经足够长了.这是一个标准过程，几乎所有通过谷歌搜索找到的链接都是我遵循的.

2) Perform a standard CUDA 10.0 / cuDNN 7.4 install. I'd rather not restate all the steps as this post is going to be more than long enough already. This is a standard procedure, pretty much any link found via googling is what I followed.

3)安装torch和torchvision

pip3 install torch torchvision

4)从PyTorch网站上的此链接:

4) From this link on the PyTorch site:

https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html

我都保存了链接的笔记本:

I've both saved the linked notebook:

https://colab .research.google.com/github/pytorch/vision/blob/temp-tutorial/tutorials/torchvision_finetuning_instance_segmentation.ipynb

并且还尝试了底部具有常规Python文件的链接:

And Also tried the link at the bottom that has the regular Python file:

https://pytorch.org/tutorials/_static/tv-training -code.py

5)在运行笔记本或常规Python方式之前，我执行了以下操作(位于上面链接的笔记本的顶部):

5) Before running either the notebook or the regular Python way, I did the following (found at the top of the above linked notebook):

将CoCo API安装到Python中:

Install the CoCo API into Python:

cd ~
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI

在gedit中打开Makefile，将"python"的两个实例更改为"python3"，然后:

open Makefile in gedit, change the two instances of "python" to "python3", then:

python3 setup.py build_ext --inplace
sudo python3 setup.py install

获取运行以上链接文件所需的文件:

Get the necessary files the above linked files need to run:

cd ~
git clone https://github.com/pytorch/vision.git
cd vision
git checkout v0.5.0

从~/vision/references/detection中复制

，将coco_eval.py，coco_utils.py，engine.py，transforms.py和utils.py复制到从上面运行链接的笔记本或tv-training-code.py文件的任何目录.

from ~/vision/references/detection, copy coco_eval.py, coco_utils.py, engine.py, transforms.py, and utils.py to whichever directory the above linked notebook or tv-training-code.py file are being ran from.

6)从上一页的链接下载Penn Fudan行人数据集:

6) Download the Penn Fudan Pedestrian dataset from the link on the above page:

https://www.cis.upenn.edu/~jshi /ped_html/PennFudanPed.zip

然后解压缩并放置在与笔记本或tv-training-code.py

then unzip and put in the same directory as the notebook or tv-training-code.py

如果以上链接中断或只是为了便于参考，这里是tv-training-code.py，因为我目前已下载它:

In case the above link ever breaks or just for easier reference, here is tv-training-code.py as I have downloaded it at this time:

# Sample code from the TorchVision 0.3 Object Detection Finetuning Tutorial
# http://pytorch.org/tutorials/intermediate/torchvision_tutorial.html

import os
import numpy as np
import torch
from PIL import Image

import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor

from engine import train_one_epoch, evaluate
import utils
import transforms as T


class PennFudanDataset(object):
    def __init__(self, root, transforms):
        self.root = root
        self.transforms = transforms
        # load all image files, sorting them to
        # ensure that they are aligned
        self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages"))))
        self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks"))))

    def __getitem__(self, idx):
        # load images ad masks
        img_path = os.path.join(self.root, "PNGImages", self.imgs[idx])
        mask_path = os.path.join(self.root, "PedMasks", self.masks[idx])
        img = Image.open(img_path).convert("RGB")
        # note that we haven't converted the mask to RGB,
        # because each color corresponds to a different instance
        # with 0 being background
        mask = Image.open(mask_path)

        mask = np.array(mask)
        # instances are encoded as different colors
        obj_ids = np.unique(mask)
        # first id is the background, so remove it
        obj_ids = obj_ids[1:]

        # split the color-encoded mask into a set
        # of binary masks
        masks = mask == obj_ids[:, None, None]

        # get bounding box coordinates for each mask
        num_objs = len(obj_ids)
        boxes = []
        for i in range(num_objs):
            pos = np.where(masks[i])
            xmin = np.min(pos[1])
            xmax = np.max(pos[1])
            ymin = np.min(pos[0])
            ymax = np.max(pos[0])
            boxes.append([xmin, ymin, xmax, ymax])

        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        # there is only one class
        labels = torch.ones((num_objs,), dtype=torch.int64)
        masks = torch.as_tensor(masks, dtype=torch.uint8)

        image_id = torch.tensor([idx])
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        # suppose all instances are not crowd
        iscrowd = torch.zeros((num_objs,), dtype=torch.int64)

        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        target["masks"] = masks
        target["image_id"] = image_id
        target["area"] = area
        target["iscrowd"] = iscrowd

        if self.transforms is not None:
            img, target = self.transforms(img, target)

        return img, target

    def __len__(self):
        return len(self.imgs)

def get_model_instance_segmentation(num_classes):
    # load an instance segmentation model pre-trained pre-trained on COCO
    model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)

    # get number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # replace the pre-trained head with a new one
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

    # now get the number of input features for the mask classifier
    in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
    hidden_layer = 256
    # and replace the mask predictor with a new one
    model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask,
                                                       hidden_layer,
                                                       num_classes)

    return model


def get_transform(train):
    transforms = []
    transforms.append(T.ToTensor())
    if train:
        transforms.append(T.RandomHorizontalFlip(0.5))
    return T.Compose(transforms)


def main():
    # train on the GPU or on the CPU, if a GPU is not available
    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

    # our dataset has two classes only - background and person
    num_classes = 2
    # use our dataset and defined transformations
    dataset = PennFudanDataset('PennFudanPed', get_transform(train=True))
    dataset_test = PennFudanDataset('PennFudanPed', get_transform(train=False))

    # split the dataset in train and test set
    indices = torch.randperm(len(dataset)).tolist()
    dataset = torch.utils.data.Subset(dataset, indices[:-50])
    dataset_test = torch.utils.data.Subset(dataset_test, indices[-50:])

    # define training and validation data loaders
    data_loader = torch.utils.data.DataLoader(
        dataset, batch_size=2, shuffle=True, num_workers=4,
        collate_fn=utils.collate_fn)

    data_loader_test = torch.utils.data.DataLoader(
        dataset_test, batch_size=1, shuffle=False, num_workers=4,
        collate_fn=utils.collate_fn)

    # get the model using our helper function
    model = get_model_instance_segmentation(num_classes)

    # move model to the right device
    model.to(device)

    # construct an optimizer
    params = [p for p in model.parameters() if p.requires_grad]
    optimizer = torch.optim.SGD(params, lr=0.005,
                                momentum=0.9, weight_decay=0.0005)
    # and a learning rate scheduler
    lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                                   step_size=3,
                                                   gamma=0.1)

    # let's train it for 10 epochs
    num_epochs = 10

    for epoch in range(num_epochs):
        # train for one epoch, printing every 10 iterations
        train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
        # update the learning rate
        lr_scheduler.step()
        # evaluate on the test dataset
        evaluate(model, data_loader_test, device=device)

    print("That's it!")

if __name__ == "__main__":
    main()

这是tv-training-code.py

$ python3 tv-training-code.py
Epoch: [0]  [ 0/60]  eta: 0:01:17  lr: 0.000090  loss: 4.1717 (4.1717)  loss_classifier: 0.8903 (0.8903)  loss_box_reg: 0.1379 (0.1379)  loss_mask: 3.0632 (3.0632)  loss_objectness: 0.0700 (0.0700)  loss_rpn_box_reg: 0.0104 (0.0104)  time: 1.2864  data: 0.1173  max mem: 1865
Traceback (most recent call last):
  File "tv-training-code.py", line 165, in <module>
    main()
  File "tv-training-code.py", line 156, in main
    train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
  File "/xxx/PennFudanExample/engine.py", line 46, in train_one_epoch
    losses.backward()
  File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 166, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py", line 99, in backward
    allow_unreachable=True)  # allow_unreachable flag
  File "/usr/local/lib/python3.6/dist-packages/torch/autograd/function.py", line 77, in apply
    return self._forward_cls.backward(self, *args)
  File "/usr/local/lib/python3.6/dist-packages/torch/autograd/function.py", line 189, in wrapper
    outputs = fn(ctx, *args)
  File "/usr/local/lib/python3.6/dist-packages/torchvision/ops/roi_align.py", line 38, in backward
    output_size[0], output_size[1], bs, ch, h, w, sampling_ratio)
RuntimeError: CUDA out of memory. Tried to allocate 132.00 MiB (GPU 0; 3.81 GiB total capacity; 2.36 GiB already allocated; 132.69 MiB free; 310.59 MiB cached) (malloc at /pytorch/c10/cuda/CUDACachingAllocator.cpp:267)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7fdfb6c9b813 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x1ce68 (0x7fdfb6edce68 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x1de6e (0x7fdfb6edde6e in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10_cuda.so)
frame #3: at::native::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>) + 0x279 (0x7fdf59472789 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch.so)
[many more frame lines omitted]

很明显，这行:

RuntimeError: CUDA out of memory. Tried to allocate 132.00 MiB (GPU 0; 3.81 GiB total capacity; 2.36 GiB already allocated; 132.69 MiB free; 310.59 MiB cached) (malloc at /pytorch/c10/cuda/CUDACachingAllocator.cpp:267)

是严重错误.

如果我在运行之前运行nvidia-smi:

If I run an nvidia-smi before a run:

$ nvidia-smi
Tue Dec 24 14:32:49 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44       Driver Version: 440.44       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1650    Off  | 00000000:01:00.0  On |                  N/A |
| N/A   47C    P8     5W /  N/A |    296MiB /  3903MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1190      G   /usr/lib/xorg/Xorg                           142MiB |
|    0      1830      G   /usr/bin/gnome-shell                          72MiB |
|    0      3711      G   ...uest-channel-token=14371934934688572948    78MiB |
+-----------------------------------------------------------------------------+

很明显，有足够的GPU内存可用(此GPU为4GB).

It seems pretty clear there is plenty of GPU memory available (this GPU is 4GB).

此外，我确信我的CUDA/cuDNN安装和GPU硬件性能良好，我经常在这台计算机上训练和推断TensorFlow对象检测API，只要我使用从未有过的allow_growth选项与GPU相关的错误.

Moreover, I'm confident my CUDA/cuDNN install and GPU hardware are good b/c I train and inference the TensorFlow object detection API on this computer frequently, and as long as I use the allow_growth option I never have GPU related errors.

从Google搜索此错误看来，这是相对常见的情况.最常见的解决方案是:

From Googling on this error it seems to be relatively common. The most common solutions are:

1)尝试较小的批次大小(由于训练和测试的批次大小分别为2和1，并且我尝试使用1和1仍然出现相同的错误，因此在这种情况下并不适用)

1) Try a smaller batch size (not really applicable in this case since the training and testing batch sizes are 2 and 1 respectively, and I tried with 1 and 1 and still got the same error)

2)更新到最新版本的PyTorch(但我已经拥有最新版本).

2) Update to the latest version of PyTorch (but I'm already at the latest version).

其他一些建议涉及重新设计训练脚本.我对TensorFlow非常熟悉，但是我是PyTorch的新手，所以我不确定该怎么做.另外，我针对此错误可以找到的大部分返工建议均与对象检测无关，因此我无法将它们与该培训脚本专门相关.

Some other suggestions involve reworking the training script. I'm very familiar with TensorFlow but I'm new to PyTorch so I'm not sure how to go about that. Also, most of the rework suggestions I can find for this error do not pertain to object detection and therefore I'm not able to relate them to this training script specifically.

还有没有其他人将此脚本与NVIDIA GPU一起在本地运行?您是否怀疑有关OS/CUDA/PyTorch配置的问题，或者是否可以重新编写脚本以防止出现此错误?任何帮助将不胜感激.

Has anybody else gotten this script to run locally with an NVIDIA GPU? Do you suspect a OS/CUDA/PyTorch configuration concern, or is there someway the script can be reworked to prevent this error? Any assistance would be greatly appreciated.

RuntimeError

在Ubuntu 18.04上使用GPU进行PyTorch对象检测-RuntimeError:CUDA内存不足.尝试分配xx.xx MiB

问题描述

推荐答案