问题描述
我正在尝试获取此PyTorch人检测示例:
I'm attempting to get this PyTorch person detection example:
https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
在Jupyter Notebook或常规python文件中使用GPU在本地运行.无论哪种方式,我都会得到标题中的错误.
running locally with a GPU, either in a Jupyter Notebook or a regular python file. I get the error in the title either way.
我正在使用Ubuntu 18.04.这是我已执行的步骤的摘要:
I'm using Ubuntu 18.04. Here is a summary of the steps I've performed:
1)在具有GTX 1650 GPU的Lenovo ThinkPad X1 Extreme Gen 2上安装了股票Ubuntu 18.04.
1) Stock Ubuntu 18.04 install on a Lenovo ThinkPad X1 Extreme Gen 2 with a GTX 1650 GPU.
2)执行标准CUDA 10.0/cuDNN 7.4安装.我不想重述所有步骤,因为这篇文章已经足够长了.这是一个标准过程,几乎所有通过谷歌搜索找到的链接都是我遵循的.
2) Perform a standard CUDA 10.0 / cuDNN 7.4 install. I'd rather not restate all the steps as this post is going to be more than long enough already. This is a standard procedure, pretty much any link found via googling is what I followed.
3)安装torch
和torchvision
pip3 install torch torchvision
4)从PyTorch网站上的此链接:
4) From this link on the PyTorch site:
https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
我都保存了链接的笔记本:
I've both saved the linked notebook:
并且还尝试了底部具有常规Python文件的链接:
And Also tried the link at the bottom that has the regular Python file:
https://pytorch.org/tutorials/_static/tv-training -code.py
5)在运行笔记本或常规Python方式之前,我执行了以下操作(位于上面链接的笔记本的顶部):
5) Before running either the notebook or the regular Python way, I did the following (found at the top of the above linked notebook):
将CoCo API安装到Python中:
Install the CoCo API into Python:
cd ~
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
在gedit中打开Makefile,将"python"的两个实例更改为"python3",然后:
open Makefile in gedit, change the two instances of "python" to "python3", then:
python3 setup.py build_ext --inplace
sudo python3 setup.py install
获取运行以上链接文件所需的文件:
Get the necessary files the above linked files need to run:
cd ~
git clone https://github.com/pytorch/vision.git
cd vision
git checkout v0.5.0
从~/vision/references/detection
中复制
,将coco_eval.py
,coco_utils.py
,engine.py
,transforms.py
和utils.py
复制到从上面运行链接的笔记本或tv-training-code.py
文件的任何目录.
from ~/vision/references/detection
, copy coco_eval.py
, coco_utils.py
, engine.py
, transforms.py
, and utils.py
to whichever directory the above linked notebook or tv-training-code.py
file are being ran from.
6)从上一页的链接下载Penn Fudan行人数据集:
6) Download the Penn Fudan Pedestrian dataset from the link on the above page:
https://www.cis.upenn.edu/~jshi /ped_html/PennFudanPed.zip
然后解压缩并放置在与笔记本或tv-training-code.py
then unzip and put in the same directory as the notebook or tv-training-code.py
如果以上链接中断或只是为了便于参考,这里是tv-training-code.py
,因为我目前已下载它:
In case the above link ever breaks or just for easier reference, here is tv-training-code.py
as I have downloaded it at this time:
# Sample code from the TorchVision 0.3 Object Detection Finetuning Tutorial
# http://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
import os
import numpy as np
import torch
from PIL import Image
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor
from engine import train_one_epoch, evaluate
import utils
import transforms as T
class PennFudanDataset(object):
def __init__(self, root, transforms):
self.root = root
self.transforms = transforms
# load all image files, sorting them to
# ensure that they are aligned
self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages"))))
self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks"))))
def __getitem__(self, idx):
# load images ad masks
img_path = os.path.join(self.root, "PNGImages", self.imgs[idx])
mask_path = os.path.join(self.root, "PedMasks", self.masks[idx])
img = Image.open(img_path).convert("RGB")
# note that we haven't converted the mask to RGB,
# because each color corresponds to a different instance
# with 0 being background
mask = Image.open(mask_path)
mask = np.array(mask)
# instances are encoded as different colors
obj_ids = np.unique(mask)
# first id is the background, so remove it
obj_ids = obj_ids[1:]
# split the color-encoded mask into a set
# of binary masks
masks = mask == obj_ids[:, None, None]
# get bounding box coordinates for each mask
num_objs = len(obj_ids)
boxes = []
for i in range(num_objs):
pos = np.where(masks[i])
xmin = np.min(pos[1])
xmax = np.max(pos[1])
ymin = np.min(pos[0])
ymax = np.max(pos[0])
boxes.append([xmin, ymin, xmax, ymax])
boxes = torch.as_tensor(boxes, dtype=torch.float32)
# there is only one class
labels = torch.ones((num_objs,), dtype=torch.int64)
masks = torch.as_tensor(masks, dtype=torch.uint8)
image_id = torch.tensor([idx])
area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
# suppose all instances are not crowd
iscrowd = torch.zeros((num_objs,), dtype=torch.int64)
target = {}
target["boxes"] = boxes
target["labels"] = labels
target["masks"] = masks
target["image_id"] = image_id
target["area"] = area
target["iscrowd"] = iscrowd
if self.transforms is not None:
img, target = self.transforms(img, target)
return img, target
def __len__(self):
return len(self.imgs)
def get_model_instance_segmentation(num_classes):
# load an instance segmentation model pre-trained pre-trained on COCO
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
# now get the number of input features for the mask classifier
in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
hidden_layer = 256
# and replace the mask predictor with a new one
model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask,
hidden_layer,
num_classes)
return model
def get_transform(train):
transforms = []
transforms.append(T.ToTensor())
if train:
transforms.append(T.RandomHorizontalFlip(0.5))
return T.Compose(transforms)
def main():
# train on the GPU or on the CPU, if a GPU is not available
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
# our dataset has two classes only - background and person
num_classes = 2
# use our dataset and defined transformations
dataset = PennFudanDataset('PennFudanPed', get_transform(train=True))
dataset_test = PennFudanDataset('PennFudanPed', get_transform(train=False))
# split the dataset in train and test set
indices = torch.randperm(len(dataset)).tolist()
dataset = torch.utils.data.Subset(dataset, indices[:-50])
dataset_test = torch.utils.data.Subset(dataset_test, indices[-50:])
# define training and validation data loaders
data_loader = torch.utils.data.DataLoader(
dataset, batch_size=2, shuffle=True, num_workers=4,
collate_fn=utils.collate_fn)
data_loader_test = torch.utils.data.DataLoader(
dataset_test, batch_size=1, shuffle=False, num_workers=4,
collate_fn=utils.collate_fn)
# get the model using our helper function
model = get_model_instance_segmentation(num_classes)
# move model to the right device
model.to(device)
# construct an optimizer
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005,
momentum=0.9, weight_decay=0.0005)
# and a learning rate scheduler
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
step_size=3,
gamma=0.1)
# let's train it for 10 epochs
num_epochs = 10
for epoch in range(num_epochs):
# train for one epoch, printing every 10 iterations
train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
# update the learning rate
lr_scheduler.step()
# evaluate on the test dataset
evaluate(model, data_loader_test, device=device)
print("That's it!")
if __name__ == "__main__":
main()
这是tv-training-code.py
$ python3 tv-training-code.py
Epoch: [0] [ 0/60] eta: 0:01:17 lr: 0.000090 loss: 4.1717 (4.1717) loss_classifier: 0.8903 (0.8903) loss_box_reg: 0.1379 (0.1379) loss_mask: 3.0632 (3.0632) loss_objectness: 0.0700 (0.0700) loss_rpn_box_reg: 0.0104 (0.0104) time: 1.2864 data: 0.1173 max mem: 1865
Traceback (most recent call last):
File "tv-training-code.py", line 165, in <module>
main()
File "tv-training-code.py", line 156, in main
train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
File "/xxx/PennFudanExample/engine.py", line 46, in train_one_epoch
losses.backward()
File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 166, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/function.py", line 77, in apply
return self._forward_cls.backward(self, *args)
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/function.py", line 189, in wrapper
outputs = fn(ctx, *args)
File "/usr/local/lib/python3.6/dist-packages/torchvision/ops/roi_align.py", line 38, in backward
output_size[0], output_size[1], bs, ch, h, w, sampling_ratio)
RuntimeError: CUDA out of memory. Tried to allocate 132.00 MiB (GPU 0; 3.81 GiB total capacity; 2.36 GiB already allocated; 132.69 MiB free; 310.59 MiB cached) (malloc at /pytorch/c10/cuda/CUDACachingAllocator.cpp:267)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7fdfb6c9b813 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x1ce68 (0x7fdfb6edce68 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x1de6e (0x7fdfb6edde6e in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10_cuda.so)
frame #3: at::native::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>) + 0x279 (0x7fdf59472789 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch.so)
[many more frame lines omitted]
很明显,这行:
RuntimeError: CUDA out of memory. Tried to allocate 132.00 MiB (GPU 0; 3.81 GiB total capacity; 2.36 GiB already allocated; 132.69 MiB free; 310.59 MiB cached) (malloc at /pytorch/c10/cuda/CUDACachingAllocator.cpp:267)
是严重错误.
如果我在运行之前运行nvidia-smi:
If I run an nvidia-smi before a run:
$ nvidia-smi
Tue Dec 24 14:32:49 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44 Driver Version: 440.44 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1650 Off | 00000000:01:00.0 On | N/A |
| N/A 47C P8 5W / N/A | 296MiB / 3903MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1190 G /usr/lib/xorg/Xorg 142MiB |
| 0 1830 G /usr/bin/gnome-shell 72MiB |
| 0 3711 G ...uest-channel-token=14371934934688572948 78MiB |
+-----------------------------------------------------------------------------+
很明显,有足够的GPU内存可用(此GPU为4GB).
It seems pretty clear there is plenty of GPU memory available (this GPU is 4GB).
此外,我确信我的CUDA/cuDNN安装和GPU硬件性能良好,我经常在这台计算机上训练和推断TensorFlow对象检测API,只要我使用从未有过的allow_growth
选项与GPU相关的错误.
Moreover, I'm confident my CUDA/cuDNN install and GPU hardware are good b/c I train and inference the TensorFlow object detection API on this computer frequently, and as long as I use the allow_growth
option I never have GPU related errors.
从Google搜索此错误看来,这是相对常见的情况.最常见的解决方案是:
From Googling on this error it seems to be relatively common. The most common solutions are:
1)尝试较小的批次大小(由于训练和测试的批次大小分别为2和1,并且我尝试使用1和1仍然出现相同的错误,因此在这种情况下并不适用)
1) Try a smaller batch size (not really applicable in this case since the training and testing batch sizes are 2 and 1 respectively, and I tried with 1 and 1 and still got the same error)
2)更新到最新版本的PyTorch(但我已经拥有最新版本).
2) Update to the latest version of PyTorch (but I'm already at the latest version).
其他一些建议涉及重新设计训练脚本.我对TensorFlow非常熟悉,但是我是PyTorch的新手,所以我不确定该怎么做.另外,我针对此错误可以找到的大部分返工建议均与对象检测无关,因此我无法将它们与该培训脚本专门相关.
Some other suggestions involve reworking the training script. I'm very familiar with TensorFlow but I'm new to PyTorch so I'm not sure how to go about that. Also, most of the rework suggestions I can find for this error do not pertain to object detection and therefore I'm not able to relate them to this training script specifically.
还有没有其他人将此脚本与NVIDIA GPU一起在本地运行?您是否怀疑有关OS/CUDA/PyTorch配置的问题,或者是否可以重新编写脚本以防止出现此错误?任何帮助将不胜感激.
Has anybody else gotten this script to run locally with an NVIDIA GPU? Do you suspect a OS/CUDA/PyTorch configuration concern, or is there someway the script can be reworked to prevent this error? Any assistance would be greatly appreciated.
推荐答案
很奇怪,将训练和测试批处理大小都更改为1后,它现在不会因GPU错误而崩溃.很奇怪,因为我确定我之前曾经尝试过.
Very strange, after changing both the training and testing batch size to 1, it now does not crash with a GPU error. Very strange since I'm certain I tried this before.
也许与将批次大小更改为1以便进行培训和测试有关,然后重新启动或以某种方式刷新其他内容与某件事有关?我不太确定很奇怪.
Perhaps it had something to do with changing the batch size to 1 for both training and testing, and then rebooting or somehow refreshing something else? I'm not really sure. Very odd.
现在evaluate
函数调用因错误而崩溃:
Now the evaluate
function call is crashing with the error:
object of type <class 'numpy.float64'> cannot be safely interpreted as an integer.
但是看来这是完全无关的,因此我将为此另行发表文章.
But it seems this is completely unrelated so I'll make a separate post for that.
这篇关于在Ubuntu 18.04上使用GPU进行PyTorch对象检测-RuntimeError:CUDA内存不足.尝试分配xx.xx MiB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!