尝试在this上训练this dataset时遇到以下错误。
由于这是本文中发布的配置,因此我假设我做错了非常大的事情。
每次我尝试进行训练时,此错误都会在不同的图像上出现。
C:/w/1/s/windows/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: block: [0,0,0], thread: [6,0,0] Assertion `t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.1.1\helpers\pydev\pydevd.py", line 1741, in <module>
main()
File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.1.1\helpers\pydev\pydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.1.1\helpers\pydev\pydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.1.1\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Noam/Code/vision_course/hopenet/deep-head-pose/code/original_code_augmented/train_hopenet_with_validation_holdout.py", line 187, in <module>
loss_reg_yaw = reg_criterion(yaw_predicted, label_yaw_cont)
File "C:\Noam\Code\vision_course\hopenet\venv\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "C:\Noam\Code\vision_course\hopenet\venv\lib\site-packages\torch\nn\modules\loss.py", line 431, in forward
return F.mse_loss(input, target, reduction=self.reduction)
File "C:\Noam\Code\vision_course\hopenet\venv\lib\site-packages\torch\nn\functional.py", line 2204, in mse_loss
ret = torch._C._nn.mse_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
RuntimeError: reduce failed to synchronize: cudaErrorAssert: device-side assert triggered
有任何想法吗?
最佳答案
当使用NLLLoss
或CrossEntropyLoss
时,并且当您的数据集具有负标签(或标签数量大于类数)时,通常会发生这种错误。这也是您使断言t >= 0 && t < n_classes
失败的确切错误。
对于MSELoss
不会发生这种情况,但是OP提到某个地方有一个CrossEntropyLoss
,因此会发生错误(程序在另一行异步崩溃)。解决方案是清理数据集并确保满足t >= 0 && t < n_classes
(其中t
表示标签)。
另外,如果您使用NLLLoss
或BCELoss
(然后分别需要softmax
或sigmoid
激活),请确保网络输出在0到1的范围内。请注意,对于CrossEntropyLoss
或BCEWithLogitsLoss
,这不是必需的,因为它们在loss函数内部实现了激活函数。 (感谢@PouyaB指出)。