我正在使用群集(类似于slurm,但使用condor),我想使用VS代码(专门用于调试器)运行代码,并且使用remote sync extension

我尝试使用VS代码中的调试器运行它,但效果并不理想。

首先,我像往常一样使用VS代码和远程同步登录到集群,并且工作正常。然后,我继续使用以下命令获得交互式作业:

condor_submit -i request_cpus=4 request_gpus=1

那么就可以成功使用节点/ GPU。

掌握了这些信息后,我将尝试运行调试器,但是它将以某种方式使我从远程 session 中注销(看起来好像它从print语句进入了头节点)。那就是而不是我想要的。我想在分配了节点/ gpu的交互式 session 中运行我的作业。为什么VS代码在错误的位置运行它?如何在正确的位置运行它?

集成终端的一些输出:
source /home/miranda9/miniconda3/envs/automl-meta-learning/bin/activate
/home/miranda9/miniconda3/envs/automl-meta-learning/bin/python /home/miranda9/.vscode-server/extensions/ms-python.python-2020.2.60897-dev/pythonFiles/lib/python/new_ptvsd/wheels/ptvsd/launcher /home/miranda9/automl-meta-learning/automl/automl/meta_optimizers/differentiable_SGD.py
conda activate base
(automl-meta-learning) miranda9~/automl-meta-learning $ source /home/miranda9/miniconda3/envs/automl-meta-learning/bin/activate
(automl-meta-learning) miranda9~/automl-meta-learning $ /home/miranda9/miniconda3/envs/automl-meta-learning/bin/python /home/miranda9/.vscode-server/extensions/ms-python.python-2020.2.60897-dev/pythonFiles/lib/python/new_ptvsd/wheels/ptvsd/launcher /home/miranda9/automl-meta-learning/automl/automl/meta_optimizers/differentiable_SGD.py
--> main in differentiable SGD
hello world torch_utils!
vision-sched.cs.illinois.edu
Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified
-> initialization of DiMO done!

---> i = 0, iteration/it 1 about to start
lp_norms(mdl) = 18.43514633178711
lp_norms(meta_optimized mdl) = 18.43514633178711
[e=0,it=1], train_loss: 2.304989814758301, train error: -1, test loss: -1, test error: -1

---> i = 1, iteration/it 2 about to start
lp_norms(mdl) = 18.470401763916016
lp_norms(meta_optimized mdl) = 18.470401763916016
[e=0,it=2], train_loss: 2.3068909645080566, train error: -1, test loss: -1, test error: -1

---> i = 2, iteration/it 3 about to start
lp_norms(mdl) = 18.548133850097656
lp_norms(meta_optimized mdl) = 18.548133850097656
[e=0,it=3], train_loss: 2.3019633293151855, train error: -1, test loss: -1, test error: -1

---> i = 0, iteration/it 1 about to start
lp_norms(mdl) = 18.65604019165039
lp_norms(meta_optimized mdl) = 18.65604019165039
[e=1,it=1], train_loss: 2.308889150619507, train error: -1, test loss: -1, test error: -1

---> i = 1, iteration/it 2 about to start
lp_norms(mdl) = 18.441967010498047
lp_norms(meta_optimized mdl) = 18.441967010498047
[e=1,it=2], train_loss: 2.300947666168213, train error: -1, test loss: -1, test error: -1

---> i = 2, iteration/it 3 about to start
lp_norms(mdl) = 18.545459747314453
lp_norms(meta_optimized mdl) = 18.545459747314453
[e=1,it=3], train_loss: 2.30662202835083, train error: -1, test loss: -1, test error: -1
-> DiMO done training!
--> Done with Main
(automl-meta-learning) miranda9~/automl-meta-learning $ conda activate base
(automl-meta-learning) miranda9~/automl-meta-learning $ hostname vision-sched.cs.illinois.edu

没有 Debug模式甚至无法运行

这个问题比我想的还要严重。我无法在交互式 session 中运行调试器,但是如果没有它自己切换到Python调试控制台,我什至无法“无需调试即可运行”。因此,这意味着我必须使用python main.py手动运行事物,但这将不允许我使用可变窗格...这是一个很大的损失!

我正在做的是将终端切换到conoder_ssh_to_job,然后单击按钮Run Without Debugging(或^F5Control + fn + f5),尽管我确保在集成窗口底部的交互式 session 中,但它自己会转到Python Debugger窗口/ pane未连接到我从集群请求的交互式 session ...

有关:
  • gitissue:https://github.com/microsoft/vscode-remote-release/issues/1722
  • quora:https://qr.ae/TqCiu8
  • reddit:https://www.reddit.com/r/vscode/comments/f1giwi/how_to_run_code_in_a_debugging_session_from_vs/
  • 最佳答案

    您可以尝试反转操作顺序。首先提交作业,获取分配给您的计算节点的名称,然后指示VSCode连接到计算节点而不是登录节点。
    所以首先是

    condor_submit -i request_cpus=4 request_gpus=1
    
    并注意计算节点的名称。在下面假设node001
    然后,在笔记本电脑上打开VSCode,单击“远程开发”扩展图标,然后选择“远程SSH:连接到主机...”。选择“+添加新的SSH主机...”。在“输入SSH命令”框中,添加以下内容:
    ssh -J vision-sched.cs.illinois.edu miranda9@node001
    
    VSCode将询问您应该更新哪个SSH配置文件。确保检查该配置:如果需要,请指定SSH密钥,用户名等。另外,请确保已在该文件中正确配置了vision-sched.cs.illinois.edu
    然后,您可以选择要连接的主机。然后,VSCode将在计算节点上执行,并在分配完成时断开连接。

    关于visual-studio - 如何使用交互式 session 从远程VS代码在调试 session 中运行代码?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/60141905/

    10-12 14:14
    查看更多