本文介绍了并行化 svn 导致客户端冻结的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个程序来并行运行 svn up,它导致机器死机.发生这种情况时,服务器没有遇到任何负载问题.

I'm writing a program to run svn up in parallel and it is causing the machine to freeze. The server is not experiencing any load issues when this happens.

命令使用 ThreadPool.map() 运行到 subprocess.Popen() 上:

The commands are run using ThreadPool.map() onto subprocess.Popen():

def cmd2args(cmd):
    if isinstance(cmd, basestring):
        return cmd if sys.platform == 'win32' else shlex.split(cmd)
    return cmd

def logrun(cmd):
    popen = subprocess.Popen(cmd2args(cmd),
                             stdout=subprocess.PIPE,
                             stderr=subprocess.STDOUT,
                             cwd=curdir,
                             shell=sys.platform == 'win32')
    for line in iter(popen.stdout.readline, ""):
        sys.stdout.write(line)
        sys.stdout.flush()

...
pool = multiprocessing.pool.ThreadPool(argv.jobcount)
pool.map(logrun, _commands)

argv.jobcountmultiprocessing.cpu_count() 和要运行的作业数(在本例中为 4)中的较小者._commands 是带有下面列出的命令的字符串列表.shell 在 Windows 上设置为 True 以便 shell 可以找到可执行文件,因为 Windows 没有 which 命令并且找到可执行文件是一个在 Windows 上稍微复杂一些(以前的命令是 cd directory&&svn up .. 的形式,它也需要 shell=True 但现在用cwd 参数).

argv.jobcount is the lesser of multiprocessing.cpu_count() and the number of jobs to run (in this case it is 4). _commands is a list of strings with the commands listed below. shell is set to True on Windows so the shell can find the executables since Windows doesn't have a which command and finding an executable is a bit more complex on Windows (the commands used to be of the form cd directory&&svn up .. which also requires shell=True but that is now done with the cwd parameter instead).

正在运行的命令是

  svn up w:/srv/lib/dktabular
  svn up w:/srv/lib/dkmath
  svn up w:/srv/lib/dkforms
  svn up w:/srv/lib/dkorm

其中每个文件夹都是一个单独的项目/存储库,但存在于同一个 Subversion 服务器上.svn 可执行文件是与 TortoiseSVN 1.8.8(构建 25755 - 64 位)一起打包的可执行文件.代码是最新的(即 svn up 是空操作).

where each folder is a separate project/repository, but existing on the same Subversion server. The svn executable is the one packaged with TortoiseSVN 1.8.8 (build 25755 - 64 Bit). The code is up-to-date (i.e. svn up is a no-op).

当客户端冻结时,任务管理器中的内存条首先变为空白:

When the client freezes, the memory bar in Task Manager first goes blank:

有时一切都会变暗

如果我等待一段时间(几分钟),机器最终会回来.

If I wait for a while (several minutes) the machine eventually comes back.

问题 1:并行调用 svn 是否合乎情理?

Q1: Is it copacetic to invoke svn in parallel?

问题 2:我使用 ThreadPool.map()subprocess.Popen() 的方式有什么问题吗?

Q2: Are there any issues with how I'm using ThreadPool.map() and subprocess.Popen()?

Q3:是否有任何工具/策略可以调试此类问题?

Q3: Are there any tools/strategies for debugging these kinds of issues?

推荐答案

我会尽我所能彻底回答所有三个问题,我欢迎更正我的陈述.

I will do the best that I can to answer all three questions thoroughly, and I welcome corrections to my statements.

问题 1:并行调用 svn 是否合乎情理?

Copogenic,这是有待决定的,但我想说它既不推荐也不不推荐.根据该声明,源代码控制工具具有需要进程和块级(最佳猜测)锁定的特定功能.校验和、文件传输和文件读取/写入需要锁定才能正确处理,否则您会面临重复工作和文件争用的风险,这将导致处理失败.

Copacetic, that is up for determination, but I would say that it's neither recommended nor unrecommended. With that statement, source control tools have specific functionality that requires process and block-level (best guess) locking. The checksumming, file transfers, and file reads/writes require locking in order to process correctly or you risk both duplicating effort and file contention, which will lead to process failures.

问题 2:我使用 ThreadPool.map()subprocess.Popen() 的方式有什么问题吗?

虽然我不知道 subprocess.Popen() 的绝对细节,因为我最后一次在 2.6 中使用它,但我可以稍微谈谈可编程性.您在创建的代码中所做的是创建一个特定子进程的池,而不是直接调用进程.现在,我对 ThreadPool() 的理解是,默认情况下它不执行锁定.这可能会导致 subprocess.Popen() 出现问题,我不确定.关于我上面的回答,锁定是需要实施的.我建议查看 https://stackoverflow.com/a/3044626/2666240 以更好地了解两者之间的差异线程和池,因为我建议使用线程而不是多处理.由于需要锁定的源代码控制应用程序的性质,如果您要在处理锁定的同时并行化操作,您还需要能够同步线程,以便不会重复工作.几个月前,我使用多处理在 Linux 上运行了一个测试,我注意到 grep 正在重复全局搜索.我会看看我是否能找到我写的代码并粘贴它.通过线程同步,我希望 Python 能够以 svn 能够理解的方式在线程之间传递 svn 线程状态,从而不会发生进程重复.话虽如此,我不知道 svn 在这方面是如何工作的,所以我只是在推测/做出最好的猜测.由于 svn 可能使用相当复杂的锁定方法(我会断言块级锁定而不是 inode 锁定,但再一次,最好的猜测),实现信号量锁定而不是 lock()Rlock().也就是说,您将不得不通过并测试各种锁定和同步方法来找出最适合 svn 的方法.在线程同步方面,这是一个很好的资源:http://effbot.org/zone/线程同步.htm

While I don't know the absolute specifics on subprocess.Popen() as I was using it last in 2.6, I can speak about the programmability a bit. What you are doing in the code you creating is creating a pool of one specific subprocess, instead of calling the processes directly. Now off the top of my head, and with my understanding of ThreadPool() is that it does not perform locking by default. This may cause issues with subprocess.Popen(), I'm not sure. Regarding my answer above, locking is something that will need to be implemented. I would recommend looking at https://stackoverflow.com/a/3044626/2666240 for a better understanding of the differences between threading and pooling as I would recommend using threading instead of mutliprocessing. With the nature of source control applications requiring locking, if you are going to parallelise operations while handling locking, you will also need to be able to synchronise the threads so that work is not duplicated. I ran a test a few months back on Linux with multiprocessing, and I noticed that grep was repeating the global search. I'll see if I can find the code I wrote and paste it. With thread synchronisation, I would hope that Python would be able to pass the svn thread status between threads in a way that svn is able to understand so that process duplication is not occuring. That being said, I don't know how svn works under the hood from that aspect, so I am only speculating/making a best guess. As svn is likely using a fairly complicated locking method (I would assert block-level locking and not inode locking but once again, best guess), it would likely make sense to implement semaphore locking instead of lock() or Rlock(). That said, you will have to go through and test various locking and synchronisation methods to figure out which works best for svn. This is a good resource when it comes to thread synchronisation: http://effbot.org/zone/thread-synchronization.htm

Q3:是否有任何工具/策略可以调试此类问题?

当然,线程和多处理都应该具有可以与日志记录结合使用的日志记录功能.我只想登录到一个文件,以便您可以参考某些内容,而不仅仅是控制台输出.从理论上讲,您应该能够只使用 logging.debug(pool.map(logrun, _commands)) 并记录所采取的进程.话虽如此,我不是线程或多处理方面的日志记录专家,所以其他人可能会比我更好地回答这个问题.

Sure, threading and multiprocessing should both have logging functionality that you can utilise in conjunction with logging. I would just log to a file so that you can have something to reference instead of just console output. You should, in theory, be able to just use logging.debug(pool.map(logrun, _commands)) and that would log the processes taken. That being said, I'm not a logging expert with threading or multiprocessing, so someone else can likely answer that better than I.

您使用的是 Python 2.x 还是 3.x?

Are you using Python 2.x or 3.x?

这篇关于并行化 svn 导致客户端冻结的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 17:58