两个线程之间的模糊锁定似乎与全局解释器锁定或其他“幕后锁定”有关,我不知该如何继续进行故障排除。任何消除锁定的技巧将不胜感激。

此问题在较大的代码集中再现(错误且有些随机)。该代码严格是python。 Python版本是2.6.5(在Linux上)。数小时的故障排除将锁定发生的问题减少到以下情况:

  • 该程序只有两个运行线程
  • 线程并发调用受单个线程保护的方法。
  • 线程1已通过Acquire()获得了锁[加上一些其他锁]
  • 线程2调用了generate()并确认正在等待锁
  • 线程1能够使用print()进行打印到控制台,但是它被一个简单的非阻塞库调用
  • 锁定了

    #5中的令人反感的调用是unicode.encode函数,该函数应该是非阻塞的。线程1中线程锁定位置的以下代码(按预期方式)将打印“A”和“B”:
    print('A')
    print('B')
    

    但是,以下代码将仅打印“A”并阻塞线程:
    print('A')
    u'hello'.encode('utf8') # This dummy (non-blocking) call locks up Thread 1
    print('B')
    

    这对我完全没有意义。两个线程之间不存在逻辑死锁条件。线程1被非阻塞库调用阻塞,该调用不会以任何方式干扰线程2,而线程2只是默默地等待获取RLock。我能想到的导致线程1被阻塞的唯一原因是它正在等待GIL。

    是否有任何想法如何进一步解决此问题,或者有任何机制以某种方式控制或操纵GIL操作作为解决方法?

    编辑:一些其他信息,以应对samplebias(并感谢您的答复)。我遇到了跟​​踪问题,因为该问题似乎对任何可能干扰两个线程之间的时间安排的事情都非常敏感。但是,仅使用-f选项运行strace,经过几次迭代后,我得以获得跟踪。

    线程1包含以下三个调试语句,这些语句应在控制台上打印两行“CHECK_IN”和“CHECK_TEST”:
    print('CHECK IN')#DEBUG
    u'hello'.encode('utf8')
    print('CHECK TEST')#DEBUG
    

    这是strace的最后一页:
    8605  mmap2(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb753d000
    8605  mmap2(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0xb6d3c000
    8605  mprotect(0xb6d3c000, 4096, PROT_NONE) = 0
    8605  clone(child_stack=0xb753c494, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0xb753cbd8, {entry_number:6, base_addr:0xb753cb70, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xb753cbd8) = 8606
    8606  set_robust_list(0xb753cbe0, 0xc <unfinished ...>
    8605  futex(0xa239138, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
    8606  <... set_robust_list resumed> )   = 0
    8606  futex(0xa239138, FUTEX_WAKE_PRIVATE, 1) = 1
    8605  <... futex resumed> )             = 0
    8606  gettimeofday( <unfinished ...>
    8605  futex(0xa272398, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
    8606  <... gettimeofday resumed> {1301528807, 326496}, NULL) = 0
    8606  futex(0xa272398, FUTEX_WAKE_PRIVATE, 1) = 1
    8605  <... futex resumed> )             = 0
    8606  futex(0xa272398, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
    8605  gettimeofday( <unfinished ...>
    8606  <... futex resumed> )             = 0
    8605  <... gettimeofday resumed> {1301528807, 326821}, NULL) = 0
    8606  futex(0xa272398, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
    8605  futex(0xa272398, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
    8606  <... futex resumed> )             = -1 EAGAIN (Resource temporarily unavailable)
    8605  <... futex resumed> )             = 0
    8606  gettimeofday( <unfinished ...>
    8605  futex(0xa272398, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
    8606  <... gettimeofday resumed> {1301528807, 326908}, NULL) = 0
    8606  futex(0xa272398, FUTEX_WAKE_PRIVATE, 1) = 1
    8605  <... futex resumed> )             = 0
    8606  futex(0xa272398, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
    8605  futex(0xa1b0d70, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
    8606  <... futex resumed> )             = 0
    8606  stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2225, ...}) = 0
    8606  fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
    8606  mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb6d3b000
    8606  write(1, "CHECK IN\n", 9)         = 9
    8606  futex(0xa115270, FUTEX_WAIT_PRIVATE, 0, NULL
    

    在程序锁定之前,三行代码的输出如下:
    CHECK IN
    

    因此,该跟踪显示了线程1(#8606)如何写入“CHECK_IN”字符串,并且在到达unicode.encode调用时进入了等待状态,该状态永远不会返回。

    顺便说一下,我将在所有模块中进行几个将来的导入,以保持一些新的python约定...
    from __future__ import print_function, unicode_literals
    

    ...但是我看不到它们应该有什么区别-特别是u'hello'字符串作为unicode字符串被显式调用。

    最佳答案

    我在Python源代码中找不到任何会导致unicode.encode()阻塞的内容,而我编写的试图重现该虚拟程序的运行符合预期。您提到线程1已获取了1个以上的锁-您是否已消除这些锁作为锁定源?

    下面的测试用例在您的环境中是否显示相同的锁定?

    import time
    import threading
    
    def worker(tid):
        _lock.acquire()
        if not tid:
            # wait for rest of threads to enter acquire
            time.sleep(0.5)
        print('%d: A' % tid)
        u'hello'.encode('utf-8')
        print('%d: B' % tid)
        _lock.release()
    
    def start(tid):
        th = threading.Thread(target=worker, args=(tid,))
        th.start()
        return th
    
    _num = 2
    _lock = threading.RLock()
    workers = [start(n) for n in range(_num)]
    while all(w.isAlive() for w in workers):
        time.sleep(1)
    

    输出:
    0: A
    0: B
    1: A
    1: B
    

    您也可以在程序上运行strace来确定进程被阻止的位置。例如,上面的脚本:
    % strace -fTr -o trace.out python lockup.py
    
    -o trace.out标志告诉strace将输出写入文件。您可以忽略它,并且strace将打印到stderr。
    trace.out的内容应向您显示该程序进行的所有系统调用,每行均以线程ID和系统调用之间的相对时间为前缀。该行的末尾将花费该系统调用中的时间。我已经用相应的Python代码注释了最后几个系统调用:
    # thread 0 time.sleep(0.5) completes
    24778      0.500124 <... select resumed> ) = 0 (Timeout) <0.500599>
    # preparing to print()
    24778      0.000071 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 <0.000017>
    24778      0.000058 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fe90a6000 <0.000018>
    # print("0: A\n")..
    24778      0.000079 write(1, "0: A\n", 5) = 5 <0.000023>
    24778      0.000106 write(1, "0: B\n", 5) = 5 <0.000056>
    # thread 0 _lock.release()
    24778      0.000114 futex(0xe0f3c0, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000024>
    24778      0.000108 madvise(0x7f8fe7266000, 8368128, MADV_DONTNEED) = 0 <0.000030>
    # thread 0 exit
    24778      0.000072 _exit(0)            = ?
    # thread 1 _lock.acquire()
    24779      0.000050 <... futex resumed> ) = 0 <0.500774>
    # thread 1 print("1: A\n") and so on..
    24779      0.000052 write(1, "1: A\n", 5) = 5 <0.000026>
    24779      0.000086 write(1, "1: B\n", 5) = 5 <0.000026>
    24779      0.000099 madvise(0x7f8fe6a65000, 8368128, MADV_DONTNEED) = 0 <0.000024>
    24779      0.000064 _exit(0)            = ?
    24777      0.499956 <... select resumed> ) = 0 (Timeout) <1.001138>
    24777      0.000132 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f8fe8c7c8f0}, {0x4d9a90, [], SA_RESTORER, 0x7f8fe8c7c8f0}, 8) = 0 <0.000025>
    # main thread process exit
    24777      0.002349 exit_group(0)       = ?
    

    关于python - 如何解决/绕过与GIL相关的锁定问题,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/5488257/

    10-14 10:35
    查看更多