用于在Python中解析

用于在Python中解析

本文介绍了用于在Python中解析(流式传输)XML的非阻塞方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个XML文档通过一个套接字传入,我需要对其进行解析并即时做出响应(即解析部分树).我想要的是一种非阻塞的方法,这样我就可以在等待更多数据输入的同时做其他事情(没有线程).

I have an XML document coming in over a socket that I need to parse and react to on the fly (ie parsing a partial tree). What I'd like is a non blocking method of doing so, so that I can do other things while waiting for more data to come in (without threading).

如果iterparse在读取缓冲区为空时完成迭代,则是理想的选择,例如:

Something like iterparse would be ideal if it finished iterating when the read buffer was empty, eg:

context = iterparse(imaginary_socket_file_wrapper)
while 1:
    for event, elem in context:
        process_elem(elem)
    # iteration of context finishes when socket has no more data
    do_other_stuff()
    time.sleep(0.1)

我想SAX也是一种选择,但iterparse似乎更符合我的需求.有什么想法吗?

I guess SAX would also be an option, but iterparse just seems simpler for my needs. Any ideas?

更新:

使用线程很好,但是引入了我希望避开的某种程度的复杂性.我认为非阻塞调用将是一种很好的方法,但是我发现它增加了解析XML的复杂性.

Using threads is fine, but introduces a level of complexity that I was hoping to sidestep. I thought that non-blocking calls would be a good way to do so, but I'm finding that it increases the complexity of parsing the XML.

推荐答案

深入iterparse源为我提供了解决方案.这是一个简单的示例,该示例动态构建XML树并在其close标签之后处理元素:

Diving into the iterparse source provided the solution for me. Here's a simple example of building an XML tree on the fly and processing elements after their close tags:

import xml.etree.ElementTree as etree

parser = etree.XMLTreeBuilder()

def end_tag_event(tag):
    node = self.parser._end(tag)
    print node

parser._parser.EndElementHandler = end_tag_event

def data_received(data):
    parser.feed(data)

在我的情况下,我最终从扭曲状态馈送了数据,但它也应与非阻塞套接字一起工作.

In my case I ended up feeding it data from twisted, but it should work with a non-blocking socket also.

这篇关于用于在Python中解析(流式传输)XML的非阻塞方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-01 19:28