问题描述
我有一个XML文档通过一个套接字传入,我需要对其进行解析并即时做出响应(即解析部分树).我想要的是一种非阻塞的方法,这样我就可以在等待更多数据输入的同时做其他事情(没有线程).
I have an XML document coming in over a socket that I need to parse and react to on the fly (ie parsing a partial tree). What I'd like is a non blocking method of doing so, so that I can do other things while waiting for more data to come in (without threading).
如果iterparse在读取缓冲区为空时完成迭代,则是理想的选择,例如:
Something like iterparse would be ideal if it finished iterating when the read buffer was empty, eg:
context = iterparse(imaginary_socket_file_wrapper)
while 1:
for event, elem in context:
process_elem(elem)
# iteration of context finishes when socket has no more data
do_other_stuff()
time.sleep(0.1)
我想SAX也是一种选择,但iterparse似乎更符合我的需求.有什么想法吗?
I guess SAX would also be an option, but iterparse just seems simpler for my needs. Any ideas?
更新:
使用线程很好,但是引入了我希望避开的某种程度的复杂性.我认为非阻塞调用将是一种很好的方法,但是我发现它增加了解析XML的复杂性.
Using threads is fine, but introduces a level of complexity that I was hoping to sidestep. I thought that non-blocking calls would be a good way to do so, but I'm finding that it increases the complexity of parsing the XML.
推荐答案
深入iterparse源为我提供了解决方案.这是一个简单的示例,该示例动态构建XML树并在其close标签之后处理元素:
Diving into the iterparse source provided the solution for me. Here's a simple example of building an XML tree on the fly and processing elements after their close tags:
import xml.etree.ElementTree as etree
parser = etree.XMLTreeBuilder()
def end_tag_event(tag):
node = self.parser._end(tag)
print node
parser._parser.EndElementHandler = end_tag_event
def data_received(data):
parser.feed(data)
在我的情况下,我最终从扭曲状态馈送了数据,但它也应与非阻塞套接字一起工作.
In my case I ended up feeding it data from twisted, but it should work with a non-blocking socket also.
这篇关于用于在Python中解析(流式传输)XML的非阻塞方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!