问题描述
我与用于标记句子的服务器进行交互.此服务器在端口 2020
上本地启动.
I interact with a server that I use to tag sentences. This server is launched locally on port 2020
.
例如,如果我通过下面使用的客户端在端口 2020
上发送 Je mange des pâtes .
,服务器会回答 Je_CL mange_V des_P pâtes_N ._.
,结果总是只有一行,如果我的输入不为空,总是一行.
For example, if I send Je mange des pâtes .
on port 2020
through the client used below, the server answers Je_CL mange_V des_P pâtes_N ._.
, the result is always one line only, and always one line if my input is not empty.
我目前必须通过此服务器标记 9 568 个文件.前 9 483 个文件按预期标记.之后,输入流似乎已关闭/已满/其他原因,因为当我尝试在 stdin上写入时出现
IOError
,特别是 Broken Pipe
错误代码>.
I currently have to tag 9 568 files through this server. The first 9 483 files are tagged as expected. After that, the input stream seems closed / full / something else because I get an IOError
, specifically a Broken Pipe
error when I try to write on stdin
.
当我跳过前 9 483 个第一个文件时,最后一个被标记没有任何问题,包括导致第一个错误的那个.
When I skip the first 9 483 first files, the last ones are tagged without any issue, including the one causing the first error.
我的服务器没有产生任何错误日志,表明发生了一些可疑的事情...我是否处理错误?一段时间后管道出现故障是否正常?
My server doesn't produce any error log indicating something fishy happened... Do I handle something incorrectly? Is it normal that the pipe fails after some time?
log = codecs.open('stanford-tagger.log', 'w', 'utf-8')
p1 = Popen(["java",
"-cp", JAR,
"edu.stanford.nlp.tagger.maxent.MaxentTaggerServer",
"-client",
"-port", "2020"],
stdin=PIPE,
stdout=PIPE,
stderr=log)
fhi = codecs.open(SUMMARY, 'r', 'utf-8') # a descriptor of the files to tag
for i, line in enumerate(fhi, 1):
if i % 500:
print "Tagged " + str(i) + " documents..."
tokens = ... # a list of words, can be quite long
try:
p1.stdin.write(' '.join(tokens).encode('utf-8') + '\n')
except IOError:
print 'bouh, I failed ;(('
result = p1.stdout.readline()
# Here I do something with result...
fhi.close()
推荐答案
除了我的评论之外,我可能会建议一些其他更改...
In addition to my comments, I might suggest a few other changes...
for i, line in enumerate(fhi, 1):
if i % 500:
print "Tagged " + str(i) + " documents..."
tokens = ... # a list of words, can be quite long
try:
s = ' '.join(tokens).encode('utf-8') + '\n'
assert s.find('\n') == len(s) - 1 # Make sure there's only one CR in s
p1.stdin.write(s)
p1.stdin.flush() # Block until we're sure it's been sent
except IOError:
print 'bouh, I failed ;(('
result = p1.stdout.readline()
assert result # Make sure we got something back
assert result.find('\n') == len(result) - 1 # Make sure there's only one CR in result
# Here I do something with result...
fhi.close()
...但鉴于还有一个我们一无所知的客户端/服务器,有很多地方可能会出错.
...but given there's also a client/server of which we know nothing about, there's a lot of places it could be going wrong.
如果您将所有查询转储到一个文件中,然后从命令行使用类似...的内容运行它,它会起作用吗?
Does it work if you dump all the queries into a single file, and then run it from the commandline with something like...
java .... < input > output
这篇关于在子进程 stdin.write 期间损坏的管道的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!