if __name__ ==''__ main__'': tests = [ ''hello\\\goodbye \ nmy fish \ n'', ''hello \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ fish \''', ''hello\rgoodbye \ n'', '''', '' \\\\\\'n', ''\ n \ n \\\\\\\ n br /> ''\ n \ nn \\ r \\\ n', ''\ n \\\\\\\\\''', ] 参加测试: print repr(entry) print repr(find_ending(entry)) 打印 一切顺利, Fuzzyman http://www.voidspace.org.uk/python/index.shtml Sybren - 世界的问题是愚蠢。不是说应该对愚蠢的死刑进行处罚,但为什么我们不把所有的安全标签都拿走,让问题自行解决? Frank Zappa Hello all,I''m trying to detect line endings used in text files. I *might* bedecoding the files into unicode first (which may be encoded usingmulti-byte encodings) - which is why I''m not letting Python handle theline endings.Is the following safe and sane :text = open(''test.txt'', ''rb'').read()if encoding:text = text.decode(encoding)ending = ''\n'' # defaultif ''\r\n'' in text:text = text.replace(''\r\n'', ''\n'')ending = ''\r\n''elif ''\n'' in text:ending = ''\n''elif ''\r'' in text:text = text.replace(''\r'', ''\n'')ending = ''\r''My worry is that if ''\n'' *doesn''t* signify a line break on the Mac,then it may exist in the body of the text - and trigger ``ending =''\n''`` prematurely ?All the best,Fuzzyman http://www.voidspace.org.uk/python/index.shtml 解决方案I''d count the number of occurences of ''\r\n'', ''\n'' without a preceding''\r'' and ''\r'' without following ''\n'', and let the majority decide.Sybren--The problem with the world is stupidity. Not saying there should be acapital punishment for stupidity, but why don''t we just take thesafety labels off of everything and let the problem solve itself?Frank ZappaSounds reasonable, edge cases for small files be damned. :-)Fuzzyman http://www.voidspace.org.uk/python/index.shtml Sybren -- The problem with the world is stupidity. Not saying there should be a capital punishment for stupidity, but why don''t we just take the safety labels off of everything and let the problem solve itself? Frank ZappaThis is what I came up with. As you can see from the docstring, itattempts to sensible(-ish) things in the event of a tie, or no lineendings at all.Comments/corrections welcomed. I know the tests aren''t very useful(because they make no *assertions* they won''t tell you if it breaks),but you can see what''s going on :import reimport osrn = re.compile(''\r\n'')r = re.compile(''\r(?!\n)'')n = re.compile(''(?<!\r)\n'')# Sequence of (regex, literal, priority) for each line endingline_ending = [(n, ''\n'', 3), (rn, ''\r\n'', 2), (r, ''\r'', 1)]def find_ending(text, default=os.linesep):"""Given a piece of text, use a simple heuristic to determine the lineending in use.Returns the value assigned to default if no line endings are found.This defaults to ``os.linesep``, the native line ending for themachine.If there is a tie between two endings, the priority chain is``''\n'', ''\r\n'', ''\r''``."""results = [(len(exp.findall(text)), priority, literal) forexp, literal, priority in line_ending]results.sort()print resultsif not sum([m[0] for m in results]):return defaultelse:return results[-1][-1]if __name__ == ''__main__'':tests = [''hello\ngoodbye\nmy fish\n'',''hello\r\ngoodbye\r\nmy fish\r\n'',''hello\rgoodbye\rmy fish\r'',''hello\rgoodbye\n'','''',''\r\r\r \n\n'',''\n\n \r\n\r\n'',''\n\n\r \r\r\n'',''\n\r \n\r \n\r'',]for entry in tests:print repr(entry)print repr(find_ending(entry))printAll the best,Fuzzyman http://www.voidspace.org.uk/python/index.shtml Sybren -- The problem with the world is stupidity. Not saying there should be a capital punishment for stupidity, but why don''t we just take the safety labels off of everything and let the problem solve itself? Frank Zappa 这篇关于检测行结尾的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-16 07:59