Here's a start for you, it will parse the data into a ParseResults data structure, which you can then walk and create a parser for the defined doctype:from pyparsing import *LT,GT,EXCLAM,LBRACK,RBRACK,LPAR,RPAR = map(Suppress,"<>![]()")DOCTYPE = Keyword("DOCTYPE").suppress()ELEMENT = Keyword("ELEMENT").suppress()ident = Word(alphas, alphanums+"_")elementRef = Group(ident("name") + Optional(oneOf("* +")("rep")))elementExpr = infixNotation(elementRef, [ (',', 2, opAssoc.LEFT), ('|', 2, opAssoc.LEFT), ])PCDATA = Literal(r"\#PCDATA")elementDefn = Group(LT+EXCLAM + ELEMENT + ident("name") + LPAR + (elementExpr | PCDATA("PCDATA"))("contents") + RPAR + GT)doctypeDefn = LT+EXCLAM + DOCTYPE + ident("name") + LBRACK + ZeroOrMore(elementDefn)("elements") + RBRACK + GT我开始只对每个 ELEMENT 定义中的元素列表使用 delimitedList,但后来我注意到 ',' 和 '|'实际上是运算符,而不仅仅是分隔符,甚至可以混合使用,如A、B、C|D、E".所以我使用了 pyparsing 的 infixNotation 助手来允许这些类型的定义.I had started to just use a delimitedList for the list of elements in each ELEMENT definition, but then I noticed that ',' and '|' are actually operators, not just delimiters, and could even be mixed, as in "A,B,C|D,E". So I used pyparsing's infixNotation helper to allow these kinds of definitions.使用您的输入示例,我可以解析并显示结果:With your input sample, I can parse and display the results with:doctype = doctypeDefn.parseString(sample)print doctype.dump()for elem in doctype.elements: print elem.dump()给予:['PcSpecs', ['PCS', ['PC', '*']], ['PC', [['MODEL'], ...- elements: [['PCS', ['PC', '*']], ['PC', [['MODEL'], ...- name: PcSpecs['PCS', ['PC', '*']]- contents: ['PC', '*'] - name: PC - rep: *- name: PCS['PC', [['MODEL'], ',', ['PRICE'], ',', ['PROCESSOR'], ',', ['RAM'], ',', ['DISK', '+']]]- contents: [['MODEL'], ',', ['PRICE'], ',', ['PROCESSOR'], ',', ['RAM'], ',', ['DISK', '+']]- name: PC['MODEL', '\\#PCDATA']- PCDATA: \#PCDATA- contents: \#PCDATA- name: MODEL['PRICE', '\\#PCDATA']- PCDATA: \#PCDATA- contents: \#PCDATA- name: PRICE['PROCESSOR', [['MANF'], ',', ['MODEL'], ',', ['SPEED']]]- contents: [['MANF'], ',', ['MODEL'], ',', ['SPEED']]- name: PROCESSOR['MANF', '\\#PCDATA']- PCDATA: \#PCDATA- contents: \#PCDATA- name: MANF['MODEL', '\\#PCDATA']- PCDATA: \#PCDATA- contents: \#PCDATA- name: MODEL['SPEED', '\\#PCDATA']- PCDATA: \#PCDATA- contents: \#PCDATA- name: SPEED['RAM', '\\#PCDATA']- PCDATA: \#PCDATA- contents: \#PCDATA- name: RAM['DISK', [['HARDDISK'], '|', ['CD'], '|', ['DVD']]]- contents: [['HARDDISK'], '|', ['CD'], '|', ['DVD']]- name: DISK['HARDDISK', [['MANF'], ',', ['MODEL'], ',', ['SIZE']]]- contents: [['MANF'], ',', ['MODEL'], ',', ['SIZE']]- name: HARDDISK['SIZE', '\\#PCDATA']- PCDATA: \#PCDATA- contents: \#PCDATA- name: SIZE['CD', ['SPEED']]- contents: ['SPEED'] - name: SPEED- name: CD['DVD', ['SPEED']]- contents: ['SPEED'] - name: SPEED- name: DVD 这篇关于xml DTD 文件的解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
06-13 05:04