python - 正则表达式解析CSS选择器

我想解析这个CSS选择器（和其他类似形式的）：
div.class1#myid.class2[key=value]

并使其匹配“ .class1”和“ .class2”，但我不知道要使用什么正则表达式。

例如：http://www.rubular.com/r/3dxpzyJLeK

在理想的世界中，我还想提取以下内容：

类型（即div）
类别（即类别列表）
id（即myid）
键（即键）
运算子（即=）
值（即值）

但我无法掌握基础知识！

任何帮助将不胜感激:)

谢谢！

最佳答案

非常感谢您的建议和帮助。我将它们全部绑定到以下两个正则表达式模式中：

这个解析CSS选择器字符串（例如div＃myid.myclass [attr = 1，fred = 3]）http://www.rubular.com/r/2L0N5iWPEJ

cssSelector = re.compile(r'^(?P<type>[\*|\w|\-]+)?(?P<id>#[\w|\-]+)?(?P<classes>\.[\w|\-|\.]+)*(?P<data>\[.+\])*$')

>>> cssSelector.match("table#john.test.test2[hello]").groups()
('table', '#john', '.test.test2', '[hello]')
>>> cssSelector.match("table").groups()
('table', None, None, None)
>>> cssSelector.match("table#john").groups()
('table', '#john', None, None)
>>> cssSelector.match("table.test.test2[hello]").groups()
('table', None, '.test.test2', '[hello]')
>>> cssSelector.match("table#john.test.test2").groups()
('table', '#john', '.test.test2', None)
>>> cssSelector.match("*#john.test.test2[hello]").groups()
('*', '#john', '.test.test2', '[hello]')
>>> cssSelector.match("*").groups()
('*', None, None, None)

而这个会做属性（例如[link，key〜= value]）http://www.rubular.com/r/2L0N5iWPEJ：

attribSelector = re.compile(r'(?P<word>\w+)\s*(?P<operator>[^\w\,]{0,2})\s*(?P<value>\w+)?\s*[\,|\]]')

>>> a = attribSelector.findall("[link, ds9 != test, bsdfsdf]")
>>> for x in a: print x
('link', '', '')
('ds9', '!=', 'test')
('bsdfsdf', '', '')

需要注意的几件事：
1）这使用逗号分隔来解析属性（因为我没有使用严格的CSS）。
2）这要求模式采用以下格式：标签，ID，类，属性

第一个正则表达式执行标记，因此空格和'>'分隔了选择器字符串的各个部分。这是因为我想用它来检查自己的对象图:)

再次感谢！

关于python - 正则表达式解析CSS选择器，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/11172600/