问题描述
任何人都可以给我一个Java库,允许我通过html页面执行XPath查询吗?我尝试使用JAXP,但它一直给我一个奇怪的错误,我似乎无法修复(线程主java.io.IOException:服务器返回的HTTP响应代码: 503 for URL:)。非常感谢。
编辑
我发现这个:
//创建一个新的SAX解析器工厂
SAXParserFactory factory = SAXParserFactory.newInstance();
//打开验证
factory.setValidating(true);
//创建一个验证的SAX解析器实例
SAXParser parser = factory.newSAXParser();
//创建一个新的DOM Document Builder工厂
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
//打开验证
factory.setValidating(true);
//创建一个验证的DOM解析器
DocumentBuilder builder = factory.newDocumentBuilder();
from 但是,将argumrent变成false并没有改变任何东西。
将解析器设置为非验证只会关闭验证;它确实不会禁止获取DTD。获取DTD不仅需要进行验证,还需要扩展实体......据我所知。
如果您想取消DTD的提取,您需要向 DocumentBuilderFactory
或 DocumentBuilder
注册适当的 EntityResolver
。实现 EntityResolver
的 resolveEntity
方法总是返回一个空字符串。
Can anyone advise me a library for Java that allows me to perform an XPath Query over an html page?
I tried using JAXP but it keeps giving me a strange error that I cannot seem to fix (thread "main" java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd).
Thank you very much.
EDIT
I found this:
// Create a new SAX Parser factory
SAXParserFactory factory = SAXParserFactory.newInstance();
// Turn on validation
factory.setValidating(true);
// Create a validating SAX parser instance
SAXParser parser = factory.newSAXParser();
// Create a new DOM Document Builder factory
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// Turn on validation
factory.setValidating(true);
// Create a validating DOM parser
DocumentBuilder builder = factory.newDocumentBuilder();
from http://www.ibm.com/developerworks/xml/library/x-jaxpval.html But turning the argumrent to false did not change anything.
Setting the parser to "non validating" just turns off validation; it does not inhibit fetching of DTD's. Fetching of DTD is needed not just for validation, but also for entity expansion... as far as I recall.
If you want to suppress fetching of DTD's, you need to register a proper EntityResolver
to the DocumentBuilderFactory
or DocumentBuilder
. Implement the EntityResolver
's resolveEntity
method to always return an empty string.
这篇关于用Java中的XPath查询HTML页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!