问题描述
我对Solr很新。我成功通过DIH索引来自我的sql数据库的数据。现在我想导入xml文件并通过DIH索引它们,但它不起作用!
我的data-config.xml如下所示:
I'm very new to Solr. I succeeded in indexing data from my sql database via DIH. Now I want to import xml files and index them also via DIH but it just won't work!My data-config.xml looks like this:
<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8" />
<document>
<entity name="dir"
processor="FileListEntityProcessor"
baseDir="/bla/test2"
fileName=".*xml"
stream="true"
recursive="false"
rootEntity="false">
<entity name="PubmedArticle"
processor="XPathEntityProcessor"
transformer="RegexTransformer"
stream="true"
forEach="/PubmedArticle"
url="${dir.fileAbsolutePath}">
<field column="journal" xpath="//Name[.='journal']/following-sibling::Value/text()" />
<field column="authors" xpath="//Name[.='authors']/following-sibling::Value/text()" />
..etc
我在schema.xml中有以下字段:
And i have the following fields in schema.xml:
< field name =journaltype =text indexed =truestored =truerequired =true/>
< field name =authorstype =textindexed =truestored =truerequired =true/>
当我运行Solr时,我没有错误,也没有索引文档:
When i run Solr i get no errors and no document is indexed:
<str name="Total **Rows Fetched**">**2000**</str>
<str name="Total **Documents Skipped**">**0**</str>
<str name="Full Dump Started">2012-02-01 14:59:17</str>
<str name="">Indexing completed. **Added/Updated: 0 documents.** Deleted 0 documents.
谁能告诉我我做错了什么?!我甚至仔细检查了路径语法...
Can anyone tell me what i did wrong?! I have even double checked the path syntax...
推荐答案
我最近在尝试同样的事情时遇到了同样的问题;即,当使用 FileListEntityProcessor (读取多个本地.xml文件)和 XPathEntityProcessor (以获取某些XML元素)时。
I recently encountered the same problem when trying the same thing; i.e., when using FileListEntityProcessor (to read multiple local .xml files) and XPathEntityProcessor (to grab certain XML elements).
根本原因:在此行中:
<field column="journal" xpath="//Name[.='journal']/following-sibling::Value/text()" />
解释:xpath属性的参数(//名称。 ..)虽然有效的xpath语法,但Solr不支持。 Apache Solr 4.4参考指南简单地说:
XPath表达式,它将从该字段的记录中提取内容。仅支持Xpath语法的子集。
Explanation: the argument for the xpath attribute ("//Name..."), while valid xpath syntax, is NOT supported by Solr. The "Apache Solr 4.4 Reference Guide" simply says:The XPath expression which will extract the content from the record for this field. Only a subset of Xpath syntax is supported.
解决方案:将xpath的参数更改为文档的完整路径root:
Solution: Change the argument for xpath to be the full path from the document root:
<field column="journal" xpath="/full/path/from/root/of/document/Name[.='journal']/following-sibling::Value/text()" />
这篇关于Solr DataImportHandler不适用于XML文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!