问题描述
我正在尝试将Words中的rss数据加载到MarkLogic数据库中.数据的形式如下:
I'm trying to load rss data from Wordpress into MarkLogic database. The data is in the form of following:
<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0"
xmlns:excerpt="http://wordpress.org/export/1.2/excerpt/"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:wp="http://wordpress.org/export/1.2/">
<item>
<wp:post_id>1</wp:post_id>
<wp:post_title>title 1</wp:post_title>
<dc:creator>bob</dc:creator>
</item>
<item>
<wp:post_id>2</title>
<wp:post_title>title 1</wp:post_title>
<dc:creator>john</dc:creator>
</item>
</rss>
但是,当我运行mlcp命令时,会收到以下警告,并且数据未插入数据库中:
However, when I run the mlcp command, I get following warning and data is not inserted into the database:
WARN mapreduce.ContentWriter: XDMP-DOCNONSBIND: No namespace binding for prefix wp
WARN mapreduce.ContentWriter: XDMP-DOCNONSBIND: No namespace binding for prefix dc
我使用的mlcp命令是:
The mlcp command I used is:
./mlcp.sh import -host localhost -port 8088 -username admin -password admin -input_file_path data.xml -mode local -input_file_type aggregates -aggregate_record_element item -aggregate_uri_id post_id -output_uri_prefix /resources/ -output_uri_suffix .xml
有什么主意我可以解决这个问题吗?
Any idea how I can fix this?
谢谢!
城
推荐答案
您的测试用例有一条格式错误的行:<wp:post_id>2</title>
.当我修复该问题并使用7.0-4修复mlcp-Hadoop2-1.2-3时,每个项目元素都会看到一条警告:
Your test case has one malformed line: <wp:post_id>2</title>
. When I fix that and mlcp-Hadoop2-1.2-3 with 7.0-4, I see one warning per item element:
15/01/12 14:16:14 WARN mapreduce.ContentWriter: XDMP-DOCNONSBIND: No namespace binding for prefix wp at /resources/1.xml line 215/01/12 14:16:14 WARN mapreduce.ContentWriter: XDMP-DOCNONSBIND: No namespace binding for prefix wp at /resources/2.xml line 2
15/01/12 14:16:14 WARN mapreduce.ContentWriter: XDMP-DOCNONSBIND: No namespace binding for prefix wp at /resources/1.xml line 215/01/12 14:16:14 WARN mapreduce.ContentWriter: XDMP-DOCNONSBIND: No namespace binding for prefix wp at /resources/2.xml line 2
对我来说,这似乎是一个mlcp错误.您的名称空间声明位于item
元素的级别之上,并且不会被发送到服务器.
This looks like an mlcp bug to me. Your namespace declarations are above the level of the item
element, and they aren't being sent up to the server.
作为一种解决方法,您可以编辑XML.或者,您可以尝试使用以下内容 http://marklogic.github.io/recordloader/:
As a workaround, you could edit the XML. Or you could try http://marklogic.github.io/recordloader/ with something like this:
$ recordloader.sh -DCONNECTION_STRING=xcc://admin:admin@localhost:8088 \
-DRECORD_NAME=item -DID_NAME="#AUTO" data.xml
有关其他选项,请参见 http://marklogic.github.io/recordloader/.
See http://marklogic.github.io/recordloader/ for other options.
这篇关于使用mlcp加载数据-命名空间问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!