问题描述
我正在使用matlab的xmlread内部函数读取一个简单的xml文件.
I'm reading a simple xml file using matlab's xmlread internal function.
<root>
<ref>
<requestor>John Doe</requestor>
<project>X</project>
</ref>
</root>
但是当我调用ref
元素的getChildren()时,它告诉我它有 5 个孩子.
But when I call getChildren() of the ref
element, it's telling me that it has 5 children.
如果运行正常,效果很好,我将所有XML放在一行中. Matlab告诉我ref
元素有 2 个孩子.
It works fine IF I put all the XML in ONE line. Matlab tells me that ref
element has 2 children.
似乎不喜欢元素之间的空格.
It doesn't seem to like the spaces between elements.
即使我在oXygen XML编辑器中运行 Canonicalize ,我仍然可以获得相同的结果.因为Canonicalize仍然留有空格.
Even if I run Canonicalize in oXygen XML editor, I still get the same results. Because Canonicalize still leaves spaces.
Matlab将Java和xerces用于xml.
Matlab uses java and xerces for xml stuff.
我该怎么做才能使xml文件保持人类可读格式(不是全部都在一行中),但仍然可以使matlab正确解析它?
What can I do so that I can keep my xml file in human readable format (not all in one line) but still have matlab correctly parse it?
filename='example01.xml';
docNode = xmlread(filename);
rootNode = docNode.getDocumentElement;
entries = rootNode.getChildNodes;
nEnt = entries.getLength
推荐答案
幕后的XML解析器正在为节点元素之间的所有空白创建#text节点.凡存在换行符或缩进的地方,都将创建一个带有换行符的#text节点,并在该节点的数据部分中跟随缩进空格.因此,在您提供的xml示例中,当解析"ref"元素的子节点时,它将返回5个节点
The XML parser behind the scenes is creating #text nodes for all whitespace between the node elements. Whereever there is a newline or indentation it will create a #text node with the newline and following indentation spaces in the data portion of the node. So in the xml example you provided when it is parsing the child nodes of the "ref" element it returns 5 nodes
- 节点1:#带有换行符和缩进空格的文本
- 节点2:请求者"节点,该节点又有一个#text子节点,数据部分带有"John Doe"
- 节点3:#带有换行符和缩进空格的文本
- 节点4:项目"节点,该节点又在数据部分中包含一个带有"X"的#text子对象
- 节点5:#text带有换行符和缩进空格
此功能为您删除了所有这些无用的#text节点.请注意,如果您有意让一个由空格组成的xml元素,则此函数将其删除,但对于99.99%的xml情况,这应该可以正常工作.
This function removes all of these useless #text nodes for you. Note that if you intentionally have an xml element composed of nothing but whitespace then this function will remove it but for the 99.99% of xml cases this should work just fine.
function removeIndentNodes( childNodes )
numNodes = childNodes.getLength;
remList = [];
for i = numNodes:-1:1
theChild = childNodes.item(i-1);
if (theChild.hasChildNodes)
removeIndentNodes(theChild.getChildNodes);
else
if ( theChild.getNodeType == theChild.TEXT_NODE && ...
~isempty(char(theChild.getData())) && ...
all(isspace(char(theChild.getData()))))
remList(end+1) = i-1; % java indexing
end
end
end
for i = 1:length(remList)
childNodes.removeChild(childNodes.item(remList(i)));
end
end
这样称呼
tree = xmlread( xmlfile );
removeIndentNodes( tree.getChildNodes );
这篇关于如何让Matlab读取正确数量的xml节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!