问题描述
我正在使用 matlab 的 xmlread 内部函数读取一个简单的 xml 文件.
I'm reading a simple xml file using matlab's xmlread internal function.
<root>
<ref>
<requestor>John Doe</requestor>
<project>X</project>
</ref>
</root>
但是当我调用 ref
元素的 getChildren() 时,它告诉我它有 5 个子元素.
But when I call getChildren() of the ref
element, it's telling me that it has 5 children.
如果我将所有 XML 放在 一行 中,它可以正常工作.Matlab 告诉我 ref
元素有 2 个子元素.
It works fine IF I put all the XML in ONE line. Matlab tells me that ref
element has 2 children.
它似乎不喜欢元素之间的空格.
It doesn't seem to like the spaces between elements.
即使我在 oXygen XML 编辑器中运行 Canonicalize,我仍然得到相同的结果.因为 Canonicalize 仍然会留下空格.
Even if I run Canonicalize in oXygen XML editor, I still get the same results. Because Canonicalize still leaves spaces.
Matlab 使用 java 和 xerces 处理 xml 内容.
Matlab uses java and xerces for xml stuff.
我该怎么做才能使我的 xml 文件保持人类可读的格式(不是全部在一行中)但仍然让 matlab 正确解析它?
What can I do so that I can keep my xml file in human readable format (not all in one line) but still have matlab correctly parse it?
filename='example01.xml';
docNode = xmlread(filename);
rootNode = docNode.getDocumentElement;
entries = rootNode.getChildNodes;
nEnt = entries.getLength
推荐答案
幕后的 XML 解析器正在为节点元素之间的所有空白创建 #text 节点.无论哪里有换行符或缩进,它都会在节点的数据部分创建一个带有换行符和缩进空格的#text 节点.因此,在您提供的 xml 示例中,当它解析ref"元素的子节点时,它返回 5 个节点
The XML parser behind the scenes is creating #text nodes for all whitespace between the node elements. Whereever there is a newline or indentation it will create a #text node with the newline and following indentation spaces in the data portion of the node. So in the xml example you provided when it is parsing the child nodes of the "ref" element it returns 5 nodes
- 节点 1:#text 带有换行符和缩进空格
- 节点 2:请求者"节点,该节点又在数据部分有一个带有John Doe"的#text 子节点
- 节点 3:#text 带有换行符和缩进空格
- 节点 4:项目"节点,该节点在数据部分有一个带有X"的#text 子节点
- 节点 5:#text 带有换行符和缩进空格
此功能会为您删除所有这些无用的#text 节点.请注意,如果您故意让 xml 元素仅由空格组成,则此函数将删除它,但对于 99.99% 的 xml 情况,这应该可以正常工作.
This function removes all of these useless #text nodes for you. Note that if you intentionally have an xml element composed of nothing but whitespace then this function will remove it but for the 99.99% of xml cases this should work just fine.
function removeIndentNodes( childNodes )
numNodes = childNodes.getLength;
remList = [];
for i = numNodes:-1:1
theChild = childNodes.item(i-1);
if (theChild.hasChildNodes)
removeIndentNodes(theChild.getChildNodes);
else
if ( theChild.getNodeType == theChild.TEXT_NODE && ...
~isempty(char(theChild.getData())) && ...
all(isspace(char(theChild.getData()))))
remList(end+1) = i-1; % java indexing
end
end
end
for i = 1:length(remList)
childNodes.removeChild(childNodes.item(remList(i)));
end
end
这样称呼
tree = xmlread( xmlfile );
removeIndentNodes( tree.getChildNodes );
这篇关于如何让 Matlab 读取正确数量的 xml 节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!