问题描述
我一直在负责一个RESTful Web服务的结果转换成新格式的XML文档。
I've been tasked with converting the results of a restful web service into an XML document with new formatting.
的HTML / XHTML的一个例子将被转换:
An example of the html/xhtml to be converted:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>OvidWS Result Set Resource</title>
</head>
<body>
<table id="results">
<tr>
<td class="_index">
<a class="uri" href="REDACTED">1</a>
</td>
<td class="au">
<span>GILLESPIE JB</span>
<span>KUKES RE</span>
</td>
<td class="so">A.M.A. American Journal of Diseases of Children</td>
<td class="ti">Acetylsalicylic acid poisoning with recovery.</td>
<td class="ui">20267726</td>
<td class="yr">1947</td>
</tr>
<tr>
<td class="_index">
<a class="uri" href="REDACTED">2</a>
</td>
<td class="au">BASS MH</td>
<td class="so">Journal of the Mount Sinai Hospital, New York</td>
<td class="ti">Aspirin poisoning in infants.</td>
<td class="ui">20265054</td>
<td class="yr">1947</td>
</tr>
</table>
</body>
</html>
在理想情况下所有我想要做的就是采取任何被列为类的属性,使其元素名称,在情况下,有没有class属性我只是想将其标记为一个项目。
Ideally all I want to do is take whatever is listed as the class attribute and make it the element name, in cases where there is no 'class' attribute I just want to mark it as an item.
这是我在寻找的转换:
<results>
<citation>
<_index>
<uri href="REDACTED">1</uri>
</_index>
<au>
<item>GILLESPIE JB</item>
<item>KUKES RE</item>
</au>
<so>A.M.A. American Journal of Diseases of Children</so>
<ti>Acetylsalicylic acid poisoning with recovery.</ti>
<ui>20267726</ui>
<yr>1947</yr>
</citation>
<citation>
<_index>
<uri href="REDACTED">2</a>
</_index>
<au>BASS MH</au>
<so>Journal of the Mount Sinai Hospital, New York</so>
<ti>Aspirin poisoning in infants.</ti>
<ui>20265054</ui>
<yr>1947</yr>
</citation>
</results>
我发现了一小块code 此处,让我重新命名一个节点:
I found a little piece of code here which allows me to rename a node:
Public Shared Function RenameNode(ByVal e As XmlNode, newName As String) As XmlNode
Dim doc As XmlDocument = e.OwnerDocument
Dim newNode As XmlNode = doc.CreateNode(e.NodeType, newName, Nothing)
While (e.HasChildNodes)
newNode.AppendChild(e.FirstChild)
End While
Dim ac As XmlAttributeCollection = e.Attributes
While (ac.Count > 0)
newNode.Attributes.Append(ac(0))
End While
Dim parent As XmlNode = e.ParentNode
parent.ReplaceChild(newNode, e)
Return newNode
End Function
但遍历XmlAttributeCollection时出现问题。出于某种原因,当在TD的一个节点看,没有出现在源2的属性神奇地出现:ROWSPAN和合并单元格。看来这些属性都搞乱了迭代器,因为它们被消费的时候,他们不从属性列表中消失,如类的属性一样。相反,属性的值被消耗(从1改变为)。这导致无限循环。
But a problem arises when iterating over the XmlAttributeCollection. For some reason when looking at one of the td nodes, 2 attributes that don't appear in the source magically appear: rowspan and colspan. It seems these attributes are messing with the iterator as when they are consumed, they do not disappear from the attribute list like the 'class' attribute does. Instead, the value of the attribute is consumed (changing from "1" to ""). This results in an infinite loop.
我注意到,他们是类型为XMLUnspecifiedAttribute,但是当我修改循环检测:
I note that they are of type 'XMLUnspecifiedAttribute', but when I modify the loop to detect that:
While (ac.Count > 0) And Not TypeOf (ac(0)) Is System.Xml.XmlUnspecifiedAttribute
newNode.Attributes.Append(ac(0))
End While
我收到以下错误:
I get the following error:
System.Xml.XmlUnspecifiedAttribute is not accessible in this context because it is 'friend'
任何想法,为什么这种情况正在发生或如何解决它?
Any ideas why this is happening or how to work around it?
推荐答案
我想你遇到的问题确实是您的文档类型声明。
I think the problem you are having is indeed your doc type declaration.
既然你还有翻译完全节点到的东西,那么我会说,你甚至不需要它,并能safely忽略它的。
since you are translating the nodes into something else completely then I would say you don't even need it and can safely ignore it.
由于我不包括在我的测试中,然后当我把它的的XmlResolver去失控,我假设你当然不需要在这里。
Since I was not including it in my tests, and then when I included it the xmlresolver went haywire, I am assuming you certainly don't need it here.
您可以通过解析设置为没有忽视它
:
You can ignore it by setting the resolver to nothing
:
{xml document object}.Xmlresolver = nothing
然后你做你的节点和程序选择。我这样做,即使在源文件中的文档类型,仍然没有问题。
Then you do your select for the node and process. I did this even with the doc type in the source file and still had no issues.
下面是code我用来测试:
Here is the code I used to test:
Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
Dim USEDoc As New XmlDocument
Dim theNameManager As System.Xml.XmlNamespaceManager = New System.Xml.XmlNamespaceManager(USEDoc.NameTable)
theNameManager.AddNamespace("xhtml", "http://www.w3.org/1999/xhtml")
USEDoc.XmlResolver = Nothing
USEDoc.Load("RestServ.txt")
renameNodes(USEDoc.SelectSingleNode("descendant::xhtml:table", theNameManager))
Dim SaveDoc As New XmlDocument
SaveDoc.AppendChild(SaveDoc.ImportNode(USEDoc.SelectSingleNode("//results", theNameManager), True))
SaveDoc.Save("RestServConv.xml")
End Sub
Public Function renameNodes(ByVal TopNode As XmlNode) As Boolean
Dim UseNode As XmlNode
If TopNode.Name <> "#text" Then
If TopNode.Name = "tr" Then
UseNode = RenameNode(TopNode, "citation")
ElseIf TopNode.Name = "table" Then
UseNode = RenameNode(TopNode, "results")
UseNode.Attributes.RemoveNamedItem("id")
ElseIf TopNode.Attributes.Count > 0 Then
For Each oAttribute As XmlAttribute In TopNode.Attributes
If oAttribute.Name = "class" Then
UseNode = RenameNode(TopNode, oAttribute.Value)
UseNode.Attributes.RemoveNamedItem("class")
Exit For
End If
Next oAttribute
End If
If UseNode IsNot Nothing Then
If UseNode.ChildNodes.Count > 0 Then
Dim x As Integer
For x = 0 To UseNode.ChildNodes.Count - 1
renameNodes(UseNode.ChildNodes(x))
Next x
End If
End If
End If
Return True
End Function
Public Shared Function RenameNode(ByVal e As XmlNode, ByVal newName As String) As XmlNode
Dim doc As XmlDocument = e.OwnerDocument
Dim newNode As XmlNode = doc.CreateNode(e.NodeType, newName, Nothing)
While (e.HasChildNodes)
newNode.AppendChild(e.FirstChild)
End While
Dim ac As XmlAttributeCollection = e.Attributes
While (ac.Count > 0)
newNode.Attributes.Append(ac(0))
End While
Dim parent As XmlNode = e.ParentNode
parent.ReplaceChild(newNode, e)
Return newNode
End Function
我通过你的例如文件,我得到的结果是这样的:
I passed in your example document and the result I got was this:
<results>
<citation>
<_index>
<uri href="REDACTED">1</uri>
</_index>
<au>
<span xmlns="http://www.w3.org/1999/xhtml">GILLESPIE JB</span>
<span xmlns="http://www.w3.org/1999/xhtml">KUKES RE</span>
</au>
<so rowspan="1" colspan="1">A.M.A. American Journal of Diseases of Children</so>
<ti>Acetylsalicylic acid poisoning with recovery.</ti>
<ui>20267726</ui>
<yr>1947</yr>
</citation>
<citation>
<_index>
<uri href="REDACTED">2</uri>
</_index>
<au>BASS MH</au>
<so>Journal of the Mount Sinai Hospital, New York</so>
<ti>Aspirin poisoning in infants.</ti>
<ui>20265054</ui>
<yr>1947</yr>
</citation>
</results>
这篇关于我怎么能忽略它创建试图重命名节点时,一个无限循环幻象XML属性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!