为什么OpenXML读行两次

本文介绍了为什么OpenXML读行两次的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述 29岁程序员，3月因学历无情被辞！我在两个工作表中计算行如下： foreach（WorksheetPart worksheetPart in workbookPart.WorksheetParts） { OpenXmlPartReader reader = new OpenXmlPartReader（worksheetPart）; if（count == 0） { while（reader.Read（）） { if（reader.ElementType == typeof（Row）） { count_first ++; } } } else if（count == 1） { while（reader.Read（）） { if（reader.ElementType == typeof（Row）） { count_second ++; } } } count ++; } 对于 count_first 和 count_second 我的数据量是行数的两倍。为什么会这样呢？这是否意味着 OpenXML 分析每个列表两次？编辑嗯，我找到了一个解决方案。要想得到它，我想，你应该把这个神圣的知识保留在一些秘密的地方。所以，这里是： while（reader.Read（）） { if（reader .ElementType == typeof（Row）） { do { count_first ++; } while（reader.ReadNextSibling（））; } } 解决方案原因你得到两倍的计数是由于 OpenXmlReader 读取每个元素的方式。读者将打开和关闭节点视为独立的项目，可以通过检查 IsStartElement 和 IsEndElement 属性。为了演示这个，你可以运行这样的东西： using（OpenXmlReader reader = OpenXmlReader.Create（worksheetPart）） { while（reader.Read（）） { if（reader.ElementType == typeof（Row）） { do { Console.WriteLine（{0} {1} {2}， reader.ElementType， reader.IsStartElement，R eader.IsEndElement）; } while（reader.Read（））; Console.WriteLine（Finished）; } } } 对于具有两行和两列的工作表（我已经突出显示了行的可读性），以下的行的行*：有两种方法可以解决这个问题取决于你想要如何读t他记录。第一种方式（正如您在答案中指出的）是通过调用 ReadNextSibling 移动到下一个兄弟节点 - 这本质上是跳转终端元素（和行）。在 do 中更改上述示例以使用 ReadNextSibling do { Console.WriteLine（{0} {1} {2}， reader.ElementType， reader.IsStartElement， reader.IsEndElement）; } while（reader.ReadNextSibling（））; 您将获得输出*： Row True False Row True False 第二种方式是只是计数起始元素（或者实际上是最终元素，而不是两者）： while（reader.Read（）） { if（reader.ElementType == typeof（Row）&& reader.IsStartElement） { count_first ++; } } 您选择哪一个取决于您是否希望阅读 c c> c code code code $ c *实际上每一行的前缀都是DocumentFormat.OpenXml.Spreadsheet的命名空间。我已经删除了可读性。 I count rows in two worksheets like this:foreach (WorksheetPart worksheetPart in workbookPart.WorksheetParts){ OpenXmlPartReader reader = new OpenXmlPartReader(worksheetPart); if (count == 0) { while (reader.Read()) { if (reader.ElementType == typeof(Row)) { count_first++; } } } else if (count == 1) { while (reader.Read()) { if (reader.ElementType == typeof(Row)) { count_second++; } } } count++;}For both worksheets in count_first and count_second I get twice as much as there are rows with data. Why is that and what does it actually mean? Does it mean that OpenXML parses each list twice?EDITWell, I found a solution. To get it right away, I guess, you should keep this sacred knowledge in some secret place. So, here it is:while (reader.Read()){ if (reader.ElementType == typeof(Row)) { do { count_first++; } while (reader.ReadNextSibling()); }} 解决方案 The reason you are getting twice the count is due to the way the OpenXmlReader reads each element. The reader treats the open and close nodes as independant items which can be differentiated by checking the IsStartElement and IsEndElement properties.To demonstrate this you can run something like this:using (OpenXmlReader reader = OpenXmlReader.Create(worksheetPart)){ while (reader.Read()) { if (reader.ElementType == typeof(Row)) { do { Console.WriteLine("{0} {1} {2}", reader.ElementType, reader.IsStartElement, reader.IsEndElement); } while (reader.Read()); Console.WriteLine("Finished"); } }}Which will produce output along the lines of the following* for a sheet with two rows and two columns (I've highlighted the Rows for readibility):There are 2 ways that you can solve this depending on how you want to read the document. The first way (as you point out in your answer) is to move to the next sibling by calling ReadNextSibling - this essentially "jumps" the end element (and any children of the Row). Changing the above example to use ReadNextSibling in the do loop:do{ Console.WriteLine("{0} {1} {2}", reader.ElementType, reader.IsStartElement, reader.IsEndElement);} while (reader.ReadNextSibling());You'll get output* of:The second way would be to just count the start elements (or indeed the end elements; just not both):while (reader.Read()){ if (reader.ElementType == typeof(Row) && reader.IsStartElement) { count_first++; }}Which one you choose depends on whether you wish to read the Cell values and how you'd like to read them (SAX or DOM).* In reality each row is prefixed with the namespace of "DocumentFormat.OpenXml.Spreadsheet." which I've removed for readibility. 这篇关于为什么OpenXML读行两次的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！