问题描述
我正在尝试连接到Microsoft Word文档(.docx),以从位于.docx中的表中读取值.我正在使用Open-XML SDK 2.0建立与.docx文件的连接.到目前为止,在寻找示例和想法之后,我有了这个
I'm trying to connect to a Microsoft word document (.docx) to read values from tables located in the .docx. I'm using Open-XML SDK 2.0 to make the connection to the .docx file. So far after looking for examples and ideas, I have this,
public static string TextFromWord(string file)
{
const string wordmlNamespace = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
StringBuilder textBuilder = new StringBuilder();
using (WordprocessingDocument wDoc = WordprocessingDocument.Open(filename,false))
{
//Manage namespaces to perform Xpath queries
NameTable nt = new NameTable();
XmlNamespaceManager nsManger = new XmlNamespaceManger(nt);
nsManager.AddNamespace("w", wordmlNamespace);
//Get the document part from the package.
//Load the XML in the document part into an XmlDocument instance.
XmlDocument xdoc = new XmlDocument(nt);
xdoc.Load(wdDoc.MainDocumentPart.GetStream());
XmlNodeList paragraphNodes = xdoc.SelectNodes("//w:p", nsManager);
foreach (XmlNode paragraphNode in paragraphNodes)
{
XmlNodeList textNodes = paragraphNode.SelectNodes(".//w:t", nsmanager);
foreach (System.Xml.XmlNode textNode in textNodes)
{
textBuilder.Append(textNode.InnerText);
}
textBuilder.Append(Environment.NewLine);
}
}
return textBuilder.ToString();
}
当.docx中只有文本时,该代码有效,但当文本位于表中时,该代码将失败.有没有一种方法可以解决此问题,使其可以与.docx中的表一起使用?
The code works when there is just text in the .docx but fails when the text is in tables. Is there a way to fix this so it can work with tables in a .docx?
推荐答案
尝试对您的方法进行以下简单的重写.它用 OpenXML元素(文档,正文,段落,表格,行,单元格,后代等).请安装并使用OpenXML 2.5 SDK .
Try the following simple re-write of your method. It replaces your System.XML calls and namespace items with OpenXML elements (Document, Body, Paragraph, Table, Row, Cell, Descendants, etc) . Please install and use the OpenXML 2.5 SDK.
public static string TextFromWord(string filename)
{
StringBuilder textBuilder = new StringBuilder();
using (WordprocessingDocument wDoc = WordprocessingDocument.Open(filename, false))
{
var parts = wDoc.MainDocumentPart.Document.Descendants().FirstOrDefault();
if (parts != null)
{
foreach (var node in parts.ChildElements)
{
if(node is Paragraph)
{
ProcessParagraph((Paragraph)node, textBuilder);
textBuilder.AppendLine("");
}
if (node is Table)
{
ProcessTable((Table)node, textBuilder);
}
}
}
}
return textBuilder.ToString();
}
private static void ProcessTable(Table node, StringBuilder textBuilder)
{
foreach (var row in node.Descendants<TableRow>())
{
textBuilder.Append("| ");
foreach (var cell in row.Descendants<TableCell>())
{
foreach (var para in cell.Descendants<Paragraph>())
{
ProcessParagraph(para, textBuilder);
}
textBuilder.Append(" | ");
}
textBuilder.AppendLine("");
}
}
private static void ProcessParagraph(Paragraph node, StringBuilder textBuilder)
{
foreach(var text in node.Descendants<Text>())
{
textBuilder.Append(text.InnerText);
}
}
注意-此代码仅适用于包含段落和表格的简单Word文档.该代码尚未在复杂的Word文档上进行过测试.
Note - this code will only work on simple Word documents that consist of Paragraphs and Tables. This code has not been tested on complex word documents.
以下文档已在控制台应用程序中使用以上代码处理:
The following document was processed with the above code in a Console app:
以下是文本输出:
这篇关于如何使用C#从Word文档中的表中读取值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!