阅读内容从Word文档在C＃中与出使用Word的Dll | 阅读内容从Word文档在C＃中与出使用Word的Dll

本文介绍了阅读内容从Word文档在C＃中与出使用Word的Dll的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

HIII

我想从与出Microsoft Word文件内容
Microsoft.Office.Interop DLL使用。

我也用这个code，但它的只读从.xml文件和.txt文件中的文本不.doc文件

 使用System.IO;
使用（StreamReader的StreamReader的=新的StreamReader（文件路径））{字符串文本= streamReader.ReadToEnd（）; }

解决方案

办公文档比简单的XML / txt文件更复杂，因为它们含有更多的文字相关的信息（字体，颜色，位置，表格，图像，等等等等）。

从Office 2007开始，微软使用'的Office Open XML格式保存Office文件。要解析的docx文件，其扩展名重命名为ZIP（例如untitled1.docx.zip），并提取其内容（使用任何ZIP应用程序/库）。

您会得到一些文件和文件夹，浏览到'字文件夹，只是读名为document.xml。

文件

本文件包含了文档中的所有文本信息（这是XML格式的，所以一定要正确分析）。

如果你想提取pre-2007的文件（例如文档文件）的文本信息，你将不得不使用的，这将文件迁移到新格式（可用于编程，了解吧）

hiii

i want to get content from Microsoft word file with outMicrosoft.Office.Interop dll uses.

I also use this code but its only read text from .xml file and .txt file not in .doc file

using System.IO;
using(StreamReader streamReader = new StreamReader(filePath)) { string text = streamReader.ReadToEnd();  }

解决方案

office documents are more complex than simple xml/txt files since they contain much more text-related information (fonts, colors, locations, tables, images, etc etc).

Starting from Office 2007, microsoft uses the 'Office Open XML' format for saving office files. To parse a docx file, rename its extension to zip (e.g. untitled1.docx.zip) and extract its contents (using any zip app/library).

You will get a few files and folders, navigate to the 'word' folder and simply read the file named 'document.xml'.

This file contains all the textual information of the document (it is xml-formatted, so be sure to parse it correctly).

If you want to extract textual information of a pre-2007 files (e.g. 'doc' file), you will have to use Microsoft Office Compatibility Pack, which migrates files to the new format (it can be used programmatically, read about it)

这篇关于阅读内容从Word文档在C＃中与出使用Word的Dll的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！