问题描述
HIII
我想从与出Microsoft Word文件内容
Microsoft.Office.Interop DLL使用。
我也用这个code,但它的只读从.xml文件和.txt文件中的文本不.doc文件
使用System.IO;
使用(StreamReader的StreamReader的=新的StreamReader(文件路径)){字符串文本= streamReader.ReadToEnd(); }
办公文档比简单的XML / txt文件更复杂,因为它们含有更多的文字相关的信息(字体,颜色,位置,表格,图像,等等等等)。
从Office 2007开始,微软使用'的Office Open XML格式保存Office文件。要解析的docx文件,其扩展名重命名为ZIP(例如untitled1.docx.zip),并提取其内容(使用任何ZIP应用程序/库)。
您会得到一些文件和文件夹,浏览到'字文件夹,只是读名为document.xml。
文件本文件包含了文档中的所有文本信息(这是XML格式的,所以一定要正确分析)。
如果你想提取pre-2007的文件(例如文档文件)的文本信息,你将不得不使用的,这将文件迁移到新格式(可用于编程,了解吧)
hiii
i want to get content from Microsoft word file with outMicrosoft.Office.Interop dll uses.
I also use this code but its only read text from .xml file and .txt file not in .doc file
using System.IO;
using(StreamReader streamReader = new StreamReader(filePath)) { string text = streamReader.ReadToEnd(); }
office documents are more complex than simple xml/txt files since they contain much more text-related information (fonts, colors, locations, tables, images, etc etc).
Starting from Office 2007, microsoft uses the 'Office Open XML' format for saving office files. To parse a docx file, rename its extension to zip (e.g. untitled1.docx.zip) and extract its contents (using any zip app/library).
You will get a few files and folders, navigate to the 'word' folder and simply read the file named 'document.xml'.
This file contains all the textual information of the document (it is xml-formatted, so be sure to parse it correctly).
If you want to extract textual information of a pre-2007 files (e.g. 'doc' file), you will have to use Microsoft Office Compatibility Pack, which migrates files to the new format (it can be used programmatically, read about it)
这篇关于阅读内容从Word文档在C#中与出使用Word的Dll的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!