问题描述
我有一个Web项目,必须从用户提供的文档中导入文本和图像,Microsoft Office 2007是一种可能的格式.还需要以这种格式生成文档.
该服务器运行CentOS 5.2,并安装了PHP/Perl/Python.如果需要,我可以执行本地二进制文件和shell脚本.我们使用Apache 2.2,但将在Nginx上线后切换到Nginx.
我有什么选择?任何人都有经验吗?
Office 2007文件格式已打开,并且详细记录.粗略地说,所有以"x"结尾的新文件格式都是zip压缩XML文档.例如:
其他文件格式大致相似.我还不知道有任何开放源代码库可以与它们进行交互-但是根据您的确切要求,读写简单的文档看起来并不难.当然,它比旧格式要容易得多.
如果您需要阅读较旧的格式,则OpenOffice有一个API,可以读写Office 2003和较旧的文档,或多或少地获得了成功.
I have a web project where I must import text and images from a user-supplied document, and one of the possible formats is Microsoft Office 2007. There's also a need to generate documents in this format.
The server runs CentOS 5.2 and has PHP/Perl/Python installed. I can execute local binaries and shell scripts if I must. We use Apache 2.2 but will be switching over to Nginx once it goes live.
What are my options? Anyone had experience with this?
The Office 2007 file formats are open and well documented. Roughly speaking, all of the new file formats ending in "x" are zip compressed XML documents. For example:
The other file formats are roughly similar. I don't know of any open source libraries for interacting with them as yet - but depending on your exact requirements, it doesn't look too difficult to read and write simple documents. Certainly it should be a lot easier than with the older formats.
If you need to read the older formats, OpenOffice has an API and can read and write Office 2003 and older documents with more or less success.
这篇关于解析并生成Microsoft Office 2007文件(.docx,.xlsx,.pptx)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!