问题描述
所以我已经安装了Community 4.0.a,并像在3.4中一样使用mimetype-map.xml扩展了mimetype列表。
So I've installed the Community 4.0.a and extended the mimetype list using mimetype-map.xml as I did before in 3.4
<alfresco-config area="mimetype-map">
<config evaluator="string-compare" condition="Mimetype Map">
<mimetypes>
<mimetype mimetype="application/dita+xml" text="true" display="DITA">
<extension default="true" display="DITA Topic">dita</extension>
<extension default="true" display="DITA Map">ditamap</extension>
<extension default="true" display="DITA Conditional Processing Profile">ditaval</extension>
</mimetype>
等...
但是每个当我导入DITA文件时,它要么被识别为XML文件,要么被识别为PLAIN TEXT。我已经对其进行了深入研究,这似乎是因为Apache TIKA会分析文件的开头以检查其mimetype。
But each time I import a DITA file, it is either recognise as an XML file, or PLAIN TEXT. I've digged into it and it looks like it's because of Apache TIKA which analyze the beginning of the file to check it's mimetype.
如何使用自定义快捷方式对TIKA进行快捷操作mimetype-map(从代码中可以看出,首先触发了TIKA,如果发现了某些东西,则游戏结束了?)
How do I shortcut TIKA with my custom mimetype-map (as it looks from the code that TIKA is triggered first and if it found something then it's game over)?
我是否必须扩展TIKA编写自己的代码解析器?
DO I have to extend TIKA writing my own parser?
推荐答案
4.0的Mimetype匹配逻辑已发生了细微变化,因为现在可以检测到内容,而不仅仅是文件名。在这种情况下,如果Tika非常确定文件是什么,则将首选此文件。
The Mimetype matching logic in 4.0 has changed slightly, now that the content is available for detection, rather than just the filename. As part of this, if Tika is very sure about what a file is, then this will be preferred.
在大多数情况下,这意味着常见但命名错误的文件,Tika可以帮助纠正错误。对于非标准文件,Tika将拒绝提供强有力的建议,并且将像以前一样使用基于Alfresco名称的匹配。 (如果Tika和Alfresco在模仿类型的规范形式上有所不同,则首选Alfresco版本)。
In most cases, this means that for common but incorrectly named files, Tika can help correct mistakes. For non standard files, Tika will decline to offer a strong suggestion, and the Alfresco name based matching will be used as before. (In cases where Tika and Alfresco differ on what the canonical form of the mimetype is, the Alfresco version is preferred though)
在少数情况下,文件type实际上是普通类型的一种特殊化,而Tika知道父类型,但不知道特定类型。在这种情况下,蒂卡(Tika)强烈建议使用父类型,而我们无法基于此来意识到添加到Alfresco的新类型。 (Tika具有模仿类型的层次结构,而Alfresco仅具有平面列表)。对于少数情况,Tika也需要提供指导。
There are a small number of cases where the file type is actually a specialisation of a common type, and Tika knows about the parent type but not the specific one. In this case, Tika strongly suggests the parent type, and we've no way to realise the new type added to Alfresco is based on that. (Tika has a hierarchy of mimetypes, while Alfresco just has a flat list). For these small number of cases, Tika needs guiding too.
通常的解决方法是报告Tika错误,并在上游添加文件类型。 (对于非常自定义的类型,您还需要添加一个Tika custom-mimetypes.xml,它定义了层次结构+ glob。)
The usual fix is to report a Tika bug, and have the filetype added upstream. (For very custom types, you need to add a Tika custom-mimetypes.xml too, which defines the hierarchy + glob.)
在这种DITA情况下,我已经打开并添加了临时修复程序。这也。
In this DITA case, I've opened TIKA-784 and added a provisional fix. This has now gone into Alfresco too.
这篇关于Alfresco社区4.0无法识别DITA文件模仿的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!