问题描述
我有一个包含ITPC数据的图像,并使用以下命令将IPTC提取为文本数据:
I have an image containing ITPC data and use the following command to extract the IPTC as textual data:
convert image.jpg IPTCTEXT:iptc.txt
问题是这似乎是使用特殊字符的实体:
The problem is that this seems to be using entities for "special characters":
2#120#Caption="Beschreibung für den Import aus IPTC"
实际上它应该是für。但不是获得正确的实体ü对于ü字符,我得到两个实体(可能两个字节的UTF-8编码字符都被转换为entites分隔)。这两个entites我无法正确解析。
Actually it should be "für" here. But instead of getting the correct entity ü for the "ü" character i get two entities (probably both bytes of the UTF-8 encoded character got transformed to entites separated). And these two entites i cannot parse correctly.
有没有办法获得正确的实体或禁用完全返回UTF-8字符的实体?
Is there any way to get the correct entity or disable the entities completely returning UTF-8 characters?
编辑:
我尝试使用Java中的StringEscapeUtils.unescapeXml解析实体但我得到两个字符(¼)而不是ü,因为两个实体都是非转义分开的。
I tried parsing the entities using StringEscapeUtils.unescapeXml in Java but i get two characters ("ü") instead of the "ü" as both entities are unescaped separated.
Edit2:
这里的示例图片:
推荐答案
最可靠的元数据包是IMHO exiv2(;适用于所有Linux发行版,Windows和不确定Mac二进制文件。)
The most reliable metadata package is IMHO exiv2 (http://exiv2.org/; available in all Linux distros, Windows, and not sure about Mac binaries).
请参阅获得结果。 ImageMagick用于元数据目的并不是那么好,我担心。
See http://paste.fedoraproject.org/232538/34459066/ for results. ImageMagick is for metadata purposes not that great, I am afraid.
这篇关于使用没有实体但是UTF-8的ImageMagick提取IPTC的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!