用Python解码ASCII文件中的COMP

用Python解码ASCII文件中的COMP

本文介绍了用Python解码ASCII文件中的COMP-3压缩字段?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文件,该文件以前是EBCDIC编码的文件,已使用 dd .但是,有些行包含COMP-3压缩字段,我想阅读.

I have a file that was formerly an EBCDIC-encoded file, which was converted to ASCII using dd. However, some lines contain COMP-3 packed fields which I would like to read.

例如,我要解码的行之一的字符串表示形式是:

For example, the string representation of one of the lines I would like to decode is:

'15\x00\x00\x00\x04@\x00\x00\x00\x00\x0c\x00\x00\x00\x00\x0c777093020141204NNNNNNNNYNNNN\n'

我想读取的字段由PIC S9(09) COMP-3 POS. 3指定,即,从第三个字节开始,并且在解码时长度为9个字节的字段(因此,根据 COMP-3规范).

The field I would like to read is specified by PIC S9(09) COMP-3 POS. 3, that is, the field that starts with the third byte and is nine bytes long when decoded (and therefore, five bytes long when encoded, according to the COMP-3 spec).

我了解COMP-3规范,并且我也知道对于此特定行,该字段的整数值应为315,但是我无法弄清楚该字段的实际解码方式.我也不知道文件是否已用dd转换为ASCII的事实是否存在.

I understand the COMP-3 spec and I also know that for this particular line the integer value of this field should be 315, but I can't figure out what to do in order to actually decode the field. I'm also not sure if the fact that the file was converted with dd to ASCII is a problem here or not.

以前有没有人从事过类似的工作,或者我显然缺少什么?谢谢!

Has anyone worked on a similar issue before, or is there something obvious I'm missing? Thank you!

推荐答案

是的,文件包含非字符数据并且已在文件或记录级别从EBCDIC转换为ASCII是一个问题.使用什么工具来做到这一点不是问题.

Yes, it is a problem that a file contains non-character data and has been converted from EBCDIC to ASCII at the file or record-level. It is not a problem what tool has been used to do that.

到目前为止,最简单的方法是要求仅以字符形式提供数据.在数据包含有符号字段的地方,符号应该分开,在隐含的小数位的地方,这些应该是实际的,或者由标度值指示(以您更方便为准).

By far the easiest thing for you is to request that the data be given to you in character-only. Where the data contains signed fields, the sign should be separate, and where there are implied decimal places these should be actual, or indicated by a scaling value (whichever is more convenient to you).

然后,您无需进行任何转换.我永远无法理解人们如何认为他们可以只向您提供包含任何内容"的EBCDIC数据,并希望您对其进行整理.

Then you need to convert nothing. I can never understand how people think they can just give you EBCDIC data containing "whatever" and expect you to sort it out.

如果单击EBCDIC标记,您将发现一些其他解决方案,如果由于某种愚蠢的原因而无法从EBCDIC来源获得字符数据,则可以使用这些解决方案.由于他们已经给了您一些废话,他们也许可以提出一些some谐的理由.如果是这样,请(礼貌地)将其记录在案.

If you click on the EBCDIC tag you will find some other solutions you may be able to apply if, for some idiotic reason, the character data cannot be made available from the EBCDIC source. Since they've given you crap already, they may be able to come up with some moronic reason. If so, document it (politely) to your boss.

如果获得字符数据,则可以dd或其他方式进行转换(如果仍然有有趣的东西,请检查代码页).

If you get character data, then you can dd or whatever to convert it (if you still get funny-looking stuff, check the code-pages).

如果您转换非字符数据,那么腌制东西的原因如下:

The reason things get pickled if you convert non-character data is exemplified by this:

05  a-packed-decimal-positive-five COMP-3 PIC S9 VALUE +5.
05  a-character-asterisk PIC X VALUE "*".

在EBCDIC中,两者均具有十六进制值5C.两者都将转换为ASCII星号. 5的COMP-3值便丢失了.请注意,COMP-3可以在低位符号之外的每个字节采用任意一对数字.当您碰巧碰到控制字符时会出现泡菜.对于二进制"字段也是如此,实际上确实更糟,因为意外命中的可能性更大.

Both of those, in EBCDIC, have the hexadecimal value 5C. Both will be converted to an ASCII asterisk. The COMP-3 value of five has then been lost. Note that a COMP-3 can, outside of the low-order sign, take any pair of numeric digits for each of its bytes. Pickle when you happen to hit a control character. Same for "binary" fields, worse indeed because more possibilities of accidental hit.

这篇关于用Python解码ASCII文件中的COMP-3压缩字段?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 02:08