如何快速解析ANSI字符串

如何快速解析ANSI字符串

本文介绍了如何快速解析ANSI字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的ansi文本文件.该文件包含许多条目(数百万至数十亿).每个条目有4行,如下所示:

I have a large ansi text file. The file contains many entries (millions to billions). Each entry has 4 lines like this:

@Instrument:6:73:941:1973#0/1
other stuff2
other stuff3
other stuff4

我对第一行感兴趣.从第一行中,我需要提取其内容(数字和字符串).我正在使用 StringReplace :和空格替换为#13 ,然后将行拆分成这样的记录:

I am interested in the first line. From the first line I need to extract its content (numbers and strings). I am using StringReplace to replace : and space with #13, then I split the line into a record like this:

   TYPE
      RBlock= record                                     // @Instrument:6:73:941:1973#0/1
       Instrument: String;                               // Instrument
       Lane: Integer;                                    // 6
       TileNo: Integer;                                  // 73
       X: integer;                                       // 941
       Y: Integer;                                       // 1973
       Pair: Byte;                                       // could be 1 or 2
       MultiplexID: AnsiString;                          // #0  <----  I need it as AnsiString
      end;

使用 StrToInto 将文本转换为数字可能很慢,因为它首先将 AnsiString 转换为字符串.

Using StrToInto to convert the text to numbers may be slow because it first converts the AnsiString to string.

任何有关如何更快阅读的想法都会受到赞赏.

Any ideas on how could I read it faster will be appreciated.

更新:该行还可以具有其他格式: @Instrument:136:FC6:2:2104:15343:197393 1:Y:18:TACA

Update: the line could also have an alternative format: @Instrument:136:FC6:2:2104:15343:197393 1:Y:18:TACA

推荐答案

您需要检查数据并检查可能会发生哪种数据.就我个人而言,我可能会做这样的事情(对于第一个示例):

You need to examine your data and check what sort of data could occur. Personally I would probably do something like this (for the first example):

procedure ParseLine(const aLine: RawByteString; var aInstrument: string; var
    aLane, aTileNo, aX, aY: Integer; var aMultiplexID: Ansistring; var aPair:
    Byte);
var
  arrayIndex: Integer;
  index: Integer;
  lineLength: Integer;
  NumList: array[0..3] of Integer;
  I: Integer;
  multiEnd: Integer;
begin
  lineLength := Length(aLine);
  // Get the aInstrument
  index := Pos(':', aLine);
  SetLength(aInstrument, index - 2);
  for I := 2 to index - 1 do
    aInstrument[I-1] := Char(aLine[I]);
  // Get the integers
  arrayIndex := 0;
  FillMemory(@NumList, SizeOf(NumList), 0);
  while (index < lineLength) and (arrayIndex < 4) do
  begin
    Inc(index);
    if (aLine[index] = ':') or (aLine[index] = '#') then
      Inc(arrayIndex)
    else
      NumList[arrayIndex] := NumList[arrayIndex] * 10 + Ord(aLine[index]) - Ord('0');
  end;
  aLane := NumList[0];
  aTileNo := NumList[1];
  aX := NumList[2];
  aY := NumList[3];
  // Get the Multiplex
  multiEnd := Pos('/', aLine, index);
  SetLength(aMultiplexID, multiEnd - index - 1);
  Inc(index);
  for I := index to multiEnd - 1 do
    aMultiplexID[I-index+1] := aLine[I];
  // Get the aPair
  if (multiEnd+1 < lineLength) then
    aPair := Ord(aLine[multiEnd+1]) - Ord('0')
  else
    aPair := 0;
end;

可以对其进行更多优化,但这将真正影响可读性.这里的问题将是该例程的数据是否有效.它会处理一个太短但在文本中不是无效值的字符串,尽管它在太短时不会返回错误.负数值也将是一个问题.您需要查看的是您的数据,它的外观,损坏或无效数据的机率以及速度对您的重要性.这是一种平衡的行为.您可以删除所有支票并使其更快,也可以添加更多支票以减慢其速度.

This could be optimized more but that would start to really hit the readability. The issue here is going to be whether the data is valid for this routine. It will handle a string that's too short but not invalid values in the text although it won't return an error when it's too short. Negative numeric's would also be a problem. What you need to look at is your data, what it looks like, what the chance of corruptions or invalid data would be and also how important speed is to you. It's a balancing act. You could remove all of the checks and have it faster or add a lot more checks which would slow it down.

这篇关于如何快速解析ANSI字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 21:31