我正在尝试拆分格式为:

@some
@garbage
@lines
@target G0.S0
@type xy
 -0.108847E+02  0.489034E-04
 -0.108711E+02  0.491023E-04
 -0.108574E+02  0.493062E-04
 -0.108438E+02  0.495075E-04
 -0.108302E+02  0.497094E-04
 ....Unknown line numbers...
&
@target G0.S1
@type xy
 -0.108847E+02  0.315559E-04
 -0.108711E+02  0.316844E-04
 -0.108574E+02  0.318134E-04
 ....Unknown line numbers...
&
@target G1.S0
@type xy
 -0.108847E+02  0.350450E-04
 -0.108711E+02  0.351669E-04
 -0.108574E+02  0.352908E-04
&
@target G1.S1
@type xy
 -0.108847E+02  0.216396E-04
 -0.108711E+02  0.217122E-04
 -0.108574E+02  0.217843E-04
 -0.108438E+02  0.218622E-04
@target Gx.Sy组合是唯一的,每组数据始终由&终止。

我已经设法将文件拆分成块:
#!/usr/bin/env python3
import os
import sys
import itertools as it
import numpy as np
import matplotlib.pyplot as plt

try:
  filename = sys.argv[1]
  print(filename)
except IndexError:
  print("ERROR: Required filename not provided")

with open(filename, "r") as f:
  for line in f:
    if line.startswith("@target"):
      print(line.split()[-1].split("."))

x=[];y=[]
with open(filename, "r") as f:
  for key,group in it.groupby(f,lambda line: line.startswith('@target')):
    print(key)
    if not key:
        group = list(group)
        group.pop(0)
        # group.pop(-1)
        print(group)
        for i in range(len(group)):
          x.append(group[i].split()[0])
          y.append(group[i].split()[1])
        nx=np.array(x)
        ny=np.array(y)

我有两个问题:

1)实际数据之前的前导行也被分组,因此,如果有任何前导,则脚本不起作用。无法预测会有多少行。但是我正在尝试在@target的之后对进行分组,然后

2)我想将数组命名为G0 [S0,S0]和G1 [S1,S2];但是我做不到。

请帮助

UPDATE :
我正在尝试将这些数据存储在G0 [S0,S1,...],G1 [S0,S1,..]的嵌套np数组中,以便可以在matplotlib中使用它。

最佳答案

以下功能可完成工作:

import numpy as np
from collections import defaultdict

def read_without_preamble(filename):
    with open(filename, 'r') as f:
        lines = f.readlines()
    for i, line in enumerate(lines):
        if line.startswith('@target'):
            return lines[i:]

def split_into_chunks(lines):
    chunks = defaultdict(dict)
    for line in lines:
        if line.startswith('@target'):
            GS_str = line.strip().split()[-1].split('.')
            G, S = map(lambda x: int(x[1:]), GS_str)
            chunks[G][S] = []
        elif line.startswith('@type xy'):
            pass
        elif line.startswith('&'):
            chunks[G][S] = np.asarray(chunks[G][S])
        else:
            xy_str = line.strip().split()
            chunks[G][S].append(map(float, xy_str))
    return chunks

要将文件拆分为多个块,您只需要运行以下代码:
try:
  filename = sys.argv[1]
  print(filename)
except IndexError:
  print("ERROR: Required filename not provided")

data = read_without_preamble(filename)
chunks = split_into_chunks(data)

逐步演示
chunks是一本字典,其中的键是G(01):
In [415]: type(chunks)
Out[415]: dict

In [416]: for k in chunks.keys(): print(k)
0
1

字典chunks的值是另一个字典,其中的键是S(在此示例中为012),该值是一个NumPy数组,其中包含Gi.Sn的数字数据。您可以像这样访问此数据块:chunks[i][n],其中索引in分别是GS的值。
In [417]: type(chunks[0])
Out[417]: dict

In [418]: for k in chunks[0].keys(): print(k)
0
1
2

In [419]: type(chunks[1][2])
Out[419]: numpy.ndarray

In [420]: chunks[1][2]
Out[420]:
array([[ -1.08851000e+01,   2.53058000e-05],
       [ -1.08715000e+01,   2.55353000e-05],
       [ -1.08579000e+01,   2.57745000e-05],
       [ -1.08443000e+01,   2.60225000e-05],
       [ -1.08306000e+01,   2.62617000e-05],
       [ -1.08170000e+01,   2.65097000e-05],
       [ -1.08034000e+01,   2.67666000e-05]])
chunks[i][n].shape[0]是任何2in,但是chunks[i][n].shape[1]可以采用任何值,即数值数据的行数可能在一个块之间变化。

formatted_file.txt

这是我在示例运行中使用的文件。它由六个块组成,分别是G0.S0G0.S1G0.S2G1.S0G1.S1G1.S2
@some
@garbage
@lines
@target G0.S0
@type xy
 -0.108851E+02  0.127435E-03
 -0.108715E+02  0.127829E-03
 -0.108579E+02  0.128191E-03
 -0.108443E+02  0.128502E-03
 -0.108306E+02  0.128726E-03
 -0.108170E+02  0.128838E-03
 -0.108034E+02  0.128751E-03
&
@target G0.S1
@type xy
 -0.108851E+02  0.472694E-04
 -0.108715E+02  0.474233E-04
 -0.108579E+02  0.475837E-04
 -0.108443E+02  0.477448E-04
 -0.108306E+02  0.479052E-04
 -0.108170E+02  0.480669E-04
 -0.108034E+02  0.482279E-04
&
@target G0.S2
@type xy
 -0.108851E+02  0.253654E-04
 -0.108715E+02  0.255956E-04
 -0.108579E+02  0.258346E-04
 -0.108443E+02  0.260825E-04
 -0.108306E+02  0.263303E-04
 -0.108170E+02  0.265781E-04
 -0.108034E+02  0.268349E-04
&
@target G1.S0
@type xy
 -0.108851E+02  0.108786E-03
 -0.108715E+02  0.109216E-03
 -0.108579E+02  0.109651E-03
 -0.108443E+02  0.110116E-03
 -0.108306E+02  0.110552E-03
 -0.108170E+02  0.111011E-03
 -0.108034E+02  0.111489E-03
&
@target G1.S1
@type xy
 -0.108851E+02  0.278045E-04
 -0.108715E+02  0.278711E-04
 -0.108579E+02  0.279384E-04
 -0.108443E+02  0.280050E-04
 -0.108306E+02  0.280723E-04
 -0.108170E+02  0.281395E-04
 -0.108034E+02  0.282074E-04
&
@target G1.S2
@type xy
 -0.108851E+02  0.253058E-04
 -0.108715E+02  0.255353E-04
 -0.108579E+02  0.257745E-04
 -0.108443E+02  0.260225E-04
 -0.108306E+02  0.262617E-04
 -0.108170E+02  0.265097E-04
 -0.108034E+02  0.267666E-04
&

关于python-3.x - 将文件拆分为大块,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/42400552/

10-10 04:43