本文介绍了从文本文件Python Numpy创建字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文件,如下所示:

I have a text file that looks like this:

# Comments
PARAMETER  0  0
      1045        54
      1705         0                           time 1
         1        10       100   0.000e+00   9999   A
         2        20       200   0.2717072   9999   B
         3        30       300   0.0282928   9999   C
         1       174        92   2999.4514   9999   APEW-1
         2       174        92   54.952499   9999   ART-3A
         1       174        97   5352.1299   9999   APEW-2
         1       173       128   40.455467   9999   APEW-3
         2       173       128   1291.1320   9999   APEW-3
         3       173       128   86.562599   9999   ART-7B
...

我想创建一个如下所示的字典(基本上跳过标题和某些列,然后转到我需要的数据):

I want to create a dictionary that looks like below (basically skipping the header and certain columns and goes to the data that I need):

my_dict = {'A':(1,10,100),'B':(2,20,200), 'C':(3,30,300), 'APEW-1':(1,174,92), ...}

这些数据点是观察点,它们各自的值是深度y,x.因此,一个观察点可以具有针对不同深度的多个值(第一列).我试图通过添加重复项的后缀来避免重命名标签.我想知道是否有解决办法.我要对它们进行的操作是调用观察点名称并提取坐标.我不确定字典是否是用于此目的的正确工具.它是一个很小的数据集,不需要很快.我正在使用Numpy,Python 2.7.

These data point are observation points and their respective values are depth, y, x. Therefore one observation point can have multiple values for different depths (the first column). I am trying to avoid rename the labels by adding a suffix for duplicates. I wonder if there is any way around it. What I want to do with them is to call a observation point name and extract the coordinates. I am not sure if the dictionary is the right tool for this purpose.It is an small dataset and doesn't need to be fast. I am using Numpy, Python 2.7.

推荐答案

loadtxt可以做到:

>>> dtype=np.rec.fromrecords([[0, 0, 0, b'APEW-1']]).dtype
>>> x = np.loadtxt(fn, skiprows=4, usecols=(0,1,2,5), dtype=dtype)
>>>
>>> result = {}
>>> for x0, x1, x2, key in x:
...     try:
...         result[key.decode()].append((x0,x1,x2))
...     except KeyError:
...         result[key.decode()] = [(x0,x1,x2)]
...
>>> result
{'A': [(1, 10, 100)], 'B': [(2, 20, 200)], 'C': [(3, 30, 300)], 'APEW-1': [(1, 174, 92)], 'ART-3A': [(2, 174, 92)], 'APEW-2': [(1, 174, 97)], 'APEW-3': [(1, 173, 128), (2, 173, 128)], 'ART-7B': [(3, 173, 128)]}

注意:

  • 我们滥用rec.fromrecords来创建描述列的compund dtype,请确保使用模板字符串,只要您期望的最长

  • we abuse rec.fromrecords to create a compund dtype describing the columns, be sure to use a template string as long as the longest you expect

  • 可能存在一种创建复合dtypes的官方方法,该方法不涉及创建一次性数组,但这很容易且有效
  • there is probably an official way of creating compound dtypes that doesn't involve creating a throw-away array but this is easy and works

如果没有重复的键,我们可以使用dict理解将记录数组转换为dict f0-f3是自动生成的字段名称

if there were no duplicate keys, we could use dict comprehension to translate the record array to dict f0-f3 are the auto generated field names

  • 为了容纳重复项,我们在列表中打包了元组的值
  • 大多数列表仅包含一个元组,但有些将具有更多元组

py2版本:主要区别在于无需使用字节字符串/decode,字典会忘记项目的顺序

py2 version: main difference no need to use byte strings / decode, dictionary forgets order of items

>> dtype=np.rec.fromrecords([[0, 0, 0, 'APEW-1']]).dtype
>>> x = np.loadtxt(fn, skiprows=4, usecols=(0,1,2,5), dtype=dtype)
>>>
>>> result = {}
>>> for x0, x1, x2, key in x:
...     try:
...         result[key].append((x0,x1,x2))
...     except KeyError:
...         result[key] = [(x0,x1,x2)]
...
>>> result
{'A': [(1, 10, 100)], 'B': [(2, 20, 200)], 'C': [(3, 30, 300)], 'APEW-1': [(1, 174, 92)], 'ART-3A': [(2, 174, 92)], 'APEW-2': [(1, 174, 97)], 'APEW-3': [(1, 173, 128), (2, 173, 128)], 'ART-7B': [(3, 173, 128)]}

这篇关于从文本文件Python Numpy创建字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 09:44