输入如下内容:

"Date 3" "Location A" "some data"
"Date 3" "Location B" "some data"
"Date 3" "Location C" "some data"
"Date 2" "Location A" "some data"
"Date 2" "Location B" "some data"
"Date 1" "Location A" "some data"
"Date 1" "Location C" "some data"

我想将它排列成列(最终将其放入电子表格),如下所示:
        Location A    Location B    Location C
Date 3  some data     some data     some data
Date 2  some data     some data     None
Date 1  some data     None          some data

使用下面的代码,当我将日期分为“月”和“日”并将日期视为整数时,它开始工作,但是在一个月之后,它们使用相同的日整数,所以它会重写它。
    log = [["Location A", "somedata", 3, "Month"],["Location B", "somedata", 3, "Month"],
       ["Location C", "somedata", 3, "Month"],["Location A", "somedata", 2, "Month"],
       ["Location B", "somedata", 2, "Month"],["Location A", "somedata", 1, "Month"],
       ["Location C","somedata",1,"Month"]]

    locations = ["Location A","Location B","Location C"]

    location = locations
    days = []

    for location, time, day, month in log:

        for i in range(len(days),day):
            days.append([i+1] + [None for x in locations])

        days[day - 1][1 + locations.index(location)] = time
        days[day - 1][0] =  month + " " + str(day) # I just hack the date together here

    days = [i for i in days if i.count(None) < len(locations)]

    locations.insert(0,"Date")
    days.insert(0,locations)

    days = list(zip(*days))

会给我(正确的)
['Date', 'Location A', 'Location B', 'Location C']
['Month 1', 'somedata', None, 'somedata']
['Month 2', 'somedata', 'somedata', None]
['Month 3', 'somedata', 'somedata', 'somedata']

但是我想将日期作为一个字符串放在一起,并且每次字符串更改时都移到下一列,而不是将日期用作整数。
locations = ["A","B","C"]

log = [ ["Date 2", "A", "Time"],["Date 2", "B", "Time"],["Date 2", "C", "Time"],
        ["Date 1", "A", "Time"],["Date 1", "B", "Time"],["Date 1", "C", "Time"] ]
out = []
j   = 0

for index, day in enumerate(log):

     date, location, time = day

     out.append([date] + [None for x in locations])

     if(log[index][0] != log[index-1][0] and index != 0):
          j += 1

     out[j][1 + locations.index(location)] = location

用这种方法,我可以得到:
['Date 2', 'A', None, 'C']
['Date 2', 'A', 'B', 'C']
['Date 1', None, None, None]
['Date 1', None, None, None]
['Date 1', None, None, None]

但是它用None填充了太多的列,所以数据与日期不对应。
有人有什么想法吗我是一个初学者,正在使用Python3.3
提前非常感谢。

最佳答案

[社区维基,因为它确实是一个不同方法的建议。]
这种操作通常被称为“旋转”。像pandas这样的库使这个过程非常简单,如果您编写代码来为以后的电子表格处理做中间工作,它会非常方便。
有点像

import pandas as pd
df = pd.read_csv("source.dat", delim_whitespace=True, header=None)
pivoted = df.pivot(index=0, columns=1, values=2)
pivoted = pivoted.fillna("None")
pivoted.index.name = ""
pivoted.to_csv("final.csv")

生产
>>> !cat final.csv
,Location A,Location B,Location C
Date 1,some data,None,some data
Date 2,some data,some data,None
Date 3,some data,some data,some data

[我应该提到,许多电子表格程序,包括世界上最常见的程序,也可以在本机上实现这一点。]
一步一步:
首先,将文件读入DataFrame(类似于电子表格页面):
>>> df = pd.read_csv("source.dat", delim_whitespace=True, header=None)
>>> df
        0           1          2
0  Date 3  Location A  some data
1  Date 3  Location B  some data
2  Date 3  Location C  some data
3  Date 2  Location A  some data
4  Date 2  Location B  some data
5  Date 1  Location A  some data
6  Date 1  Location C  some data

[7 rows x 3 columns]

然后使用pivot方法重塑它:
>>> pivoted = df.pivot(index=0, columns=1, values=2)
>>> pivoted
1      Location A Location B Location C
0
Date 1  some data        NaN  some data
Date 2  some data  some data        NaN
Date 3  some data  some data  some data

[3 rows x 3 columns]

pandas使用NaN表示缺少的值,但如果您愿意,我们可以使用"None"
>>> pivoted = pivoted.fillna("None")
>>> pivoted
1      Location A Location B Location C
0
Date 1  some data       None  some data
Date 2  some data  some data       None
Date 3  some data  some data  some data

[3 rows x 3 columns]

您似乎不需要命名索引,所以让我们去掉它:
>>> pivoted.index.name = ""
>>> pivoted
1      Location A Location B Location C

Date 1  some data       None  some data
Date 2  some data  some data       None
Date 3  some data  some data  some data

[3 rows x 3 columns]

然后我们可以用to_csv来写出来。(如果需要,我们也可以直接将其写入Excel格式的工作簿。)

10-04 12:10