给定another question的数据集:

    user                             item  \
0  b80344d063b5ccb3212f76538f3d9e43d87dca9e          The Cove - Jack Johnson
1  b80344d063b5ccb3212f76538f3d9e43d87dca9e  Entre Dos Aguas - Paco De Lucia
2  b80344d063b5ccb3212f76538f3d9e43d87dca9e            Stronger - Kanye West
3  b80344d063b5ccb3212f76538f3d9e43d87dca9e    Constellations - Jack Johnson
4  b80344d063b5ccb3212f76538f3d9e43d87dca9e      Learn To Fly - Foo Fighters

rating
0       1
1       2
2       1
3       1
4       1

是否有任何方法可以以预期的格式加载这些数据,而不必手动将所有内容移到同一行?

最佳答案

方法之一是基于\n\n进行拆分,然后创建单独的数据帧,然后将它们连接起来。即

#Bit of code from https://stackoverflow.com/questions/45740537/copying-multiindex-dataframes-with-pd-read-clipboard

def read_clipboard_split(index_names_row=None, **kwargs):
    encoding = kwargs.pop('encoding', 'utf-8')

    # only utf-8 is valid for passed value because that's what clipboard
    # supports
    if encoding is not None and encoding.lower().replace('-', '') != 'utf8':
        raise NotImplementedError(
            'reading from clipboard only supports utf-8 encoding')

    from pandas import compat, read_fwf
    from pandas.io.clipboard import clipboard_get
    from pandas.io.common import StringIO

    data = clipboard_get()
    items = data.split("\n\n")
    k = []
    for i in items:
        k.append(read_fwf(StringIO(i), **kwargs))
    df = pd.concat(k,axis=1)
    return df

read_clipboard_split()

样本运行:
用户\
0 b80344d063b5ccb3212f76538f3d9e43d87dca9e号
1号B80344d063b5ccb3212f76538f3d9e43d87dca9e
2号b80344d063b5ccb3212f76538f3d9e43d87dca9e
3号B80344d063b5ccb3212f76538f3d9e43d87dca9e
4号B80344d063b5ccb3212f76538f3d9e43d87dca9e
评级
0 1个
12个
2 1个
3 1个
4 1个
输出:
未命名:0用户\n未命名:0分级
0 0 b80344d063b5ccb3212f76538f3d9e43d87dca9e 0 1
1 1 B80344d063b5ccb3212f76538f3d9e43d87dca9e1 2
2 2 B80344d063b5ccb3212f76538f3d9e43d87dca9e2 1
3 3 b80344d063b5ccb3212f76538f3d9e43d87dca9e3 1
4 4 B80344d063b5ccb3212f76538f3d9e43d87dca9e4 1

07-24 17:10
查看更多