我正在尝试将制表符分隔的文本文件读入数据框。

这是文件在Excel中的外观:

CALENDAR_DATE   ORDER_NUMBER    INVOICE_NUMBER  TRANSACTION_TYPE    CUSTOMER_NUMBER   CUSTOMER_NAME
5/13/2016 0:00    13867666       6892372              S                 2026            CUSTOMER 1

导入到df中:
df = p.read_table("E:/FileLoc/ThisIsAFile.txt", encoding = "iso-8859-1")

现在,它没有将前3列视为列索引的一部分(df [0] =事务类型),并且所有 header 都转移了以反射(reflect)这一点。
                                CALENDAR_DATE   ORDER_NUMBER    INVOICE_NUMBER
5/13/2016 0:00 13867666 6892372       S             2026          CUSTOMER 1

我试图操纵文本文件,然后将其导入到mysql数据库中作为最终结果。

最佳答案

您可以将 read_csv 与分隔符2和更多空格一起使用:

import pandas as pd
import io

temp=u"""CALENDAR_DATE   ORDER_NUMBER    INVOICE_NUMBER  TRANSACTION_TYPE    CUSTOMER_NUMBER   CUSTOMER_NAME
5/13/2016 0:00    13867666       6892372              S                 2026            CUSTOMER 1"""
#after testing replace io.StringIO(temp) to filename
df =pd.read_csv(io.StringIO(temp), sep=r'\s{2,}', engine='python', encoding = "iso-8859-1")
print (df)
    CALENDAR_DATE  ORDER_NUMBER  INVOICE_NUMBER TRANSACTION_TYPE  \
0  5/13/2016 0:00      13867666         6892372                S

   CUSTOMER_NUMBER CUSTOMER_NAME
0             2026    CUSTOMER 1

如果分隔符为tabulator,请使用sep='\t'

编辑:

我用您的数据对其进行了测试,并且可以正常工作:
import pandas as pd

df = pd.read_csv('test/AnonymizedData.txt', sep='\t')
print (df)

   CUSTOMER_NUMBER CUSTOMER_NAME  CUSTOMER_BRANCH_CODE CUSTOMER_BRANCH_NAME  \
0             2026    CUSTOMER 1                    83       SALES BRANCH 1
1             2359    CUSTOMER 2                    76       SALES BRANCH 2
2           100662    CUSTOMER 3                    28       SALES BRANCH 3
3             3245    CUSTOMER 4                    84       SALES BRANCH 4
4             3179    CUSTOMER 5                    28       SALES BRANCH 5
5            39881    CUSTOMER 6                    67       SALES BRANCH 6
6            37020    CUSTOMER 7                    58       SALES BRANCH 7
7             1239    CUSTOMER 8                    50       SALES BRANCH 8
8             2379    CUSTOMER 9                    76       SALES BRANCH 9

  CUSTOMER_CITY CUSTOMER_STATE     ...      PRICING_PRODUCT_TYPE_CODE  \
0        TOWN 1             CO     ...                             11
1        TOWN 2             OH     ...                             11
2        TOWN 3             ME     ...                             11
3        TOWN 4             IL     ...                             11
4        TOWN 5             NH     ...                             11
5        TOWN 6             TX     ...                             11
6        TOWN 7             NC     ...                             11
7        TOWN 8             NY     ...                             11
8        TOWN 9             OH     ...                             11

  PRICING_PRODUCT_TYPE  ORGANIZATION_ID ORGANIZATION_NAME  PRODUCT_LINE_CODE  \
0          DISPOSABLES               83  ORGANIZATIONNAME                891
1          DISPOSABLES               83  ORGANIZATIONNAME                891
2          DISPOSABLES               83  ORGANIZATIONNAME                891
3          DISPOSABLES               83  ORGANIZATIONNAME                891
4          DISPOSABLES               83  ORGANIZATIONNAME                891
5          DISPOSABLES               83  ORGANIZATIONNAME                891
6          DISPOSABLES               83  ORGANIZATIONNAME                891
7          DISPOSABLES               83  ORGANIZATIONNAME                891
8          DISPOSABLES               83  ORGANIZATIONNAME                891

  PRODUCT_LINE  ROBOTIC_FLAG  Unnamed: 52  Unnamed: 53  Unnamed: 54
0  PRODUCTNAME             N            N          NaN            3
1  PRODUCTNAME             N            N          NaN            3
2  PRODUCTNAME             N            N          NaN            2
3  PRODUCTNAME             N            N          NaN            7
4  PRODUCTNAME             N            N          NaN            1
5  PRODUCTNAME             N            N          NaN            4
6  PRODUCTNAME             N            N          NaN            3
7  PRODUCTNAME             N            N          NaN            5
8  PRODUCTNAME             N            N          NaN            3

[9 rows x 55 columns]

关于python - Pandas read_table错误,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/37445855/

10-12 12:52