我正在尝试使用Pandas从原始文本文件创建数据帧。该文件包括3个类别,类别名称后面有与每个类别相关的项。我可以根据类别创建一个系列,但不知道如何将每个项目类型与它们各自的类别相关联,并从中创建一个数据帧。下面是我的初始代码以及所需的数据帧输出。你能帮我指点正确的方法吗?

category = ['Fruits', 'Vegetables', 'Meats']

items='''Fruits
apple
orange
pear
Vegetables
broccoli
squash
carrot
Meats
chicken
beef
lamb'''

Category = pd.Series()

i = 0
for item in items.splitlines():
    if item in category:
        Category = Category.set_value(i, item)
        i += 1
df = pd.DataFrame(Category)
print(df)

所需数据帧输出:
Category    Item
Fruits      apple
            orange
            pear
Vegetables  broccoli
            squash
            carrot
Meats       chicken
            beef
            lamb

最佳答案

考虑迭代地追加到列表字典而不是序列字典。然后,将dict转换为dataframe。下面的键用于输出所需的结果,因为这样的分组需要一个数字:

from io import StringIO
import pandas as pd

txtobj = StringIO('''Fruits
apple
orange
pear
Vegetables
broccoli
squash
carrot
Meats
chicken
beef
lamb''')

items = {'Category':[], 'Item':[]}

for line in txtobj:
    curr_line = line.replace('\n','')
    if curr_line in ['Fruits','Vegetables', 'Meats']:
        curr_category = curr_line

    if curr_category != curr_line:
        items['Category'].append(curr_category)
        items['Item'].append(curr_line)

df = pd.DataFrame(items).assign(key=1)
print(df)
#      Category      Item  key
# 0      Fruits     apple    1
# 1      Fruits    orange    1
# 2      Fruits      pear    1
# 3  Vegetables  broccoli    1
# 4  Vegetables    squash    1
# 5  Vegetables    carrot    1
# 6       Meats   chicken    1
# 7       Meats      beef    1
# 8       Meats      lamb    1

print(df['key'].groupby([df['Category'], df['Item']]).count())
# Category    Item
# Fruits      apple       1
#             orange      1
#             pear        1
# Meats       beef        1
#             chicken     1
#             lamb        1
# Vegetables  broccoli    1
#             carrot      1
#             squash      1
# Name: key, dtype: int64

关于python - Python Pandas使用文本文件创建数据框,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/45113619/

10-12 16:40
查看更多