Python新手。在PythonAnywhere中处理大型数据集。我的CSV由于某种原因引入了“年份”作为文本。我能够使用pd.to_numeric使其成为数字。但是现在它是一个浮点数,我想要一个整数。我尝试了.dropna()。apply(np.int64),但它仍作为int传入。我需要dropna,因为显然有一些缺失的值
码:

import pandas as pd
import numpy as np

movies_df = pd.read_csv("movies_All.csv")

recentdf = movies_df.copy()

recentdf['Year'] = pd.to_numeric(recentdf['Year'], errors = 'coerce')

recentdf['Year'] = recentdf['Year'].dropna().apply(np.int64)

#recentdf = recentdf[recentdf['Year'] > 2000]

print(recentdf['Year'].head())


输出:名称:年,dtype:float64

最佳答案

我很困惑。根据您给定的输入,您的代码对我有用:

import pandas as pd, numpy as np
from io import StringIO

input = """
movieId,title,Year
1,Toy Story (1995),1995.0
2,Jumanji (1995),1995.0
"""

df = pd.read_csv(StringIO(input))
df['Year'] = df['Year'].dropna().apply(np.int64)
print(df["Year"].head())


输出量

0    1995
1    1995
Name: Year, dtype: int64


编辑:下面的讨论。

import pandas as pd, numpy as np
from io import StringIO

input = """
movieId,title,genres
1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,Jumanji (1995),Adventure|Children|Fantasy
3,Grumpier Old Men (1995),Comedy|Romance
4,Waiting to Exhale (1995),Comedy|Drama|Romance
5,Father of the Bride Part II (1995),Comedy
6,Heat (1995),Action|Crime|Thriller
7,Sabrina (1995),Comedy|Romance
8,Tom and Huck (1995),Adventure|Children
9,Sudden Death (1995),Action
10,GoldenEye (1995),Action|Adventure|Thriller
11,"American President, The (1995)",Comedy|Drama|Romance
12,Dracula: Dead and Loving It (1995),Comedy|Horror
13,Balto (1995),Adventure|Animation|Children
14,Nixon (1995),Drama
"""

df = pd.read_csv(StringIO(input))
df["Year"] = df["title"].apply(lambda title: title[-5:-1])
df['Year'] = df['Year'].dropna().apply(np.int64)
print(df["Year"].head())


输出量

0    1995
1    1995
2    1995
3    1995
4    1995
...
Name: Year, dtype: int64

10-07 16:24
查看更多