我是熊猫的新手。我正在尝试使用邮政编码,该邮政编码中的人口数以及该邮政编码中的县数建立数据集。
我从人口普查网站获取数据:https://www2.census.gov/geo/docs/maps-data/data/rel/zcta_county_rel_10.txt
我正在尝试以下代码,但无法正常工作。您能帮我找出正确的代码吗?我预感该错误是由于数据帧或与数据类型有关的排序所致。但是我无法制定出正确的代码以使其正确。请让我知道你的想法。先感谢您!
import pandas as pd
df = pd.read_csv("zcta_county_rel_10.txt", dtype={'ZCTA5': str, 'STATE': str, 'COUNTY': str}, usecols=['ZCTA5', 'STATE', 'COUNTY', 'ZPOP'])
zcta_pop = df.drop_duplicates(subset={'ZCTA5', 'ZPOP'}).drop(['STATE', 'COUNTY'], 1)
zcta_ct_county = df['ZCTA5'].value_counts()
zcta_ct_county.columns = ['ZCTA5', 'CT_COUNTY']
pre_merge_1 = pd.merge(zcta_pop, zcta_ct_county, on='ZCTA5')[['ZCTA5', 'ZPOP', 'CT_COUNTY']]
这是我的错误信息:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/python27/lib/python2.7/site-packages/pandas/tools/merge.py", line 58, in merge copy=copy, indicator=indicator)
File "/usr/local/python27/lib/python2.7/site-packages/pandas/tools/merge.py", line 473, in __init__ 'type {0}'.format(type(right)))
ValueError: can not merge DataFrame with instance of type <class 'pandas.core.series.Series'>
解
import pandas as pd
df = pd.read_csv("zcta_county_rel_10.txt", dtype={'ZCTA5': str, 'STATE': str, 'COUNTY': str}, usecols=['ZCTA5', 'STATE', 'COUNTY', 'ZPOP'])
zcta_pop = df.drop_duplicates(subset={'ZCTA5', 'ZPOP'}).drop(['STATE', 'COUNTY'], 1)
zcta_ct_county = df['ZCTA5'].value_counts().reset_index()
zcta_ct_county.columns = ['ZCTA5', 'CT_COUNTY']
pre_merge_1 = pd.merge(zcta_pop, zcta_ct_county, on='ZCTA5')[['ZCTA5', 'ZPOP', 'CT_COUNTY']]
最佳答案
我认为您需要添加reset_index
,因为value_counts
的输出是Series
,并且需要带有2列的DataFrame
:
zcta_ct_county = df['ZCTA5'].value_counts().reset_index()
关于python - Pandas 数据框合并,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/44161194/