本文介绍了在Python中计算Pearson相关性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有4列国家,年份,GDP,CO2排放量"
I have 4 columns "Country, year, GDP, CO2 emissions"
我想测量每个国家的GDP与CO2排放之间的皮尔逊相关性.
I want to measure the pearson correlation between GDP and CO2emissions for each country.
国家/地区"列包含世界上所有国家/地区,年份中的值是"1990、1991,....,2018".
The country column has all the countries in the world and the year has the values "1990, 1991, ...., 2018".
推荐答案
您应该使用与corr()
分组的groupby
作为聚合函数:
You should use a groupby
grouped with corr()
as your aggregation function:
country = ['India','India','India','India','India','China','China','China','China','China']
Year = [2018,2017,2016,2015,2014,2018,2017,2016,2015,2014]
GDP = [100,98,94,64,66,200,189,165,134,130]
CO2 = [94,96,90,76,64,180,172,150,121,117]
df = pd.DataFrame({'country':country,'Year':Year,'GDP':GDP,'CO2':CO2})
print(df.groupby('country')[['GDP','CO2']].corr()
如果我们稍微改善一下输出结果,我们可以做一些更奇特的事情:
If we work this output a bit we can go to something fancier:
df_corr = (df.groupby('country')['GDP','CO2'].corr()).drop(columns='GDP').drop('CO2',level=1).rename(columns={'CO2':'Correlation'})
df_corr = df_corr.reset_index().drop(columns='level_1').set_index('country',drop=True)
print(df_corr)
输出:
Correlation
country
China 0.999581
India 0.932202
这篇关于在Python中计算Pearson相关性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!