在Python中计算Pearson相关性

本文介绍了在Python中计算Pearson相关性的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有4列国家，年份，GDP，CO2排放量"

I have 4 columns "Country, year, GDP, CO2 emissions"

我想测量每个国家的GDP与CO2排放之间的皮尔逊相关性.

I want to measure the pearson correlation between GDP and CO2emissions for each country.

国家/地区"列包含世界上所有国家/地区，年份中的值是"1990、1991，....，2018".

The country column has all the countries in the world and the year has the values "1990, 1991, ...., 2018".

推荐答案

您应该使用与corr()分组的groupby作为聚合函数:

You should use a groupby grouped with corr() as your aggregation function:

country = ['India','India','India','India','India','China','China','China','China','China']
Year = [2018,2017,2016,2015,2014,2018,2017,2016,2015,2014]
GDP = [100,98,94,64,66,200,189,165,134,130]
CO2 = [94,96,90,76,64,180,172,150,121,117]
df = pd.DataFrame({'country':country,'Year':Year,'GDP':GDP,'CO2':CO2})
print(df.groupby('country')[['GDP','CO2']].corr()

如果我们稍微改善一下输出结果，我们可以做一些更奇特的事情:

If we work this output a bit we can go to something fancier:

df_corr = (df.groupby('country')['GDP','CO2'].corr()).drop(columns='GDP').drop('CO2',level=1).rename(columns={'CO2':'Correlation'})
df_corr = df_corr.reset_index().drop(columns='level_1').set_index('country',drop=True)
print(df_corr)

输出:

         Correlation
country
China       0.999581
India       0.932202

这篇关于在Python中计算Pearson相关性的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！