主要在jupyter notebook里面熟悉这个库的使用,它的安装方法与实现,可自行搜索。

Pandas是一个优秀的数据分析工具,官网:http://pandas.pydata.org/

相关的库使用pip安装,用豆瓣的代理下载速度比官方的快,安装命令:

pip install -i https://pypi.douban.com/simple/ matplotlib

pip install -i https://pypi.douban.com/simple/ pandas

pip install -i https://pypi.douban.com/simple/ requests

pip install -i https://pypi.douban.com/simple/ scipy

方法后面是执行的结果,从结果上就能看出方法的作用的,所以没做太多描述。

import os
import pandas as pd
import requests
PATH = 'F:/Git/ML_Python/02iris/'
r = requests.get('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data')
with open(PATH + 'iris.data','w') as f:
f.write(r.text)
os.chdir(PATH)
df = pd.read_csv(PATH + 'iris.data',names=['花萼长度','花萼宽度','花瓣长度','花瓣宽度','类别'])
df.head()

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
} .dataframe thead th {
text-align: right;
}
05.13.51.40.2Iris-setosa
14.93.01.40.2Iris-setosa
24.73.21.30.2Iris-setosa
34.63.11.50.2Iris-setosa
45.03.61.40.2Iris-setosa
df.iloc[:3, :2]

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
} .dataframe thead th {
text-align: right;
}
05.13.5
14.93.0
24.73.2
df.loc[:1,[x for x in df.columns if ('宽度' in x)|('长度' in x)]]

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
} .dataframe thead th {
text-align: right;
}
05.13.51.40.2
14.93.01.40.2
df['类别'].unique()
array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object)
df.count()
花萼长度    150
花萼宽度 150
花瓣长度 150
花瓣宽度 150
类别 150
dtype: int64
df[df['类别']=='Iris-virginica'].count()
花萼长度    50
花萼宽度 50
花瓣长度 50
花瓣宽度 50
类别 50
dtype: int64
df[(df['类别']=='Iris-virginica')& (df['花瓣长度']>6)].reset_index(drop=True)

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
} .dataframe thead th {
text-align: right;
}
07.63.06.62.1Iris-virginica
17.32.96.31.8Iris-virginica
27.23.66.12.5Iris-virginica
37.73.86.72.2Iris-virginica
47.72.66.92.3Iris-virginica
57.72.86.72.0Iris-virginica
67.42.86.11.9Iris-virginica
77.93.86.42.0Iris-virginica
87.73.06.12.3Iris-virginica
df.describe()

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
} .dataframe thead th {
text-align: right;
}
count150.000000150.000000150.000000150.000000
mean5.8433333.0540003.7586671.198667
std0.8280660.4335941.7644200.763161
min4.3000002.0000001.0000000.100000
25%5.1000002.8000001.6000000.300000
50%5.8000003.0000004.3500001.300000
75%6.4000003.3000005.1000001.800000
max7.9000004.4000006.9000002.500000
df.corr()
df.corr(method='kendall')

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
} .dataframe thead th {
text-align: right;
}
花萼长度1.000000-0.0721120.7176240.654960
花萼宽度-0.0721121.000000-0.182391-0.146988
花瓣长度0.717624-0.1823911.0000000.803014
花瓣宽度0.654960-0.1469880.8030141.000000
df.corr('spearman')

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
} .dataframe thead th {
text-align: right;
}
花萼长度1.000000-0.1594570.8813860.834421
花萼宽度-0.1594571.000000-0.303421-0.277511
花瓣长度0.881386-0.3034211.0000000.936003
花瓣宽度0.834421-0.2775110.9360031.000000
05-11 13:20