我已经创建了一个哑铃图,但是每种类别类型的最小值和最大值都太多了。我只想在每个区域中显示一个天蓝色点(最低价格)和一个绿色点(最高价格)。
这是到目前为止的图表:
My dumbbell chart
这是我的DataFrame:
The DataFrame
这是完整数据集的链接:
https://drive.google.com/open?id=1PpI6PlO8ox2vKfM4aGmEUexCPPWa59S_
到目前为止,这是我的代码:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
db = df[['minPrice','maxPrice', 'neighbourhood_hosts']]
ordered_db = db.sort_values(by='minPrice')
my_range=db['neighbourhood_hosts']
plt.figure(figsize=(8,6))
plt.hlines(y=my_range, xmin=ordered_db['minPrice'], xmax=ordered_db['maxPrice'], color='grey', alpha=0.4)
plt.scatter(ordered_db['minPrice'], my_range, color='skyblue', alpha=1, label='minimum price')
plt.scatter(ordered_db['maxPrice'], my_range, color='green', alpha=0.4 , label='maximum price')
plt.legend()
plt.title("Comparison of the minimum and maximum prices")
plt.xlabel('Value range')
plt.ylabel('Area')
如何格式化我的代码,以便每个区域只有一个最小值和一个最大值?
最佳答案
根据对话,这是脚本:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('dumbbell data.csv')
db = df[['minPrice','maxPrice', 'neighbourhood_hosts']]
#create max and min price based on area name
max_price = db.groupby(['neighbourhood_hosts'])['maxPrice'].max().reset_index()
min_price = db.groupby(['neighbourhood_hosts'])['minPrice'].min().reset_index()
var_price = pd.DataFrame()
var_price['range'] = max_price.maxPrice-min_price.minPrice
var_price['neighbourhood_hosts'] = min_price['neighbourhood_hosts']
var_price = var_price.sort_values(by='range')
#sort max and min price according to variance
max_price = max_price.reindex(var_price.index)
min_price = min_price.reindex(var_price.index)
plt.figure(figsize=(8,6))
plt.hlines(y=min_price['neighbourhood_hosts'], xmin=min_price['minPrice'], xmax=max_price['maxPrice'], color='grey', alpha=0.4)
plt.scatter(min_price['minPrice'], min_price['neighbourhood_hosts'], color='skyblue', alpha=1, label='minimum price')
plt.scatter(max_price['maxPrice'], max_price['neighbourhood_hosts'], color='green', alpha=0.4 , label='maximum price')
plt.legend()
plt.title("Comparison of the minimum and maximum prices")
plt.xlabel('Value range')
plt.ylabel('Area')