问题描述
我当前正在进行数据可视化项目。
我想绘制代表从一个地铁站到所有其他地铁站的多条线(大约200k)。这就是说,所有的地铁站都应该用一条直线连接起来。
线条的颜色并不重要(很可能是红色、蓝色等),但不透明度才是最重要的。两个随机站点之间的旅行次数越多,这条特定线路的不透明度就越高;反之亦然。
我觉得我已经接近所需的输出,但想不出正确的方法。
我使用的DataFrame(df = pd.read_csv(...)
)由一系列列组成,分别是:id_start_station
、id_end_station
、lat_start_station
、long_start_station
、lat_end_station
、long_end_station
、number_of_journeys
。
我必须通过编码来提取坐标
lons = []
lons = np.empty(3 * len(df))
lons[::3] = df['long_start_station']
lons[1::3] = df['long_end_station']
lons[2::3] = None
lats = []
lats = np.empty(3 * len(df))
lats[::3] = df['lat_start_station']
lats[1::3] = df['lat_end_station']
lats[2::3] = None
然后我开始了一个插图:
fig = go.Figure()
,然后添加跟踪:
fig.add_trace(go.Scattermapbox(
name='Journeys',
lat=lats,
lon=lons,
mode='lines',
line=dict(color='red', width=1),
opacity= ¿?, # PROBLEM IS HERE [1]
))
[1]所以我尝试了几种不同的方法来传递不透明项:
- 我为每个轨迹的不透明度创建了一个新的元组,方法是:
opacity = []
opacity = np.empty(3 * len(df))
opacity [::3] = df['number_of_journeys'] / max(df['number_of_journeys'])
opacity [1::3] = df['number_of_journeys'] / max(df['number_of_journeys'])
opacity [2::3] = None
并将其传递给[1],但出现以下错误:
ValueError:
Invalid value of type 'numpy.ndarray' received for the 'opacity' property of scattermapbox
The 'opacity' property is a number and may be specified as:
- An int or float in the interval [0, 1]
- 然后我考虑使用
rgba
的属性alpha
将";opacity";术语传递给";color";术语,例如:rgba(255,0,0,0.5)
。
因此我首先创建了所有alpha
参数的映射(&Q;&Q;):
df['alpha'] = df['number_of_journeys'] / max(df['number_of_journeys'])
,然后创建一个函数来检索特定颜色内的所有alpha
参数:
colors_with_opacity = []
def colors_with_opacity_func(df, empty_list):
for alpha in df['alpha']:
empty_list.extend(["rgba(255,0,0,"+str(alpha)+")"])
empty_list.extend(["rgba(255,0,0,"+str(alpha)+")"])
empty_list.append(None)
colors_with_opacity_func(df, colors_with_opacity)
并将其传递给Scatterapbox的颜色属性,但得到以下错误:
ValueError:
Invalid value of type 'builtins.list' received for the 'color' property of scattermapbox.line
The 'color' property is a color and may be specified as:
- A hex string (e.g. '#ff0000')
- An rgb/rgba string (e.g. 'rgb(255,0,0)')
- An hsl/hsla string (e.g. 'hsl(0,100%,50%)')
- An hsv/hsva string (e.g. 'hsv(0,100%,100%)')
- A named CSS color:
aliceblue, antiquewhite, aqua, [...] , whitesmoke,
yellow, yellowgreen
由于它是大量行,循环/迭代跟踪将导致性能问题。
任何帮助都将不胜感激。我想不出合适的方法来完成那件事。
提前谢谢您。
编辑1:添加新问题
我在下面添加此问题是因为我相信它可以帮助其他正在寻找此特定主题的人。
按照Rob的有用答案,我成功地添加了前面指定的多个不透明度。
但是,我的一些同事建议进行更改,以改善地图的可视化效果。
现在,与其拥有多个不透明度(根据数据帧的值,每个轨迹一个),还希望有多个宽度(根据数据帧的相同值)。
根据Rob的回答,我需要这样的东西:
BINS_FOR_OPACITY=10
opacity_a = np.geomspace(0.001,1, BINS_FOR_OPACITY)
BINS_FOR_WIDTH=10
width_a = np.geomspace(1,3, BINS_FOR_WIDTH)
fig = go.Figure()
# Note the double "for" statement that follows
for opacity, d in df.groupby(pd.cut(df["number_of_journeys"], bins=BINS_FOR_OPACITY, labels=opacity_a)):
for width, d in df.groupby(pd.cut(df["number_of_journeys"], bins=BINS_FOR_WIDTH, labels=width_a)):
fig.add_traces(
go.Scattermapbox(
name=f"{d['number_of_journeys'].mean():.2E}",
lat=np.ravel(d.loc[:,[c for c in df.columns if "lat" in c or c=="none"]].values),
lon=np.ravel(d.loc[:,[c for c in df.columns if "long" in c or c=="none"]].values),
line_width=width
line_color="blue",
opacity=opacity,
mode="lines+markers",
)
)
但是,上面的方法显然不起作用,因为它产生的跟踪比它应该做的要多得多(我真的不能解释原因,但我猜可能是因为这两个for
语句强制执行了双循环)。
我突然想到,可能在pd.cut
部分隐藏了某种解决方案,因为我需要类似的东西,但找不到正确的方法。
我还设法通过以下方式创建了 pandas 系列:
widths = pd.cut(df.["size"], bins=BINS_FOR_WIDTH, labels=width_a)
并迭代该系列,但得到的结果与以前相同(轨迹过多)。
为了强调和澄清自己,我不需要只有多个不透明或多个宽度,但我需要同时拥有和两个,这就是给我带来一些麻烦的原因。
再次感谢您的帮助。
推荐答案
opacity
是每个轨迹,对于标记,可以使用rgba(a,b,c,d)
使用颜色,但不能用于线。(直线散点图相同)- 为了演示,我使用了伦敦地铁站(经过过滤以减少节点数量)。再加上格式化数据为CSV的额外努力。作为源的JSON与解决方案无关
- 编码到binNumber_of_Journey以包含到具有用于计算和不透明度的几何级数的轨迹中
- 此示例数据集正在生成83k采样线
import requests
import geopandas as gpd
import plotly.graph_objects as go
import itertools
import numpy as np
import pandas as pd
from pathlib import Path
# get geometry of london underground stations
gdf = gpd.GeoDataFrame.from_features(
requests.get(
"https://raw.githubusercontent.com/oobrien/vis/master/tube/data/tfl_stations.json"
).json()
)
# limit to zone 1 and stations that have larger number of lines going through them
gdf = gdf.loc[gdf["zone"].isin(["1","2","3","4","5","6"]) & gdf["lines"].apply(len).gt(0)].reset_index(
drop=True
).rename(columns={"id":"tfl_id", "name":"id"})
# wanna join all valid combinations of stations...
combis = np.array(list(itertools.combinations(gdf.index, 2)))
# generate dataframe of all combinations of stations
gdf_c = (
gdf.loc[combis[:, 0], ["geometry", "id"]]
.assign(right=combis[:, 1])
.merge(gdf.loc[:, ["geometry", "id"]], left_on="right", right_index=True, suffixes=("_start_station","_end_station"))
)
gdf_c["lat_start_station"] = gdf_c["geometry_start_station"].apply(lambda g: g.y)
gdf_c["long_start_station"] = gdf_c["geometry_start_station"].apply(lambda g: g.x)
gdf_c["lat_end_station"] = gdf_c["geometry_end_station"].apply(lambda g: g.y)
gdf_c["long_end_station"] = gdf_c["geometry_end_station"].apply(lambda g: g.x)
gdf_c = gdf_c.drop(
columns=[
"geometry_start_station",
"right",
"geometry_end_station",
]
).assign(number_of_journeys=np.random.randint(1,10**5,len(gdf_c)))
gdf_c
f = Path.cwd().joinpath("SO.csv")
gdf_c.to_csv(f, index=False)
# there's an requirement to start with a CSV even though no sample data has been provided, now we're starting with a CSV
df = pd.read_csv(f)
# makes use of ravel simpler...
df["none"] = None
# now it's simple to generate scattermapbox... a trace per required opacity
BINS=10
opacity_a = np.geomspace(0.001,1, BINS)
fig = go.Figure()
for opacity, d in df.groupby(pd.cut(df["number_of_journeys"], bins=BINS, labels=opacity_a)):
fig.add_traces(
go.Scattermapbox(
name=f"{d['number_of_journeys'].mean():.2E}",
lat=np.ravel(d.loc[:,[c for c in df.columns if "lat" in c or c=="none"]].values),
lon=np.ravel(d.loc[:,[c for c in df.columns if "long" in c or c=="none"]].values),
line_color="blue",
opacity=opacity,
mode="lines+markers",
)
)
fig.update_layout(
mapbox={
"style": "carto-positron",
"center": {'lat': 51.520214996769255, 'lon': -0.097792388774743},
"zoom": 9,
},
margin={"l": 0, "r": 0, "t": 0, "b": 0},
)
这篇关于MapBox中的多个不透明度-Ploly for Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!