问题描述
我使用matplotlib绘制散点图:
并根据 建议.并且图表中的气泡数量大部分时间都小于 150.
我发现所谓的基于力的标签放置
并使用 k=0.008
产生
I use matplotlib to plot a scatter chart:
And label the bubble using a transparent box according to the tip at matplotlib: how to annotate point on a scatter automatically placed arrow?
Here is the code:
if show_annote:
for i in range(len(x)):
annote_text = annotes[i][0][0] # STK_ID
ax.annotate(annote_text, xy=(x[i], y[i]), xytext=(-10,3),
textcoords='offset points', ha='center', va='bottom',
bbox=dict(boxstyle='round,pad=0.2', fc='yellow', alpha=0.2),
fontproperties=ANNOTE_FONT)
and the resulting plot:
But there is still room for improvement to reduce overlap (for instance the label box offset is fixed as (-10,3)). Are there algorithms that can:
- dynamically change the offset of label box according to the crowdedness of its neighbourhood
- dynamically place the label box remotely and add an arrow line beween bubble and label box
- somewhat change the label orientation
- label_box overlapping bubble is better than label_box overlapping label_box?
I just want to make the chart easy for human eyes to comprehand, so some overlap is OK, not as rigid a constraint as http://en.wikipedia.org/wiki/Automatic_label_placement suggests. And the bubble quantity within the chart is less than 150 most of the time.
I find the so called Force-based label placement
http://bl.ocks.org/MoritzStefaner/1377729 is quite interesting. I don't know if there is any python code/package available to implement the algorithm.
I am not an academic guy and not looking for an optimum solution, and my python codes need to label many many charts, so the the speed/memory is in the scope of consideration.
I am looking for a quick and effective solution. Any help (code,algorithm,tips,thoughts) on this subject? Thanks.
The following builds on tcaswell's answer.
Networkx layout methods such as nx.spring_layout
rescale the positions so that they all fit in a unit square (by default). Even the position of the fixed data_nodes
are rescaled. So, to apply the pos
to the original scatter_data
, an unshifting and unscaling must be performed.
Note also that nx.spring_layout
has a k
parameter which controls the optimal distance between nodes. As k
increases, so does the distance of the annotations from the data points.
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
np.random.seed(2016)
N = 20
scatter_data = np.random.rand(N, 3)*10
def repel_labels(ax, x, y, labels, k=0.01):
G = nx.DiGraph()
data_nodes = []
init_pos = {}
for xi, yi, label in zip(x, y, labels):
data_str = 'data_{0}'.format(label)
G.add_node(data_str)
G.add_node(label)
G.add_edge(label, data_str)
data_nodes.append(data_str)
init_pos[data_str] = (xi, yi)
init_pos[label] = (xi, yi)
pos = nx.spring_layout(G, pos=init_pos, fixed=data_nodes, k=k)
# undo spring_layout's rescaling
pos_after = np.vstack([pos[d] for d in data_nodes])
pos_before = np.vstack([init_pos[d] for d in data_nodes])
scale, shift_x = np.polyfit(pos_after[:,0], pos_before[:,0], 1)
scale, shift_y = np.polyfit(pos_after[:,1], pos_before[:,1], 1)
shift = np.array([shift_x, shift_y])
for key, val in pos.items():
pos[key] = (val*scale) + shift
for label, data_str in G.edges():
ax.annotate(label,
xy=pos[data_str], xycoords='data',
xytext=pos[label], textcoords='data',
arrowprops=dict(arrowstyle="->",
shrinkA=0, shrinkB=0,
connectionstyle="arc3",
color='red'), )
# expand limits
all_pos = np.vstack(pos.values())
x_span, y_span = np.ptp(all_pos, axis=0)
mins = np.min(all_pos-x_span*0.15, 0)
maxs = np.max(all_pos+y_span*0.15, 0)
ax.set_xlim([mins[0], maxs[0]])
ax.set_ylim([mins[1], maxs[1]])
fig, ax = plt.subplots()
ax.scatter(scatter_data[:, 0], scatter_data[:, 1],
c=scatter_data[:, 2], s=scatter_data[:, 2] * 150)
labels = ['ano_{}'.format(i) for i in range(N)]
repel_labels(ax, scatter_data[:, 0], scatter_data[:, 1], labels, k=0.008)
plt.show()
with k=0.011
yields
and with k=0.008
yields
这篇关于如何改进 matplotlib 散点图的标签放置(代码、算法、提示)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!