问题描述
我正在尝试使用此网页上的nofollow noreferrer>此文件.我在生成的多边形的定位中遇到了伪像:
有些国家与邻国不结盟-例如美国/加拿大,美国/墨西哥,俄罗斯/亚洲邻国.由于这种影响会影响到具有更复杂多边形的国家,这似乎可能与累积求和有关,但是我不清楚问题在我的工作流中在哪里,这是
- 将原始SVG解析为XML,并提取所有SVG路径字符串
- 使用
nodejs
的 svg-path-parser模块 - 将结果数据帧(组合绝对坐标和相对坐标)处理为所有绝对坐标
我在这里使用R(在美国/加拿大)并通过对nodejs的外部调用来再现完整的工作流程:
require(dplyr)
require(purrr)
require(stringr)
require(tidyr)
require(ggplot2)
require(rvest)
require(xml2)
require(jsonlite)
# Get and parse the SVG
doc = read_xml('https://visionscarto.net/public/fonds-de-cartes-en/visionscarto-bertin1953.svg')
countries = doc %>% html_nodes('.country')
names(countries) = html_attr(countries, 'id')
cdi = str_which(names(countries), 'CIV') # unicode in Cote d'Ivoire breaks the code
countries = countries[-cdi]
# Extract SVG paths and parse with node's svg-path-parser module.
# If you don't have node you can use this instead (note this step might be the problem):
# d = read_csv('https://gist.githubusercontent.com/geotheory/b7353a7a8a480209b31418c806cb1c9e/raw/6d3ba2a62f6e8667eef15e29a5893d9d795e8bb1/bertin_svg.csv')
d = imap_dfr(countries, ~{
message(.y)
svg_path = xml_find_all(.x, paste0("//*[@id='", .y, "']/d1:path")) %>% html_attr('d')
node_call = paste0("node -e \"var parseSVG = require('svg-path-parser'); var d='", svg_path,
"'; console.log(JSON.stringify(parseSVG(d)));\"")
system(node_call, intern = T) %>% fromJSON %>% mutate(country = .y)
}) %>% as_data_frame()
# some initial processing
d1 = d %>% filter(country %in% c('USA United States','CAN Canada')) %>%
mutate(x = replace_na(x, 0), y = replace_na(y, 0), # NAs need replacing
relative = replace_na(relative, FALSE),
grp = (command == 'closepath') %>% cumsum) # polygon grouping variable
# new object to loop through
d2 = d1 %>% mutate(x_adj = x, y_adj = y) %>% filter(command != 'closepath')
# loop through and change relative coords to absolute
for(i in 2:nrow(d2)){
if(d2$relative[i]){ # cumulative sum where coords are relative
d2$x_adj[i] = d2$x_adj[i-1] + d2$x_adj[i]
d2$y_adj[i] = d2$y_adj[i-1] + d2$y_adj[i]
} else{ # code M/L require no alteration
if(d2$code[i] == 'V') d2$x_adj[i] = d2$x_adj[i-1] # absolute vertical transform inherits previous x
if(d2$code[i] == 'H') d2$y_adj[i] = d2$y_adj[i-1] # absolute holrizontal transform etc
}
}
# plot result
d2 %>% ggplot(aes(x_adj, -y_adj, group = paste(country, grp))) +
geom_polygon(fill='white', col='black', size=.3) +
coord_equal() + guides(fill=F)
任何帮助表示赞赏.在 w3 中指定了SVG路径语法,并更简洁地总结了此处.
编辑(响应@ccprog)
以下是H
命令序列从svg-path-parser
返回的数据:
code command x y relative country
<chr> <chr> <dbl> <dbl> <lgl> <chr>
1 l lineto -0.91 -0.6 TRUE CAN Canada
2 l lineto -0.92 -0.59 TRUE CAN Canada
3 H horizontal lineto 189. NA NA CAN Canada
4 l lineto -1.03 0.02 TRUE CAN Canada
5 l lineto -0.74 -0.07 TRUE CAN Canada
以下是循环后相同序列的d2
外观:
code command x y relative country grp x_adj y_adj
<chr> <chr> <dbl> <dbl> <lgl> <chr> <int> <dbl> <dbl>
1 l lineto -0.91 -0.6 TRUE CAN Canada 20 199. 143.
2 l lineto -0.92 -0.59 TRUE CAN Canada 20 198. 143.
3 H horizontal lineto 189. 0 FALSE CAN Canada 20 189. 143.
4 l lineto -1.03 0.02 TRUE CAN Canada 20 188. 143.
5 l lineto -0.74 -0.07 TRUE CAN Canada 20 187. 143.
这看起来不好吗?当我查看H
和先前行的y_adj的原始值时,它们是相同的142.56
.
有效的解决方案,这要感谢@ccprog
d = imap_dfr(countries, ~{
message(.y)
svg_path = xml_find_all(.x, paste0("//*[@id='", .y, "']/d1:path")) %>% html_attr('d')
node_call = paste0("node -e \"var parseSVG = require('svg-path-parser'); var d='", svg_path,
"'; console.log(JSON.stringify(parseSVG.makeAbsolute(parseSVG(d))));\"")
system(node_call, intern = T) %>% fromJSON %>% mutate(country = .y)
}) %>% as_data_frame() %>%
mutate(grp = (command == 'moveto') %>% cumsum)
d %>% ggplot(aes(x, -y, group = grp, fill=country)) +
geom_polygon(col='black', size=.3, alpha=.5) +
coord_equal() + guides(fill=F)
查看您对加拿大的渲染,尤其是哈德逊声音的南部海岸.有一个非常明显的错误.筛选路径数据,我在原始数据中发现了以下顺序:
h-2.28l-.91-.6-.92-.59H188.65l-1.03.02-.74-.07-.75-.07-.74-.07-.74-.06.88 1.09
我已将您的渲染结果加载到Inkscape中,并在顶部绘制了路径的相关部分,箭头标记了由绝对H命令绘制的线段. (z命令已删除,这就是缺少段的原因.)很明显,段中某处的段太长.
事实证明,绝对H
更正前一个(水平)错误.看上一点:它是198., 143.
,但是应该是191.76,146.07
.垂直误差保持在-3.6左右.
我制作了一个 codepen ,它可以将原始路径数据与您的渲染完全重叠尽可能.路径数据已分为(单多边形)组,并已由Inkscape转换为绝对数据.不幸的是,该程序无法将它们转换为多边形图元,因此其中仍然有V和H命令.
它显示了这一点:
- 路径的起点匹配.
- 由绝对H命令描述的点具有匹配的水平值,但没有垂直值. (这是整个路径中唯一的绝对命令.)
- 每个路径组(多边形)本身似乎都是一致的,但是除了
group0
之外,它们都已从预期的位置移走了.
我已经对该偏差进行了一些视觉测量(误差〜0.05),并且最终给出了线索:
group01: 0.44,-0.73
group02: 0.84,-1.12
group03: 2.04,-1.44
group04: 2.94,-1.73
group05: 2.60,-1.86
group06: 3.14,-2.38
group07: 3.68,-2.54
group08: 4.03,-3.35
group09: 4.87,-2.97
group10: 6.08,-3.50 (begin)
group10: 0.00,-3.53 (end)
group11: 1.08,-1.95
group12: 2.05,-2.45
group13: 2.89,-2.84
group14: 3.64,-3.67
group15: 4.48,-3.44
group16: 4.04,-3.99
group17: 4.32,-3.08
group18: 4.75,-2.75
group19: 5.72,-2.95
group20: 5.40,-3.11
group21: 6.02,-2.95
group22: 6.63,-4.14
group23: 6.85,-5.00
group24: 7.14,-4.86
group25: 7.72,-4.39
group26: 8.65,-4.75
group27: 9.49,-4.39
group28: 10.20,-4.44
group29: 11.13,-4.58
您要删除closepath
命令,然后相对于最后一组的最后一个显式点计算下一个组的第一个点.但是closepath
实际上将当前点移动:回到最后一个moveto
命令的位置.这些可能但不一定相同.
我无法在R中给您准备好的脚本,但是您需要做的是:在新组的开始处,缓存第一个点的位置.在下一组的开始处,计算相对于该缓存点的新的第一点.
I'm trying to crack an R workflow for parsing SVG paths, using this file on this webpage. I'm encountering artifacts in the positioning of resulting polygons:
Some of the countries do not align with their neighbours - e.g. US/Canada, US/Mexico, Russia/Asian neighbours. Since the effect hits the countries with more complex polygons it seems likely to be a problem to do with cumulative summing, but I'm unclear where the problem lies in my workflow, which is:
- parse raw SVG as XML, and extract all the SVG path strings
- parse individual path strings with
nodejs
's svg-path-parser module - process the resulting data.frames (which combine absolute and relative coordinates) into all absolute coordinates
I reproduce the full workflow here using R (for US/Canada), with an external call to nodejs:
require(dplyr)
require(purrr)
require(stringr)
require(tidyr)
require(ggplot2)
require(rvest)
require(xml2)
require(jsonlite)
# Get and parse the SVG
doc = read_xml('https://visionscarto.net/public/fonds-de-cartes-en/visionscarto-bertin1953.svg')
countries = doc %>% html_nodes('.country')
names(countries) = html_attr(countries, 'id')
cdi = str_which(names(countries), 'CIV') # unicode in Cote d'Ivoire breaks the code
countries = countries[-cdi]
# Extract SVG paths and parse with node's svg-path-parser module.
# If you don't have node you can use this instead (note this step might be the problem):
# d = read_csv('https://gist.githubusercontent.com/geotheory/b7353a7a8a480209b31418c806cb1c9e/raw/6d3ba2a62f6e8667eef15e29a5893d9d795e8bb1/bertin_svg.csv')
d = imap_dfr(countries, ~{
message(.y)
svg_path = xml_find_all(.x, paste0("//*[@id='", .y, "']/d1:path")) %>% html_attr('d')
node_call = paste0("node -e \"var parseSVG = require('svg-path-parser'); var d='", svg_path,
"'; console.log(JSON.stringify(parseSVG(d)));\"")
system(node_call, intern = T) %>% fromJSON %>% mutate(country = .y)
}) %>% as_data_frame()
# some initial processing
d1 = d %>% filter(country %in% c('USA United States','CAN Canada')) %>%
mutate(x = replace_na(x, 0), y = replace_na(y, 0), # NAs need replacing
relative = replace_na(relative, FALSE),
grp = (command == 'closepath') %>% cumsum) # polygon grouping variable
# new object to loop through
d2 = d1 %>% mutate(x_adj = x, y_adj = y) %>% filter(command != 'closepath')
# loop through and change relative coords to absolute
for(i in 2:nrow(d2)){
if(d2$relative[i]){ # cumulative sum where coords are relative
d2$x_adj[i] = d2$x_adj[i-1] + d2$x_adj[i]
d2$y_adj[i] = d2$y_adj[i-1] + d2$y_adj[i]
} else{ # code M/L require no alteration
if(d2$code[i] == 'V') d2$x_adj[i] = d2$x_adj[i-1] # absolute vertical transform inherits previous x
if(d2$code[i] == 'H') d2$y_adj[i] = d2$y_adj[i-1] # absolute holrizontal transform etc
}
}
# plot result
d2 %>% ggplot(aes(x_adj, -y_adj, group = paste(country, grp))) +
geom_polygon(fill='white', col='black', size=.3) +
coord_equal() + guides(fill=F)
Any assistance appreciated. The SVG path syntax is specified at w3 and summarised more concisely here.
Edit (response to @ccprog)
Here is data returned from svg-path-parser
for the H
command sequence:
code command x y relative country
<chr> <chr> <dbl> <dbl> <lgl> <chr>
1 l lineto -0.91 -0.6 TRUE CAN Canada
2 l lineto -0.92 -0.59 TRUE CAN Canada
3 H horizontal lineto 189. NA NA CAN Canada
4 l lineto -1.03 0.02 TRUE CAN Canada
5 l lineto -0.74 -0.07 TRUE CAN Canada
Here is what d2
looks like for same sequence after the loop:
code command x y relative country grp x_adj y_adj
<chr> <chr> <dbl> <dbl> <lgl> <chr> <int> <dbl> <dbl>
1 l lineto -0.91 -0.6 TRUE CAN Canada 20 199. 143.
2 l lineto -0.92 -0.59 TRUE CAN Canada 20 198. 143.
3 H horizontal lineto 189. 0 FALSE CAN Canada 20 189. 143.
4 l lineto -1.03 0.02 TRUE CAN Canada 20 188. 143.
5 l lineto -0.74 -0.07 TRUE CAN Canada 20 187. 143.
Does this not look ok?. When I look at raw values for y_adj for H
and previous rows they are identical 142.56
.
Edit 2: working solution, thanks to @ccprog
d = imap_dfr(countries, ~{
message(.y)
svg_path = xml_find_all(.x, paste0("//*[@id='", .y, "']/d1:path")) %>% html_attr('d')
node_call = paste0("node -e \"var parseSVG = require('svg-path-parser'); var d='", svg_path,
"'; console.log(JSON.stringify(parseSVG.makeAbsolute(parseSVG(d))));\"")
system(node_call, intern = T) %>% fromJSON %>% mutate(country = .y)
}) %>% as_data_frame() %>%
mutate(grp = (command == 'moveto') %>% cumsum)
d %>% ggplot(aes(x, -y, group = grp, fill=country)) +
geom_polygon(col='black', size=.3, alpha=.5) +
coord_equal() + guides(fill=F)
Look at your rendering of Canada, especially the southern coast of the Hudson sound. There is a very obvious error. Sieveing through the path data, I found the following sequence in the original data:
h-2.28l-.91-.6-.92-.59H188.65l-1.03.02-.74-.07-.75-.07-.74-.07-.74-.06.88 1.09
I've loaded your rendering result into Inkscape, and drawn the relevant part of the path on top, the arrow marking the segment drawn by the absolute H command. (The z command has been removed, that is the reason for the missing segment.) It is obvious that somewhere in there a segment is too long.
It turns out the absolute H
corrects the previous (horizontal) error. Look at the preceding point: it is 198., 143.
, but it should be 191.76,146.07
. The vertical error remains at about -3.6.
I've made a codepen that overlays the original path data with your rendering as precisely as possible. The path data have been divided into the (single-polygon) groups and converted to absolute by Inkscape. Unfortunately, the program cannot convert them to polygon primitives, so there are still V and H commands in there.
It shows this:
- The starting point of the path matches.
- The point described by the absolute H command has a matching horizontal value, but not vertical. (It is the only absolute command in the whole path.)
- Every path group (polygon) seems to be consistent in itself, but apart from
group0
they all are removed from their intended place.
I've made some visual measurements of that deviation (error ~0.05), and they ultimately give the clue:
group01: 0.44,-0.73
group02: 0.84,-1.12
group03: 2.04,-1.44
group04: 2.94,-1.73
group05: 2.60,-1.86
group06: 3.14,-2.38
group07: 3.68,-2.54
group08: 4.03,-3.35
group09: 4.87,-2.97
group10: 6.08,-3.50 (begin)
group10: 0.00,-3.53 (end)
group11: 1.08,-1.95
group12: 2.05,-2.45
group13: 2.89,-2.84
group14: 3.64,-3.67
group15: 4.48,-3.44
group16: 4.04,-3.99
group17: 4.32,-3.08
group18: 4.75,-2.75
group19: 5.72,-2.95
group20: 5.40,-3.11
group21: 6.02,-2.95
group22: 6.63,-4.14
group23: 6.85,-5.00
group24: 7.14,-4.86
group25: 7.72,-4.39
group26: 8.65,-4.75
group27: 9.49,-4.39
group28: 10.20,-4.44
group29: 11.13,-4.58
You are removing the closepath
commands, and then compute the first point of the next group relative to the last explicit point of the last group. But closepath
actually moves the ccurrent point: back to the position of the last moveto
command. These may, but need not be identical.
I can't give you a ready script in R, but what you need to do is this: at the beginning of a new group, cache the position of the first point. At the beginning of the next group, compute the new first point relative to that cached point.
这篇关于在R中解析SVG路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!