在R中解析SVG路径

在R中解析SVG路径

本文介绍了在R中解析SVG路径的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用此网页上的nofollow noreferrer>此文件.我在生成的多边形的定位中遇到了伪像:

有些国家与邻国不结盟-例如美国/加拿大,美国/墨西哥,俄罗斯/亚洲邻国.由于这种影响会影响到具有更复杂多边形的国家,这似乎可能与累积求和有关,但是我不清楚问题在我的工作流中在哪里,这是

  1. 将原始SVG解析为XML,并提取所有SVG路径字符串
  2. 使用nodejs svg-path-parser模块
  3. 将结果数据帧(组合绝对坐标和相对坐标)处理为所有绝对坐标

我在这里使用R(在美国/加拿大)并通过对nodejs的外部调用来再现完整的工作流程:

require(dplyr)
require(purrr)
require(stringr)
require(tidyr)
require(ggplot2)
require(rvest)
require(xml2)
require(jsonlite)

# Get and parse the SVG
doc = read_xml('https://visionscarto.net/public/fonds-de-cartes-en/visionscarto-bertin1953.svg')

countries = doc %>% html_nodes('.country')
names(countries) = html_attr(countries, 'id')
cdi = str_which(names(countries), 'CIV') # unicode in Cote d'Ivoire breaks the code
countries = countries[-cdi]

# Extract SVG paths and parse with node's svg-path-parser module.
# If you don't have node you can use this instead (note this step might be the problem):
# d = read_csv('https://gist.githubusercontent.com/geotheory/b7353a7a8a480209b31418c806cb1c9e/raw/6d3ba2a62f6e8667eef15e29a5893d9d795e8bb1/bertin_svg.csv')

d = imap_dfr(countries, ~{
  message(.y)
  svg_path = xml_find_all(.x, paste0("//*[@id='", .y, "']/d1:path")) %>% html_attr('d')
  node_call = paste0("node -e \"var parseSVG = require('svg-path-parser'); var d='", svg_path,
                     "'; console.log(JSON.stringify(parseSVG(d)));\"")
  system(node_call, intern = T) %>% fromJSON %>% mutate(country = .y)
}) %>% as_data_frame()


# some initial processing
d1 = d %>% filter(country %in% c('USA United States','CAN Canada')) %>%
  mutate(x = replace_na(x, 0), y = replace_na(y, 0), # NAs need replacing
         relative = replace_na(relative, FALSE),
         grp = (command == 'closepath') %>% cumsum)  # polygon grouping variable

# new object to loop through
d2 = d1 %>% mutate(x_adj = x, y_adj = y) %>% filter(command != 'closepath')

# loop through and change relative coords to absolute
for(i in 2:nrow(d2)){
  if(d2$relative[i]){ # cumulative sum where coords are relative
    d2$x_adj[i] = d2$x_adj[i-1] + d2$x_adj[i]
    d2$y_adj[i] = d2$y_adj[i-1] + d2$y_adj[i]
  } else{ # code M/L require no alteration
    if(d2$code[i] == 'V') d2$x_adj[i] = d2$x_adj[i-1] # absolute vertical transform inherits previous x
    if(d2$code[i] == 'H') d2$y_adj[i] = d2$y_adj[i-1] # absolute holrizontal transform etc
  }
}

# plot result
d2 %>% ggplot(aes(x_adj, -y_adj, group = paste(country, grp))) +
  geom_polygon(fill='white', col='black', size=.3) +
  coord_equal() + guides(fill=F)

任何帮助表示赞赏.在 w3 中指定了SVG路径语法,并更简洁地总结了此处.


编辑(响应@ccprog)

以下是H命令序列从svg-path-parser返回的数据:

  code  command                 x      y relative country
  <chr> <chr>               <dbl>  <dbl> <lgl>    <chr>
1 l     lineto              -0.91  -0.6  TRUE     CAN Canada
2 l     lineto              -0.92  -0.59 TRUE     CAN Canada
3 H     horizontal lineto  189.    NA    NA       CAN Canada
4 l     lineto              -1.03   0.02 TRUE     CAN Canada
5 l     lineto              -0.74  -0.07 TRUE     CAN Canada

以下是循环后相同序列的d2外观:

  code  command                 x     y relative country      grp x_adj y_adj
  <chr> <chr>               <dbl> <dbl> <lgl>    <chr>      <int> <dbl> <dbl>
1 l     lineto              -0.91 -0.6  TRUE     CAN Canada    20  199.  143.
2 l     lineto              -0.92 -0.59 TRUE     CAN Canada    20  198.  143.
3 H     horizontal lineto  189.    0    FALSE    CAN Canada    20  189.  143.
4 l     lineto              -1.03  0.02 TRUE     CAN Canada    20  188.  143.
5 l     lineto              -0.74 -0.07 TRUE     CAN Canada    20  187.  143.

这看起来不好吗?当我查看H和先前行的y_adj的原始值时,它们是相同的142.56.


有效的解决方案,这要感谢@ccprog

d = imap_dfr(countries, ~{
  message(.y)
  svg_path = xml_find_all(.x, paste0("//*[@id='", .y, "']/d1:path")) %>% html_attr('d')
  node_call = paste0("node -e \"var parseSVG = require('svg-path-parser'); var d='", svg_path,
                     "'; console.log(JSON.stringify(parseSVG.makeAbsolute(parseSVG(d))));\"")
  system(node_call, intern = T) %>% fromJSON %>% mutate(country = .y)
}) %>% as_data_frame() %>%
  mutate(grp = (command == 'moveto') %>% cumsum)

d %>% ggplot(aes(x, -y, group = grp, fill=country)) +
  geom_polygon(col='black', size=.3, alpha=.5) +
  coord_equal() + guides(fill=F)
解决方案

查看您对加拿大的渲染,尤其是哈德逊声音的南部海岸.有一个非常明显的错误.筛选路径数据,我在原始数据中发现了以下顺序:

h-2.28l-.91-.6-.92-.59H188.65l-1.03.02-.74-.07-.75-.07-.74-.07-.74-.06.88 1.09

我已将您的渲染结果加载到Inkscape中,并在顶部绘制了路径的相关部分,箭头标记了由绝对H命令绘制的线段. (z命令已删除,这就是缺少段的原因.)很明显,段中某处的段太长.

事实证明,绝对H 更正前一个(水平)错误.看上一点:它是198., 143.,但是应该是191.76,146.07.垂直误差保持在-3.6左右.

我制作了一个 codepen ,它可以将原始路径数据与您的渲染完全重叠尽可能.路径数据已分为(单多边形)组,并已由Inkscape转换为绝对数据.不幸的是,该程序无法将它们转换为多边形图元,因此其中仍然有V和H命令.

它显示了这一点:

  • 路径的起点匹配.
  • 由绝对H命令描述的点具有匹配的水平值,但没有垂直值. (这是整个路径中唯一的绝对命令.)
  • 每个路径组(多边形)本身似乎都是一致的,但是除了group0之外,它们都已从预期的位置移走了.

我已经对该偏差进行了一些视觉测量(误差〜0.05),并且最终给出了线索​​:

group01: 0.44,-0.73
group02: 0.84,-1.12
group03: 2.04,-1.44
group04: 2.94,-1.73
group05: 2.60,-1.86
group06: 3.14,-2.38
group07: 3.68,-2.54
group08: 4.03,-3.35
group09: 4.87,-2.97
group10: 6.08,-3.50 (begin)
group10: 0.00,-3.53 (end)
group11: 1.08,-1.95
group12: 2.05,-2.45
group13: 2.89,-2.84
group14: 3.64,-3.67
group15: 4.48,-3.44
group16: 4.04,-3.99
group17: 4.32,-3.08
group18: 4.75,-2.75
group19: 5.72,-2.95
group20: 5.40,-3.11
group21: 6.02,-2.95
group22: 6.63,-4.14
group23: 6.85,-5.00
group24: 7.14,-4.86
group25: 7.72,-4.39
group26: 8.65,-4.75
group27: 9.49,-4.39
group28: 10.20,-4.44
group29: 11.13,-4.58

您要删除closepath命令,然后相对于最后一组的最后一个显式点计算下一个组的第一个点.但是closepath实际上将当前点移动:回到最后一个moveto命令的位置.这些可能但不一定相同.

我无法在R中给您准备好的脚本,但是您需要做的是:在新组的开始处,缓存第一个点的位置.在下一组的开始处,计算相对于该缓存点的新的第一点.

I'm trying to crack an R workflow for parsing SVG paths, using this file on this webpage. I'm encountering artifacts in the positioning of resulting polygons:

Some of the countries do not align with their neighbours - e.g. US/Canada, US/Mexico, Russia/Asian neighbours. Since the effect hits the countries with more complex polygons it seems likely to be a problem to do with cumulative summing, but I'm unclear where the problem lies in my workflow, which is:

  1. parse raw SVG as XML, and extract all the SVG path strings
  2. parse individual path strings with nodejs's svg-path-parser module
  3. process the resulting data.frames (which combine absolute and relative coordinates) into all absolute coordinates

I reproduce the full workflow here using R (for US/Canada), with an external call to nodejs:

require(dplyr)
require(purrr)
require(stringr)
require(tidyr)
require(ggplot2)
require(rvest)
require(xml2)
require(jsonlite)

# Get and parse the SVG
doc = read_xml('https://visionscarto.net/public/fonds-de-cartes-en/visionscarto-bertin1953.svg')

countries = doc %>% html_nodes('.country')
names(countries) = html_attr(countries, 'id')
cdi = str_which(names(countries), 'CIV') # unicode in Cote d'Ivoire breaks the code
countries = countries[-cdi]

# Extract SVG paths and parse with node's svg-path-parser module.
# If you don't have node you can use this instead (note this step might be the problem):
# d = read_csv('https://gist.githubusercontent.com/geotheory/b7353a7a8a480209b31418c806cb1c9e/raw/6d3ba2a62f6e8667eef15e29a5893d9d795e8bb1/bertin_svg.csv')

d = imap_dfr(countries, ~{
  message(.y)
  svg_path = xml_find_all(.x, paste0("//*[@id='", .y, "']/d1:path")) %>% html_attr('d')
  node_call = paste0("node -e \"var parseSVG = require('svg-path-parser'); var d='", svg_path,
                     "'; console.log(JSON.stringify(parseSVG(d)));\"")
  system(node_call, intern = T) %>% fromJSON %>% mutate(country = .y)
}) %>% as_data_frame()


# some initial processing
d1 = d %>% filter(country %in% c('USA United States','CAN Canada')) %>%
  mutate(x = replace_na(x, 0), y = replace_na(y, 0), # NAs need replacing
         relative = replace_na(relative, FALSE),
         grp = (command == 'closepath') %>% cumsum)  # polygon grouping variable

# new object to loop through
d2 = d1 %>% mutate(x_adj = x, y_adj = y) %>% filter(command != 'closepath')

# loop through and change relative coords to absolute
for(i in 2:nrow(d2)){
  if(d2$relative[i]){ # cumulative sum where coords are relative
    d2$x_adj[i] = d2$x_adj[i-1] + d2$x_adj[i]
    d2$y_adj[i] = d2$y_adj[i-1] + d2$y_adj[i]
  } else{ # code M/L require no alteration
    if(d2$code[i] == 'V') d2$x_adj[i] = d2$x_adj[i-1] # absolute vertical transform inherits previous x
    if(d2$code[i] == 'H') d2$y_adj[i] = d2$y_adj[i-1] # absolute holrizontal transform etc
  }
}

# plot result
d2 %>% ggplot(aes(x_adj, -y_adj, group = paste(country, grp))) +
  geom_polygon(fill='white', col='black', size=.3) +
  coord_equal() + guides(fill=F)

Any assistance appreciated. The SVG path syntax is specified at w3 and summarised more concisely here.


Edit (response to @ccprog)

Here is data returned from svg-path-parser for the H command sequence:

  code  command                 x      y relative country
  <chr> <chr>               <dbl>  <dbl> <lgl>    <chr>
1 l     lineto              -0.91  -0.6  TRUE     CAN Canada
2 l     lineto              -0.92  -0.59 TRUE     CAN Canada
3 H     horizontal lineto  189.    NA    NA       CAN Canada
4 l     lineto              -1.03   0.02 TRUE     CAN Canada
5 l     lineto              -0.74  -0.07 TRUE     CAN Canada

Here is what d2 looks like for same sequence after the loop:

  code  command                 x     y relative country      grp x_adj y_adj
  <chr> <chr>               <dbl> <dbl> <lgl>    <chr>      <int> <dbl> <dbl>
1 l     lineto              -0.91 -0.6  TRUE     CAN Canada    20  199.  143.
2 l     lineto              -0.92 -0.59 TRUE     CAN Canada    20  198.  143.
3 H     horizontal lineto  189.    0    FALSE    CAN Canada    20  189.  143.
4 l     lineto              -1.03  0.02 TRUE     CAN Canada    20  188.  143.
5 l     lineto              -0.74 -0.07 TRUE     CAN Canada    20  187.  143.

Does this not look ok?. When I look at raw values for y_adj for H and previous rows they are identical 142.56.


Edit 2: working solution, thanks to @ccprog

d = imap_dfr(countries, ~{
  message(.y)
  svg_path = xml_find_all(.x, paste0("//*[@id='", .y, "']/d1:path")) %>% html_attr('d')
  node_call = paste0("node -e \"var parseSVG = require('svg-path-parser'); var d='", svg_path,
                     "'; console.log(JSON.stringify(parseSVG.makeAbsolute(parseSVG(d))));\"")
  system(node_call, intern = T) %>% fromJSON %>% mutate(country = .y)
}) %>% as_data_frame() %>%
  mutate(grp = (command == 'moveto') %>% cumsum)

d %>% ggplot(aes(x, -y, group = grp, fill=country)) +
  geom_polygon(col='black', size=.3, alpha=.5) +
  coord_equal() + guides(fill=F)
解决方案

Look at your rendering of Canada, especially the southern coast of the Hudson sound. There is a very obvious error. Sieveing through the path data, I found the following sequence in the original data:

h-2.28l-.91-.6-.92-.59H188.65l-1.03.02-.74-.07-.75-.07-.74-.07-.74-.06.88 1.09

I've loaded your rendering result into Inkscape, and drawn the relevant part of the path on top, the arrow marking the segment drawn by the absolute H command. (The z command has been removed, that is the reason for the missing segment.) It is obvious that somewhere in there a segment is too long.

It turns out the absolute H corrects the previous (horizontal) error. Look at the preceding point: it is 198., 143., but it should be 191.76,146.07. The vertical error remains at about -3.6.

I've made a codepen that overlays the original path data with your rendering as precisely as possible. The path data have been divided into the (single-polygon) groups and converted to absolute by Inkscape. Unfortunately, the program cannot convert them to polygon primitives, so there are still V and H commands in there.

It shows this:

  • The starting point of the path matches.
  • The point described by the absolute H command has a matching horizontal value, but not vertical. (It is the only absolute command in the whole path.)
  • Every path group (polygon) seems to be consistent in itself, but apart from group0 they all are removed from their intended place.

I've made some visual measurements of that deviation (error ~0.05), and they ultimately give the clue:

group01: 0.44,-0.73
group02: 0.84,-1.12
group03: 2.04,-1.44
group04: 2.94,-1.73
group05: 2.60,-1.86
group06: 3.14,-2.38
group07: 3.68,-2.54
group08: 4.03,-3.35
group09: 4.87,-2.97
group10: 6.08,-3.50 (begin)
group10: 0.00,-3.53 (end)
group11: 1.08,-1.95
group12: 2.05,-2.45
group13: 2.89,-2.84
group14: 3.64,-3.67
group15: 4.48,-3.44
group16: 4.04,-3.99
group17: 4.32,-3.08
group18: 4.75,-2.75
group19: 5.72,-2.95
group20: 5.40,-3.11
group21: 6.02,-2.95
group22: 6.63,-4.14
group23: 6.85,-5.00
group24: 7.14,-4.86
group25: 7.72,-4.39
group26: 8.65,-4.75
group27: 9.49,-4.39
group28: 10.20,-4.44
group29: 11.13,-4.58

You are removing the closepath commands, and then compute the first point of the next group relative to the last explicit point of the last group. But closepath actually moves the ccurrent point: back to the position of the last moveto command. These may, but need not be identical.

I can't give you a ready script in R, but what you need to do is this: at the beginning of a new group, cache the position of the first point. At the beginning of the next group, compute the new first point relative to that cached point.

这篇关于在R中解析SVG路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 15:07