问题描述
根据 http://jsonlint.com/,以下字符串中的json是正确的json,但tidyjson对象是:
The json in the following string is correct json according to http://jsonlint.com/ but tidyjson objects:
library(dplyr)
library(tidyjson)
json <- '
[{"country":"us","city":"Portland","topics":[{"urlkey":"videogame","name":"Video Games","id":4471},{"urlkey":"board-games","name":"Board Games","id":19585},{"urlkey":"computer-programming","name":"Computer programming","id":48471},{"urlkey":"opensource","name":"Open Source","id":563}],"joined":1416349237000,"link":"http://www.meetup.com/members/156440062","bio":"Analytics engineer. Primarily work in the Hadoop space.","lon":-122.65,"other_services":{},"name":"Aaron Wirick","visited":1443078098000,"self":{"common":{}},"id":156440062,"state":"OR","lat":45.56,"status":"active"}]
'
json %>% as.tbl_json %>% gather_keys
我得到:
Error in gather_keys(.) : 1 records are values not objects
推荐答案
如注释之一所述,gather_keys
正在寻找具有数组的对象.您可能在这里使用的是gather_array
.
As mentioned in one of the comments, gather_keys
is looking for objects, where you have an array. What you should probably be using here is gather_array
.
此外,另一个答案使用更强力的方法来解析tidyjson包创建的JSON属性. tidyjson提供了一些方法,可以根据需要在更简洁的管道中进行处理:
Further, the other answer uses a more brute-force approach to parsing the JSON attribute that the tidyjson package creates. tidyjson provides methods for dealing with this in a bit cleaner pipeline if desired:
library(dplyr)
library(tidyjson)
json <- '
[{"country":"us","city":"Portland"
,"topics":[
{"urlkey":"videogame","name":"Video Games","id":4471}
,{"urlkey":"board-games","name":"Board Games","id":19585}
,{"urlkey":"computer-programming","name":"Computer programming","id":48471}
,{"urlkey":"opensource","name":"Open Source","id":563}
]
,"joined":1416349237000
,"link":"http://www.meetup.com/members/156440062"
,"bio":"Analytics engineer. Primarily work in the Hadoop space."
,"lon":-122.65,"other_services":{}
,"name":"Aaron Wirick","visited":1443078098000
,"self":{"common":{}}
,"id":156440062,"state":"OR","lat":45.56,"status":"active"
}]
'
mydf <- json %>% as.tbl_json %>% gather_array %>%
spread_values(
country=jstring('country')
, city=jstring('city')
, joined=jnumber('joined')
, bio=jstring('bio')
) %>%
enter_object('topics') %>%
gather_array %>%
spread_values(urlkey=jstring('urlkey'))
如果数组中有多个这样的对象,则此管道确实发光.希望对您有帮助,即使事后很久也是如此!
This pipeline really shines if there are multiple such objects in the array. Hope that is helpful, even if very long after the fact!
这篇关于tidyjson中的“记录是值而不是对象"是什么意思的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!