在R中解析json文件时出错

在R中解析json文件时出错

本文介绍了在R中解析json文件时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以以下格式提取100个实例的业务数据:

Yelp business data with 100 instances, in the following format:

{
    "_id" : ObjectId("5aab338ffc08b46adb7a2320"),
    "business_id" : "Pd52CjgyEU3Rb8co6QfTPw",
    "name" : "Flight Deck Bar & Grill",
    "neighborhood" : "Southeast",
    "address" : "6730 S Las Vegas Blvd",
    "city" : "Las Vegas",
    "state" : "NV",
    "postal_code" : "89119",
    "latitude" : 36.0669136,
    "longitude" : -115.1708484,
    "stars" : 4.0,
    "review_count" : NumberInt(13),
    "is_open" : NumberInt(1),
    "attributes" : {
        "Alcohol" : "full_bar",
        "HasTV" : true,
        "NoiseLevel" : "average",
        "RestaurantsAttire" : "casual",
        "BusinessAcceptsCreditCards" : true,
        "Music" : {
            "dj" : false,
            "background_music" : true,
            "no_music" : false,
            "karaoke" : false,
            "live" : false,
            "video" : false,
            "jukebox" : false
        },
        "Ambience" : {
            "romantic" : false,
            "intimate" : false,
            "classy" : false,
            "hipster" : false,
            "divey" : false,
            "touristy" : false,
            "trendy" : false,
            "upscale" : false,
            "casual" : true
        },
        "RestaurantsGoodForGroups" : true,
        "Caters" : true,
        "WiFi" : "free",
        "RestaurantsReservations" : false,
        "RestaurantsTableService" : true,
        "RestaurantsTakeOut" : true,
        "GoodForKids" : true,
        "HappyHour" : true,
        "GoodForDancing" : false,
        "BikeParking" : true,
        "OutdoorSeating" : false,
        "RestaurantsPriceRange2" : NumberInt(2),
        "RestaurantsDelivery" : false,
        "BestNights" : {
            "monday" : false,
            "tuesday" : false,
            "friday" : false,
            "wednesday" : true,
            "thursday" : false,
            "sunday" : false,
            "saturday" : false
        },
        "GoodForMeal" : {
            "dessert" : false,
            "latenight" : false,
            "lunch" : true,
            "dinner" : false,
            "breakfast" : false,
            "brunch" : false
        },
        "BusinessParking" : {
            "garage" : false,
            "street" : false,
            "validated" : false,
            "lot" : true,
            "valet" : false
        },
        "CoatCheck" : false,
        "Smoking" : "no",
        "WheelchairAccessible" : true
    },
    "categories" : [
        "Nightlife",
        "Bars",
        "Barbeque",
        "Sports Bars",
        "American (New)",
        "Restaurants"
    ],
    "hours" : {
        "Monday" : "8:30-22:30",
        "Tuesday" : "8:30-22:30",
        "Friday" : "8:30-22:30",
        "Wednesday" : "8:30-22:30",
        "Thursday" : "8:30-22:30",
        "Sunday" : "8:30-22:30",
        "Saturday" : "8:30-22:30"
    }
}

我需要在R中导入它.我有以下代码:

I need to import this in R. I have the following code:

library('jsonlite')
data<- stream_in(file("~/Desktop/business100.json"))

当我使用上面的代码时,它给出以下错误:

When i use the above code,It gives the following error:

Error: lexical error: invalid char in json text.
                         {     "_id" : ObjectId("5aab338ffc08b46adb7a2
                     (right here) ------^

我认为json的格式存在一些问题,但是当我在mongodb中看到json文件时,它看起来还不错.可以做什么呢,谢谢!

I think there is some problem with the format of the json, but when i see the json file in mongodb, it looks fine. What can be done for it, thank you!

推荐答案

如果这是mongolite(如注释中所建议),则可能是最好的方法.如果您卡住了并且由于某种原因无法使用它,则可以替换这些非JSON属性,并使用常规的JSON解析器对其进行解析.

If this is mongolite (as suggested in the comments), that is likely the best way to go. If you are stuck and cannot use it for some reason, it is possible to replace these non-JSON properties and parse it with regular JSON parsers.

为概括起见,请创建一个(verbatim)字符串的向量.我假设每个属性的格式均为DiscardableProperty(save_all_here),因此基于您提供的数据的一个很好的起点是:

To generalize, create a vector of the (verbatim) strings. I make the assumption that each property is of the form DiscardableProperty(save_all_here), so a good starting point based on the data you've provided is:

ptns <- c('ObjectId', 'NumberInt')
str(jsontxt)
#  chr "{ \n    \"_id\" : ObjectId(\"5aab338ffc08b46adb7a2320\"), \n    \"business_id\" : \"Pd52CjgyEU3Rb8co6QfTPw\", \n    \"name\" : "| __truncated__
jsontxt2 <- Reduce(function(txt, p) gsub(sprintf("%s\\(([^)]+)\\)", p), "\\1", txt),
                   ptns, init=jsontxt)
str(jsontxt2)
#  chr "{ \n    \"_id\" : \"5aab338ffc08b46adb7a2320\", \n    \"business_id\" : \"Pd52CjgyEU3Rb8co6QfTPw\", \n    \"name\" : \"Flight D"| __truncated__

(请注意缺少ObjectId.)

这很好解析:

str(fromJSON(jsontxt2))
# List of 16
#  $ _id         : chr "5aab338ffc08b46adb7a2320"
#  $ business_id : chr "Pd52CjgyEU3Rb8co6QfTPw"
#  $ name        : chr "Flight Deck Bar & Grill"
#  $ neighborhood: chr "Southeast"
#  $ address     : chr "6730 S Las Vegas Blvd"
# ...

修改:单次替换:

jsontxt2 <- gsub(sprintf("(%s)\\(([^)]+)\\)", paste(ptns, collapse = "|")),
                 "\\2", jsontxt)

这篇关于在R中解析json文件时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 13:12