问题描述
尝试解析R中俄语字母的JSON文件时出现问题.
I have an issue when trying to parse a JSON file in russian alphabet in R. The file looks like this:
[{"text": "Валера!", "type": "status"}, {"text": "когда выйдет", "type": "status"}, {"text": "КАК ДЕЛА?!)", "type": "status"}]
,并以UTF-8编码保存.我尝试使用库rjson,RJSONIO和jsonlite对其进行解析,但是它不起作用:
and it is saved in UTF-8 encoding. I tried libraries rjson, RJSONIO and jsonlite to parse it, but it doesn't work:
library(jsonlite)
allFiles <- fromJSON(txt="ru_json_example_short.txt")
给我错误
Error in feed_push_parser(buf) :
lexical error: invalid char in json text.
[{"text": "Валера!", "
(right here) ------^
当我将文件保存在ANSI encodieng中时,它可以正常工作,但是随后俄语字母转换为问号,因此输出无法使用.请问有人知道如何在R中解析此类JSON文件吗?
When I save the file in ANSI encodieng, it works OK, but then, the Russian alphabet transforms into question marks, so the output is unusable.Does anyone know how to parse such JSON file in R, please?
上面提到的适用于Windows记事本中保存的UTF-8文件.当我将其保存在PSPad中并对其进行解析时,结果如下所示:
Above mentioned applies for UTF-8 file saved in Windows Notepad. When I save it in PSPad and the parse it, the result looks like this:
text type
1 <U+0412><U+0430><U+043B><U+0435><U+0440><U+0430>! status
2 <U+043A><U+043E><U+0433><U+0434><U+0430> <U+0432><U+044B><U+0439><U+0434><U+0435><U+0442> status
3 <U+041A><U+0410><U+041A> <U+0414><U+0415><U+041B><U+0410>?!) status
推荐答案
尝试以下操作:
dat <- fromJSON(sprintf("[%s]",
paste(readLines("./ru_json_example_short.txt"),
collapse=",")))
dat
[[1]]
text type
1 Валера! status
2 когда выйдет status
3 КАК ДЕЛА?!) status
这篇关于R JSON UTF-8解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!