R JSON UTF-8解析

本文介绍了R JSON UTF-8解析的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

尝试解析R中俄语字母的JSON文件时出现问题.

I have an issue when trying to parse a JSON file in russian alphabet in R. The file looks like this:

[{"text": "Валера!", "type": "status"}, {"text": "когда выйдет", "type": "status"}, {"text": "КАК ДЕЛА?!)", "type": "status"}]

，并以UTF-8编码保存.我尝试使用库rjson，RJSONIO和jsonlite对其进行解析，但是它不起作用:

and it is saved in UTF-8 encoding. I tried libraries rjson, RJSONIO and jsonlite to parse it, but it doesn't work:

library(jsonlite)
allFiles <- fromJSON(txt="ru_json_example_short.txt")

给我错误

Error in feed_push_parser(buf) :
  lexical error: invalid char in json text.
                                       ď»ż[{"text": "Đ’Đ°Đ»ĐµŃ€Đ°!", "
                     (right here) ------^

当我将文件保存在ANSI encodieng中时，它可以正常工作，但是随后俄语字母转换为问号，因此输出无法使用.请问有人知道如何在R中解析此类JSON文件吗?

When I save the file in ANSI encodieng, it works OK, but then, the Russian alphabet transforms into question marks, so the output is unusable.Does anyone know how to parse such JSON file in R, please?

上面提到的适用于Windows记事本中保存的UTF-8文件.当我将其保存在PSPad中并对其进行解析时，结果如下所示:

Above mentioned applies for UTF-8 file saved in Windows Notepad. When I save it in PSPad and the parse it, the result looks like this:

    text   type
1                                         <U+0412><U+0430><U+043B><U+0435><U+0440><U+0430>! status
2 <U+043A><U+043E><U+0433><U+0434><U+0430> <U+0432><U+044B><U+0439><U+0434><U+0435><U+0442> status
3                              <U+041A><U+0410><U+041A> <U+0414><U+0415><U+041B><U+0410>?!) status

jsonlite

R JSON UTF-8解析

问题描述

推荐答案