问题描述
我正在研究R项目.我使用的数据集可从以下链接获得 https://www.kaggle.com/ranjitha1/hotel-reviews-city -chennai/data
I am working on a R project. The data set I used is available at the following linkhttps://www.kaggle.com/ranjitha1/hotel-reviews-city-chennai/data
我使用的代码是
df1 = read.csv("chennai.csv", header = TRUE)
library(tidytext)
tidy_books <- df1 %>% unnest_tokens(word,Review_Text)
Review_Text是文本列.但是,出现以下错误.
Here Review_Text is the text column. Yet, I get the following error.
Error in check_input(x) :
Input must be a character vector of any length or a list of character
vectors, each of which has a length of 1.
推荐答案
stringsAsFactors再次罢工!
stringsAsFactors strikes again!
您的Review_Text列是一个因素,而不是错误提示功能所要求的字符向量.
Your Review_Text column is a factor, not a character vector as the error message says the function requires.
我强烈建议您使用默认值read.csv
上的readr::read_csv
,因为它更快,并且默认值不会导致此问题.否则,只需将stringsAsFactors
设置为FALSE
,就可以了:
I would strongly recommend using readr::read_csv
over the default read.csv
as it's faster and its defaults don't cause this problem. Otherwise, just set stringsAsFactors
to FALSE
and you're good:
> tidytext::unnest_tokens(readr::read_csv("chennai_reviews.csv"), word, Review_Text)
Parsed with column specification:
cols(
Hotel_name = col_character(),
Review_Title = col_character(),
Review_Text = col_character(),
Sentiment = col_character(),
Rating_Percentage = col_character(),
X6 = col_integer(),
X7 = col_integer(),
X8 = col_character(),
X9 = col_character()
)
Warning: 1 parsing failure.
row # A tibble: 1 x 5 col row col expected actual expected <int> <chr> <chr> <chr> actual 1 2262 X7 an integer "Expedia Booking availability was , only for Non- AC ; ON REQUEST OVER PHONE got it.\n\nRecommended" file # ... with 1 more variables: file <chr>
# A tibble: 179,883 x 9
Hotel_name Review_Title Sentiment Rating_Percentage X6 X7 X8 X9 word
<chr> <chr> <chr> <chr> <int> <int> <chr> <chr> <chr>
1 Accord Metropolitan Excellent comfortableness during stay 3 100 NA NA <NA> <NA> its
2 Accord Metropolitan Excellent comfortableness during stay 3 100 NA NA <NA> <NA> really
3 Accord Metropolitan Excellent comfortableness during stay 3 100 NA NA <NA> <NA> nice
4 Accord Metropolitan Excellent comfortableness during stay 3 100 NA NA <NA> <NA> place
5 Accord Metropolitan Excellent comfortableness during stay 3 100 NA NA <NA> <NA> to
6 Accord Metropolitan Excellent comfortableness during stay 3 100 NA NA <NA> <NA> stay
7 Accord Metropolitan Excellent comfortableness during stay 3 100 NA NA <NA> <NA> especially
8 Accord Metropolitan Excellent comfortableness during stay 3 100 NA NA <NA> <NA> for
9 Accord Metropolitan Excellent comfortableness during stay 3 100 NA NA <NA> <NA> business
10 Accord Metropolitan Excellent comfortableness during stay 3 100 NA NA <NA> <NA> and
# ... with 179,873 more rows
Warning message:
Missing column names filled in: 'X6' [6], 'X7' [7], 'X8' [8], 'X9' [9]
或
> tidytext::unnest_tokens(read.csv("chennai_reviews.csv", stringsAsFactors = FALSE), word, Review_Text)
Hotel_name
1 Accord Metropolitan
Review_Title
...snip...
这篇关于如何解决以下错误:输入必须是任意长度的字符向量或字符向量列表,每个字符向量的长度为1.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!