作为R的新手,我正在制作一个显示两个变量的词云:frequencyrating。我正在使用一个通用表,按州显示假设的大学数量(字体=数量从大到小)和假设的平均大学排名


1 =绿色(良好),
3 =黄色(平均值),
5 =红色(不良)


我能够创建一个描绘字体=大学数量的云,但是不能将等级与第三列联系起来。这是我的通用表:

State   Colleges    Rating
Alabama        220      1
Alaska         100      3
Arizona         50      5
Arkansas       275      1
California     155      3
Colorado        68      5
Connecticut    235      1
Delaware       189      3
Florida         32      5
Georgia        219      1
Hawaii         117      3
Idaho           63      5
Illinois       264      1
Indiana        167      3
Iowa            76      5
Kansas         287      1
Kentucky       178      3
Louisiana       67      5
Maine          246      1
Maryland       169      3
Massachusetts   46      5
Michigan       225      1
Minnesota      132      3
Mississippi     23      5
Missouri       219      1
Montana        194      3
Nebraska        97      5


下面是我非常简单的脚本:

library(wordcloud)
library(rcolorbrewer)

data <- read.csv("wordcloud.csv", header = T)
pal <- brewer.pal(9, "RdYlGn")
wordcloud(data$State, data$Colleges, scale = c(4,1), colors = pal, rot.per=.5)


上面的脚本允许文本大小反映大学的数量,但是我无法将1 =绿色(良好)到3 =黄色(平均)到5 =红色(不良)的色阶链接起来。任何建议,不胜感激。

最佳答案

在这种情况下,也有可能绘制比较云。

为此,我们首先将数据从长格式转换为宽格式:

library(reshape2)
df1 <- dcast(df1,State + Colleges ~ Rating, value.var = "Colleges")


然后,我们执行一些标准操作以准备合适的矩阵:

rownames(df1) <- df1[,1] #use name of States as row names
df1 <- df1[,-c(1,2)] #remove "States" and "Colleges" column
df1[is.na(df1)] <- 0  #set NA values to zero
df1 <- as.matrix(df1) #convert into matrix
colnames(df1) <- c("good", "average", "bad")


最后,我们可以绘制比较云并根据需要为组分配颜色:

library(wordcloud)
comparison.cloud(df1,max.words=Inf,random.order=FALSE, scale = c(4,.5),
                     title.size = 1,  colors=c("green","orange","red"))


r - R中具有两个单独值的词云-LMLPHP

数据

df1 <- structure(list(State = structure(1:27, .Label = c("Alabama",
"Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut",
"Delaware", "Florida", "Georgia", "Hawaii", "Idaho", "Illinois",
"Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine",
"Maryland", "Massachusetts", "Michigan", "Minnesota", "Mississippi",
"Missouri", "Montana", "Nebraska"), class = "factor"), Colleges = c(220L,
100L, 50L, 275L, 155L, 68L, 235L, 189L, 32L, 219L, 117L, 63L,
264L, 167L, 76L, 287L, 178L, 67L, 246L, 169L, 46L, 225L, 132L,
23L, 219L, 194L, 97L), Rating = c(1L, 3L, 5L, 1L, 3L, 5L, 1L,
3L, 5L, 1L, 3L, 5L, 1L, 3L, 5L, 1L, 3L, 5L, 1L, 3L, 5L, 1L, 3L,
5L, 1L, 3L, 5L)), .Names = c("State", "Colleges", "Rating"),
class = "data.frame", row.names = c(NA, -27L))

10-04 13:04