This question already has answers here:
ggplot: Order bars in faceted bar chart per facet

(3个答案)


3年前关闭。




我正在进行一些文本分析,并尝试使用反文档频率(数字值)显示每本书的主要单词。我一直在跟踪TidyText挖掘,但是使用的是Harry Potter。

一些书之间的最高单词(使用IDF)是相同的(例如Lupine或Griphook),并且在进行绘图时,顺序使用该单词的最大IDF。例如,抓钩是巫师之石和死亡圣器中的关键词。它在《死亡圣器》中的值为.0007,但只有.0002,但被定为“巫师之石”的最高价值。


hp.plot <- hp.words %>%
  arrange(desc(tf_idf)) %>%
  mutate(word = factor(word, levels = rev(unique(word))))

##For correct ordering of books
hp.plot$book <- factor(hp.plot$book, levels = c('Sorcerer\'s Stone', 'Chamber of Secrets',
                                                 'Prisoner of Azkhaban', 'Goblet of Fire',
                                                 'Order of the Phoenix', 'Half-Blood Prince',
                                                 'Deathly Hallows'))

hp.plot %>%
  group_by(book) %>%
  top_n(10) %>%
  ungroup %>%
  ggplot(aes(x=word, y=tf_idf, fill = book, group = book)) +
  geom_col(show.legend = FALSE) +
  labs(x = NULL, y = "tf-idf") +
  facet_wrap(~book, scales = "free") +
  coord_flip()

还有here's数据帧的图像供您参考。

我已经尝试过预先排序,但这似乎行不通。有任何想法吗?

编辑:CSV is here

最佳答案

reorder()函数将按指定的变量对因子进行重新排序(请参阅?reorder)。

在绘图之前,在最后一个块中的mutate(word = reorder(word, tf_idf))之后插入ungroup(),应按tf_idf重新排序。我没有您的数据样本,但是使用janeaustenr包,这样做是相同的:

library(tidytext)
library(janeaustenr)
library(dplyr)

book_words <- austen_books() %>%
  unnest_tokens(word, text) %>%
  count(book, word, sort = TRUE) %>%
  ungroup()

total_words <- book_words %>%
  group_by(book) %>%
  summarize(total = sum(n))

book_words <- left_join(book_words, total_words)

book_words <- book_words %>%
  bind_tf_idf(word, book, n)


library(ggplot2)
book_words %>%
  group_by(book) %>%
  top_n(10) %>%
  ungroup() %>%
  mutate(word = reorder(word, tf_idf)) %>%
  ggplot(aes(x = word, y = tf_idf, fill = book, group = book)) +
  geom_col(show.legend = FALSE) +
  labs(x = NULL, y = "tf-idf") +
  facet_wrap(~book, scales = "free") +
  coord_flip()

10-08 15:57