问题描述
我有一个数据表,其中的一列代表每个研究对象(行)的实验室值.
I have a table of data with a column representing a lab value for each study subject (rows).
我想生成一系列直方图,显示每个实验室测试(即列)的值分布.理想情况下,每组实验室值都应具有不同的bin宽度(有些是整数,范围是数百,有些是数字,范围是2-3).
I want to generate a series of histograms showing the distribution of values for each lab test (i.e. column). Each set of lab values would ideally have a different bin width (some are integers with a range of hundreds, some are numeric with a range of 2-3).
我该怎么做?
推荐答案
如果结合使用tidyr
和ggplot2
包,则可以使用facet_wrap
来快速创建数据中每个变量的直方图集.框架.
If you combine the tidyr
and ggplot2
packages, you can use facet_wrap
to make a quick set of histograms of each variable in your data.frame.
您需要使用tidyr::gather
将数据重整为长格式,因此您将具有key
和value
列,例如:
You need to reshape your data to long form with tidyr::gather
, so you have key
and value
columns like such:
library(tidyr)
library(ggplot2)
# or `library(tidyverse)`
mtcars %>% gather() %>% head()
#> key value
#> 1 mpg 21.0
#> 2 mpg 21.0
#> 3 mpg 22.8
#> 4 mpg 21.4
#> 5 mpg 18.7
#> 6 mpg 18.1
使用此数据作为我们的数据,我们可以将value
映射为x变量,并使用facet_wrap
以key
列分隔:
Using this as our data, we can map value
as our x variable, and use facet_wrap
to separate by the key
column:
ggplot(gather(mtcars), aes(value)) +
geom_histogram(bins = 10) +
facet_wrap(~key, scales = 'free_x')
scales = 'free_x'
是必需的,除非您的数据具有相似的比例.
The scales = 'free_x'
is necessary unless your data is all of a similar scale.
您可以用任何能得出数字的值代替bins = 10
,这可能使您可以通过一些创造力来分别设置它们.另外,您可以设置binwidth
,这可能更实用,具体取决于数据的外观.无论如何,分箱都需要一些技巧.
You can replace bins = 10
with anything that evaluates to a number, which may allow you to set them somewhat individually with some creativity. Alternatively, you can set binwidth
, which may be more practical, depending on what your data looks like. Regardless, binning will take some finesse.
这篇关于如何为表格的每一列生成直方图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!