问题描述
我试图绘制随时间变化的观察频率.我有一个数据集,其中数百条法律被编码为 0-3.我想知道随着时间的推移,结果 2-3 是否会更频繁地发生.以下是模拟数据示例:
I'd trying to graph the frequency of observations over time. I have a dataset where hundreds of laws are coded 0-3. I'd like to know if outcomes 2-3 are occurring more often as time progresses. Here is a sample of mock data:
Data <- data.frame(
year = sample(1998:2004, 200, replace = TRUE),
score = sample(1:4, 200, replace = TRUE)
)
如果我画图
plot(Data$year, Data$score)
我得到一个方格矩阵,其中每个点都被填满,但我不知道哪些数字出现得更频繁.有没有办法通过给定行/年的观察次数来着色或改变每个点的大小?
I get a checkered matrix where every single spot is filled in, but I can't tell which numbers occur more often. Is there a way to color or to change the size of each point by the number of observations of a given row/year?
一些注释可能有助于回答问题:
A few notes may help in answering the question:
1).我不知道如何对某些数字比其他数字更频繁出现的数据进行采样.我的示例程序从所有数字中均等地采样.如果有更好的方法,我应该创建可重复的数据以反映以后几年的更多观察结果,我想知道如何.
1). I don't know how to sample data where certain numbers occur more frequently than others. My sample procedure samples equally from all numbers. If there is a better way I should have created my reproducible data to reflect more observations in later years, I would like to know how.
2).这似乎最好在散点图中进行可视化,但我可能是错的.我对其他可视化持开放态度.
2). this seemed like it would be best to visualize in a scatter plot, but I could be wrong. I'm open to other visualizations.
谢谢!
推荐答案
以下是我的处理方式(希望这是您所需要的)
Here's how I would approach this (hope this is what you need)
创建数据(注意:在问题中使用 sample
时,请始终使用 set.seed
以便可重现)
Create the data (Note: when using sample
in questions, always use set.seed
too so it will be reproducible)
set.seed(123)
Data <- data.frame(
year = sample(1998:2004, 200, replace = TRUE),
score = sample(1:4, 200, replace = TRUE)
)
使用table
Data2 <- as.data.frame.matrix(table(Data))
Data2$year <- row.names(Data2)
使用 melt
将其转换回长格式
Use melt
to convert it back to long format
library(reshape2)
Data2 <- melt(Data2, "year")
绘制数据,同时显示每组不同的颜色和相对大小的预频率
Plot the data while showing different color per group and relative size pre frequency
library(ggplot2)
ggplot(Data2, aes(year, variable, size = value, color = variable)) +
geom_point()
或者,您可以同时使用 fill
和 size
来描述频率,例如
Alternatively, you could use both fill
and size
to describe frequency, something like
ggplot(Data2, aes(year, variable, size = value, fill = value)) +
geom_point(shape = 21)
这篇关于R - 随时间变化的观察图频率,值范围较小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!