问题描述
我目前正在尝试在使用ggplot创建的数据框中绘制2列的图形
I currently am trying to graph 2 columns in a data frame I created using ggplot
我正在绘制日期与数值的关系图.我使用dplyr库创建数据框:
I am graphing date vs. numeric value. I used dplyr library to create the dataframe:
is_china <- confirmed_cases_worldwide %>%
filter(country == "China", type=='confirmed') %>%
mutate(cumu_cases = cumsum(cases))
我相信原因是因为y值是cumsum函数的结果列,但不确定
I believe the reason is due to the y value being a result column of cumsum function, but am unsure
表看起来像这样,最后一列是目标y值:
The table looks something like this, the last column being the targeted y value:
2020-01-22 NA China 31.8257 117.2264 confirmed 1 1
2 2020-01-23 NA China 31.8257 117.2264 confirmed 8 9
3 2020-01-24 NA China 31.8257 117.2264 confirmed 6 15
4 2020-01-25 NA China 31.8257 117.2264 confirmed 24 39
5 2020-01-26 NA China 31.8257 117.2264 confirmed 21 60
6 2020-01-27 NA China 31.8257 117.2264 confirmed 10 70
7 2020-01-28 NA China 31.8257 117.2264 confirmed 36 106
8 2020-01-29 NA China 31.8257 117.2264 confirmed 46 152
当我使用列案例(在表上倒数第二)对此进行绘图时,很好,但是当我尝试使用累积案例进行图形化时,该图形非常混乱:
When I graph this with the column cases(second to last on the table), it is fine, but when I try graphing with the cumulative cases, the graph is very volitle:
我不确定为什么.
推荐答案
您正在尝试按国家/地区分组,但是只有一个国家/地区.
You're attempting to group by country, but there is just one country.
library(dplyr)
is_china <- confirmed_cases_worldwide %>%
filter(country == "China", type=='confirmed') %>%
mutate(date = as.Date(date))
unique(is_china$country)
# [1] "China"
但是,具有33个区别的 lat
和 long
变量表示我们拥有面板数据.因此,不考虑面板结构,您会通过 cumsum
获得奇怪的值;此外,该变量已经存在,我们不需要再次计算.总体而言,这解释了您所得到的奇怪之处.
However, the lat
and long
variables with 33 distinctions indicate that we have panel data. So without considering the panel structure, you get strange values with cumsum
; besides, the variable is already there and we don't need to calculate it again. Altogether this explains the strange lines you're getting.
由于 province
变量为空,我们可以使用 lat
和 long
生成新的 gps
变量用于分组.
Since the province
variable is empty, we could use lat
and long
to generate a new gps
variable for grouping.
unique(is_china$lat)
# [1] 31.8257 40.1824 30.0572 26.0789 ... [33] 29.1832
unique(is_china$long)
# [1] 117.2264 116.4142 107.8740 117.9874 ... [33] 120.0934
is_china$gps <- apply(is_china[4:5], 1, function(x) Reduce(paste, x))
现在,我们可以使用 gps
作为 factor
绘制数据.
Now we can plot the data using gps
as a factor
.
library(ggplot2)
ggplot(is_china, aes(x=date, y=cumu_cases, color=factor(gps))) +
geom_line()
要仅选择特定坐标,可以对数据进行子集化,例如:
To select only specific coordinates you may subset your data, e.g.:
ggplot(is_china[is_china$gps %in% c("30.9756 112.2707", "22.3 114.2"), ],
aes(x=date, y=cumu_cases, color=factor(gps))) +
geom_line()
数据:
confirmed_cases_worldwide <-
read.csv("https://raw.githubusercontent.com/king-sules/Covid/master/china_vs_world.csv")
这篇关于ggplot无法正确显示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!