ggplot无法正确显示

ggplot无法正确显示

本文介绍了ggplot无法正确显示的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在尝试在使用ggplot创建的数据框中绘制2列的图形

I currently am trying to graph 2 columns in a data frame I created using ggplot

我正在绘制日期与数值的关系图.我使用dplyr库创建数据框:

I am graphing date vs. numeric value. I used dplyr library to create the dataframe:

is_china <- confirmed_cases_worldwide %>%
  filter(country == "China", type=='confirmed') %>%
  mutate(cumu_cases = cumsum(cases))

我相信原因是因为y值是cumsum函数的结果列,但不确定

I believe the reason is due to the y value being a result column of cumsum function, but am unsure

表看起来像这样,最后一列是目标y值:

The table looks something like this, the last column being the targeted y value:


    2020-01-22  NA  China   31.8257 117.2264    confirmed   1   1
2   2020-01-23  NA  China   31.8257 117.2264    confirmed   8   9
3   2020-01-24  NA  China   31.8257 117.2264    confirmed   6   15
4   2020-01-25  NA  China   31.8257 117.2264    confirmed   24  39
5   2020-01-26  NA  China   31.8257 117.2264    confirmed   21  60
6   2020-01-27  NA  China   31.8257 117.2264    confirmed   10  70
7   2020-01-28  NA  China   31.8257 117.2264    confirmed   36  106
8   2020-01-29  NA  China   31.8257 117.2264    confirmed   46  152

当我使用列案例(在表上倒数第二)对此进行绘图时,很好,但是当我尝试使用累积案例进行图形化时,该图形非常混乱:

When I graph this with the column cases(second to last on the table), it is fine, but when I try graphing with the cumulative cases, the graph is very volitle:

我不确定为什么.

推荐答案

您正在尝试按国家/地区分组,但是只有一个国家/地区.

You're attempting to group by country, but there is just one country.

library(dplyr)
is_china <- confirmed_cases_worldwide %>%
  filter(country == "China", type=='confirmed') %>%
  mutate(date = as.Date(date))

unique(is_china$country)
# [1] "China"

但是,具有33个区别的 lat long 变量表示我们拥有面板数据.因此,不考虑面板结构,您会通过 cumsum 获得奇怪的值;此外,该变量已经存在,我们不需要再次计算.总体而言,这解释了您所得到的奇怪之处.

However, the lat and long variables with 33 distinctions indicate that we have panel data. So without considering the panel structure, you get strange values with cumsum; besides, the variable is already there and we don't need to calculate it again. Altogether this explains the strange lines you're getting.

由于 province 变量为空,我们可以使用 lat long 生成新的 gps 变量用于分组.

Since the province variable is empty, we could use lat and long to generate a new gps variable for grouping.

unique(is_china$lat)
# [1] 31.8257 40.1824 30.0572 26.0789 ...  [33] 29.1832
unique(is_china$long)
# [1] 117.2264 116.4142 107.8740 117.9874 ... [33] 120.0934

is_china$gps <- apply(is_china[4:5], 1, function(x) Reduce(paste, x))

现在,我们可以使用 gps 作为 factor 绘制数据.

Now we can plot the data using gps as a factor.

library(ggplot2)
ggplot(is_china, aes(x=date, y=cumu_cases, color=factor(gps))) +
  geom_line()

要仅选择特定坐标,可以对数据进行子集化,例如:

To select only specific coordinates you may subset your data, e.g.:

ggplot(is_china[is_china$gps %in% c("30.9756 112.2707", "22.3 114.2"), ],
       aes(x=date, y=cumu_cases, color=factor(gps))) +
  geom_line()

数据:

confirmed_cases_worldwide <-
  read.csv("https://raw.githubusercontent.com/king-sules/Covid/master/china_vs_world.csv")

这篇关于ggplot无法正确显示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-23 21:16