本文介绍了使用R ggplot绘制宽格式数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我有一个数据框(如下所示),按地区显示了年销售额。最后一栏计算三年内该地区所有销售额的总和。 我是R的新手,想使用 ggplot 创建一个单散点图来分析数据。 x轴为三年,y轴为销售额。 理想情况下,每个区域在2013年,2014年,2015年和2016年都有自己的点线(除了少数NA)。然后,我要为每个区域着色线根据其区域。总和列不应出现在绘图上。有想法吗? df<-structure(list(Region = structure(1:6, .Label = c( A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U), class = factor ), 2016 = c(8758.82,25559.89,30848.02,8696.99,3621.12,5468.76), 2015 = c(26521.67,89544.93,92825.55,28916.4,14004.54,16618.38), 2014 = c(NA,NA,199673.73,37108.09,16909.87,20610.58), 2013​​ = c(27605.35,NA,78794.31,31824.75,17990.21,17307.11),总计销售 = c(35280.49、115104.82、323347.3、74721.48、34535.53、42697.72)), row.names = c(NA,6L),类= data.frame) 解决方案您的数据采用宽格式,因此最好将其转换为长格式,以与 ggplot 一起使用。在这里,我使用 tidyr :: gather()来做到这一点 库(tidyr)库(ggplot2) df_long<-df%&%;%收集(年份,销售额,-区域) df_long #>地区年销售额#> 1 A 2016 8758.82 #> 2 B 2016 25559.89 #> 3 C 2016 30848.02 #> 4 D 2016 8696.99 #> 5 E 2016 3621.12 #> 6 F 2016 5468.76 #> 7 A 2015 26521.67 #> 2015年8月8日89544.93 #> 9 C 2015 92825.55 #> 10 D 2015 28916.40 #> 11 E 2015 14004.54 #> 12 F 2015 16618.38 #> 13 A 2014 NA #> 14 B 2014 NA #> 15 C 2014 199673.73 #> 16 D 2014 37108.09 #> 17 E 2014 16909.87 #> 18 F 2014 20610.58 #> 19 A 2013 27605.35 #> 20 B 2013 NA #> 21 C 2013 78794.31 #> 22 D 2013 31824.75 #> 23 E 2013 17990.21 #> 24 F 2013 17307.11 #> 25总销售量35280.49 #> 26 B总销售额115104.82 #> 27 C总销售额323347.30 #> 28 D总销售额74721.48 #> 29 E总销售额34535.53 #> 30 F总销售额42697.72 图:指定 color = Region 和 group =地区位于 aes 内,因此 ggplot 知道如何选择颜色并绘制线条 ggplot(df_long,aes(x = Year,y =销售,颜色=地区,组=地区))+ geom_point()+ geom_line()+ scale_color_brewer(palette ='Dark2')+ theme_classic(base_size = 12)#>警告:已删除3个包含缺失值的行(geom_point)。 #>警告:已删除2个包含缺失值的行(geom_path)。 也可以使用 facet_grid() ggplot(df_long,aes(x =年,y =销售,组=地区))+ geom_point()+ geom_line()+ facet_grid(Region〜。,scales ='free_y')+ theme_bw(base_size = 12)#>警告:已删除3个包含缺失值的行(geom_point)。 #>警告:已删除2个包含缺失值的行(geom_path)。 由 reprex包(v0.2.1.9000) I have a data frame (see below) that shows sales by region by year. The final column calculates the sum of all the sales in the region over the three year period.I am new to R and would like use ggplot to create a SINGLE scatter plot to analyze the data. The x-axis would be the three years and the y-axis would sales.Ideally, each region would have its own line with points (other than a few NAs) in 2013, 2014, 2015, and 2016. I would then like to color each line based on its region. The sum column should not appear on the plot. Any ideas?df <- structure(list(Region = structure(1:6, .Label = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U"), class = "factor"), "2016" = c(8758.82, 25559.89, 30848.02, 8696.99, 3621.12, 5468.76), "2015" = c(26521.67, 89544.93, 92825.55, 28916.4, 14004.54, 16618.38), "2014" = c(NA, NA, 199673.73, 37108.09, 16909.87, 20610.58), "2013" = c(27605.35, NA, 78794.31, 31824.75, 17990.21, 17307.11), "Total Sales" = c(35280.49, 115104.82, 323347.3, 74721.48, 34535.53, 42697.72)), row.names = c(NA, 6L), class = "data.frame") 解决方案 Your data is in wide format so it's better to convert it to long format to work with ggplot. Here I use tidyr::gather() to do thatlibrary(tidyr)library(ggplot2)df_long <- df %>% gather(Year, Sales, -Region)df_long#> Region Year Sales#> 1 A 2016 8758.82#> 2 B 2016 25559.89#> 3 C 2016 30848.02#> 4 D 2016 8696.99#> 5 E 2016 3621.12#> 6 F 2016 5468.76#> 7 A 2015 26521.67#> 8 B 2015 89544.93#> 9 C 2015 92825.55#> 10 D 2015 28916.40#> 11 E 2015 14004.54#> 12 F 2015 16618.38#> 13 A 2014 NA#> 14 B 2014 NA#> 15 C 2014 199673.73#> 16 D 2014 37108.09#> 17 E 2014 16909.87#> 18 F 2014 20610.58#> 19 A 2013 27605.35#> 20 B 2013 NA#> 21 C 2013 78794.31#> 22 D 2013 31824.75#> 23 E 2013 17990.21#> 24 F 2013 17307.11#> 25 A Total Sales 35280.49#> 26 B Total Sales 115104.82#> 27 C Total Sales 323347.30#> 28 D Total Sales 74721.48#> 29 E Total Sales 34535.53#> 30 F Total Sales 42697.72Plot: specify color = Region and group = Region inside aes so ggplot knows how to pick color and draw linesggplot(df_long, aes(x = Year, y = Sales, color = Region, group = Region)) + geom_point() + geom_line() + scale_color_brewer(palette = 'Dark2') + theme_classic(base_size = 12)#> Warning: Removed 3 rows containing missing values (geom_point).#> Warning: Removed 2 rows containing missing values (geom_path).Can also use facet_grid()ggplot(df_long, aes(x = Year, y = Sales, group = Region)) + geom_point() + geom_line() + facet_grid(Region ~., scales = 'free_y') + theme_bw(base_size = 12)#> Warning: Removed 3 rows containing missing values (geom_point).#> Warning: Removed 2 rows containing missing values (geom_path).Created on 2018-10-12 by the reprex package (v0.2.1.9000) 这篇关于使用R ggplot绘制宽格式数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
09-05 20:43