我有一个数据集,其中包括三种土地覆盖类型的月度ndvi和降水量数据,每种类型共有26个站点,每个站点13年。我想运行一个循环,用每种土地覆盖类型的每个站每年用“ndvi”和“cv”填充矩阵m1。最后,我需要三个环形土地覆被矩阵的三个地块。

输入以下所示的脚本时,我遇到了错误。我不确定为什么会有“不同的行数”,因为每个站点每年总是有一个最大的ndvi值和一个cv值。有人可以针对我的错误提供建议吗?该脚本用于我的均值ndvi的分析,但以某种方式使用max则不行。

dput(head(d))

structure(list(row.names = c(1L, 1769L, 2055L, 2341L, 2627L,
2913L), timestamp = 1:6, station = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = c("Aiselukharka", "Anarmani", "BiratnagarAirport",
"Chainpur", "Chandragadhi", "Damak", "Dhankuta", "Diktel", "Dingla",
"Haraicha", "Ilam", "Kanyam", "Kechana", "KhotangBazar", "Leguwa",
"Letang", "ManebhanjyangBazar", "Muga", "Mulghat", "Num", "Okhaldunga",
"PakhribasBazar", "Phidim", "Sanischare", "Sankhuwasabha", "Tumlingtar"
), class = "factor"), year = c(2000L, 2000L, 2000L, 2000L, 2000L,
2000L), month = structure(c(5L, 4L, 8L, 1L, 9L, 7L), .Label = c("apr",
"aug", "dec", "feb", "jan", "jul", "jun", "mar", "may", "nov",
"oct", "sept"), class = "factor"), ndvi = c(0.4138, 0.4396, 0.4393,
0.6029, 0.4756, 0.4969), landcover = structure(c(3L, 3L, 3L,
3L, 3L, 3L), .Label = c("Cropland/Natural vegetation mosaic",
"Croplands", "Mixed forest"), class = "factor"), altitude = c(2143L,
2143L, 2143L, 2143L, 2143L, 2143L), altrange = structure(c(3L,
3L, 3L, 3L, 3L, 3L), .Label = c("0-500", "1501-2000", "2001+",
"501-1500"), class = "factor"), precipitation = c(16, 4, 25.5,
72.6, 241.7, 505.9)), .Names = c("row.names", "timestamp", "station",
"year", "month", "ndvi", "landcover", "altitude", "altrange",
"precipitation"), row.names = c(NA, 6L), class = "data.frame")

d <- read.csv("asort.csv", header = TRUE, sep = ",")
stations <- levels(d$station)
landcover <- levels(d$landcover)
allyears=c$year[ ! duplicated( c$year)]

for(lc in landcover) {
m1=NULL
for(j in stations){
  for (i in allyears){
      tmp <- d[d$landcover==lc & d$station==j & d$year==i,]
      ndvi<- tmp$ndvi[which.max(tmp$ndvi)];
      precip_2m<-tmp$precipitation[tmp$month %in% c("feb","mar","apr","may","jun","jul","aug")]
      cv<-sd(precip_2m,na.rm=T)/mean(precip_2m, na.rm=T)
      station=j
      landcover=lc
      year=i
      lag=l
      m1 = rbind(m1, data.frame(ndvi, cv,landcover, station, year))
  }
}
 cat("landcover=",lc)
 print(summary(aov(ndvi~cv,data=m1)))
 plot(ndvi~cv,main=lc,
     xlab="cv of growing season precipitation", ylab="max ndvi ", data=m1)
 abline(lm(ndvi~cv, data=m1))
 fit = summary(lm(ndvi~cv, data=m1))
 r2 = fit$adj.r.squared
 my.p = fit$coefficients[2,4]
 rp = vector('expression',2)
 rp[1] = substitute(expression(italic(R)^2 == value.r), list(value.r = format(r2,dig=3)))[2]
 rp[2] = substitute(expression(italic(p) == value.p), list(value.p = format(my.p, digits = 2))[2]
legend('topright', legend = rp, bty = 'n')
}


Error in data.frame(ndvi, cv, landcover, station, year) :
arguments imply differing number of rows: 0, 1

谢谢!

最佳答案

当特定子集(nrow(tmp)==0)中没有值时,出现该错误。 mean和您现在正在执行的操作之间的区别是mean(NULL)实际上返回了长度为1的 vector ,而tmp$ndvi[which.max(tmp$ndvi)]将返回长度为零的 vector 。这是您要具体分配的值(如站号,土地覆盖物等)始终具有长度1的值与您正在计算的值(可能为零长度)之间的不匹配,因此会出现不匹配错误。

因此,您可以做两件事。最简单的是更换

ndvi<- tmp$ndvi[which.max(tmp$ndvi)];


ndvi<- max(tmp$ndvi);

因为max具有与mean相同的行为,只要它会返回某些内容。但这当然意味着您最终的结果中只是获得了奇怪的数据。您可以使用以下方法测试空tmp data.fames的替代方法
for (i in allyears){
    tmp <- d[d$landcover==lc & d$station==j & d$year==i,]
    if(nrow(tmp)>0) {
       ...
       m1 = rbind(m1, data.frame(ndvi, cv,landcover, station, year))
    }
}

但是,实际上似乎应该可以使用aggregate计算大多数这些值(尽管对于不同的摘要函数,您可能不得不多次调用它)。

10-05 18:53