本文介绍了计数R csv中的岛屿的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想计算.csv中沿行的孤岛.我说的岛屿"是指.csv行上连续的非空白条目.如果连续有三个非空​​白条目,我希望将其计为1个岛.连续少于三个连续项的任何项都计为1个非孤岛".然后,我想将输出写入数据框:

I would like to count islands along rows in a .csv. I say "islands" meaning consecutive non-blank entries on rows of the .csv. If there are three non-blank entries in a row, I would like that to be counted as 1 island. Anything less than three consecutive entries in a row counts as 1 "non-island". I would then like to write the output to a dataframe:

Name,,,,,,,,,,,,,
Michael,,,1,1,1,,,,,,,,
Peter,,,,1,1,,,,,,,,,
John,,,,,1,,,,,,,,,

所需的数据帧输出:

Name,island,nonisland,
Michael,1,0,
Peter,0,1,
John,0,1,

推荐答案

您可以像这样使用rle;

output <- stack(sapply(apply(df, 1, rle), function(x) sum(x$lengths >= 3)))
names(output) <- c("island", "name")

output$nonisland <- 0
output$nonisland[output$island == 0] <- 1
#  island    name nonisland
#1      1 Michael         0
#2      0   Peter         1
#3      0    John         1

在这里跨数据框的行运行rle.然后,查找长度大于或等于3的字符,然后进行查找并累加.

Here you run rle across the rows of your data frame. Then look through and add up occurrences when you find lengths of 3 or more.

请注意,此解决方案假定所有岛都是由同一事物组成的(即,与您的示例中的全为1).如果不是这种情况,则需要通过执行以下操作将所有非空条目转换为相同的内容:在rle之前的df[!is.na(df)] <- 1将是适当的.

Note that this solution assumes all islands are made up of the same thing (i.e. all 1's as in your example). If that is not the case, you would need to convert all the non-empty entries to be the same thing by doing something like this: df[!is.na(df)] <- 1 before rle will be appropriate.

这篇关于计数R csv中的岛屿的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-31 06:04