本文介绍了如何在R中的非结构化数据框架内定位数据结构化区域?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我有一些包含感兴趣的子集的数据框。 问题是这个子集在不同的数据帧之间是不一致的。尽管如此,在更抽象的层次中,遵循一般结构:数据框架内的一个矩形区域。 example1< - data.frame(x = c(name,129-2,NA,NA,acc,2,3,4,NA,NA) y = c(NA,NA,NA,NA,deb,3,2,5,NA,NA),z = c(NA,NA,NA,NA,asset ,1,2,NA,NA)) print(example1) xyz 1名称< NA> < NA> 2 129-2< NA> < NA> 3< NA> < NA> < NA> 4< NA> < NA> < NA> 5分配ACC DEB资产 6分配2 3 1 7分配3 2 1 8 4 5 2 9版; NA> < NA> < NA> 10< NA> < NA> < NA> example1 包含一个具有结构信息的矩形区域: 5 acc deb资产 6 2 3 1 7 3 2 1 8 4 5 2 如前所述,该地区不是总是一致的 列的位置并不总是相同的 $ b 这里是另一个 example2 : example2< - data.frame(x = c (name,129-2,wallabe#23,NA,NA,acc,2,3,4,NA),y = c(NA,NA,NA,NA,余额,债务,3,2,5,NA),z = c(NA,NA,NA,NA,NA,资产,1,1,2,NA),u = c(NA,NA,NA,货币:,NA,NA,NA,NA,NA,NA),i = c(NA,NA,NA,USD,result ,2,3,1,NA), o = c(NA,NA,NA,NA,NA,输入,2,2,1,NA)) print(example2)>示例2 X YžüI O 1名< NA> < NA> < NA> < NA> < NA> 2 129-2< NA> < NA> < NA> < NA> < NA> 3 wallabe#23< NA> < NA> < NA> < NA> < NA> 4< NA> < NA> < NA>货币:USD< NA> 5< NA>余额< NA> < NA>结果< NA> 6 acc deb资产< NA>赢了 7 2 3 1< NA> 2 2 8 3 2 1< NA> 3 2 9 4 5 2< NA> 1 1 10< NA> < NA> < NA> < NA> < NA> < NA> example2 包含一个明确矩形区域: 6 ACC DEB资产< NA>赢了 7 2 3 1< NA> 2 2 8 3 2 1< NA> 3 2 9 4 5 2< NA> 1 1 扫描此数据框以查找其中的这种区域的一种方法? 任何想法都赞赏 解决方案想要尝试同样数量的 NA的最长序列 s: findTable< - function(df){ naSeq< - rowSums(is.na(df))#每行 myRle df [rep(myRle == max(myRle),myRle),]#获取最长序列} findTable(example1)xyz 5 acc deb资产 6 2 3 1 7 3 2 1 8 4 5 2 findTable(example2)xyzuio 6 acc资产< NA>赢了 7 2 3 1< NA> 2 2 8 3 2 1< NA> 3 2 9 4 5 2< NA> 1个1 I have a certain kind of data frames that contain a subset of interest. The problem is that this subset, is non consistent between the different data frames. Nonetheless, in a more abstract level, follows a general structure: a rectangular region inside the data frame.example1 <- data.frame(x = c("name", "129-2", NA, NA, "acc", 2, 3, 4, NA, NA), y = c(NA, NA, NA, NA, "deb", 3, 2, 5, NA, NA), z = c(NA, NA, NA, NA, "asset", 1, 1, 2, NA, NA))print(example1) x y z1 name <NA> <NA>2 129-2 <NA> <NA>3 <NA> <NA> <NA>4 <NA> <NA> <NA>5 acc deb asset6 2 3 17 3 2 18 4 5 29 <NA> <NA> <NA>10 <NA> <NA> <NA>The example1 contain a clear rectangular región with a structure information:5 acc deb asset6 2 3 17 3 2 18 4 5 2As mentioned before, the region is not always consistent,the position of the columns are not always the samethe name of the variables insde the subset of interest are not always the sameHere another example2:example2 <- data.frame(x = c("name", "129-2", "wallabe #23", NA, NA, "acc", 2, 3, 4, NA ), y = c(NA, NA, NA, NA, "balance", "deb", 3, 2, 5, NA), z = c(NA, NA, NA, NA, NA, "asset", 1, 1, 2, NA), u = c(NA, NA, NA, "currency:", NA, NA, NA, NA, NA, NA), i = c(NA, NA, NA, "USD", "result", "win", 2, 3, 1, NA), o = c(NA, NA, NA, NA, NA, "lose", 2, 2, 1, NA))print(example2)> example2 x y z u i o1 name <NA> <NA> <NA> <NA> <NA>2 129-2 <NA> <NA> <NA> <NA> <NA>3 wallabe #23 <NA> <NA> <NA> <NA> <NA>4 <NA> <NA> <NA> currency: USD <NA>5 <NA> balance <NA> <NA> result <NA>6 acc deb asset <NA> win lose7 2 3 1 <NA> 2 28 3 2 1 <NA> 3 29 4 5 2 <NA> 1 110 <NA> <NA> <NA> <NA> <NA> <NA>The example2 contain a not clear rectangular región:6 acc deb asset <NA> win lose7 2 3 1 <NA> 2 28 3 2 1 <NA> 3 29 4 5 2 <NA> 1 1One method to scan this dataframe to locate this kind of region inside of it?Any idea is appreciated 解决方案 You might want to try the longest sequence with same amount of NAs:findTable <- function(df){ naSeq <- rowSums(is.na(df)) # How many NA per row myRle <- rle(naSeq )$length # Find sequences length df[rep(myRle == max(myRle), myRle),] # Get longest sequence}findTable(example1) x y z5 acc deb asset6 2 3 17 3 2 18 4 5 2findTable(example2) x y z u i o6 acc deb asset <NA> win lose7 2 3 1 <NA> 2 28 3 2 1 <NA> 3 29 4 5 2 <NA> 1 1 这篇关于如何在R中的非结构化数据框架内定位数据结构化区域?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
09-05 02:18