本文介绍了在多列上使用数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 限时删除!! 我有一个很大的样本数据集,其中描述了样本是否可行 - 它看起来(类似),其中'desc'是描述列,'空白'表示样本不可行: desc xyz 1 blank 4.529976 5.297952 5.581013 2 blank 5.906855 4.557389 4.901660 3样品4.322014 4.798248 4.995959 4样本3.997565 5.975604 7.160871 5空白4.898922 7.666193 5.551385 6空白5.667884 5.195825 5.232072 7空白5.524773 6.726074 4.767475 8样本4.382937 5.926217 5.203737 9示例4.976908 3.079191 4.614121 10 blank 4.572954 4.772373 6.077195 我想使用if else语句将具有不可用数据的行设置为NA。最后的数据集应该如下所示: desc xyz 1空白不适用不适用 2空白NA NA NA 3样本4.322014 4.798248 4.995959 4样本3.997565 5.975604 7.160871 5空白不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用$ b 8样本4.382937 5.926217 5.203737 9样本4.976908 3.079191 4.614121 10空白不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用不适用b 我已经尝试了一个for循环,但是我无法获得for循环来更改一个循环中的所有列。我真正的数据集有40列,所以我宁愿不必在单独的循环中处理它!这里是一次更改一列的代码: for(i in 1:length(desc)){ $ if(dat $ desc [i] ==blank){ dat $ x [i] } else { dat $ x [ i]< - dat $ x [i] } } 我用这个脚本制作了样本数据: desc x z -norm(10,mean = 5,sd = 1) dat 对不起,如果这是一个基本问题,花了一整天早上看论坛,一直没能找到解决方案。 任何帮助非常感谢! 选项1,命名要更改的列: dat [wh ich(dat $ desc ==blank),c(x,y,z)] 在具有40列的实际数据中,如果您只想将最后的39列设置为NA,则以下操作可能比命名每列要简单一些; 选项2,使用范围选择列: $ $ $ $ $ $ $ c $ dat $选项3,不包括该选项第一列: $ $ p $ dat [哪个(dat $ desc ==空白),-1] 选项4,不包括指定的列: dat [其中(dat $ desc ==blank),!names(dat)%in%desc] 正如你所看到的,有很多方法可以做这种操作(这还远不是一个完整的列表),并且了解每个选项作品将帮助您更好地理解语言。 I have a large dataset of samples with descriptors of whether the sample is viable - it looks (kind of) like this, where 'desc' is the description column and 'blank' indicates the sample is not viable: desc x y z1 blank 4.529976 5.297952 5.5810132 blank 5.906855 4.557389 4.9016603 sample 4.322014 4.798248 4.9959594 sample 3.997565 5.975604 7.1608715 blank 4.898922 7.666193 5.5513856 blank 5.667884 5.195825 5.2320727 blank 5.524773 6.726074 4.7674758 sample 4.382937 5.926217 5.2037379 sample 4.976908 3.079191 4.61412110 blank 4.572954 4.772373 6.077195I want to use an if else statement to set the rows with unuseable data to NA. The final data set should look like this: desc x y z1 blank NA NA NA2 blank NA NA NA3 sample 4.322014 4.798248 4.9959594 sample 3.997565 5.975604 7.1608715 blank NA NA NA6 blank NA NA NA7 blank NA NA NA8 sample 4.382937 5.926217 5.2037379 sample 4.976908 3.079191 4.61412110 blank NA NA NAI have tried a for loop, but I'm having trouble getting the for-loop to change all the columns in one loop. My real dataset has 40 columns, so I'd rather not have to process it in separate loops! Here is the code to change one column at a time:for(i in 1:length(desc)){ if(dat$desc[i] =="blank"){ dat$x[i] <- NA } else { dat$x[i] <- dat$x[i] }}I made the sample data with this script:desc <- c("blank", "blank", "sample", "sample", "blank", "blank", "blank", "sample", "sample", "blank")x <- rnorm(10, mean=5, sd=1)y <- rnorm(10, mean=5, sd=1)z <- rnorm(10, mean=5, sd=1)dat <- data.frame(desc,x,y,z)Sorry if this is a basic question, I've spent all morning looking at forums and haven't been able to find a solution.Any help is much appreciated! 解决方案 For your example dataset this will work;Option 1, name the columns to change:dat[which(dat$desc == "blank"), c("x", "y", "z")] <- NAIn your actual data with 40 columns, if you just want to set the last 39 columns to NA, then the following may be simpler than naming each of the columns to change;Option 2, select columns using a range:dat[which(dat$desc == "blank"), 2:40] <- NAOption 3, exclude the 1st column:dat[which(dat$desc == "blank"), -1] <- NAOption 4, exclude a named column:dat[which(dat$desc == "blank"), !names(dat) %in% "desc"] <- NAAs you can see, there are many ways to do this kind of operation (this is far from a complete list), and understanding how each of these options works will help you to get a better understanding of the language. 这篇关于在多列上使用数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 1403页,肝出来的..
09-06 08:03