本文介绍了使用R中具有三位数的多输入算法循环遍历多个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一种名为SAIGE-GENE的基因解释软件.该算法看起来像这样(完整算法位于 https://github.com/weizhouUMICH/SAIGE/wiki/Genetic-association-tests-using-SAIGE#step-2--performing-the-区域-或基于基因的关联测试):它涉及到多个不同的文件,文件名中分别带有染色体编号(1至22).

I am using a genetic interpretation software called SAIGE-GENE. The algorithm looks like this (full algorithm at https://github.com/weizhouUMICH/SAIGE/wiki/Genetic-association-tests-using-SAIGE#step-2--performing-the-region--or-gene-based-association-tests): It involves multiple different files being entered with chromosome numbers in their file names (1 to 22).

SPAGMMATtest = function(
         vcfFile = "",
                 vcfFileIndex = "",
         vcfField = "DS",
         groupFile ="",
         savFile = "",
         savFileIndex = "",
         sampleFile = "",
         idstoExcludeFile = "",
         idstoIncludeFile = "",
         rangestoExcludeFile = "",
         rangestoIncludeFile = "",
         chrom = "",
         start = 1,
         end = 250000000,
         IsDropMissingDosages = FALSE,
         minMAC = 0.5,
                 minMAF = 0,
         maxMAFforGroupTest = 0.5,
             minInfo = 0,
                 GMMATmodelFile = "",
                 varianceRatioFile = "",
                 SPAcutoff=2,
                 SAIGEOutputFile = "",
         numLinesOutput = 10000,
         IsSparse=TRUE,

......

我没有把所有内容都放在这里,因为这无关紧要.我正在将此算法输入一些其他文件,通常我将文件命名为chr1_file_name.txt .... chr22_file_name.txt.

I haven't put the whole thing here as it isn't relevant. I am inputting a few different files into this algorithm and normally I name my files chr1_file_name.txt....chr22_file_name.txt.

然后我在整个算法中使用for循环在R中使用粘贴功能按染色体编号输入不同的文件名:

I then use a for loop in R on the whole algorithm using the paste function to input the different file names by chromosome number:

for(i in 1:22){SPAGMMATtest = function(
         vcfFile = paste("chr",i,"_file_name.txt", sep=""),
                 vcfFileIndex = "",
         vcfField = "DS",
         savFile = "",
         groupFile ="paste("chr",i,".group_file.txt", sep="")",

这很好,但是,我认为我会很聪明,并为该实验的文件名使用三位数字命名:chr001_file_name.txt ... chr022_file_name.txt.

This works fine however, I thought I would be clever and use three digit naming for my file names for this experiment: chr001_file_name.txt...chr022_file_name.txt.

我以前的循环现在不起作用,如果我将循环的开始更改为for(001:022中的i),它也将不起作用.

My previous loop now does not work and if I change the start of the loop to for(i in 001:022) it doesn't work either.

我在做什么错?如何在不重命名所有文件的情况下解决此问题?

What am I doing wrong and how can I fix this without renaming all my files?

推荐答案

建议

为了创建包含3位数字和前导零的字符文件名,例如001、002,...,022.

in order to create character file names which include 3 digits and leading zeroes, e.g., 001, 002, ..., 022.

通过使用 sprintf()完全创建文件名 ,从而删除对 paste()的调用,可以进一步缩短此时间paste0():

This can be further shortened by creating the filename completely with sprintf() thereby removing the calls to paste() or paste0():

sprintf("chr%03d_file_name.txt", i)

使用 i< -1 ,例如 sprintf("chr%03d_file_name.txt",i)返回"chr001_file_name.txt" .

还有第二个发现:

OP已发布了代码段

for(i in 1:22){SPAGMMATtest = function(
         vcfFile = paste("chr",i,"_file_name.txt", sep=""),
                 vcfFileIndex = "",
         vcfField = "DS",
         savFile = "",
         groupFile ="paste("chr",i,".group_file.txt", sep="")",
         ...

这看起来好像是OP将函数 definition 拉入了 for 循环.我认为仅从 for 循环内的调用函数就足够了:

This looks like as if the OP has pulled the function definition into the for loop. I believe it is sufficient only to call the function from within the for loop:

for (i in 1:22) {
     SPAGMMATtest(
         vcfFile = sprintf("chr%03d_file_name.txt", i),
         vcfFileIndex = "",
         vcfField = "DS",
         savFile = "",
         groupFile = sprintf("chr%03d.group_file.txt", i)
         ...

这篇关于使用R中具有三位数的多输入算法循环遍历多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-30 01:58