本文介绍了合并来自许多文件的数据并绘制它们的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我有一个书面的应用程序正在分析数据和在CSV文件中写入结果。它包含三个列: id , diff 和 count 。 1. id 循环的id - 理论上较大的id,较低的 diff 应该是 2. Diff 是 sum 3 count $ (估算器 - RealValue)^ 2 b>是周期中的观察次数 对于参数K的15个不同值,我生成名称为%K%.csv的CSV文件,其中%K%使用的值。我的文件总数是15。 我想做的是用R简单循环写,这样就能按顺序绘制我的文件的内容让我决定,K的值是最好的(一般来说 diff 是最低的。 做一些像 ggplot(data = data)+ geom_point(aes(x = id,y = sqrt )) 这是否有意义我要做什么?请注意,统计完全不是我的域,也不是R(但你可能已经知道了这个)。 有没有更好的方法,我可以选择?从理论的角度来看, >解决方案 编辑以清除一些拼写错误并解决多个K值问题。 假设你已将所有.csv文件放在一个目录中(这个目录中没有其他东西)。我还将假设每个.csv真的有相同的结构(相同的列数,在相同的顺序)。我将首先生成文件名列表: myCSVs 然后我将使用 lapply ,使用 read.csv 将每个文件读入数据框: setwd(path / to / directory)#此函数只读入文件,#追加一个带有从文件#名称。你可能需要修改这里的细节。 myFun< - function(fn){ tmp< - read.csv(fn) tmp $ K< - strsplit(fn,。,fixed = TRUE) [1]] [1] tmp } dataList pre> 根据您的.csv的结构,您可能需要向 read.csv 传递一些额外的参数。最后,我将这个数据框列表合并为一个数据框: myData 对于的统计方面,可以传递你的问题,有点难以提供意见,没有具体的数据的例子。一旦你找出了编程部分,你可以问一个单独的问题,提供一些示例数据(在这里,或在stats.stackexchange.com),人们将能够建议一些可视化或分析技术,可能有帮助。 / p> I have written application that is analyzing data and writing results in CSV file. It contains three columns: id, diff and count. 1. id is the id of the cycle - in theory the greater id, the lower diff should be 2. Diff is the sum of (Estimator - RealValue)^2 for each observation in the cycle3 count is number of observation during cycleFor 15 different values of parameter K, I am generating CSV file with name: %K%.csv , where %K% is the used value. My total number of files is 15.What I would like to do, is to write in R simple loop, that will be able to plot content of my files in order to let me decide, which value of K is the best (for which in general the diff is the lowest.For single file I am doing something like ggplot(data = data) + geom_point(aes(x= id, y=sqrt(diff/count)))Does it make sense what I am trying to do ? Please note that statistics is completely not my domain, nor is R (but you probably could figure out this already).Is there any better approach I can choose? And from theoretical point of view, am I doing what I am expecting to do?I Would be very greateful for any comments, hints, critic and answers 解决方案 Edited to clean up some typos and address the multiple K value issue.I'm going to assume that you've placed all your .csv files in a single directory (and there's nothing else in this directory). I will also assume that each .csv really do have the same structure (same number of columns, in the same order). I would begin by generating a list of the file names:myCSVs <- list.files("path/to/directory")Then I would 'loop' over the list of file names using lapply, reading each file into a data frame using read.csv:setwd("path/to/directory")#This function just reads in the file and# appends a column with the K val taken from the file# name. You may need to tinker with the particulars here.myFun <- function(fn){ tmp <- read.csv(fn) tmp$K <- strsplit(fn,".",fixed = TRUE)[[1]][1] tmp}dataList <- lapply(myCSVs, FUN = myFun,...)Depending on the structure of your .csv's you may need to pass some additional arguments to read.csv. Finally, I would combine this list of data frames into a single data frame:myData <- do.call(rbind, dataList)Then you should have all your data in a single data frame, myData, that you can pass to ggplot.As for the statistical aspect of your question, it's a little difficult to offer an opinion without concrete examples of your data. Once you've figured the programming part out, you could ask a separate question that provides some sample data (either here, or on stats.stackexchange.com) and folks will be able to suggest some visualization or analysis techniques that may help. 这篇关于合并来自许多文件的数据并绘制它们的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!