合并来自许多文件的数据并绘制它们

本文介绍了合并来自许多文件的数据并绘制它们的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述 29岁程序员，3月因学历无情被辞！我有一个书面的应用程序正在分析数据和在CSV文件中写入结果。它包含三个列： id ， diff 和 count 。 1. id 循环的id - 理论上较大的id，较低的 diff 应该是 2. Diff 是 sum 3 count $ （估算器 - RealValue）^ 2 b>是周期中的观察次数对于参数K的15个不同值，我生成名称为％K％.csv的CSV文件，其中％K％使用的值。我的文件总数是15。我想做的是用R简单循环写，这样就能按顺序绘制我的文件的内容让我决定，K的值是最好的（一般来说 diff 是最低的。做一些像 ggplot（data = data）+ geom_point（aes（x = id，y = sqrt ））这是否有意义我要做什么？请注意，统计完全不是我的域，也不是R（但你可能已经知道了这个）。有没有更好的方法，我可以选择？从理论的角度来看， >解决方案编辑以清除一些拼写错误并解决多个K值问题。假设你已将所有.csv文件放在一个目录中（这个目录中没有其他东西）。我还将假设每个.csv真的有相同的结构（相同的列数，在相同的顺序）。我将首先生成文件名列表： myCSVs 然后我将使用 lapply ，使用 read.csv 将每个文件读入数据框： setwd（path / to / directory）＃此函数只读入文件，＃追加一个带有从文件＃名称。你可能需要修改这里的细节。 myFun< - function（fn）{ tmp< - read.csv（fn） tmp $ K< - strsplit（fn，。，fixed = TRUE） [1]] [1] tmp } dataList pre> 根据您的.csv的结构，您可能需要向 read.csv 传递一些额外的参数。最后，我将这个数据框列表合并为一个数据框： myData 对于的统计方面，可以传递你的问题，有点难以提供意见，没有具体的数据的例子。一旦你找出了编程部分，你可以问一个单独的问题，提供一些示例数据（在这里，或在stats.stackexchange.com），人们将能够建议一些可视化或分析技术，可能有帮助。 / p> I have written application that is analyzing data and writing results in CSV file. It contains three columns: id, diff and count. 1. id is the id of the cycle - in theory the greater id, the lower diff should be 2. Diff is the sum of (Estimator - RealValue)^2 for each observation in the cycle3 count is number of observation during cycleFor 15 different values of parameter K, I am generating CSV file with name: %K%.csv , where %K% is the used value. My total number of files is 15.What I would like to do, is to write in R simple loop, that will be able to plot content of my files in order to let me decide, which value of K is the best (for which in general the diff is the lowest.For single file I am doing something like ggplot(data = data) + geom_point(aes(x= id, y=sqrt(diff/count)))Does it make sense what I am trying to do ? Please note that statistics is completely not my domain, nor is R (but you probably could figure out this already).Is there any better approach I can choose? And from theoretical point of view, am I doing what I am expecting to do?I Would be very greateful for any comments, hints, critic and answers 解决方案 Edited to clean up some typos and address the multiple K value issue.I'm going to assume that you've placed all your .csv files in a single directory (and there's nothing else in this directory). I will also assume that each .csv really do have the same structure (same number of columns, in the same order). I would begin by generating a list of the file names:myCSVs <- list.files("path/to/directory")Then I would 'loop' over the list of file names using lapply, reading each file into a data frame using read.csv:setwd("path/to/directory")#This function just reads in the file and# appends a column with the K val taken from the file# name. You may need to tinker with the particulars here.myFun <- function(fn){ tmp <- read.csv(fn) tmp$K <- strsplit(fn,".",fixed = TRUE)[[1]][1] tmp}dataList <- lapply(myCSVs, FUN = myFun,...)Depending on the structure of your .csv's you may need to pass some additional arguments to read.csv. Finally, I would combine this list of data frames into a single data frame:myData <- do.call(rbind, dataList)Then you should have all your data in a single data frame, myData, that you can pass to ggplot.As for the statistical aspect of your question, it's a little difficult to offer an opinion without concrete examples of your data. Once you've figured the programming part out, you could ask a separate question that provides some sample data (either here, or on stats.stackexchange.com) and folks will be able to suggest some visualization or analysis techniques that may help. 这篇关于合并来自许多文件的数据并绘制它们的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！