问题描述
下面的Go代码读取10,000条记录的CSV(时间戳为times
,浮点为values
),对数据执行一些操作,然后将原始值以及.但是,这非常慢(例如数小时,但大多数都是calculateStuff()
),我很好奇我可以处理的CSV读写效率是否低下.
The Go code below reads in a 10,000 record CSV (of timestamp times
and float values
), runs some operations on the data, and then writes the original values to another CSV along with an additional column for score
. However it is terribly slow (i.e. hours, but most of that is calculateStuff()
) and I'm curious if there are any inefficiencies in the CSV reading/writing I can take care of.
package main
import (
"encoding/csv"
"log"
"os"
"strconv"
)
func ReadCSV(filepath string) ([][]string, error) {
csvfile, err := os.Open(filepath)
if err != nil {
return nil, err
}
defer csvfile.Close()
reader := csv.NewReader(csvfile)
fields, err := reader.ReadAll()
return fields, nil
}
func main() {
// load data csv
records, err := ReadCSV("./path/to/datafile.csv")
if err != nil {
log.Fatal(err)
}
// write results to a new csv
outfile, err := os.Create("./where/to/write/resultsfile.csv"))
if err != nil {
log.Fatal("Unable to open output")
}
defer outfile.Close()
writer := csv.NewWriter(outfile)
for i, record := range records {
time := record[0]
value := record[1]
// skip header row
if i == 0 {
writer.Write([]string{time, value, "score"})
continue
}
// get float values
floatValue, err := strconv.ParseFloat(value, 64)
if err != nil {
log.Fatal("Record: %v, Error: %v", floatValue, err)
}
// calculate scores; THIS EXTERNAL METHOD CANNOT BE CHANGED
score := calculateStuff(floatValue)
valueString := strconv.FormatFloat(floatValue, 'f', 8, 64)
scoreString := strconv.FormatFloat(prob, 'f', 8, 64)
//fmt.Printf("Result: %v\n", []string{time, valueString, scoreString})
writer.Write([]string{time, valueString, scoreString})
}
writer.Flush()
}
我正在寻找帮助,以使此CSV读/写模板代码尽快.对于此问题的范围,我们不必担心calculateStuff
方法.
I'm looking for help making this CSV read/write template code as fast as possible. For the scope of this question we need not worry about the calculateStuff
method.
推荐答案
您先将文件加载到内存中,然后再对其进行处理,这对于大文件而言可能会很慢.
You're loading the file in memory first then processing it, that can be slow with a big file.
您需要循环并调用.Read
并一次处理一行.
You need to loop and call .Read
and process one line at a time.
func processCSV(rc io.Reader) (ch chan []string) {
ch = make(chan []string, 10)
go func() {
r := csv.NewReader(rc)
if _, err := r.Read(); err != nil { //read header
log.Fatal(err)
}
defer close(ch)
for {
rec, err := r.Read()
if err != nil {
if err == io.EOF {
break
}
log.Fatal(err)
}
ch <- rec
}
}()
return
}
//请注意,这大致是基于DaveC的评论.
//note it's roughly based on DaveC's comment.
这篇关于在Go中高效读写CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!