本文介绍了在Go中高效读写CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面的Go代码读取10,000条记录的CSV(时间戳为times,浮点为values),对数据执行一些操作,然后将原始值以及.但是,这非常慢(例如数小时,但大多数都是calculateStuff()),我很好奇我可以处理的CSV读写效率是否低下.

The Go code below reads in a 10,000 record CSV (of timestamp times and float values), runs some operations on the data, and then writes the original values to another CSV along with an additional column for score. However it is terribly slow (i.e. hours, but most of that is calculateStuff()) and I'm curious if there are any inefficiencies in the CSV reading/writing I can take care of.

package main

import (
  "encoding/csv"
  "log"
  "os"
  "strconv"
)

func ReadCSV(filepath string) ([][]string, error) {
  csvfile, err := os.Open(filepath)

  if err != nil {
    return nil, err
  }
  defer csvfile.Close()

  reader := csv.NewReader(csvfile)
  fields, err := reader.ReadAll()

  return fields, nil
}

func main() {
  // load data csv
  records, err := ReadCSV("./path/to/datafile.csv")
  if err != nil {
    log.Fatal(err)
  }

  // write results to a new csv
  outfile, err := os.Create("./where/to/write/resultsfile.csv"))
  if err != nil {
    log.Fatal("Unable to open output")
  }
  defer outfile.Close()
  writer := csv.NewWriter(outfile)

  for i, record := range records {
    time := record[0]
    value := record[1]

    // skip header row
    if i == 0 {
      writer.Write([]string{time, value, "score"})
      continue
    }

    // get float values
    floatValue, err := strconv.ParseFloat(value, 64)
    if err != nil {
      log.Fatal("Record: %v, Error: %v", floatValue, err)
    }

    // calculate scores; THIS EXTERNAL METHOD CANNOT BE CHANGED
    score := calculateStuff(floatValue)

    valueString := strconv.FormatFloat(floatValue, 'f', 8, 64)
    scoreString := strconv.FormatFloat(prob, 'f', 8, 64)
    //fmt.Printf("Result: %v\n", []string{time, valueString, scoreString})

    writer.Write([]string{time, valueString, scoreString})
  }

  writer.Flush()
}

我正在寻找帮助,以使此CSV读/写模板代码尽快.对于此问题的范围,我们不必担心calculateStuff方法.

I'm looking for help making this CSV read/write template code as fast as possible. For the scope of this question we need not worry about the calculateStuff method.

推荐答案

您先将文件加载到内存中,然后再对其进行处理,这对于大文件而言可能会很慢.

You're loading the file in memory first then processing it, that can be slow with a big file.

您需要循环并调用.Read并一次处理一行.

You need to loop and call .Read and process one line at a time.

func processCSV(rc io.Reader) (ch chan []string) {
    ch = make(chan []string, 10)
    go func() {
        r := csv.NewReader(rc)
        if _, err := r.Read(); err != nil { //read header
            log.Fatal(err)
        }
        defer close(ch)
        for {
            rec, err := r.Read()
            if err != nil {
                if err == io.EOF {
                    break
                }
                log.Fatal(err)

            }
            ch <- rec
        }
    }()
    return
}

//请注意,这大致是基于DaveC的评论.

//note it's roughly based on DaveC's comment.

这篇关于在Go中高效读写CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-12 13:58