我尝试在Go中编写一个程序,以在非常大的DNA序列文件中查找一些基因。我已经制作了一个Perl程序来做到这一点,但是我想利用goroutines并行执行此搜索;)

因为文件很大,所以我的想法是一次读取100个序列,然后将分析结果发送到goroutine,然后再次读取100个序列,依此类推。

我要感谢该站点的成员对 slice 和goroutine的真正有用的解释。

我进行了建议的更改,以使用goroutines处理过的 slice 的副本。但是-race执行仍然在copy()函数级别检测到一个数据竞争:

非常感谢您的评论!

    ==================
WARNING: DATA RACE
Read by goroutine 6:
  runtime.slicecopy()
      /usr/lib/go-1.6/src/runtime/slice.go:113 +0x0
  main.main.func1()
      test_chan006.go:71 +0xd8

Previous write by main goroutine:
  main.main()
      test_chan006.go:63 +0x3b7

Goroutine 6 (running) created at:
  main.main()
      test_chan006.go:73 +0x4c9
==================
[>5HSAA098909 BA098909 ...]
Found 1 data race(s)
exit status 66

    line 71 is : copy(bufCopy, buf_Seq)
    line 63 is : buf_Seq = append(buf_Seq, line)
    line 73 is :}(genes, buf_Seq)




    package main

import (
    "bufio"
    "fmt"
    "os"
    "github.com/mathpl/golang-pkg-pcre/src/pkg/pcre"
    "sync"
)

// function read a list of genes and return a slice of gene names
func read_genes(filename string) []string {
    var genes []string // slice of genes names
    // Open the file.
    f, _ := os.Open(filename)
    // Create a new Scanner for the file.
    scanner := bufio.NewScanner(f)
    // Loop over all lines in the file and print them.
    for scanner.Scan() {
          line := scanner.Text()
        genes = append(genes, line)
    }
    return genes
}

// function find the sequences with a gene matching gene[] slice
func search_gene2( genes []string, seqs []string) ([]string) {
  var res []string

  for r := 0 ; r <= len(seqs) - 1; r++ {
    for i := 0 ; i <= len(genes) - 1; i++ {

      match := pcre.MustCompile(genes[i], 0).MatcherString(seqs[r], 0)

      if (match.Matches() == true) {
          res = append( res, seqs[r])           // is the gene matches the gene name is append to res
          break
      }
    }
  }

  return res
}
//###########################################

func main() {
    var slice []string
    var buf_Seq []string
    read_buff := 100    // the number of sequences analysed by one goroutine

    var wg sync.WaitGroup
    queue := make(chan []string, 100)

    filename := "fasta/sequences.tsv"
    f, _ := os.Open(filename)
    scanner := bufio.NewScanner(f)
    n := 0
    genes := read_genes("lists/genes.csv")

    for scanner.Scan() {
            line := scanner.Text()
            n += 1
            buf_Seq = append(buf_Seq, line) // store the sequences into buf_Seq
            if n == read_buff {   // when the read buffer contains 100 sequences one goroutine analyses them

          wg.Add(1)

          go func(genes, buf_Seq []string) {
            defer wg.Done()
                        bufCopy := make([]string, len(buf_Seq))
                        copy(bufCopy, buf_Seq)
            queue <- search_gene2( genes, bufCopy)
            }(genes, buf_Seq)
                        buf_Seq = buf_Seq[:0]   // reset buf_Seq
              n = 0 // reset the sequences counter

        }
    }
    go func() {
            wg.Wait()
            close(queue)
        }()

        for t := range queue {
            slice = append(slice, t...)
        }

        fmt.Println(slice)
}

最佳答案

之所以存在数据竞争,是因为 slice 是Go中的引用类型。它们通常按值传递,但作为引用类型,对一个值所做的任何更改都会反射(reflect)在另一个值中。考虑:

func f(xs []string) {
    xs[0] = "changed_in_f"
}

func main() {
    xs := []string{"set_in_ main", "asd"}
    fmt.Println("Before call:", xs)
    f(xs)
    fmt.Println("After call:", xs)

    var ys []string
    ys = xs
    ys[0] = "changed_through_ys"
    fmt.Println("After ys:", xs)

}

打印:
Before call: [set_in_main asd]
After call: [changed_in_f asd]
After ys: [changed_through_ys asd]

发生这种情况是因为所有三个片共享内存中的相同基础数组。更多详细信息here

当您将buf_Seq传递给search_gene2时,可能会发生这种情况。新的分片值将传递给每个调用,但是,每个分片值可能引用相同的基础数组,从而导致潜在的竞争状况(对append的调用可能会改变分片的基础数组)。

要解决此问题,请在main中尝试以下操作:
bufCopy := make([]string, len(buf_Seq))
// make a copy of buf_Seq in an entirely separate slice
copy(buffCopy, buf_Seq)
go func(genes, buf_Seq []string) {
        defer wg.Done()
        queue <- search_gene2( genes, bufCopy)
    }(genes, buf_Seq)
}

关于arrays - Goroutines分享片: : trying to understand a data race,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/38923237/

10-10 19:46