为什么我的Rust程序比等效的Java程序慢？

本文介绍了为什么我的Rust程序比等效的Java程序慢？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！
问题描述

我在中玩弄了二进制序列化和反序列化，并注意到二进制反序列化比Java要慢几个数量级。为了消除由于分配和开销等造成的开销的可能性，我只是从每个程序中读取一个二进制流。每个程序从磁盘上的二进制文件读取，其中包含一个4字节的整数，包含输入值的数量，以及一个连续的8字节的大端字节 - 编码的浮点数。这里是Java的实现：
  import java.io. *; 
 $ b $ public class ReadBinary {
 public static void main（String [] args）throws Exception {
 DataInputStream input = new DataInputStream（new BufferedInputStream（new FileInputStream（args [0]） ））; 
 int inputLength = input.readInt（）; 
 System.out.println（input length：+ inputLength）; 
 try {
 for（int i = 0; i< inputLength; i ++）{
 double d = input.readDouble（）; 
 if（i == inputLength  -  1）{
 System.out.println（d）; 
 
 $ b} finally {
 input.close（）
} 
} 
} 
   
 
 
 $ b 下面是Rust实现： 
 
 
  fn main（）{
使用std :: os; 
使用std :: io :: File; 
使用std :: io :: BufferedReader; 
 
 let args = os :: args（）; 
 let fname = args [1] .as_slice（）; 
 let path = Path :: new（fname）; 
 let mut file = BufferedReader :: new（File :: open（& path））; 
让input_length = read_int（& mut文件）为uint; （0u，input_length）{
 let d = read_double_slow（& mut file）; 
 if i == input_length  -  1 {
 println！（{}，d）; 
} 
} 
} 
 
 fn read_int< R：Reader>（input：& mut R） - > i32 {
 match input.read_be_i32（）{
 Ok（x）=> x，
 Err（e）=> （e）
} 
} 
 
 fn read_double_slow< R：Reader>（input：& mut R） - > f64 {
 match input.read_be_f64（）{
 Ok（x）=> x，
 Err（e）=> （e）
} 
} 
  
值，以确保所有的输入实际上被读取。在我的机器上，当文件包含（相同）3000万随机生成的双打时，Java版本运行时间为0.8秒，而Rust版本运行时间为40.8秒。 
 
 
怀疑Rust的字节解释本身是低效的，我重新尝试了一个自定义的浮点反序列化实现。内部是，不需要 IoResult 包装器： 
 
 
  fn read_double< R：Reader>（输入：& mut R，buffer：& mut [u8]） - > f64 {
使用std :: mem :: transmute; 
 match input.read_at_least（8，buffer）{
 Ok（n）=>如果n> 8 {fail！（n> 8）}，
 Err（e）=> （e）
}; 
 let mut val = 0u64; 
 let mut i = 8; 
而i> 0 {
 i  -  = 1; 
 val + = buffer [7-i]作为u64<<我* 8; 
} 
 unsafe {
 transmute ::< u64，f64>（val）; 
 
 
 
 
 
 
 $ b 为了完成这个工作，创建了一个8字节的slice，在 read_double 函数中传入和（重新）用作缓冲区。这产生了显着的性能提升，平均运行时间约为5.6秒。不幸的是，这仍然比Java版本慢得多（并且更加冗长！），使得难以扩展到更大的输入集合。有没有什么可以做到这一点在Rust中更快运行？更重要的是，是否可以将这些更改合并到默认的 Reader 实现本身中，以使二进制I / O更轻松？
 
 
 以下是我用来生成输入文件的代码： 
 
 
  import java.io. *; 
 import java.util.Random; 
 $ b $ public class MakeBinary {
 public static void main（String [] args）throws Exception {
 DataOutputStream output = new DataOutputStream（new BufferedOutputStream（System.out））; 
 int outputLength = Integer.parseInt（args [0]）; 
 output.writeInt（outputLength）; 
 Random rand = new Random（）; 
 for（int i = 0; i< outputLength; i ++）{
 output.writeDouble（rand.nextDouble（）* 10 + 1）; 
} 
 output.flush（）; 
 
 
 
 
 $ b $（注意，生成随机数和      在我的测试机器上写入磁盘只需要3.8秒。通常会比在Java中慢。但用优化（ rustc -O 或 cargo --release ）来构建它，应该非常快。如果它的标准版本仍然以较慢的速度结束，那么应该仔细检查一下，找出缓慢的地方 - 可能是某些内容不应该，或者不应该，或者可能是某些预期的优化没有发生。
 
I was playing around with binary serialization and deserialization in Rust and noticed that binary deserialization is several orders of magnitude slower than with Java. To eliminate the possibility of overhead due to, for example, allocations and overheads, I'm simply reading a binary stream from each program. Each program reads from a binary file on disk which contains a 4-byte integer containing the number of input values, and a contiguous chunk of 8-byte big-endian IEEE 754-encoded floating point numbers. Here's the Java implementation:
import java.io.*;

public class ReadBinary {
    public static void main(String[] args) throws Exception {
        DataInputStream input = new DataInputStream(new BufferedInputStream(new FileInputStream(args[0])));
        int inputLength = input.readInt();
        System.out.println("input length: " + inputLength);
        try {
            for (int i = 0; i < inputLength; i++) {
                double d = input.readDouble();
                if (i == inputLength - 1) {
                    System.out.println(d);
                }
            }
        } finally {
            input.close()
        }
    }
}
Here's the Rust implementation:
fn main() {
    use std::os;
    use std::io::File;
    use std::io::BufferedReader;

    let args = os::args();
    let fname = args[1].as_slice();
    let path = Path::new(fname);
    let mut file = BufferedReader::new(File::open(&path));
    let input_length = read_int(&mut file) as uint;
    for i in range(0u, input_length) {
        let d = read_double_slow(&mut file);
        if i == input_length - 1 {
            println!("{}", d);
        }
    }
}

fn read_int<R : Reader>(input: &mut R) -> i32 {
    match input.read_be_i32() {
        Ok(x) => x,
        Err(e) => fail!(e)
    }
}

fn read_double_slow<R : Reader>(input: &mut R) -> f64 {
    match input.read_be_f64() {
        Ok(x) => x,
        Err(e) => fail!(e)
    }
}
I'm outputting the last value to make sure that all of the input is actually being read. On my machine, when the file contains (the same) 30 million randomly-generated doubles, the Java version runs in 0.8 seconds, while the Rust version runs in 40.8 seconds.
Suspicious of inefficiencies in Rust's byte interpretation itself, I retried it with a custom floating point deserialization implementation. The internals are almost exactly the same as what's being done in Rust's Reader, without the IoResult wrappers:
fn read_double<R : Reader>(input: &mut R, buffer: &mut [u8]) -> f64 {
    use std::mem::transmute;
    match input.read_at_least(8, buffer) {
        Ok(n) => if n > 8 { fail!("n > 8") },
        Err(e) => fail!(e)
    };
    let mut val = 0u64;
    let mut i = 8;
    while i > 0 {
        i -= 1;
        val += buffer[7-i] as u64 << i * 8;
    }
    unsafe {
        transmute::<u64, f64>(val);
    }
}
The only change I made to the earlier Rust code in order to make this work was create an 8-byte slice to be passed in and (re)used as a buffer in the read_double function. This yielded a significant performance gain, running in about 5.6 seconds on average. Unfortunately, this is still noticeably slower (and more verbose!) than the Java version, making it difficult to scale up to larger input sets. Is there something that can be done to make this run faster in Rust? More importantly, is it possible to make these changes in such a way that they can be merged into the default Reader implementation itself to make binary I/O less painful?
For reference, here's the code I'm using to generate the input file:
import java.io.*;
import java.util.Random;

public class MakeBinary {
    public static void main(String[] args) throws Exception {
        DataOutputStream output = new DataOutputStream(new BufferedOutputStream(System.out));
        int outputLength = Integer.parseInt(args[0]);
        output.writeInt(outputLength);
        Random rand = new Random();
        for (int i = 0; i < outputLength; i++) {
            output.writeDouble(rand.nextDouble() * 10 + 1);
        }
        output.flush();
    }
}
(Note that generating the random numbers and writing them to disk only takes 3.8 seconds on my test machine.)
 解决方案 
When you build without optimisations, it will often be slower than it would be in Java. But build it with optimisations (rustc -O or cargo --release) and it should be very much faster. If the standard version of it still ends up slower, it’s something that should be examined carefully to figure out where the slowness is—perhaps something is being inlined that shouldn’t be, or not that should be, or perhaps some optimisation that was expected is not occurring.
                        这篇关于为什么我的Rust程序比等效的Java程序慢？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！
                        1403页，肝出来的..