为什么我的julia代码运行得这么慢

为什么我的julia代码运行得这么慢

本文介绍了为什么我的julia代码运行得这么慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

redim = 2;
# Loading data
iris_data = readdlm("iris_data.csv");
iris_target = readdlm("iris_target.csv");

# Center data
iris_data = broadcast(-, iris_data, mean(iris_data, 1));
n_data, n_dim = size(iris_data);

Sw = zeros(n_dim, n_dim);
Sb = zeros(n_dim, n_dim);

C = cov(iris_data);


classes = unique(iris_target);

for i=1:length(classes)
    index = find(x -> x==classes[i], iris_target);
    d = iris_data[index,:];
    classcov = cov(d);
    Sw += length(index) / n_data .* classcov;
end
Sb = C - Sw;

evals, evecs = eig(Sw, Sb);
w = evecs[:,1:redim];
new_data = iris_data * w;

此代码仅对iris_data执行LDA(线性判别分析).将iris_data的尺寸减小为2.大约需要4秒钟,但是Python(numpy/scipy)仅需要0.6秒钟. 为什么?

This code just does LDA (linear discriminant analysis) for the iris_data.Reduct the dimensions of the iris_data to 2.It will takes about 4 seconds, but Python(numpy/scipy) only takes about 0.6 seconds. Why?

推荐答案

这是《朱莉娅手册》 :


节选:


Excerpt:

任何对性能有严格要求或经过基准测试的代码都应位于函数内部.

Any code that is performance critical or being benchmarked should be inside a function.

我们发现全局名称经常是常量,将它们声明为常量可以大大提高性能

We find that global names are frequently constants, and declaring them as such greatly improves performance


知道 script (所有过程性顶层代码)样式在许多科学计算用户中如此普遍,我建议您至少将整个文件包装在let表达式中,以供初学者使用. (让我们介绍一个新的本地范围),即:


Knowing that the script (all procedural top level code) style is so pervasive among many scientific computing users, I would recommend you to at least wrap the whole file inside a let expression for starters (let introduces a new local scope), ie:

let

redim = 2
# Loading data
iris_data = readdlm("iris_data.csv")
iris_target = readdlm("iris_target.csv")

# Center data
iris_data = broadcast(-, iris_data, mean(iris_data, 1))
n_data, n_dim = size(iris_data)

Sw = zeros(n_dim, n_dim)
Sb = zeros(n_dim, n_dim)

C = cov(iris_data)


classes = unique(iris_target)

for i=1:length(classes)
    index = find(x -> x==classes[i], iris_target)
    d = iris_data[index,:]
    classcov = cov(d)
    Sw += length(index) / n_data .* classcov
end
Sb = C - Sw

evals, evecs = eig(Sw, Sb)
w = evecs[:,1:redim]
new_data = iris_data * w

end

但是我还敦促您将其重构为小的函数,然后组成一个main函数,该函数调用其余函数,如下所示,请注意此重构如何使您的代码具有通用性和可重用性(且快速): >

But I would also urge you to refactor that into small functions and then compose a main function that calls the rest, something like this, notice how this refactor makes your code general and reusable (and fast):

module LinearDiscriminantAnalysis

export load_data, center_data

"Returns data and target Matrices."
load_data(data_path, target_path) = (readdlm(data_path), readdlm(target_path))

function center_data(data, target)
    data = broadcast(-, data, mean(data, 1))
    n_data, n_dim = size(data)
    Sw = zeros(n_dim, n_dim)
    Sb = zeros(n_dim, n_dim)
    C = cov(data)
    classes = unique(target)
    for i=1:length(classes)
        index = find(x -> x==classes[i], target)
        d = data[index,:]
        classcov = cov(d)
        Sw += length(index) / n_data .* classcov
    end
    Sb = C - Sw
    evals, evecs = eig(Sw, Sb)
    redim = 2
    w = evecs[:,1:redim]
    return data * w
end

end


using LinearDiscriminantAnalysis

function main()
    iris_data, iris_target = load_data("iris_data.csv", "iris_target.csv")
    result = center_data(iris_data, iris_target)
    @show result
end

main()

注意:

  • 您不需要所有这些分号.
  • 匿名函数当前运行缓慢,但将在v0.5中更改.如果性能至关重要,则可以暂时使用 FastAnonymous .
  • 摘要中,请仔细阅读并考虑所有性能提示.
  • main只是一个名称,它可以是您喜欢的任何其他名称.
  • You don't need all those semicolons.
  • anonymous functions are currently slow but that will change in v0.5. You can use FastAnonymous for now, if performance is critical.
  • In summary read carefully and take into account all the performance tips.
  • main is just a name, it could be anything else you like.

这篇关于为什么我的julia代码运行得这么慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 21:58