为什么我的julia代码运行得这么慢? | 为什么我的julia代码运行得这么慢

本文介绍了为什么我的julia代码运行得这么慢?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

redim = 2;
# Loading data
iris_data = readdlm("iris_data.csv");
iris_target = readdlm("iris_target.csv");

# Center data
iris_data = broadcast(-, iris_data, mean(iris_data, 1));
n_data, n_dim = size(iris_data);

Sw = zeros(n_dim, n_dim);
Sb = zeros(n_dim, n_dim);

C = cov(iris_data);


classes = unique(iris_target);

for i=1:length(classes)
    index = find(x -> x==classes[i], iris_target);
    d = iris_data[index,:];
    classcov = cov(d);
    Sw += length(index) / n_data .* classcov;
end
Sb = C - Sw;

evals, evecs = eig(Sw, Sb);
w = evecs[:,1:redim];
new_data = iris_data * w;

此代码仅对iris_data执行LDA(线性判别分析).将iris_data的尺寸减小为2.大约需要4秒钟，但是Python(numpy/scipy)仅需要0.6秒钟. 为什么?

This code just does LDA (linear discriminant analysis) for the iris_data.Reduct the dimensions of the iris_data to 2.It will takes about 4 seconds, but Python(numpy/scipy) only takes about 0.6 seconds. Why?

推荐答案

这是《朱莉娅手册》 :

节选:

Excerpt:

任何对性能有严格要求或经过基准测试的代码都应位于函数内部.

Any code that is performance critical or being benchmarked should be inside a function.

我们发现全局名称经常是常量，将它们声明为常量可以大大提高性能

We find that global names are frequently constants, and declaring them as such greatly improves performance

知道 script (所有过程性顶层代码)样式在许多科学计算用户中如此普遍，我建议您至少将整个文件包装在let表达式中，以供初学者使用. (让我们介绍一个新的本地范围)，即:

Knowing that the script (all procedural top level code) style is so pervasive among many scientific computing users, I would recommend you to at least wrap the whole file inside a let expression for starters (let introduces a new local scope), ie:

let

redim = 2
# Loading data
iris_data = readdlm("iris_data.csv")
iris_target = readdlm("iris_target.csv")

# Center data
iris_data = broadcast(-, iris_data, mean(iris_data, 1))
n_data, n_dim = size(iris_data)

Sw = zeros(n_dim, n_dim)
Sb = zeros(n_dim, n_dim)

C = cov(iris_data)


classes = unique(iris_target)

for i=1:length(classes)
    index = find(x -> x==classes[i], iris_target)
    d = iris_data[index,:]
    classcov = cov(d)
    Sw += length(index) / n_data .* classcov
end
Sb = C - Sw

evals, evecs = eig(Sw, Sb)
w = evecs[:,1:redim]
new_data = iris_data * w

end

但是我还敦促您将其重构为小的函数，然后组成一个main函数，该函数调用其余函数，如下所示，请注意此重构如何使您的代码具有通用性和可重用性(且快速): >

But I would also urge you to refactor that into small functions and then compose a main function that calls the rest, something like this, notice how this refactor makes your code general and reusable (and fast):

module LinearDiscriminantAnalysis

export load_data, center_data

"Returns data and target Matrices."
load_data(data_path, target_path) = (readdlm(data_path), readdlm(target_path))

function center_data(data, target)
    data = broadcast(-, data, mean(data, 1))
    n_data, n_dim = size(data)
    Sw = zeros(n_dim, n_dim)
    Sb = zeros(n_dim, n_dim)
    C = cov(data)
    classes = unique(target)
    for i=1:length(classes)
        index = find(x -> x==classes[i], target)
        d = data[index,:]
        classcov = cov(d)
        Sw += length(index) / n_data .* classcov
    end
    Sb = C - Sw
    evals, evecs = eig(Sw, Sb)
    redim = 2
    w = evecs[:,1:redim]
    return data * w
end

end

using LinearDiscriminantAnalysis

function main()
    iris_data, iris_target = load_data("iris_data.csv", "iris_target.csv")
    result = center_data(iris_data, iris_target)
    @show result
end

main()

注意:

您不需要所有这些分号.
匿名函数当前运行缓慢，但将在v0.5中更改.如果性能至关重要，则可以暂时使用 FastAnonymous .
摘要中，请仔细阅读并考虑所有性能提示.
main只是一个名称，它可以是您喜欢的任何其他名称.

You don't need all those semicolons.
anonymous functions are currently slow but that will change in v0.5. You can use FastAnonymous for now, if performance is critical.
In summary read carefully and take into account all the performance tips.
main is just a name, it could be anything else you like.

这篇关于为什么我的julia代码运行得这么慢?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！