


I have a performance critical piece of code, where I need to check one array for values below a threshold and then conditionally set the values of two other arrays. My code looks like this:

#include <Eigen/Dense>

int main(){
        a (1, 100),
        b (2, 100),
        c (3, 100);


    constexpr double minVal { 1e-8 };

    /* the code segment in question */
    /* option 1 */
    for ( int i=0; i<2; ++i ){
        b.row(i)   = (a < minVal).select( 0, c.row(i+1) / a );
        c.row(i+1) = (a < minVal).select( 0, c.row(i+1) );
    /* option 2, which is slower */
    b = (a < minVal).replicate(2,1).select( 0, c.bottomRows(2) / a.replicate(2,1) );
    c.bottomRows(2) = (a < minVal).replicate(2,1).select( 0, c.bottomRows(2) );

    return 0;

The array a, whose values are checked for reaching the threshold minVal, has one row and a dynamic number of columns. The other two arrays b and c have two and three rows, respectively, and the same number of columns as a.

Now I would like to do the above logic in a more eigen way, without that loop in option 1, because typically, eigen has tricks up its sleeve for performance, that I can never hope to match when writing raw loops.However, the only way I could think of was option 2, which is noticeably slower than option 1.


What would be the right and efficient way to do the above? Or is the loop already my best option?



  • Define your array types with fixed number of rows and dynamic number of columns, i.e., you can replace Eigen::ArrayXXd with Eigen::Array<double, 1/2/3, Eigen::Dynamic>.
  • Use fixed-size version of block operations (see https://eigen.tuxfamily.org/dox/group__TutorialBlockOperations.html), i.e., you can replace bottomRows(N) with bottomRows<N>() and similarly replicate(2,1) with replicate<2,1>().


I have changed the array types in your code and included a third option with the possible improvements that I have mentioned:

#include <Eigen/Dense>

#include <iostream>
#include <chrono>

constexpr int numberOfTrials = 1000000;
constexpr double minVal{ 1e-8 };

typedef Eigen::Array<double, 1, Eigen::Dynamic> Array1Xd;
typedef Eigen::Array<double, 2, Eigen::Dynamic> Array2Xd;
typedef Eigen::Array<double, 3, Eigen::Dynamic> Array3Xd;

inline void option1(const Array1Xd& a, Array2Xd& b, Array3Xd& c)
    for (int i = 0; i < 2; ++i) {
        b.row(i) = (a < minVal).select(0, c.row(i + 1) / a);
        c.row(i + 1) = (a < minVal).select(0, c.row(i + 1));

inline void option2(const Array1Xd& a, Array2Xd& b, Array3Xd& c)
    b = (a < minVal).replicate(2, 1).select(0, c.bottomRows(2) / a.replicate(2, 1));
    c.bottomRows(2) = (a < minVal).replicate(2, 1).select(0, c.bottomRows(2));

inline void option3(const Array1Xd& a, Array2Xd& b, Array3Xd& c)
    b = (a < minVal).replicate<2, 1>().select(0, c.bottomRows<2>() / a.replicate<2, 1>());
    c.bottomRows<2>() = (a < minVal).replicate<2, 1>().select(0, c.bottomRows<2>());

int main() {
    Array1Xd a(1, 100);
    Array2Xd b(2, 100);
    Array3Xd c(3, 100);


    auto tpBegin1 = std::chrono::steady_clock::now();
    for (int i = 0; i < numberOfTrials; i++)
        option1(a, b, c);
    auto tpEnd1 = std::chrono::steady_clock::now();

    auto tpBegin2 = std::chrono::steady_clock::now();
    for (int i = 0; i < numberOfTrials; i++)
        option2(a, b, c);
    auto tpEnd2 = std::chrono::steady_clock::now();

    auto tpBegin3 = std::chrono::steady_clock::now();
    for (int i = 0; i < numberOfTrials; i++)
        option3(a, b, c);
    auto tpEnd3 = std::chrono::steady_clock::now();

    std::cout << "(Option 1) Average execution time: " << std::chrono::duration_cast<std::chrono::microseconds>(tpEnd1 - tpBegin1).count() / (long double)(numberOfTrials) << " us" << std::endl;
    std::cout << "(Option 2) Average execution time: " << std::chrono::duration_cast<std::chrono::microseconds>(tpEnd2 - tpBegin2).count() / (long double)(numberOfTrials) << " us" << std::endl;
    std::cout << "(Option 3) Average execution time: " << std::chrono::duration_cast<std::chrono::microseconds>(tpEnd3 - tpBegin3).count() / (long double)(numberOfTrials) << " us" << std::endl;

    return 0;

Average execution times that I have obtained are as follows (i7-9700K, msvc2019, optimizations enabled, NDEBUG):

(Option 1) Average execution time: 0.527717 us
(Option 2) Average execution time: 3.25618 us
(Option 3) Average execution time: 0.512029 us

And with AVX2+OpenMP enabled:

(Option 1) Average execution time: 0.374309 us
(Option 2) Average execution time: 3.31356 us
(Option 3) Average execution time: 0.260551 us


I'm not sure if it is the most "Eigen" way but I hope it helps!


07-27 17:09