本文介绍了跨子对象边界的指针算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下代码(跨子对象边界执行指针算术)是否针对其编译的类型T(在C ++ 11中为不一定非要是POD )或其任何子集吗?

#include <cassert>
#include <cstddef>

template<typename T>
struct Base
{
    // ensure alignment
    union
    {
        T initial;
        char begin;
    };
};

template<typename T, size_t N>
struct Derived : public Base<T>
{
    T rest[N - 1];
    char end;
};

int main()
{
    Derived<float, 10> d;
    assert(&d.rest[9] - &d.initial == 10);
    assert(&d.end - &d.begin == sizeof(float) * 10);
    return 0;
}

LLVM在内部向量类型的实现中使用了上述技术的一种变体,该向量被优化为最初将堆栈用于小型阵列,但一旦超过初始容量便切换到堆分配的缓冲区. (此示例中这样做的原因尚不清楚,但显然可以减少模板代码的膨胀;如果您仔细阅读代码.)

注意:在任何人抱怨之前,这并不完全是他们在做的事情,可能是他们的方法比我在这里给出的更符合标准,但是我想问一谈一般情况.

显然,它可以在实践中起作用,但是我很好奇标准中是否有任何保证.鉴于 N3242/expr.add :

但从理论上讲,以上引号的中间部分加上类的布局和对齐保证,可能会使以下(较小的)调整有效:

#include <cassert>
#include <cstddef>

template<typename T>
struct Base
{
    T initial[1];
};

template<typename T, size_t N>
struct Derived : public Base<T>
{
    T rest[N - 1];
};

int main()
{
    Derived<float, 10> d;
    assert(&d.rest[9] - &d.rest[0] == 9);
    assert(&d.rest[0] == &d.initial[1]);
    assert(&d.rest[0] - &d.initial[0] == 1);
    return 0;
}

与有关union布局,与char *的可转换性等各种其他规定结合起来,可以说使原始代码也有效. (主要问题是上面给出的指针算法的定义中缺少可传递性.)

有人知道吗? N3242/expr.add 似乎很清楚,指针必须属于相同的数组对象"才能定义,但是假设 可以是其他保证的情况如果将标准中的标准"组合在一起,则在这种情况下无论如何都可能需要定义,以便保持逻辑上的自洽. (我不敢打赌,但是至少可以想象得到.)

编辑:@MatthieuM提出了这样的反对意见,即此类不是标准布局,因此,即使两个子类都不能在基础子对象与派生对象的第一个成员之间包含任何填充,与alignof(T)对齐.我不确定这是否正确,但这提出了以下变体问题:

  • 如果继承被删除,是否可以保证工作正常?

  • 即使不是&d.end - &d.begin == sizeof(float) * 10,也可以保证&d.end - &d.begin >= sizeof(float) * 10吗?

最后编辑 @ArneMertz要求非常仔细地阅读 N3242/expr.add (是的,我知道我正在阅读草稿,但已经足够接近了) ,但是该标准是否真的暗示着以下对象具有未定义的行为,那么如果交换线被删除了? (与上述类定义相同)

int main()
{
    Derived<float, 10> d;
    bool aligned;
    float * p = &d.initial[0], * q = &d.rest[0];

    ++p;
    if((aligned = (p == q)))
    {
        std::swap(p, q); // does it matter if this line is removed?
        *++p = 1.0;
    }

    assert(!aligned || d.rest[1] == 1.0);

    return 0;
}

此外,如果==不够强大,那么如果我们利用std::less在指针上形成总阶的事实,并将上述条件更改为:

    if((aligned = (!std::less<float *>()(p, q) && !std::less<float *>()(q, p))))

根据严格的标准阅读,代码是否假定两个相等的指针指向同一数组对象确实被破坏了?

编辑抱歉,只想再添加一个示例,即可消除标准布局问题:

#include <cassert>
#include <cstddef>
#include <utility>
#include <functional>

// standard layout
struct Base
{
    float initial[1];
    float rest[9];
};

int main()
{
    Base b;
    bool aligned;
    float * p = &b.initial[0], * q = &b.rest[0];

    ++p;
    if((aligned = (p == q)))
    {
        std::swap(p, q); // does it matter if this line is removed?
        *++p = 1.0;
        q = &b.rest[1];
        // std::swap(p, q); // does it matter if this line is added?
        p -= 2; // is this UB?
    }
    assert(!aligned || b.rest[1] == 1.0);
    assert(p == &b.initial[0]);

    return 0;
}
解决方案

更新:该答案最初缺少一些信息,因此得出错误的结论.

在您的示例中,initialrest是明显不同的(数组)对象,因此将指向initial(或其元素)的指针与指向rest(或其元素)的指针进行比较

  • UB,如果使用指针的区别. (§5.7,6)
  • 未指定,如果您使用关系运算符(第5.9.2节)
  • ==定义的很好(因此,第二个片段很好,请参见下文)

第一个代码段:

对于您提供的报价(§5.7,6),第一个代码段中的差异是未定义的行为:

要阐明第一个示例代码的UB部分:

//first example
int main()
{
    Derived<float, 10> d;
    assert(&d.rest[9] - &d.initial == 10);            //!!! UB !!!
    assert(&d.end - &d.begin == sizeof(float) * 10);  //!!! UB !!! (*)
    return 0;
}

标记为(*)的行很有趣:d.begind.end不是同一数组的元素,因此该操作导致UB.尽管您可能会reinterpret_cast<char*>(&d)并在结果数组中拥有它们的两个地址,但是这是事实.但是由于该数组是d all 的表示形式,因此不应将其视为对d parts 的访问.因此,尽管该操作可能会奏效,并且可以在任何人梦dream以求的实施方案上产生预期的结果,但根据定义,它仍然是UB.

第二个片段:

这实际上是定义明确的行为,但是实现定义的结果:

int main()
{
    Derived<float, 10> d;
    assert(&d.rest[9] - &d.rest[0] == 9);
    assert(&d.rest[0] == &d.initial[1]);         //(!)
    assert(&d.initial[1] - &d.initial[0] == 1);
    return 0;
}

标有(!)的行不是 ub,但是其结果是定义的实现,因为填充,对齐方式和提到的工具可能会起作用.但是如果该断言将成立,则您可以使用两个对象部分,例如一个数组.

您会知道rest[0]将紧随initial[0]放在内存中. 一见钟情,您不能轻易使用等式:

  • initial[1]将指向initial的最后一位,将其引用为UB.
  • rest[-1]显然超出范围.

但是输入§3.9.2,3:

因此,假设&initial[1] == &rest[0],它将是二进制的,就好像只有一个数组一样,所有都可以.

您可以遍历两个数组,因为您可以在边界处应用一些指针上下文切换".因此,最后一个代码段:不需要swap

但是,有一些警告:rest[-1]是UB,initial[2]也是如此,因为§5.7,5:

(重点是我的).那么,这两个如何融合在一起?

  • 良好路径":&initial[1]可以,并且由于&initial[1] == &rest[0],您可以使用该地址并继续增加指针以访问rest的其他元素,这是因为§3.9.2,3
  • 错误路径":initial[2]*(initial + 2),但是由于第5.7.5节,initial +2已经是UB,您在这里永远不会使用第3.9.2,3节.

一起:您必须在边界处停留,稍作休息以检查地址是否相等,然后您可以继续前进.

Does the following code (which performs pointer arithmetic across subobject boundaries) have well-defined behavior for types T for which it compiles (which, in C++11, does not not necessarily have to be POD) or any subset thereof?

#include <cassert>
#include <cstddef>

template<typename T>
struct Base
{
    // ensure alignment
    union
    {
        T initial;
        char begin;
    };
};

template<typename T, size_t N>
struct Derived : public Base<T>
{
    T rest[N - 1];
    char end;
};

int main()
{
    Derived<float, 10> d;
    assert(&d.rest[9] - &d.initial == 10);
    assert(&d.end - &d.begin == sizeof(float) * 10);
    return 0;
}

LLVM uses a variation of the above technique in the implementation of an internal vector type which is optimized to initially use the stack for small arrays but switches to a heap-allocated buffer once over initial capacity. (The reason for doing it this way is not clear from this example but is apparently to reduce template code bloat; this is clearer if you look through the code.)

NOTE: Before anyone complains, this is not exactly what they are doing and it might be that their approach is more standards-compliant than what I have given here, but I wanted to ask about the general case.

Obviously, it works in practice, but I'm curious if anything in the standard guarantees for that to be the case. I'm inclined to say no, given N3242/expr.add:

But theoretically, the middle part of the above quote, combined with class layout and alignment guarantees, might allow the following (minor) adjustment to be valid:

#include <cassert>
#include <cstddef>

template<typename T>
struct Base
{
    T initial[1];
};

template<typename T, size_t N>
struct Derived : public Base<T>
{
    T rest[N - 1];
};

int main()
{
    Derived<float, 10> d;
    assert(&d.rest[9] - &d.rest[0] == 9);
    assert(&d.rest[0] == &d.initial[1]);
    assert(&d.rest[0] - &d.initial[0] == 1);
    return 0;
}

which combined with various other provisions concerning union layout, convertibility to and from char *, etc., might arguably make the original code valid as well. (The main problem is the lack of transitivity in the definition of pointer arithmetic given above.)

Anyone know for sure? N3242/expr.add seems to make clear that pointers must belong to the same "array object" for it to be defined, but it could hypothetically be the case that other guarantees in the standard, when combined together, might require a definition anyway in this case in order to remain logically self-consistent. (I'm not betting on it, but I would it's at least conceivable.)

EDIT: @MatthieuM raises the objection that this class is not standard-layout and therefore might not be guaranteed to contain no padding between the base subobject and the first member of the derived, even if both are aligned to alignof(T). I'm not sure how true that is, but that opens up the following variant questions:

  • Would this be guaranteed to work if the inheritance were removed?

  • Would &d.end - &d.begin >= sizeof(float) * 10 be guaranteed even if &d.end - &d.begin == sizeof(float) * 10 were not?

LAST EDIT @ArneMertz argues for a very close reading of N3242/expr.add (yes, I know I'm reading a draft, but it's close enough), but does the standard really imply that the following has undefined behavior then if the swap line is removed? (same class definitions as above)

int main()
{
    Derived<float, 10> d;
    bool aligned;
    float * p = &d.initial[0], * q = &d.rest[0];

    ++p;
    if((aligned = (p == q)))
    {
        std::swap(p, q); // does it matter if this line is removed?
        *++p = 1.0;
    }

    assert(!aligned || d.rest[1] == 1.0);

    return 0;
}

Also, if == is not strong enough, what if we take advantage of the fact that std::less forms a total order over pointers, and change the conditional above to:

    if((aligned = (!std::less<float *>()(p, q) && !std::less<float *>()(q, p))))

Is code that assumes that two equal pointers point to the same array object really broken according to a strict reading of the standard?

EDIT Sorry, just want to add one more example, to eliminate the standard layout issue:

#include <cassert>
#include <cstddef>
#include <utility>
#include <functional>

// standard layout
struct Base
{
    float initial[1];
    float rest[9];
};

int main()
{
    Base b;
    bool aligned;
    float * p = &b.initial[0], * q = &b.rest[0];

    ++p;
    if((aligned = (p == q)))
    {
        std::swap(p, q); // does it matter if this line is removed?
        *++p = 1.0;
        q = &b.rest[1];
        // std::swap(p, q); // does it matter if this line is added?
        p -= 2; // is this UB?
    }
    assert(!aligned || b.rest[1] == 1.0);
    assert(p == &b.initial[0]);

    return 0;
}
解决方案

Updated: This answer at first missed some information and thus lead to wrong conclusions.

In your examples, initial and rest are clearly distinct (array) objects, so comparing pointers to initial (or its elements) with pointers to rest (or its elements) is

  • UB, if you use the difference of the pointers. (§5.7,6)
  • unspecified, if you use relational operators (§5.9,2)
  • well defined for == (So the second snipped is good, see below)

First snippet:

Building the difference in the first snippet is undefined behavior, for the quote you provided (§5.7,6):

To clarify the UB parts of the first example code:

//first example
int main()
{
    Derived<float, 10> d;
    assert(&d.rest[9] - &d.initial == 10);            //!!! UB !!!
    assert(&d.end - &d.begin == sizeof(float) * 10);  //!!! UB !!! (*)
    return 0;
}

The line marked with (*) is interesting: d.begin and d.end are not elements of the same array and therefore the operation results in UB. This is despite the fact you may reinterpret_cast<char*>(&d) and have both their addresses in the resulting array. But since that array is a representation of all of d, it's not to be seen as an access to parts of d. So while that operation probably will just work and give the expected result on any implementation one can dream of, it still is UB - as a matter of definition.

Second snippet:

This is actually well defined behavior, but implementation defined result:

int main()
{
    Derived<float, 10> d;
    assert(&d.rest[9] - &d.rest[0] == 9);
    assert(&d.rest[0] == &d.initial[1]);         //(!)
    assert(&d.initial[1] - &d.initial[0] == 1);
    return 0;
}

The line marked with (!) is not ub, but its result is implementation defined, since padding, alignment and the mentioned instumentation might play a role.But if that assertion would hold, you could use the two object parts like one array.

You would know that rest[0] would lay immediately after initial[0] in memory. At first sight, you could not easily use the equality:

  • initial[1] would point one-past-the-end of initial, dereferencing it is UB.
  • rest[-1] is clearly out of bounds.

But enters §3.9.2,3:

So provided that &initial[1] == &rest[0], it will be binary the same as if there was only one array, and all will be ok.

You could iterate over both arrays, since you could apply some "pointer context switch" at the boundaries. So to your last snippet: the swap is not needed!

However, there are some caveats: rest[-1] is UB, and so would be initial[2], because of §5.7,5:

(emphasis mine). So how do these two fit together?

  • "Good path": &initial[1] is ok, and since &initial[1] == &rest[0] you can take that address and go on to increment the pointer to access the other elements of rest, because of §3.9.2,3
  • "Bad path": initial[2] is *(initial + 2), but since §5.7,5, initial +2 is already UB and you never get to use §3.9.2,3 here.

Together: you have to stop by at the boundary, take a short break to check that the addresses are equal and then you can move on.

这篇关于跨子对象边界的指针算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-31 07:47