在 T 和 UnsafeCell T 之间转换是否安全和定义行为?

本文介绍了在 T 和 UnsafeCell T 之间转换是否安全和定义行为?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

一个最近的问题正在寻找构建自我参照结构的能力.在讨论该问题的可能答案时，一个可能的答案涉及使用 UnsafeCell 用于内部可变性，然后通过 转化.

A recent question was looking for the ability to construct self-referential structures. In discussing possible answers for the question, one potential answer involved using an UnsafeCell for interior mutability and then "discarding" the mutability through a transmute.

以下是这种想法的一个小例子.我对示例本身并不感兴趣，但它足够复杂，需要像 transmute 这样更大的锤子，而不是仅仅使用 UnsafeCell::new 和/或 UnsafeCell::into_inner:

Here's a small example of such an idea in action. I'm not deeply interested in the example itself, but it's just enough complication to require a bigger hammer like transmute as opposed to just using UnsafeCell::new and/or UnsafeCell::into_inner:

use std::{
    cell::UnsafeCell, mem, rc::{Rc, Weak},
};

// This is our real type.
struct ReallyImmutable {
    value: i32,
    myself: Weak<ReallyImmutable>,
}

fn initialize() -> Rc<ReallyImmutable> {
    // This mirrors ReallyImmutable but we use `UnsafeCell`
    // to perform some initial interior mutation.
    struct NotReallyImmutable {
        value: i32,
        myself: Weak<UnsafeCell<NotReallyImmutable>>,
    }

    let initial = NotReallyImmutable {
        value: 42,
        myself: Weak::new(),
    };

    // Without interior mutability, we couldn't update the `myself` field
    // after we've created the `Rc`.
    let second = Rc::new(UnsafeCell::new(initial));

    // Tie the recursive knot
    let new_myself = Rc::downgrade(&second);

    unsafe {
        // Should be safe as there can be no other accesses to this field
        (&mut *second.get()).myself = new_myself;

        // No one outside of this function needs the interior mutability
        // TODO: Is this call safe?
        mem::transmute(second)
    }
}

fn main() {
    let v = initialize();
    println!("{} -> {:?}", v.value, v.myself.upgrade().map(|v| v.value))
}

此代码似乎打印出我所期望的内容，但这并不意味着它是安全的或使用了已定义的语义.

This code appears to print out what I'd expect, but that doesn't mean that it's safe or using defined semantics.

从 UnsafeCell 转换为 T 内存安全吗?它会调用未定义的行为吗?从 T 到 UnsafeCell 的相反方向转换怎么样?

Is transmuting from a UnsafeCell<T> to a T memory safe? Does it invoke undefined behavior? What about transmuting in the opposite direction, from a T to an UnsafeCell<T>?

推荐答案

(我对 SO 还是个新手，不确定好吧，也许"是否有资格作为答案，但你去吧.;)

(I am still new to SO and not sure if "well, maybe" qualifies as an answer, but here you go. ;)

免责声明:这类事情的规则(还)不是一成不变的.所以，目前还没有确定的答案.我将根据 (a) LLVM 执行/我们最终想要做的编译器转换类型，以及 (b) 我脑中将定义答案的模型类型做出一些猜测.

Disclaimer: The rules for these kinds of things are not (yet) set in stone. So, there is no definitive answer yet. I'm going to make some guesses based on (a) what kinds of compiler transformations LLVM does/we will eventually want to do, and (b) what kind of models I have in my head that would define the answer to this.

此外，我看到了两个部分:数据布局透视图和别名透视图.布局问题是 NotReallyImmutable 原则上可以与 ReallyImmutable 具有完全不同的布局.我不太了解数据布局，但是随着 UnsafeCell 变成 repr(transparent) 并且这是两种类型之间的唯一区别，我认为 intent 是为了这个工作.但是，您依赖 repr(transparent) 是结构化的"，因为它应该允许您替换较大类型的内容，我不确定是否已在任何地方明确写下.听起来像是对后续 RFC 的提议，以适当地扩展 repr(transparent) 保证?

Also, I see two parts to this: The data layout perspective, and the aliasing perspective. The layout issue is that NotReallyImmutable could, in principle, have a totally different layout than ReallyImmutable. I don't know much about data layout, but with UnsafeCell becoming repr(transparent) and that being the only difference between the two types, I think the intent is for this to work. You are, however, relying on repr(transparent) being "structural" in the sense that it should allow you to replace things in larger types, which I am not sure has been written down explicitly anywhere. Sounds like a proposal for a follow-up RFC that extends the repr(transparent) guarantees appropriately?

就别名而言，问题在于打破了围绕 &T 的规则.我想说的是，只要您在通过 &UnsafeCell<T> 编写代码时，在任何地方都不会有实时的 &T，您就很好——但是我认为我们还不能完全保证.让我们更详细地看一下.

As far as aliasing is concerned, the issue is breaking the rules around &T. I'd say that, as long as you never have a live &T around anywhere when writing through the &UnsafeCell<T>, you are good -- but I don't think we can guarantee that quite yet. Let's look in more detail.

这里的相关优化是利用 &T 只读的优化.因此，如果您重新排序最后两行(transmute 和赋值)，该代码可能是 UB，因为我们可能希望编译器能够预取"共享引用后面的值，并且稍后重用该值(即内联此值后).

The relevant optimizations here are the ones that exploit &T being read-only. So if you reordered the last two lines (transmute and the assignment), that code would likely be UB as we may want the compiler to be able to "pre-fetch" the value behind the shared reference and re-use that value later (i.e. after inlining this).

但是在您的代码中，我们只会在 transmute 返回后发出只读"注释(LLVM 中的 noalias)，并且数据确实是读取的-只有从那里开始.所以，这应该不错.

But in your code, we would only emit "read-only" annotations (noalias in LLVM) after the transmute comes back, and the data is indeed read-only starting there. So, this should be good.

我的记忆模型中最激进的"本质上是断言所有值始终有效，我认为即使该模型也适用于您的代码.&UnsafeCell 是该模型中的一个特例，它的有效性刚刚停止，并且没有说明此引用背后的内容.transmute 返回的那一刻，我们获取它指向的内存并将其全部设为只读，即使我们通过 Rc(我的模型没有，但只是因为我想不出一个好的方法来做到这一点)你会没事的，因为你在 transmute 之后不再变异.(您可能已经注意到，这与编译器视角中的限制相同.毕竟，这些模型的重点是允许编译器优化.;)

The "most aggressive" of my memory models essentially asserts that all values are always valid, and I think even that model should be fine with your code. &UnsafeCell is a special case in that model where validity just stops, and nothing is said about what lives behind this reference. The moment the transmute returns, we grab the memory it points to and make it all read-only, and even if we did that "recursively" through the Rc (which my model doesn't, but only because I couldn't figure out a good way to make it do so) you'd be fine as you don't mutate any more after the transmute. (As you may have noticed, this is the same restriction as in the compiler perspective. The point of these models is to allow compiler optimizations, after all. ;)

(作为旁注，我真的希望 miri 现在处于更好的状态.似乎我必须尝试验证才能在那里再次工作，因为那样我可以告诉你只在 miri 中运行你的代码，它会告诉你我的模型的那个版本是否适合你正在做的事情:D)

(As a side-note, I really wish miri was in better shape right now. Seems I have to try and get validation to work again in there, because then I could tell you to just run your code in miri and it'd tell you if that version of my model is okay with what you are doing :D )

我目前正在考虑其他模型，它们仅在访问时"检查事物，但尚未为该模型制定 UnsafeCell 故事.这个例子表明，模型可能必须包含内存相变"的方法，首先是 UnsafeCell，然后是具有只读保证的正常共享.感谢您提出这个问题，这将成为一些值得思考的好例子！

I am thinking about other models currently that only check things "on access", but haven't worked out the UnsafeCell story for that model yet. What this example shows is that the model may have to contain ways for a "phase transition" of memory first being UnsafeCell, but later having normal sharing with read-only guarantees. Thanks for bringing this up, that will make for some nice examples to think about!

所以，我想我可以说(至少在我这边)有允许这种代码的意图，并且这样做似乎并没有阻止任何优化.我们是否真的能找到一个每个人都同意并且仍然允许这样做的模型，我无法预测.

So, I think I can say that (at least from my side) there is the intent to allow this kind of code, and doing so does not seem to prevent any optimizations. Whether we'll actually manage to find a model that everybody can agree with and that still allows this, I cannot predict.

现在，这更有趣.问题是，正如我上面所说的，在通过 UnsafeCell 写入时，您不能让 &T 存活.但这里的活"是什么意思?这是一个很难的问题！在我的一些模型中，这可能与该类型的引用存在于某处并且生命周期仍处于活动状态"一样弱，即，它可能与引用是否实际使用无关.(这很有用，因为它让我们可以进行更多优化，例如即使我们无法证明循环曾经运行过，也可以将负载移出循环——这会引入使用其他未使用的引用.)并且由于 &T 是Copy，你甚至不能真正摆脱这样的引用.所以，如果你有 x: &T，那么在 let y: &UnsafeCell 之后= transmute(x)，旧的 x 仍然存在并且它的生命周期仍然有效，所以通过 y 写入很可能是 UB.

Now, this is more interesting. The problem is that, as I said above, you must not have a &T live when writing through an UnsafeCell<T>. But what does "live" mean here? That's a hard question! In some of my models, this could be as weak as "a reference of that type exists somewhere and the lifetime is still active", i.e., it could have nothing to do with whether the reference is actually used. (That's useful because it lets us do more optimizations, like moving a load out of a loop even if we cannot prove that the loop ever runs -- which would introduce a use of an otherwise unused reference.) And since &T is Copy, you cannot even really get rid of such a reference either. So, if you have x: &T, then after let y: &UnsafeCell<T> = transmute(x), the old x is still around and its lifetime still active, so writing through y could well be UB.

我认为你必须以某种方式限制 &T 允许的别名，非常小心地确保没有人仍然持有这样的引用.我不会说这是不可能的"，因为人们总是让我感到惊讶(尤其是在这个社区中；)但是 TBH 我想不出一种方法来完成这项工作.我很好奇你是否有一个例子，尽管你认为这是合理的.

I think you'd have to somehow restrict the aliasing that &T allows, very carefully making sure that nobody still holds such a reference. I'm not going to say "this is impossible" because people keep surprising me (especially in this community ;) but TBH I cannot think of a way to make this work. I'd be curious if you have an example though where you think this is reasonable.

这篇关于在 T 和 UnsafeCell T 之间转换是否安全和定义行为?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！