本文介绍了PHP“foreach"实际上是如何工作的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我先说我知道 foreach 是什么、做什么以及如何使用它.这个问题涉及它在引擎盖下的工作方式,我不想要任何类似这就是使用 foreach 循环数组的方式"这样的答案.

Let me prefix this by saying that I know what foreach is, does and how to use it. This question concerns how it works under the bonnet, and I don't want any answers along the lines of "this is how you loop an array with foreach".

很长一段时间我都认为 foreach 与数组本身一起工作.然后我发现许多关于它与数组的副本一起工作的事实的参考,并且我从那时起假设这就是故事的结尾.但是我最近就此事进行了讨论,经过一些实验后发现这实际上并非 100% 正确.

For a long time I assumed that foreach worked with the array itself. Then I found many references to the fact that it works with a copy of the array, and I have since assumed this to be the end of the story. But I recently got into a discussion on the matter, and after a little experimentation found that this was not in fact 100% true.

让我展示我的意思.对于以下测试用例,我们将使用以下数组:

Let me show what I mean. For the following test cases, we will be working with the following array:

$array = array(1, 2, 3, 4, 5);

测试用例 1:

foreach ($array as $item) {
  echo "$item
";
  $array[] = $item;
}
print_r($array);

/* Output in loop:    1 2 3 4 5
   $array after loop: 1 2 3 4 5 1 2 3 4 5 */

这清楚地表明我们没有直接使用源数组 - 否则循环将永远持续下去,因为我们在循环过程中不断地将项目推送到数组上.但只是为了确定情况确实如此:

This clearly shows that we are not working directly with the source array - otherwise the loop would continue forever, since we are constantly pushing items onto the array during the loop. But just to be sure this is the case:

测试案例 2:

foreach ($array as $key => $item) {
  $array[$key + 1] = $item + 2;
  echo "$item
";
}

print_r($array);

/* Output in loop:    1 2 3 4 5
   $array after loop: 1 3 4 5 6 7 */

这支持了我们最初的结论,我们在循环期间使用源数组的副本,否则我们将在循环期间看到修改后的值.但是...

This backs up our initial conclusion, we are working with a copy of the source array during the loop, otherwise we would see the modified values during the loop. But...

如果我们查看手册,我们会发现以下语句:

If we look in the manual, we find this statement:

当 foreach 第一次开始执行时,内部数组指针会自动重置为数组的第一个元素.

对...这似乎表明 foreach 依赖于源数组的数组指针.但是我们刚刚证明了我们没有使用源数组,对吗?嗯,不完全是.

Right... this seems to suggest that foreach relies on the array pointer of the source array. But we've just proved that we're not working with the source array, right? Well, not entirely.

测试用例 3:

// Move the array pointer on one to make sure it doesn't affect the loop
var_dump(each($array));

foreach ($array as $item) {
  echo "$item
";
}

var_dump(each($array));

/* Output
  array(4) {
    [1]=>
    int(1)
    ["value"]=>
    int(1)
    [0]=>
    int(0)
    ["key"]=>
    int(0)
  }
  1
  2
  3
  4
  5
  bool(false)
*/

因此,尽管我们没有直接使用源数组,但我们直接使用源数组指针——指针位于循环末尾的数组末尾这一事实表明了这一点.除了这不可能是真的 - 如果是这样,那么 测试用例 1 将永远循环.

So, despite the fact that we are not working directly with the source array, we are working directly with the source array pointer - the fact that the pointer is at the end of the array at the end of the loop shows this. Except this can't be true - if it was, then test case 1 would loop forever.

PHP 手册还指出:

由于 foreach 依赖于内部数组指针在循环内更改它可能会导致意外行为.

好吧,让我们找出意外行为"是什么(从技术上讲,任何行为都是意外的,因为我不再知道会发生什么).

Well, let's find out what that "unexpected behavior" is (technically, any behavior is unexpected since I no longer know what to expect).

测试案例 4:

foreach ($array as $key => $item) {
  echo "$item
";
  each($array);
}

/* Output: 1 2 3 4 5 */

测试用例 5:

foreach ($array as $key => $item) {
  echo "$item
";
  reset($array);
}

/* Output: 1 2 3 4 5 */

...没有什么出乎意料的,事实上它似乎支持源代码复制"理论.

...nothing that unexpected there, in fact it seems to support the "copy of source" theory.

问题

这里发生了什么?我的 C-fu 不够好,我只能通过查看 PHP 源代码来提取正确的结论,如果有人能帮我将其翻译成英文,我将不胜感激.

What is going on here? My C-fu is not good enough for me to able to extract a proper conclusion simply by looking at the PHP source code, I would appreciate it if someone could translate it into English for me.

在我看来 foreach 与数组的 copy 一起工作,但在循环后将源数组的数组指针设置为数组的末尾.

It seems to me that foreach works with a copy of the array, but sets the array pointer of the source array to the end of the array after the loop.

  • 这是正确的吗?完整的故事吗?
  • 如果不是,它到底在做什么?
  • 是否有任何情况在 foreach 期间使用调整数组指针的函数(each()reset() 等)> 会影响循环的结果吗?
  • Is this correct and the whole story?
  • If not, what is it really doing?
  • Is there any situation where using functions that adjust the array pointer (each(), reset() et al.) during a foreach could affect the outcome of the loop?

推荐答案

foreach 支持对三种不同类型的值进行迭代:

foreach supports iteration over three different kinds of values:

在下文中,我将尝试准确解释迭代在不同情况下的工作原理.到目前为止,最简单的情况是 Traversable 对象,至于这些 foreach 基本上只是这些代码的语法糖:

In the following, I will try to explain precisely how iteration works in different cases. By far the simplest case is Traversable objects, as for these foreach is essentially only syntax sugar for code along these lines:

foreach ($it as $k => $v) { /* ... */ }

/* translates to: */

if ($it instanceof IteratorAggregate) {
    $it = $it->getIterator();
}
for ($it->rewind(); $it->valid(); $it->next()) {
    $v = $it->current();
    $k = $it->key();
    /* ... */
}

对于内部类,通过使用本质上只是在 C 级别镜像 Iterator 接口的内部 API,避免了实际的方法调用.

For internal classes, actual method calls are avoided by using an internal API that essentially just mirrors the Iterator interface on the C level.

数组和普通对象的迭代要复杂得多.首先,需要注意的是,在 PHP 中,数组"是真正有序的字典,它们将按照这个顺序遍历(只要你没有使用诸如 sort).这与按键的自然顺序(其他语言中的列表通常如何工作)或根本没有定义的顺序(其他语言中的字典通常如何工作)进行迭代相反.

Iteration of arrays and plain objects is significantly more complicated. First of all, it should be noted that in PHP "arrays" are really ordered dictionaries and they will be traversed according to this order (which matches the insertion order as long as you didn't use something like sort). This is opposed to iterating by the natural order of the keys (how lists in other languages often work) or having no defined order at all (how dictionaries in other languages often work).

这同样适用于对象,因为对象属性可以被视为另一个(有序的)字典将属性名称映射到它们的值,加上一些可见性处理.在大多数情况下,对象属性实际上并没有以这种相当低效的方式存储.但是,如果您开始迭代一个对象,通常使用的打包表示将转换为真正的字典.在这一点上,普通对象的迭代变得非常类似于数组的迭代(这就是为什么我不会在这里过多讨论普通对象迭代).

The same also applies to objects, as the object properties can be seen as another (ordered) dictionary mapping property names to their values, plus some visibility handling. In the majority of cases, the object properties are not actually stored in this rather inefficient way. However, if you start iterating over an object, the packed representation that is normally used will be converted to a real dictionary. At that point, iteration of plain objects becomes very similar to iteration of arrays (which is why I'm not discussing plain-object iteration much in here).

到目前为止,一切都很好.迭代字典不会太难,对吧?当您意识到数组/对象可以在迭代过程中发生变化时,问题就开始了.发生这种情况的方式有多种:

So far, so good. Iterating over a dictionary can't be too hard, right? The problems begin when you realize that an array/object can change during iteration. There are multiple ways this can happen:

  • 如果您使用 foreach ($arr as &$v) 按引用进行迭代,则 $arr 将转换为引用,您可以在迭代期间更改它.
  • 在 PHP 5 中,即使您按值迭代也是如此,但该数组是事先引用的:$ref =&$arr;foreach ($ref as $v)
  • 对象具有通过句柄传递的语义,对于大多数实际目的而言,这意味着它们的行为类似于引用.因此,在迭代期间始终可以更改对象.
  • If you iterate by reference using foreach ($arr as &$v) then $arr is turned into a reference and you can change it during iteration.
  • In PHP 5 the same applies even if you iterate by value, but the array was a reference beforehand: $ref =& $arr; foreach ($ref as $v)
  • Objects have by-handle passing semantics, which for most practical purposes means that they behave like references. So objects can always be changed during iteration.

在迭代期间允许修改的问题是您当前所在的元素被删除的情况.假设您使用一个指针来跟踪您当前所在的数组元素.如果这个元素现在被释放,你会留下一个悬空指针(通常会导致段错误).

The problem with allowing modifications during iteration is the case where the element you are currently on is removed. Say you use a pointer to keep track of which array element you are currently at. If this element is now freed, you are left with a dangling pointer (usually resulting in a segfault).

解决这个问题有不同的方法.PHP 5 和 PHP 7 在这方面有很大不同,我将在下面描述这两种行为.总而言之,PHP 5 的方法相当愚蠢,会导致各种奇怪的边缘情况问题,而 PHP 7 的更多参与方法导致更可预测和一致的行为.

There are different ways of solving this issue. PHP 5 and PHP 7 differ significantly in this regard and I'll describe both behaviors in the following. The summary is that PHP 5's approach was rather dumb and lead to all kinds of weird edge-case issues, while PHP 7's more involved approach results in more predictable and consistent behavior.

作为最后的初步,应该注意的是,PHP 使用引用计数和写时复制来管理内存.这意味着如果你复制"一个值,你实际上只是重用旧值并增加它的引用计数(refcount).只有在您执行某种修改后,才会完成真正的副本(称为复制").请参阅您被欺骗以获得更广泛的信息关于这个话题的介绍.

As a last preliminary, it should be noted that PHP uses reference counting and copy-on-write to manage memory. This means that if you "copy" a value, you actually just reuse the old value and increment its reference count (refcount). Only once you perform some kind of modification a real copy (called a "duplication") will be done. See You're being lied to for a more extensive introduction on this topic.

PHP 5 中的数组有一个专用的内部数组指针"(IAP),它正确支持修改:每当删除一个元素时,都会检查 IAP 是否指向该元素.如果是,则改为前进到下一个元素.

Arrays in PHP 5 have one dedicated "internal array pointer" (IAP), which properly supports modifications: Whenever an element is removed, there will be a check whether the IAP points to this element. If it does, it is advanced to the next element instead.

虽然 foreach 确实使用了 IAP,但还有一个额外的问题:只有一个 IAP,但一个数组可以是多个 foreach 循环的一部分:

While foreach does make use of the IAP, there is an additional complication: There is only one IAP, but one array can be part of multiple foreach loops:

// Using by-ref iteration here to make sure that it's really
// the same array in both loops and not a copy
foreach ($arr as &$v1) {
    foreach ($arr as &$v) {
        // ...
    }
}

为了支持只有一个内部数组指针的两个同时循环,foreach 执行以下诡计:在循环体执行之前,foreach 将备份一个指向数组指针的指针当前元素及其散列到 per-foreach HashPointer.循环体运行后,如果仍然存在,IAP 将被设置回该元素.但是,如果该元素已被删除,我们将只使用 IAP 当前所在的位置.这个方案大体上有点作用,但是你可以摆脱它的很多奇怪的行为,我将在下面演示其中的一些.

To support two simultaneous loops with only one internal array pointer, foreach performs the following shenanigans: Before the loop body is executed, foreach will back up a pointer to the current element and its hash into a per-foreach HashPointer. After the loop body runs, the IAP will be set back to this element if it still exists. If however the element has been removed, we'll just use wherever the IAP is currently at. This scheme mostly-kinda-sort of works, but there's a lot of weird behavior you can get out of it, some of which I'll demonstrate below.

IAP 是数组的一个可见特性(通过 current 函数系列公开),因为在写时复制语义下,对 IAP 的更改算作修改.不幸的是,这意味着 foreach 在许多情况下被迫复制它正在迭代的数组.准确的条件是:

The IAP is a visible feature of an array (exposed through the current family of functions), as such changes to the IAP count as modifications under copy-on-write semantics. This, unfortunately, means that foreach is in many cases forced to duplicate the array it is iterating over. The precise conditions are:

  1. 数组不是引用(is_ref=0).如果是引用,则应该传播对它的更改,因此不应复制.
  2. 该数组的引用计数>1.如果 refcount 为 1,则数组不共享,我们可以直接修改它.
  1. The array is not a reference (is_ref=0). If it's a reference, then changes to it are supposed to propagate, so it should not be duplicated.
  2. The array has refcount>1. If refcount is 1, then the array is not shared and we're free to modify it directly.

如果数组没有重复(is_ref=0,refcount=1),那么只有它的refcount 会增加(*).此外,如果使用 foreach 引用,那么(可能重复的)数组将变成引用.

If the array is not duplicated (is_ref=0, refcount=1), then only its refcount will be incremented (*). Additionally, if foreach by reference is used, then the (potentially duplicated) array will be turned into a reference.

将此代码视为发生重复的示例:

Consider this code as an example where duplication occurs:

function iterate($arr) {
    foreach ($arr as $v) {}
}

$outerArr = [0, 1, 2, 3, 4];
iterate($outerArr);

这里,$arr 将被复制,以防止 $arr 上的 IAP 更改泄漏到 $outerArr.就上述条件而言,数组不是引用(is_ref=0),而是用在两个地方(refcount=2).这个要求很不幸,并且是次优实现的产物(这里没有迭代过程中的修改问题,所以我们实际上不需要首先使用 IAP).

Here, $arr will be duplicated to prevent IAP changes on $arr from leaking to $outerArr. In terms of the conditions above, the array is not a reference (is_ref=0) and is used in two places (refcount=2). This requirement is unfortunate and an artifact of the suboptimal implementation (there is no concern of modification during iteration here, so we don't really need to use the IAP in the first place).

(*) 增加 refcount 在这里听起来无害,但违反了写时复制 (COW) 语义:这意味着我们将修改 refcount=2 数组的 IAP,而COW 规定只能对 refcount=1 值执行修改.这种违规会导致用户可见的行为更改(而 COW 通常是透明的),因为迭代数组上的 IAP 更改将是可观察到的——但仅限于对数组进行第一次非 IAP 修改.相反,三个有效"选项将是 a) 始终重复,b) 不增加 refcount 从而允许在循环中任意修改迭代数组或 c) 不完全使用 IAP(PHP 7 解决方案).

(*) Incrementing the refcount here sounds innocuous, but violates copy-on-write (COW) semantics: This means that we are going to modify the IAP of a refcount=2 array, while COW dictates that modifications can only be performed on refcount=1 values. This violation results in user-visible behavior change (while a COW is normally transparent) because the IAP change on the iterated array will be observable -- but only until the first non-IAP modification on the array. Instead, the three "valid" options would have been a) to always duplicate, b) do not increment the refcount and thus allowing the iterated array to be arbitrarily modified in the loop or c) don't use the IAP at all (the PHP 7 solution).

为了正确理解下面的代码示例,您必须注意最后一个实现细节.循环遍历某些数据结构的正常"方式在伪代码中看起来像这样:

There is one last implementation detail that you have to be aware of to properly understand the code samples below. The "normal" way of looping through some data structure would look something like this in pseudocode:

reset(arr);
while (get_current_data(arr, &data) == SUCCESS) {
    code();
    move_forward(arr);
}

然而,foreach,作为一个相当特殊的雪花,选择做一些稍微不同的事情:

However foreach, being a rather special snowflake, chooses to do things slightly differently:

reset(arr);
while (get_current_data(arr, &data) == SUCCESS) {
    move_forward(arr);
    code();
}

也就是说,数组指针在循环体运行之前已经向前移动了.这意味着当循环体在元素 $i 上工作时,IAP 已经在元素 $i+1 上.这就是为什么在迭代期间显示修改的代码示例总是 unset next 元素,而不是当前元素的原因.

Namely, the array pointer is already moved forward before the loop body runs. This means that while the loop body is working on element $i, the IAP is already at element $i+1. This is the reason why code samples showing modification during iteration will always unset the next element, rather than the current one.

上述三个方面应该让您对 foreach 实现的特性有一个大致完整的印象,我们可以继续讨论一些示例.

The three aspects described above should provide you with a mostly complete impression of the idiosyncrasies of the foreach implementation and we can move on to discuss some examples.

此时您的测试用例的行为很容易解释:

The behavior of your test cases is simple to explain at this point:

  • 在测试用例 1 和 2 中,$array 以 refcount=1 开始,所以它不会被 foreach 复制:只有 refcount 递增.当循环体随后修改数组(在该点具有 refcount=2)时,将在该点发生重复.Foreach 将继续处理 $array 的未修改副本.

  • In test cases 1 and 2 $array starts off with refcount=1, so it will not be duplicated by foreach: Only the refcount is incremented. When the loop body subsequently modifies the array (which has refcount=2 at that point), the duplication will occur at that point. Foreach will continue working on an unmodified copy of $array.

在测试用例 3 中,数组再次没有重复,因此 foreach 将修改 $array 变量的 IAP.迭代结束时,IAP为NULL(表示迭代完成),each通过返回false表示.

In test case 3, once again the array is not duplicated, thus foreach will be modifying the IAP of the $array variable. At the end of the iteration, the IAP is NULL (meaning iteration has done), which each indicates by returning false.

在测试用例 4 和 5 中,eachreset 都是按引用函数.$array 在传递给它们时有一个 refcount=2,所以它必须被复制.因此 foreach 将再次处理一个单独的数组.

In test cases 4 and 5 both each and reset are by-reference functions. The $array has a refcount=2 when it is passed to them, so it has to be duplicated. As such foreach will be working on a separate array again.

展示各种重复行为的一个好方法是观察 current() 函数在 foreach 循环内的行为.考虑这个例子:

A good way to show the various duplication behaviors is to observe the behavior of the current() function inside a foreach loop. Consider this example:

foreach ($array as $val) {
    var_dump(current($array));
}
/* Output: 2 2 2 2 2 */

这里你应该知道 current() 是一个 by-ref 函数(实际上是:prefer-ref),即使它不修改数组.它必须与所有其他函数(如 next)一起使用,这些函数都是 by-ref.按引用传递意味着数组必须分开,因此 $arrayforeach-array 将是不同的.上面也提到了你得到 2 而不是 1 的原因:foreach 推进数组指针 before 运行用户代码,而不是之后.所以即使代码位于第一个元素,foreach 已经将指针移到第二个元素.

Here you should know that current() is a by-ref function (actually: prefer-ref), even though it does not modify the array. It has to be in order to play nice with all the other functions like next which are all by-ref. By-reference passing implies that the array has to be separated and thus $array and the foreach-array will be different. The reason you get 2 instead of 1 is also mentioned above: foreach advances the array pointer before running the user code, not after. So even though the code is at the first element, foreach already advanced the pointer to the second.

现在让我们尝试一个小的修改:

Now lets try a small modification:

$ref = &$array;
foreach ($array as $val) {
    var_dump(current($array));
}
/* Output: 2 3 4 5 false */

这里我们有 is_ref=1 的情况,所以数组没有被复制(就像上面一样).但是现在它是一个引用,传递给 by-ref current() 函数时不再需要复制数组.因此 current()foreach 在同一个数组上工作.不过,由于 foreach 使指针前进的方式,您仍然会看到逐一行为.

Here we have the is_ref=1 case, so the array is not copied (just like above). But now that it is a reference, the array no longer has to be duplicated when passing to the by-ref current() function. Thus current() and foreach work on the same array. You still see the off-by-one behavior though, due to the way foreach advances the pointer.

在进行 by-ref 迭代时您会得到相同的行为:

You get the same behavior when doing by-ref iteration:

foreach ($array as &$val) {
    var_dump(current($array));
}
/* Output: 2 3 4 5 false */

这里重要的是foreach在通过引用迭代时会让$array成为is_ref=1,所以基本上你的情况和上面一样.

Here the important part is that foreach will make $array an is_ref=1 when it is iterated by reference, so basically you have the same situation as above.

另一个小的变化,这次我们将数组分配给另一个变量:

Another small variation, this time we'll assign the array to another variable:

$foo = $array;
foreach ($array as $val) {
    var_dump(current($array));
}
/* Output: 1 1 1 1 1 */

这里 $array 的引用计数在循环开始时为 2,因此我们实际上必须预先进行复制.因此 $array 和 foreach 使用的数组从一开始就完全分开.这就是为什么您可以在循环之前的任何位置获取 IAP 的位置(在本例中它位于第一个位置).

Here the refcount of the $array is 2 when the loop is started, so for once we actually have to do the duplication upfront. Thus $array and the array used by foreach will be completely separate from the outset. That's why you get the position of the IAP wherever it was before the loop (in this case it was at the first position).

试图解释迭代期间的修改是我们所有 foreach 问题的根源,因此它有助于考虑这种情况的一些示例.

Trying to account for modifications during iteration is where all our foreach troubles originated, so it serves to consider some examples for this case.

考虑在同一个数组上的这些嵌套循环(其中使用 by-ref 迭代来确保它确实是相同的):

Consider these nested loops over the same array (where by-ref iteration is used to make sure it really is the same one):

foreach ($array as &$v1) {
    foreach ($array as &$v2) {
        if ($v1 == 1 && $v2 == 1) {
            unset($array[1]);
        }
        echo "($v1, $v2)
";
    }
}

// Output: (1, 1) (1, 3) (1, 4) (1, 5)

这里的预期部分是输出中缺少 (1, 2) 因为元素 1 被删除了.可能出乎意料的是,外循环在第一个元素之后停止.这是为什么?

The expected part here is that (1, 2) is missing from the output because element 1 was removed. What's probably unexpected is that the outer loop stops after the first element. Why is that?

这背后的原因是上面描述的嵌套循环黑客:在循环体运行之前,当前的 IAP 位置和散列被备份到 HashPointer.在循环体之后,它将被恢复,但前提是元素仍然存在,否则将使用当前 IAP 位置(无论它是什么).在上面的示例中,情况正是如此:外循环的当前元素已被删除,因此它将使用已被内循环标记为完成的 IAP!

The reason behind this is the nested-loop hack described above: Before the loop body runs, the current IAP position and hash is backed up into a HashPointer. After the loop body it will be restored, but only if the element still exists, otherwise the current IAP position (whatever it may be) is used instead. In the example above this is exactly the case: The current element of the outer loop has been removed, so it will use the IAP, which has already been marked as finished by the inner loop!

HashPointer 备份+恢复机制的另一个后果是通过reset() 等对IAP 的更改通常不会影响foreach.例如,下面的代码就好像 reset() 根本不存在一样执行:

Another consequence of the HashPointer backup+restore mechanism is that changes to the IAP through reset() etc. usually do not impact foreach. For example, the following code executes as if the reset() were not present at all:

$array = [1, 2, 3, 4, 5];
foreach ($array as &$value) {
    var_dump($value);
    reset($array);
}
// output: 1, 2, 3, 4, 5

原因是,在reset()临时修改IAP的同时,会在循环体之后恢复到当前的foreach元素.要强制 reset() 对循环产生影响,您必须额外删除当前元素,以便备份/恢复机制失败:

The reason is that, while reset() temporarily modifies the IAP, it will be restored to the current foreach element after the loop body. To force reset() to make an effect on the loop, you have to additionally remove the current element, so that the backup/restore mechanism fails:

$array = [1, 2, 3, 4, 5];
$ref =& $array;
foreach ($array as $value) {
    var_dump($value);
    unset($array[1]);
    reset($array);
}
// output: 1, 1, 3, 4, 5

但是,这些例子仍然是理智的.如果您还记得 HashPointer 恢复使用指向元素及其散列的指针来确定它是否仍然存在,那么真正的乐趣就开始了.但是:哈希有冲突,指针可以重用!这意味着,通过仔细选择数组键,我们可以让 foreach 相信被删除的元素仍然存在,所以它会直接跳转到它.一个例子:

But, those examples are still sane. The real fun starts if you remember that the HashPointer restore uses a pointer to the element and its hash to determine whether it still exists. But: Hashes have collisions, and pointers can be reused! This means that, with a careful choice of array keys, we can make foreach believe that an element that has been removed still exists, so it will jump directly to it. An example:

$array = ['EzEz' => 1, 'EzFY' => 2, 'FYEz' => 3];
$ref =& $array;
foreach ($array as $value) {
    unset($array['EzFY']);
    $array['FYFY'] = 4;
    reset($array);
    var_dump($value);
}
// output: 1, 4

根据前面的规则,这里我们通常应该期望输出 1, 1, 3, 4 .发生的情况是 'FYFY' 与删除的元素 'EzFY' 具有相同的哈希值,并且分配器碰巧重用相同的内存位置来存储元素.所以foreach最终直接跳转到新插入的元素,从而缩短了循环.

Here we should normally expect the output 1, 1, 3, 4 according to the previous rules. How what happens is that 'FYFY' has the same hash as the removed element 'EzFY', and the allocator happens to reuse the same memory location to store the element. So foreach ends up directly jumping to the newly inserted element, thus short-cutting the loop.

我想提到的最后一个奇怪的情况是,PHP 允许您在循环期间替换迭代的实体.所以你可以开始迭代一个数组,然后在中途用另一个数组替换它.或者开始迭代一个数组,然后用一个对象替换它:

One last odd case that I'd like to mention, it is that PHP allows you to substitute the iterated entity during the loop. So you can start iterating on one array and then replace it with another array halfway through. Or start iterating on an array and then replace it with an object:

$arr = [1, 2, 3, 4, 5];
$obj = (object) [6, 7, 8, 9, 10];

$ref =& $arr;
foreach ($ref as $val) {
    echo "$val
";
    if ($val == 3) {
        $ref = $obj;
    }
}
/* Output: 1 2 3 6 7 8 9 10 */

正如您在本例中所看到的,PHP 将在替换发生后从头开始迭代另一个实体.

As you can see in this case PHP will just start iterating the other entity from the start once the substitution has happened.

如果你还记得的话,数组迭代的主要问题是如何处理迭代中元素的移除.为此,PHP 5 使用了单个内部数组指针 (IAP),这在某种程度上是次优的,因为必须拉伸一个数组指针以支持多个同时进行的 foreach 循环reset() 的交互 等等.

If you still remember, the main problem with array iteration was how to handle removal of elements mid-iteration. PHP 5 used a single internal array pointer (IAP) for this purpose, which was somewhat suboptimal, as one array pointer had to be stretched to support multiple simultaneous foreach loops and interaction with reset() etc. on top of that.

PHP 7 使用了不同的方法,即它支持创建任意数量的外部安全哈希表迭代器.这些迭代器必须在数组中注册,从那时起它们具有与 IAP 相同的语义:如果删除了一个数组元素,则指向该元素的所有哈希表迭代器将前进到下一个元素.

PHP 7 uses a different approach, namely, it supports creating an arbitrary amount of external, safe hashtable iterators. These iterators have to be registered in the array, from which point on they have the same semantics as the IAP: If an array element is removed, all hashtable iterators pointing to that element will be advanced to the next element.

这意味着 foreach 将不再使用 IAP.foreach 循环对 current() 等的结果绝对没有影响,它自己的行为永远不会受到像 reset()

This means that foreach will no longer use the IAP at all. The foreach loop will be absolutely no effect on the results of current() etc. and its own behavior will never be influenced by functions like reset() etc.

PHP 5 和 PHP 7 之间的另一个重要变化与数组重复有关.现在不再使用 IAP,在所有情况下,按值数组迭代只会执行 refcount 增量(而不是复制数组).如果在 foreach 循环期间修改了数组,那么此时将发生重复(根据写时复制)并且 foreach 将继续处理旧数组.

Another important change between PHP 5 and PHP 7 relates to array duplication. Now that the IAP is no longer used, by-value array iteration will only do a refcount increment (instead of duplication the array) in all cases. If the array is modified during the foreach loop, at that point a duplication will occur (according to copy-on-write) and foreach will keep working on the old array.

在大多数情况下,这种变化是透明的,除了更好的性能之外没有其他影响.但是,有一种情况会导致不同的行为,即数组事先是引用的情况:

In most cases, this change is transparent and has no other effect than better performance. However, there is one occasion where it results in different behavior, namely the case where the array was a reference beforehand:

$array = [1, 2, 3, 4, 5];
$ref = &$array;
foreach ($array as $val) {
    var_dump($val);
    $array[2] = 0;
}
/* Old output: 1, 2, 0, 4, 5 */
/* New output: 1, 2, 3, 4, 5 */

以前引用数组的按值迭代是特殊情况.在这种情况下,没有发生重复,因此迭代期间数组的所有修改都将反映在循环中.在 PHP 7 中,这种特殊情况消失了:数组的按值迭代将总是继续处理原始元素,而忽略循环期间的任何修改.

Previously by-value iteration of reference-arrays was special cases. In this case, no duplication occurred, so all modifications of the array during iteration would be reflected by the loop. In PHP 7 this special case is gone: A by-value iteration of an array will always keep working on the original elements, disregarding any modifications during the loop.

这当然不适用于按引用迭代.如果您通过引用进行迭代,则所有修改都将通过循环反映出来.有趣的是,对于普通对象的按值迭代也是如此:

This, of course, does not apply to by-reference iteration. If you iterate by-reference all modifications will be reflected by the loop. Interestingly, the same is true for by-value iteration of plain objects:

$obj = new stdClass;
$obj->foo = 1;
$obj->bar = 2;
foreach ($obj as $val) {
    var_dump($val);
    $obj->bar = 42;
}
/* Old and new output: 1, 42 */

这反映了对象的 by-handle 语义(即,即使在 by-value 上下文中,它们的行为也类似于引用).

This reflects the by-handle semantics of objects (i.e. they behave reference-like even in by-value contexts).

让我们考虑几个示例,从您的测试用例开始:

Let's consider a few examples, starting with your test cases:

  • 测试用例 1 和 2 保留相同的输出:按值数组迭代始终在原始元素上工作.(在这种情况下,即使 refcounting 和重复行为在 PHP 5 和 PHP 7 之间也完全相同.

  • Test cases 1 and 2 retain the same output: By-value array iteration always keep working on the original elements. (In this case, even refcounting and duplication behavior is exactly the same between PHP 5 and PHP 7).

测试用例 3 更改:Foreach 不再使用 IAP,因此 each() 不受循环影响.前后输出相同.

Test case 3 changes: Foreach no longer uses the IAP, so each() is not affected by the loop. It will have the same output before and after.

测试用例 4 和 5 保持不变:each()reset() 将在更改 IAP 之前复制数组,而 foreach 仍然使用原始数组.(即使阵列已共享,IAP 更改也不重要.)

Test cases 4 and 5 stay the same: each() and reset() will duplicate the array before changing the IAP, while foreach still uses the original array. (Not that the IAP change would have mattered, even if the array was shared.)

第二组示例与current() 在不同reference/refcounting 配置下的行为有关.这不再有意义,因为 current() 完全不受循环影响,因此它的返回值始终保持不变.

The second set of examples was related to the behavior of current() under different reference/refcounting configurations. This no longer makes sense, as current() is completely unaffected by the loop, so its return value always stays the same.

然而,在考虑迭代期间的修改时,我们会得到一些有趣的变化.我希望你会发现新的行为更理智.第一个例子:

However, we get some interesting changes when considering modifications during iteration. I hope you will find the new behavior saner. The first example:

$array = [1, 2, 3, 4, 5];
foreach ($array as &$v1) {
    foreach ($array as &$v2) {
        if ($v1 == 1 && $v2 == 1) {
            unset($array[1]);
        }
        echo "($v1, $v2)
";
    }
}

// Old output: (1, 1) (1, 3) (1, 4) (1, 5)
// New output: (1, 1) (1, 3) (1, 4) (1, 5)
//             (3, 1) (3, 3) (3, 4) (3, 5)
//             (4, 1) (4, 3) (4, 4) (4, 5)
//             (5, 1) (5, 3) (5, 4) (5, 5)

如您所见,外循环在第一次迭代后不再中止.原因是两个循环现在都有完全独立的哈希表迭代器,并且两个循环不再通过共享 IAP 产生任何交叉污染.

As you can see, the outer loop no longer aborts after the first iteration. The reason is that both loops now have entirely separate hashtable iterators, and there is no longer any cross-contamination of both loops through a shared IAP.

现在修复的另一个奇怪的边缘情况是,当您删除和添加碰巧具有相同哈希值的元素时会得到奇怪的效果:

Another weird edge case that is fixed now, is the odd effect you get when you remove and add elements that happen to have the same hash:

$array = ['EzEz' => 1, 'EzFY' => 2, 'FYEz' => 3];
foreach ($array as &$value) {
    unset($array['EzFY']);
    $array['FYFY'] = 4;
    var_dump($value);
}
// Old output: 1, 4
// New output: 1, 3, 4

以前 HashPointer 恢复机制直接跳转到新元素,因为它看起来"与删除的元素相同(由于哈希和指针冲突).因为我们不再依赖元素哈希来做任何事情,所以这不再是一个问题.

Previously the HashPointer restore mechanism jumped right to the new element because it "looked" like it's the same as the removed element (due to colliding hash and pointer). As we no longer rely on the element hash for anything, this is no longer an issue.

这篇关于PHP“foreach"实际上是如何工作的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-23 16:03
查看更多