本文介绍了C ++是否被视为Von Neumann编程语言?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

术语

  • C ++是否被视为Von Neumann语言,或者如果不是(例如,由于线程的出现导致异步执行),是否曾经被视为Von Neumann语言?
  • 是否存在C ++的计算模型/抽象机所基于的体系结构,因此可以归类为该体系结构的语言?

解决方案

TL:DR:C ++抽象机是 PRAM(并行随机访问计算机).


摘录自您链接的冯·诺依曼语言维基百科文章:

停止描述了从存在到不存在的过渡.因此,是的,根据Wikipedia的说法,在C ++ 11添加线程之前,C ++严格地是Von Neumann语言.(并且在基本上仍然是一种VN语言之后;让多个线程共享相同的地址空间并不会从根本上改变C ++的工作方式.)

在这种情况下成为冯·诺依曼架构的有趣之处:

  • 完全具有可寻址的RAM,可以随时有效地访问(对象缓存/分页)
  • 将程序存储在RAM中:功能指针是可能且高效的,而无需解释器
  • 具有一个程序计数器,它可以逐步浏览存储程序中的指令:自然模型是一种命令式编程语言,一次只能执行一项操作..这是如此基础,很容易忘记它不是唯一的模型!(相对于FPGA或ASIC或在每个时钟周期所有门都可能并行执行某些操作的事物.或MIMD GPU,其中您编写的计算内核"可能在所有数据上并行运行,而无需隐式排序每个数据的顺序元素被处理.或计算RAM :将ALU放入存储芯片中以绕过冯·诺伊曼(Von Neumann)瓶颈)

IDK为什么Wiki文章会提及自修改代码;与大多数语言一样,ISO C ++对此未进行标准化,并且与的提前编译完全兼容.拆分总线/拆分地址空间哈佛体系结构.(无需 eval 或需要解释器或JIT的其他任何东西.)或在普通CPU上(冯·诺伊曼),严格的W ^ X内存保护并且从不使用 mprotect 将页面权限从可写更改为可执行文件.

当然,大多数真正的C ++实现 do 提供了明确定义的方式来将机器代码写入缓冲区并转换为函数指针作为扩展.(例如,GNU C/C ++的 __ builtin ___ clear_cache(start,end)被命名为I-cache同步,但其定义是为了确保可以安全地将数据作为函数调用数据.还包括死存储消除优化),因此即使在具有相干I缓存的x86上,即使没有它,代码也可能会中断.)因此,实现可以扩展ISO C ++以利用Von Neumann架构的这一功能;故意限制ISO C ++的范围,以允许OS与类似内容之间的差异.

请注意,冯·诺依曼(Von Neumann)严格表示不支持间接寻址模式.一些早期的CPU不需要,并且自修改代码(以重写指令中的硬编码地址)对于实现我们现在使用间接寻址的东西是必需的.

还请注意,约翰·冯·诺伊曼(John Von Neumann)是一位非常有名的人,他的名字与很多基本事物联系在一起.冯·诺依曼体系结构的某些内涵(相对于哈佛而言)并非在所有情况下都具有实际意义.例如冯·诺依曼语言"一词不太在乎冯·诺依曼与哈佛.它关心的是带有程序计数器的存储程序,而不是像Cellular Automata或Turing机器(带有真实磁带)之类的东西.通过使用单独的总线(或只是拆分的缓存)来获取指令(哈佛)来获得额外的带宽只是性能优化,而不是根本的改变.


什么是抽象机器模型/计算模型?

首先,有一些较弱计算模型./em>比图灵机,例如有限状态机.也有非顺序计算模型,例如 Cellular Automata(康威的生命游戏),在每个步骤"中并行发生多件事.

车床是最广为人知(且在数学上简单)的连续抽象机,据我们所知,它是强"的.如果没有任何绝对的内存寻址方式,只是磁带上的相对移动,它自然就可以提供无限的存储空间.这很重要,在某种程度上使所有其他种类的抽象机与实际CPU完全不同.请记住,这些计算模型用于理论计算机科学,而不是工程学.诸如有限的内存或性能之类的问题与理论上可计算的 无关,只有在实践中才可以.

如果您可以在Turing机器上进行计算,则可以在任何其他Turing完整的计算模型(按定义)上进行计算,也许可以使用更简单的程序,也可以不使用.图灵机的编程效果不是很好,或者对于任何实际的CPU而言,它们至少与汇编语言不同.最值得注意的是,内存不是随机访问的.而且他们无法轻松地为并行计算/算法建模.(如果您想以抽象的方式证明算法,那么在某种抽象机器上实现该算法可能是一件好事.)

证明抽象机具有什么特征以使其完全图灵,这可能也很有趣,因此这是开发更多抽象机的另一动机.

在可计算性方面,还有许多其他方面是等效的. RAM机器模型最类似于具有内存阵列的实际CPU.但是作为一台简单的抽象机,它不会为寄存器带来麻烦.实际上,只是为了使事情变得更加混乱,它称其存储单元为寄存器的数组.RAM机支持间接寻址,因此,与实际CPU的正确比喻绝对是内存,而不是CPU寄存器.(并且有无数个寄存器,每个寄存器的大小都是无限制的.地址一直在不断发展,每个寄存器"都必须能够容纳一个指针.)RAM机可以是哈佛的:程序存储在独立的有限状态部分中机器.可以将其视为具有内存间接寻址模式的计算机,这样您就可以将变量"保留在已知位置,并将其中一些变量用作指向无限制大小的数据结构的指针.

用于抽象RAM机器的程序看起来像汇编语言,带有加载/添加/jnz以及您希望它具有的其他任何选择.操作数可以是立即数或寄存器号(普通人称其为绝对地址).或者,如果模型有一个累加器,那么您有一台带有累加器的加载/存储计算机,更像是真正的CPU.

如果您想知道为什么叫MIPS之类的"3地址"机器而不是3操作数,那大概是1.因为指令编码需要通过冯·诺伊曼瓶颈的空间/I提取带宽,所以3 明确操作数位置(寄存器号)和2.,因为在RAM抽象机中,操作数是内存地址=寄存器号.


C ++不能完全图灵化:指针的大小是有限的.

当然,C ++与CS抽象机模型相比有巨大的区别:C ++要求每种类型都必须具有编译时常量的 sizeof ,因此如果您包含无限存储要求,则C ++ 不能是图灵完备的. CS上的C实际上是图灵完成的吗?适用于C ++,也是如此:类型具有固定宽度的要求是无限存储的首选.另请参见 https://en.wikipedia.org/wiki/Random-access_machine#Finite_vs_unbounded


所以计算机科学抽象机很傻,那么C ++抽象机呢?

它们当然有其目的,但是如果我们得到一些较少的抽象,并且还讨论了什么是机器,那么我们可以说的还有很多关于C ++的有趣的东西,以及它假定的机器类型.可以有效地.一旦我们讨论了有限机器的机器和性能,这些差异就变得很重要.

首先,要完全运行C ++,其次,要在没有巨大和/或不可接受的性能开销的情况下运行.(例如,硬件将需要直接支持指针,可能不需要通过自我修改代码将指针值存储到使用该指针的每个加载/存储指令中.这在C ++ 11中不起作用,因为C ++ 11中线程是线程的一部分语言:同一代码可以同时在2个不同的指针上进行操作.)

我们可以更详细地研究ISO C ++标准所假定的计算模型,该模型根据抽象机上发生的事情来描述语言的工作方式.在真正的硬件上运行代码时需要真正的实现,这些硬件可以在"抽象机执行C ++源的情况下运行,从而再现任何/所有可观察到的行为(程序的其他部分可观察到而无需调用UB).

C/C ++具有内存和指针,因此它绝对是一种RAM机器.

或者这几天,一台并行随机访问计算机 ,将共享内存添加到RAM模型,并为每个线程提供自己的程序计数器.鉴于 std :: atomic 释放序列使 all 先前的操作对其他线程可见,因此同步的建立先于关系"模型基于相干共享内存.在需要手动触发同步/刷新功能的东西上进行仿真对于性能而言是可怕的.(非常聪明的优化可能会证明何时可以延迟执行该延迟,因此并非每个发布存储都必须遭受损失,但是seq-cst可能会很可怕.seq-cst必须建立所有线程都同意的全局操作顺序;这很难做到,除非同时所有其他线程都可以看到一个商店.)

但是请注意,在C ++中,除非您使用 atomic< T> 进行操作,否则实际的同时访问是UB.此允许优化器自由使用CPU寄存器,用于本地人,临时人,甚至全局人,而不会将寄存器公开为语言功能.通常, UB允许优化;这就是为什么现代C/C ++实现不是 可移植汇编语言.

C/C ++中的历史 register 关键字表示变量不能使用其地址,因此,即使是非优化的编译器也可以将其保存在CPU寄存器中,而不是内存中.我们在谈论的是CPU寄存器,而不是计算机科学RAM机器寄存器=可寻址的内存位置".(类似于x86上的 rax..rsp/r8..r15 或MIPS上的 r0..r31 ).现代编译器会进行转义分析,并自然将本地变量正常保留在寄存器中,除非它们不得不溢出它们.其他类型的CPU寄存器也是可能的,例如类似于x87 FP寄存器的寄存器堆栈.无论如何,存在 register 关键字可以针对这种类型的计算机进行优化.但是,这并不排除在没有寄存器,只有内存指令的计算机上运行./p>

C ++旨在在带有CPU寄存器的冯·诺伊曼机器上很好地运行,但是C ++抽象机器(该标准用于定义语言)不允许将数据作为代码执行,或说些关于寄存器但是,每个C ++线程的确有其自己的执行上下文,并且该模型对PRAM线程/内核进行了建模,每个线程都有自己的程序计数器和调用堆栈(或用于自动存储并确定返回位置的实现方式).使用CPU寄存器,它们是每个线程专用的.

所有现实世界中的CPU都是随机访问计算机,并且具有单独的CPU寄存器来自可寻址/可索引RAM.即使是只能使用单个累加器寄存器进行计算的CPU,通常也至少具有一个指针或索引寄存器,该指针或索引寄存器至少允许某些有限的数组索引.至少所有可以很好地用作C编译器目标的CPU.

没有寄存器,每种机器指令编码将需要所有操作数的绝对内存地址.(也许就像6502,其中零页",即低256字节的内存是特殊的,并且有些寻址模式使用零页中的一个字作为索引或指针,以允许不带任何16位的16位指针位体系结构寄存器.或类似的东西.)请参见为什么C到Z80编译器会生成不良代码?在RetroComputing.SE 上获得了一些有关真实8位CPU的有趣信息,其中完全兼容的C实现(支持递归和重入)的实现成本非常高.许多缓慢之处是6502/Z80系统太小,无法托管优化的编译器.但是,即使是假设的现代优化交叉编译器(如gcc或LLVM后端)在某些方面也很难.另请参见未使用的内存地址?很好地解释了6502的零页索引寻址模式:来自内存中绝对8位地址的16位指针+ 8位寄存器.

根本没有间接寻址的机器 不能轻易支持数组索引,链接列表,并且绝对不能将指针变量作为一流对象.(反正效率不高)


real 机器上有效的是什么->哪些习语是自然的

C的大部分早期历史都在 PDP-11 ,这是普通的mem +寄存器机器,其中任何寄存器都可以用作指针.自动存储映射到寄存器,或在需要溢出时映射到调用堆栈上的空间.内存是字节的平面数组(或 char 的块),没有分段.

数组索引只是根据指针算法定义的,而不是它自己的东西,这也许是因为PDP-11可以高效地做到这一点:任何寄存器都可以保存地址并被取消引用.(相对于有些机器只有几个特殊的指针宽度寄存器,而其余的则变窄.这在8位机器上很常见,但是早期的16位机器(如PDP-11)的RAM不足,只有一个16位寄存器一个地址就足够了.

请参阅Dennis Ritchie的文章 C语言的发展了解更多历史; C从PDP-7 Unix上的B中脱颖而出.(第一个Unix是用PDP-7 asm编写的).我对PDP-7不太了解,但显然BCPL和B也使用只是整数的指针,而数组是基于指针算术的.

PDP-7是一种18位可字寻址的ISA .这可能就是为什么B没有 char 类型的原因.但是它的寄存器足够宽以容纳指针,因此它自然支持B和C的指针模型(指针并不是真正的特殊,您可以在周围复制它们并取消引用它们,并且可以使用任何地址).如此平坦的内存模型,没有像分段计算机或页面为零的8位微控制器上那样的特殊"内存区域.

诸如C99 VLA(以及大小不受限制的局部变量)以及无限的重入和递归之类的东西意味着函数局部变量上下文(也就是使用堆栈指针的普通计算机上的堆栈帧)的调用堆栈或其他分配机制.

The term Von Neumann languages is applied to programming languages whose computational model is based on the Von Neumann computer architecture.

  • Is C++ considered a Von Neumann language, or if it's not (e.g., because of asynchronous execution with the advent of threads) was it ever considered a Von Neumann language?
  • Is there an architecture that C++'s computational model/abstract machine is based on and thus can be classified as a language of that architecture?

解决方案

TL:DR: The C++ abstract machine is a type of PRAM (Parallel Random Access Machine).


From the Von Neumann Languages Wikipedia article you linked:

Cease describes a transition from being to not-being. So yes, before C++11 added threads, C++ was strictly a Von Neumann language according to Wikipedia. (And after it's still basically a VN language; having multiple threads sharing the same address-space doesn't fundamentally change how C++ works.)

The interesting parts of being a Von Neumann architecture in this context:

  • Having addressable RAM at all, allowing efficient access (modulo cache / paging) to any object at any time
  • Storing the program in RAM: function pointers are possible and efficient, without requiring an interpreter
  • Having a program counter that steps through instructions in the stored program: The natural model is an imperative programming language that does one thing at a time. This is so fundamental that it's easy to forget it's not the only model! (vs. an FPGA or ASIC or something where all gates potentially do something in parallel every clock cycle. Or a MIMD GPU where a computational "kernel" you write is run over all the data potentially in parallel, without implicit sequencing of what order each element is processed in. Or Computational RAM: put ALUs in the memory chips to bypass the Von Neumann bottleneck)

IDK why the wiki article mentions self-modifying code, though; like most languages, ISO C++ doesn't standardize that and is fully compatible with ahead-of-time compilation for a split-bus / split-address-space Harvard architecture. (No eval or anything else that would require an interpreter or JIT.) Or on a normal CPU (Von Neumann), strict W^X memory protection and never using mprotect to change page permissions from writeable to executable.

Of course most real C++ implementations do provide well-defined ways to write machine-code into a buffer and cast to a function pointer, as extensions. (e.g. GNU C/C++'s __builtin___clear_cache(start, end) is named for I-cache sync, but defined in terms of making it safe to call data as a function wrt. dead-store elimination optimizations as well, so it's possible for code to break without it even on x86 which has coherent I-caches.) So implementations can extend ISO C++ to take advantage of this feature of Von Neumann architectures; ISO C++ is intentionally limited in scope to allow for differences between OSes and stuff like that.

Note that being Von Neumann does not strictly imply supporting indirect addressing modes. Some early CPUs didn't, and self-modifying code (to rewrite an address hard-coded in an instruction) was necessary to implement things that we now use indirection for.

Also note that John Von Neumann was a really famous guy, with his name attached to a lot of fundamental things. Some of the connotations of Von Neumann architecture (as opposed to Harvard) aren't really relevant in all contexts. e.g. the "Von Neumann language" term doesn't so much care about Von Neumann vs. Harvard; It cares about stored-program with a program counter vs. something like Cellular Automata or a Turing machine (with a real tape). Getting extra bandwidth by using a separate bus (or just split caches) to fetch instructions (Harvard) is just a performance optimization, not a fundamental change.


What is an abstract machine model / model of computation anyway?

First of all, there are some models of computation that are weaker than Turing machines, like Finite State Machines. There are also non-sequential models of computation, for example Cellular Automata (Conway's Game of Life), where multiple things happen in parallel at each "step".

The Turing machine is the most widely-known (and mathematically simple) sequential abstract machine that is as "strong" as we know how to make. Without any kind of absolute memory addressing, just relative movement on the tape, it naturally provides infinite storage. This is important, and makes all other kinds of abstract machines very unlike real CPUs in some ways. Remember, these models of computation are used for theoretical computer science, not engineering. Problems like finite amounts of memory or performance aren't relevant to what's computable in theory, only in practice.

If you can compute something on a Turing machine, you can compute it on any other Turing-complete model of computation (by definition), perhaps with a much simpler program or perhaps not. Turing machines aren't very nice to program, or at least very different from assembly language for any real CPU. Most notably, the memory isn't random-access. And they can't easily model parallel computing / algorithms. (If you want to prove things about an algorithm in the abstract, having an implementation of it for an abstract machine of some sort is probably a good thing.)

It's also potentially interesting to prove what features an abstract machine needs to have in order to be Turing complete, so that's another motivation for developing more of them.

There are many others that are equivalent in terms of computability. The RAM machine model is most like real-world CPUs that have an array of memory. But being a simple abstract machine, it doesn't bother with registers. In fact, just to make things more confusing, it calls its memory cells an array of registers. A RAM machine supports indirect addressing, so the correct analogy to real world CPUs is definitely to memory, not CPU registers. (And there are an unbounded number of registers, each of unbounded size. Addresses keep going forever and every "register" needs to able to hold a pointer.) A RAM machine can be Harvard: program stored in a separate finite-state portion of the machine. Think of it like a machine with memory-indirect addressing modes so you can keep "variables" in known locations, and use some of them as pointers to unbounded-size data structures.

The program for an abstract RAM machine looks like assembly language, with load/add/jnz and whatever other selection of instructions you want it to have. The operands can be immediates or register numbers (what normal people would call absolute addresses). Or if the model has an accumulator, then you have a load/store machine with an accumulator a lot more like a real CPU.

If you ever wondered why a "3-address" machine like MIPS was called that instead of 3-operand, it's probably 1. because the instruction encoding needs room / I-fetch bandwidth through the Von Neumann bottleneck for 3 explicit operand locations (register number) and 2. because in a RAM abstract machine, operands are memory addresses = register numbers.


C++ can't be Turing complete: pointers have a finite size.

Of course, C++ has huge differences from a CS abstract machine model: C++ requires every type to have a compile-time-constant finite sizeof, so C++ can't be Turing-complete if you include the infinite-storage requirement. Everything in Is C actually Turing-complete? on cs.SE applies to C++, too: the requirement that types have a fixed width is a showstopper for infinite storage. See also https://en.wikipedia.org/wiki/Random-access_machine#Finite_vs_unbounded


So Computer Science abstract machines are silly, what about the C++ Abstract machine?

They of course have their purposes, but there's a lot more interesting stuff we can say about C++ and what kind of machine it assumes if we get a bit less abstract and also talk about what a machine can do efficiently. Once we talk about finite machine machines and performance, these differences become relevant.

First, to run C++ at all, and second, to run without huge and/or unacceptable performance overheads. (e.g. the HW will need to support pointers fairly directly, probably not with self-modifying code that stores the pointer value into every load/store instruction that uses it. And that wouldn't work in C++11 where threading is part of the language: the same code can be operating on 2 different pointers at once.)

We can look in more detail at the model of computation assumed by the ISO C++ standard, which describes how the language works in terms of what happens on the Abstract Machine. Real implementations are required to run code on real hardware that runs "as-if" the abstract machine were executing C++ source, reproducing any/all observable behaviour (observable by other parts of the program without invoking UB).

C/C++ has memory and pointers, so it's pretty definitely a type of RAM machine.

Or these days, a Parallel random-access machine, adding shared memory to the RAM model, and giving each thread its own program counter. Given that std::atomic<> release-sequences make all previous operations visible to other threads, the "establishing a happens-before relationship" model of synchronization is based around coherent shared memory. Emulating it on top of something that required manual triggering of syncing / flushing would be horrible for performance. (Very clever optimizations may prove when that can be delayed so not every release-store has to suffer, but seq-cst will probably be horrible. seq-cst has to establish a global order of operations that all threads agree on; that's hard unless a store becomes visible to all other threads at the same time.)

But note that in C++, actual simultaneous access is UB unless you do it with atomic<T>. This allows the optimizer to freely use CPU registers for locals, temporaries, and even globals without exposing registers as a language feature. UB allows optimization in general; that's why modern C/C++ implementations are not portable assembly language.

The historical register keyword in C/C++ means a variable can't have its address taken, so even a non-optimizing compiler can keep it in a CPU register, not memory. We're talking about CPU registers, not the computer science RAM Machine "register = addressable memory location". (Like rax..rsp/r8..r15 on x86, or r0..r31 on MIPS). Modern compilers do escape analysis and naturally keep locals in registers normally, unless they have to spill them. Other types of CPU registers are possible, e.g. a register-stack like x87 FP registers. Anyway, the register keyword existed to optimize for this type of machine. But it doesn't rule out running on a machine with no registers, only memory-memory instructions.

C++ is designed to run well on a Von Neumann machine with CPU registers, but the C++ abstract machine (that the standard uses to define the language) doesn't allow execution of data as code, or say anything about registers. Each C++ thread does have its own execution context, though, and that models PRAM threads/cores each having their own program counter and callstack (or whatever an implementation uses for automatic storage, and for figuring out where to return.) In a real machine with CPU registers, they're private to each thread.

All real-world CPUs are Random Access Machines, and have CPU registers separate from addressable / indexable RAM. Even CPUs that can only compute with a single accumulator register typically have at least one pointer or index register that at least allows some limited array indexing. At least all CPUs that work well as C compiler targets.

Without registers, every machine instruction encoding would need absolute memory addresses for all operands. (Maybe like a 6502 where the "zero page", the low 256 bytes of memory, was special, and there are addressing modes that use a word from the zero page as the index or pointer, to allow 16-bit pointers without any 16-bit architectural registers. Or something like that.) See Why do C to Z80 compilers produce poor code? on RetroComputing.SE for some interesting stuff about real-world 8-bit CPUs where a fully compliant C implementation (supporting recursion and reentrancy) is quite expensive to implement. A lot of of the slowness is that 6502 / Z80 systems were too small to host an optimizing compiler. But even a hypothetical modern optimizing cross-compiler (like a gcc or LLVM back-end) would have a hard time with some things. See also a recent answer on What is an unused memory address? for a nice explanation of 6502's zero-page indexed addressing mode: 16-bit pointer from an absolute 8-bit address in memory + 8-bit register.

A machine without indirect addressing at all couldn't easily support array indexing, linked lists, and definitely not pointer variables as first-class objects. (Not efficiently anyway)


What's efficient on real machines -> what idioms are natural

Most of C's early history was on PDP-11, which is a normal mem + register machine where any register can work as a pointer. Automatic storage maps to registers, or to space on the callstack when they need to be spilled. Memory is a flat array of bytes (or chunks of char), no segmentation.

Array indexing is just defined in terms of pointer arithmetic instead of being its own thing perhaps because PDP-11 could do that efficiently: any register can hold an address and be dereferenced. (vs. some machines with only a couple special registers of pointer width, and the rest narrower. That was common on an 8-bit machine, but early 16-bit machines like PDP-11 had little enough RAM that one 16-bit register was enough for an address).

See Dennis Ritchie's article The Development of the C Language for more history; C grew out of B on PDP-7 Unix. (The first Unix was written in PDP-7 asm). I don't know much about PDP-7, but apparently BCPL and B also use pointers that are just integers, and arrays are based on pointer-arithmetic.

PDP-7 is an 18-bit word-addressable ISA. That's probably why B has no char type. But its registers are wide enough to hold pointers so it does naturally support B and C's pointer model (that pointers aren't really special, you can copy them around and deref them, and you can take the address of anything). So flat memory model, no "special" area of memory like you find on segmented machines or some 8-bit micros with a zero page.

Things like C99 VLAs (and unlimited size local variables) and unlimited reentrancy and recursion imply a callstack or other allocation mechanism for function local-variable context (aka stack frames on a normal machine that uses a stack pointer.)

这篇关于C ++是否被视为Von Neumann编程语言?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 19:40