问题描述
std::chrono
提供几个时钟来测量时间.同时,我想CPU评估时间的唯一方法是计数周期.
问题1:除了计算周期外,cpu或gpu是否还有其他评估时间的方法?
如果是这种情况,因为计算机计数周期的方式永远不会像原子钟一样精确,则意味着计算机的秒"(period = std::ratio<1>
)实际上可以比实际的时间短或长.第二,从长远来看,导致计算机时钟与GPS之类的时间测量差异.
问题2:,对吗?
某些硬件具有不同的频率(例如,空闲模式和Turbo模式).在这种情况下,这意味着周期数会在一秒钟内发生变化.
问题3:由cpu和gpus测量的周期计数"是否随硬件频率而变化?如果是,那么std::chrono
如何处理?如果不是,一个周期对应什么(例如什么是基本"时间)?有没有办法在编译时访问转换?有没有办法在运行时访问转换?
计数周期,是的,但是周期是什么?
在现代x86上,内核(内部以及clock_gettime
和其他系统调用)使用的时间源通常是一个固定频率的计数器,该计数器对参考周期"进行计数,而与turbo,省电或时钟相关,停止闲置. (这是您从rdtsc
或C/C ++中的 __rdtsc()
获得的计数器).
正常的std::chrono
实现将使用操作系统提供的功能,例如Unix上的clock_gettime
. (在Linux上,这可以纯粹在用户空间中运行,代码+比例因子数据在内核映射到每个进程的地址空间的VDSO页面中.低开销的时间源很好.避免了user-> kernel-> user往返启用Meltdown + Spectre缓解功能有很大帮助.)
分析一个不受内存限制的紧密循环可能要使用实际的内核时钟周期,因此它将对当前内核的实际速度不敏感. (并且不必担心将CPU提升到最大Turbo等).使用perf stat ./a.out
或perf record ./a.out
.例如 x86的MOV是否真的可以";免费"?为什么我根本不能复制它?
某些系统没有/没有在CPU中内置与墙上时钟等效的计数器,因此OS会在RAM中保留一个时间,该时间会在计时器中断时进行更新,或者时间查询功能会从单独的芯片读取时间.
(系统调用+硬件I/O =更高的开销,这是x86的rdtsc
指令从概要分析对象转变为时钟源对象的部分原因.)
std::chrono
offer several clocks to measure times. At the same time, I guess the only way a cpu can evaluate time, is by counting cycles.
Question 1: Does a cpu or a gpu has any other way to evaluate time than by counting cycles?
If that is the case, because the way a computer count cycles will never be as precise as an atomic clock, it means that a "second" (period = std::ratio<1>
) for a computer can be actually shorter or bigger than an actual second, causing differences in the long run for time measurements between the computer clock and let's say GPS.
Question 2: Is that correct?
Some hardware have varying frequencies (for example idle mode, and turbo modes). In that case, it would mean that the number of cycles would vary during a second.
Question 3: Is the "cycle count" measured by cpu and gpus varying depending on the hardware frequency? If yes, then how std::chrono
deal with it? If not, what does a cycle correspond to (like what is the "fundamental" time)? Is there a way to access the conversion at compile-time? Is there a way to access the conversion at runtime?
Counting cycles, yes, but cycles of what?
On a modern x86, the timesource used by the kernel (internally and for clock_gettime
and other system calls) is typically a fixed-frequency counter that counts "reference cycles" regardless of turbo, power-saving, or clock-stopped idle. (This is the counter you get from rdtsc
, or __rdtsc()
in C/C++).
Normal std::chrono
implementations will use an OS-provided function like clock_gettime
on Unix. (On Linux, this can run purely in user-space, code + scale factor data in a VDSO page mapped by the kernel into every process's address space. Low-overhead timesources are nice. Avoiding a user->kernel->user round trip helps a lot with Meltdown + Spectre mitigation enabled.)
Profiling a tight loop that's not memory bound might want to use actual core clock cycles, so it will be insensitive to the actual speed of the current core. (And doesn't have to worry about ramping up the CPU to max turbo, etc.) e.g. using perf stat ./a.out
or perf record ./a.out
. e.g. Can x86's MOV really be "free"? Why can't I reproduce this at all?
Some systems didn't / don't have a wall-clock-equivalent counter built right in to the CPU, so either the OS would maintain a time in RAM that it updates on timer interrupts, or time-query functions would read the time from a separate chip.
(System call + hardware I/O = higher overhead, which is part of the reason that x86's rdtsc
instruction morphed from a profiling thing into a clocksource thing.)
All of these clock frequencies are ultimately derived from a crystal oscillator on the mobo. But the scale factors to extrapolate time from cycle counts can be adjusted to keep the clock in sync with atomic time, typically using the Network Time Protocol (NTP), as @Tony points out.
这篇关于std :: chrono :: clock,硬件时钟和周期计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!