问题描述
我有一个在 Atom 上运行的嵌入式 Linux 系统,它是一个足够新的 CPU,具有不变的 TSC(时间戳计数器),内核在启动时测量其频率.我在自己的代码中使用 TSC 来保持时间(避免内核调用),我的启动代码测量 TSC 速率,但我宁愿只使用内核的测量.有没有办法从内核中检索它?它不在/proc/cpuinfo 中.
I have an embedded Linux system running on an Atom, which is a new enough CPU to have an invariant TSC (time stamp counter), whose frequency the kernel measures on startup. I use the TSC in my own code to keep time (avoiding kernel calls), and my startup code measures the TSC rate, but I'd rather just use the kernel's measurement. Is there any way to retrieve this from the kernel? It's not in /proc/cpuinfo anywhere.
推荐答案
BPFtrace
作为 root,您可以使用 bpftrace 检索内核的 TSC 速率:
BPFtrace
As root, you can retrieve the kernel's TSC rate with bpftrace:
# bpftrace -e 'BEGIN { printf("%u\n", *kaddr("tsc_khz")); exit(); }' | tail -n
(在 CentOS 7 和 Fedora 29 上测试过)
(tested it on CentOS 7 and Fedora 29)
或者,作为 root,您也可以从 /proc/kcore
读取它,例如:
Alternatively, also as root, you can also read it from /proc/kcore
, e.g.:
# gdb /dev/null /proc/kcore -ex 'x/uw 0x'$(grep '\<tsc_khz\>' /proc/kallsyms \
| cut -d' ' -f1) -batch 2>/dev/null | tail -n 1 | cut -f2
(在 CentOS 7 和 Fedora 29 上测试过)
(tested it on CentOS 7 and Fedora 29)
如果系统没有 bpftrace 或 gdb 可用,但是 SystemTap 你可以像这样(作为 root):
If the system doesn't have bpftrace nor gdb available but SystemTap you can get it like this (as root):
# cat tsc_khz.stp
#!/usr/bin/stap -g
function get_tsc_khz() %{ /* pure */
THIS->__retvalue = tsc_khz;
%}
probe oneshot {
printf("%u\n", get_tsc_khz());
}
# ./tsc_khz.stp
当然,您也可以编写一个小的内核模块,通过 /sys
伪文件系统提供对 tsc_khz
的访问.更好的是,有人已经这样做了,并且 tsc_freq_khz 模块在 GitHub 上可用.有了这个,以下应该工作:
Of course, you can also write a small kernel module that provides access to tsc_khz
via the /sys
pseudo file system. Even better, somebody already did that and a tsc_freq_khz module is available on GitHub. With that the following should work:
# modprobe tsc_freq_khz
$ cat /sys/devices/system/cpu/cpu0/tsc_freq_khz
(在 Fedora 29 上测试,读取 sysfs 文件不需要 root)
(tested on Fedora 29, reading the sysfs file doesn't require root)
如果以上都不是一个选项,您可以从内核日志中解析 TSC 速率.但这很快就会变得丑陋,因为您会在不同的硬件和内核上看到不同类型的消息,例如在 Fedora 29 i7 系统上:
In case nothing of the above is an option you can parse the TSC rate from the kernel logs. But this gets ugly fast because you see different kinds of messages on different hardware and kernels, e.g. on a Fedora 29 i7 system:
$ journalctl --boot | grep 'kernel: tsc:' -i | cut -d' ' -f5-
kernel: tsc: Detected 2800.000 MHz processor
kernel: tsc: Detected 2808.000 MHz TSC
但在 Fedora 29 Intel Atom 上:
But on a Fedora 29 Intel Atom just:
kernel: tsc: Detected 2200.000 MHz processor
在 CentOS 7 i5 系统上:
While on a CentOS 7 i5 system:
kernel: tsc: Fast TSC calibration using PIT
kernel: tsc: Detected 1895.542 MHz processor
kernel: tsc: Refined TSC clocksource calibration: 1895.614 MHz
性能值
Linux 内核尚未提供用于读取 TSC 速率的 API.但它确实提供了一种用于获取可用于将 TSC 计数转换为纳秒的 mult
和 shift
值.这些值源自 tsc_khz
- 也在 arch/x86/kernel/tsc.c - 其中 tsc_khz
被初始化和校准.它们与用户空间共享.
Perf Values
The Linux Kernel doesn't provide an API to read the TSC rate, yet. But it does provide one for getting the mult
and shift
values that can be used to convert TSC counts to nanoseconds. Those values are derived from tsc_khz
- also in arch/x86/kernel/tsc.c - where tsc_khz
is initialized and calibrated. And they are shared with userspace.
使用 perf API 并访问共享页面的示例程序:
Example program that uses the perf API and accesses the shared page:
#include <asm/unistd.h>
#include <inttypes.h>
#include <linux/perf_event.h>
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
static long perf_event_open(struct perf_event_attr *hw_event, pid_t pid,
int cpu, int group_fd, unsigned long flags)
{
return syscall(__NR_perf_event_open, hw_event, pid, cpu, group_fd, flags);
}
实际代码:
int main(int argc, char **argv)
{
struct perf_event_attr pe = {
.type = PERF_TYPE_HARDWARE,
.size = sizeof(struct perf_event_attr),
.config = PERF_COUNT_HW_INSTRUCTIONS,
.disabled = 1,
.exclude_kernel = 1,
.exclude_hv = 1
};
int fd = perf_event_open(&pe, 0, -1, -1, 0);
if (fd == -1) {
perror("perf_event_open failed");
return 1;
}
void *addr = mmap(NULL, 4*1024, PROT_READ, MAP_SHARED, fd, 0);
if (!addr) {
perror("mmap failed");
return 1;
}
struct perf_event_mmap_page *pc = addr;
if (pc->cap_user_time != 1) {
fprintf(stderr, "Perf system doesn't support user time\n");
return 1;
}
printf("%16s %5s\n", "mult", "shift");
printf("%16" PRIu32 " %5" PRIu16 "\n", pc->time_mult, pc->time_shift);
close(fd);
}
在 Fedora 29 上测试过,它也适用于非 root 用户.
Tested in on Fedora 29 and it works also for non-root users.
这些值可用于通过如下函数将 TSC 计数转换为纳秒:
Those values can be used to convert a TSC count to nanoseconds with a function like this one:
static uint64_t mul_u64_u32_shr(uint64_t cyc, uint32_t mult, uint32_t shift)
{
__uint128_t x = cyc;
x *= mult;
x >>= shift;
return x;
}
CPUID/MSR
另一种获取 TSC 率的方法是 跟随 DPDK 的领导.
x86_64 上的 DPDK 基本上使用以下策略:
DPDK on x86_64 basically uses the following strategy:
- 通过 cpuid 内在函数(不需要特殊权限)读取时间戳计数器和标称核心晶体时钟信息叶"(如果可用)
- 从 MSR 中读取它(需要 rawio 功能和对
/dev/cpu/* 的读取权限/msr
),如果可能的话 - 通过其他方式在用户空间进行校准,否则
- Read the 'Time Stamp Counter and Nominal Core Crystal Clock Information Leaf' via cpuid intrinsics (doesn't require special privileges), if available
- Read it from the MSR (requires the rawio capability and read permissions on
/dev/cpu/*/msr
), if possible - Calibrate it in userspace by other means, otherwise
FWIW,快速测试表明 cpuid 叶似乎没有那么广泛可用,例如i7 Skylake 和 Goldmont atom 没有.否则,从 DPDK 代码可以看出,使用 MSR 需要一堆复杂的大小写区分.
FWIW, a quick test shows that the cpuid leaf doesn't seem to be that widely available, e.g. an i7 Skylake and a goldmont atom don't have it. Otherwise, as can be seen from the DPDK code, using the MSR requires a bunch of intricate case distinctions.
但是,如果程序已经使用了 DPDK,那么获取 TSC 速率、获取 TSC 值或转换 TSC 值只是使用正确的 DPDK API 的问题.
However, in case the program already uses DPDK, getting the TSC rate, getting TSC values or converting TSC values is just a matter of using the right DPDK API.
这篇关于从 x86 内核获取 TSC 速率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!