问题描述
我要为我的C#应用程序非常高的分辨率计时器。我想访问RDTSC汇编指令。有没有办法做到这一点。
I want a very high resolution timer for my C# application. I'd like to access the RDTSC assembly instruction. Is there a way to do this?
编辑:我移植一些C ++代码,并试图保留相同的功能原件。我可以切换到更多的东西。NET,但需要评估RDTSC指令,所以我可以比较结果到原来的。
I am porting some C++ code and trying to retain the same functionality as the original. I may switch to something more .NET, but want to evaluate the RDTSC instruction so I can compare results to the original.
推荐答案
这里是你如何能做到这一点:
Here is how you can do it:
using System;
using System.ComponentModel;
using System.Diagnostics;
using System.Runtime.InteropServices;
public static class Rdtsc
{
[StructLayout(LayoutKind.Sequential)]
private struct SystemInfo
{
public ushort wProcessorArchitecture;
public ushort wReserved;
public uint dwPageSize;
public IntPtr lpMinimumApplicationAddress;
public IntPtr lpMaximumApplicationAddress;
public IntPtr dwActiveProcessorMask;
public uint dwNumberOfProcessors;
public uint dwProcessorType;
public uint dwAllocationGranularity;
public ushort wProcessorLevel;
public ushort wProcessorRevision;
}
[DllImport("kernel32.dll", ExactSpelling = true)]
private static extern void GetNativeSystemInfo(out SystemInfo lpSystemInfo);
[DllImport("kernel32.dll", ExactSpelling = true, SetLastError = true)]
private static extern IntPtr VirtualAlloc(IntPtr lpAddress, IntPtr dwSize, uint flAllocationType, uint flProtect);
[DllImport("kernel32.dll", ExactSpelling = true, SetLastError = true)]
[return: MarshalAs(UnmanagedType.Bool)]
private static extern bool VirtualProtect(IntPtr lpAddress, IntPtr dwSize, uint flAllocationType, out uint lpflOldProtect);
[DllImport("kernel32.dll", ExactSpelling = true, SetLastError = true)]
[return: MarshalAs(UnmanagedType.Bool)]
private static extern bool VirtualFree(IntPtr lpAddress, IntPtr dwSize, uint dwFreeType);
private const uint PAGE_READWRITE = 0x04;
private const uint PAGE_EXECUTE = 0x10;
private const uint MEM_COMMIT = 0x1000;
private const uint MEM_RELEASE = 0x8000;
[SuppressUnmanagedCodeSecurity]
[UnmanagedFunctionPointer(CallingConvention.StdCall)]
public delegate ulong TimestampDelegate();
public static readonly TimestampDelegate Timestamp;
static Rdtsc()
{
SystemInfo systemInfo;
GetNativeSystemInfo(out systemInfo);
if (systemInfo.wProcessorArchitecture != 0 /* PROCESSOR_ARCHITECTURE_INTEL */ &&
systemInfo.wProcessorArchitecture != 9 /* PROCESSOR_ARCHITECTURE_AMD64 */)
{
// Fallback for ARM/IA64/...
Timestamp = StopwatchGetTimestamp;
return;
}
byte[] body;
if (Environment.Is64BitProcess)
{
body = new byte[]
{
0x0f, 0x31, // rdtsc
0x48, 0xc1, 0xe2, 0x20, // shl rdx,20h
0x48, 0x0b, 0xc2, // or rax,rdx
0xc3, // ret
};
}
else
{
body = new byte[]
{
0x0f, 0x31, // rdtsc
0xc3, // ret
};
}
IntPtr buf = IntPtr.Zero;
try
{
// We VirtualAlloc body.Length bytes, with R/W access
// Note that from what I've read, MEM_RESERVE is useless
// if the first parameter is IntPtr.Zero
buf = VirtualAlloc(IntPtr.Zero, (IntPtr)body.Length, MEM_COMMIT, PAGE_READWRITE);
if (buf == IntPtr.Zero)
{
throw new Win32Exception();
}
// Copy our instructions in the buf
Marshal.Copy(body, 0, buf, body.Length);
// Change the access of the allocated memory from R/W to Execute
uint oldProtection;
bool result = VirtualProtect(buf, (IntPtr)body.Length, PAGE_EXECUTE, out oldProtection);
if (!result)
{
throw new Win32Exception();
}
// Create a delegate to the "function"
Timestamp = (TimestampDelegate)Marshal.GetDelegateForFunctionPointer(buf, typeof(TimestampDelegate));
buf = IntPtr.Zero;
}
finally
{
// There was an error!
if (buf != IntPtr.Zero)
{
// Free the allocated memory
bool result = VirtualFree(buf, IntPtr.Zero, MEM_RELEASE);
if (!result)
{
throw new Win32Exception();
}
}
}
}
// Fallback if rdtsc isn't available
private static ulong StopwatchGetTimestamp()
{
return unchecked((ulong)Stopwatch.GetTimestamp());
}
}
一些注意事项:
Some notes:
- 我已经包含了ARM处理器(
Stopwatch.GetTimestamp()
)。 $ b备用$ b - 我的不可以使用CPUID停止被移动时,该RDTSC指令(的)。我不强劲,装配,所以我不知道该怎么做。如果要修改代码,请随时做到这一点,在正确的操作码添加注释使用
- 我使用RDTSC和不可以 RDTSCP(同样的问题,装配不我的语言)
- 是非常慢......比方说,等效的Visual C ++代码是没有所谓的内联18-24蜱RDTSC,而C#版本,初始预热后,被称为在RDTSC的27-100蜱。
- I've included a fallback for ARM processors (
Stopwatch.GetTimestamp()
). - I'm not using CPUID to stop the RDTSC instruction from being moved around ("cpuid" before "rdtsc"). I'm not strong with assembly, so I don't know how to do it. If you want to modify the code, feel free to do it and add a comment on the "right" OpCodes to use
- I'm using RDTSC and not RDTSCP (same problem, assembly isn't my language)
- It is very slow... Let's say that the equivalent Visual C++ code is called without inlining in 18-24 ticks of RDTSC, while the C# version, after the initial warmup, is called in 27-100 ticks of RDTSC.
Visual C ++的比较代码:
The Visual C++ comparison code:
__declspec(noinline) uint64_t __stdcall Rdtsc(void)
{
return __rdtsc();
}
Aaaaand现在... 全面落实与rdtscp
public static class Rdtsc
{
[StructLayout(LayoutKind.Sequential)]
private struct SystemInfo
{
public ushort wProcessorArchitecture;
public ushort wReserved;
public uint dwPageSize;
public IntPtr lpMinimumApplicationAddress;
public IntPtr lpMaximumApplicationAddress;
public IntPtr dwActiveProcessorMask;
public uint dwNumberOfProcessors;
public uint dwProcessorType;
public uint dwAllocationGranularity;
public ushort wProcessorLevel;
public ushort wProcessorRevision;
}
[DllImport("kernel32.dll", ExactSpelling = true)]
private static extern void GetNativeSystemInfo(out SystemInfo lpSystemInfo);
[DllImport("kernel32.dll", ExactSpelling = true, SetLastError = true)]
private static extern IntPtr VirtualAlloc(IntPtr lpAddress, IntPtr dwSize, uint flAllocationType, uint flProtect);
[DllImport("kernel32.dll", ExactSpelling = true, SetLastError = true)]
[return: MarshalAs(UnmanagedType.Bool)]
private static extern bool VirtualProtect(IntPtr lpAddress, IntPtr dwSize, uint flAllocationType, out uint lpflOldProtect);
[DllImport("kernel32.dll", ExactSpelling = true, SetLastError = true)]
[return: MarshalAs(UnmanagedType.Bool)]
private static extern bool VirtualFree(IntPtr lpAddress, IntPtr dwSize, uint dwFreeType);
private const uint PAGE_READWRITE = 0x04;
private const uint PAGE_EXECUTE = 0x10;
private const uint PAGE_EXECUTE_READWRITE = 0x40;
private const uint MEM_COMMIT = 0x1000;
private const uint MEM_RELEASE = 0x8000;
[SuppressUnmanagedCodeSecurity]
[UnmanagedFunctionPointer(CallingConvention.StdCall)]
public delegate ulong FuncUInt64();
/// <summary>
/// Uses rdtsc. On non-Intel uses Stopwatch.GetTimestamp.
/// </summary>
public static readonly FuncUInt64 Timestamp;
/// <summary>
/// Uses rdtscp if present. Otherwise uses cpuid + rdtsc. On
/// non-Intel uses Stopwatch.GetTimestamp.
/// </summary>
public static readonly FuncUInt64 TimestampP;
public static readonly bool IsRdtscSupported;
public static readonly bool IsRdtscPSupported;
static Rdtsc()
{
SystemInfo systemInfo;
GetNativeSystemInfo(out systemInfo);
if (systemInfo.wProcessorArchitecture != 0 /* PROCESSOR_ARCHITECTURE_INTEL */ &&
systemInfo.wProcessorArchitecture != 9 /* PROCESSOR_ARCHITECTURE_AMD64 */)
{
// Fallback for ARM/IA64/...
Timestamp = StopwatchGetTimestamp;
TimestampP = StopwatchGetTimestamp;
IsRdtscSupported = false;
IsRdtscPSupported = false;
return;
}
byte[] cpuid, rdtsc, rdtscp, rdtsccpuid;
IsRdtscSupported = true;
// Assembly generated with https://defuse.ca/online-x86-assembler.htm
if (Environment.Is64BitProcess)
{
/* CPUID x64:
push rbx;
mov eax, 0x80000000;
cpuid;
mov ebx, 0x80000001;
cmp eax, ebx;
jb Error;
mov eax, ebx;
cpuid;
mov eax, ecx;
shl rax, 0x20;
or rax, rdx
jmp End;
Error:
xor rax, rax;
End:
pop rbx;
ret;
0: 53 push rbx
1: b8 00 00 00 80 mov eax,0x80000000
6: 0f a2 cpuid
8: bb 01 00 00 80 mov ebx,0x80000001
d: 39 d8 cmp eax,ebx
f: 72 0f jb 20 <Error>
11: 89 d8 mov eax,ebx
13: 0f a2 cpuid
15: 89 c8 mov eax,ecx
17: 48 c1 e0 20 shl rax,0x20
1b: 48 09 d0 or rax,rdx
1e: eb 03 jmp 23 <End>
0000000000000020 <Error>:
20: 48 31 c0 xor rax,rax
0000000000000023 <End>:
23: 5b pop rbx
24: c3 ret
*/
cpuid = new byte[] { 0x53, 0xB8, 0x00, 0x00, 0x00, 0x80, 0x0F, 0xA2, 0xBB, 0x01, 0x00, 0x00, 0x80, 0x39, 0xD8, 0x72, 0x16, 0x89, 0xD8, 0x48, 0xC7, 0xC2, 0xFF, 0xFF, 0xFF, 0xFF, 0x0F, 0xA2, 0x89, 0xC8, 0x48, 0xC1, 0xE0, 0x20, 0x48, 0x09, 0xD0, 0xEB, 0x03, 0x48, 0x31, 0xC0, 0x5B, 0xC3 };
/* RDTSC x64:
rdtsc;
shl rdx, 0x20;
or rax,rdx;
ret;
0: 0f 31 rdtsc
2: 48 c1 e2 20 shl rdx,0x20
6: 48 09 d0 or rax,rdx
9: c3 ret
*/
rdtsc = new byte[] { 0x0F, 0x31, 0x48, 0xC1, 0xE2, 0x20, 0x48, 0x09, 0xD0, 0xC3 };
/* RDTSCP x64
rdtscp;
shl rdx, 0x20;
or rax, rdx;
ret;
0: 0f 01 f9 rdtscp
3: 48 c1 e2 20 shl rdx,0x20
7: 48 09 d0 or rax,rdx
a: c3 ret
*/
rdtscp = new byte[] { 0x0F, 0x01, 0xF9, 0x48, 0xC1, 0xE2, 0x20, 0x48, 0x09, 0xD0, 0xC3 };
/* RDTSC + CPUID x64
push rbx;
xor eax, eax;
cpuid;
rdtsc;
shl rdx, 0x20;
or rax, rdx;
pop rbx;
ret;
0: 53 push rbx
1: 31 c0 xor eax,eax
3: 0f a2 cpuid
5: 0f 31 rdtsc
7: 48 c1 e2 20 shl rdx,0x20
b: 48 09 d0 or rax,rdx
e: 5b pop rbx
f: c3 ret
*/
rdtsccpuid = new byte[] { 0x53, 0x31, 0xC0, 0x0F, 0xA2, 0x0F, 0x31, 0x48, 0xC1, 0xE2, 0x20, 0x48, 0x09, 0xD0, 0x5B, 0xC3 };
}
else
{
/* CPUID x86:
push ebx;
mov eax, 0x80000000;
cpuid;
mov ebx, 0x80000001;
cmp eax, ebx;
jb Error;
mov eax, ebx;
cpuid;
mov eax, edx;
mov edx, ecx;
jmp End;
Error:
xor eax, eax;
xor edx, edx;
End:
pop ebx;
ret;
0: 53 push ebx
1: b8 00 00 00 80 mov eax,0x80000000
6: 0f a2 cpuid
8: bb 01 00 00 80 mov ebx,0x80000001
d: 39 d8 cmp eax,ebx
f: 72 0a jb 1b <Error>
11: 89 d8 mov eax,ebx
13: 0f a2 cpuid
15: 89 d0 mov eax,edx
17: 89 ca mov edx,ecx
19: eb 04 jmp 1f <End>
0000001b <Error>:
1b: 31 c0 xor eax,eax
1d: 31 d2 xor edx,edx
0000001f <End>:
1f: 5b pop ebx
20: c3 ret
*/
cpuid = new byte[] { 0x53, 0xB8, 0x00, 0x00, 0x00, 0x80, 0x0F, 0xA2, 0xBB, 0x01, 0x00, 0x00, 0x80, 0x39, 0xD8, 0x72, 0x0A, 0x89, 0xD8, 0x0F, 0xA2, 0x89, 0xD0, 0x89, 0xCA, 0xEB, 0x04, 0x31, 0xC0, 0x31, 0xD2, 0x5B, 0xC3 };
/* RDTSC x86:
rdtsc;
ret;
0: 0f 31 rdtsc
2: c3 ret
*/
rdtsc = new byte[] { 0x0F, 0x31, 0xC3 };
/* RDTSCP x86
rdtscp;
ret;
0: 0f 01 f9 rdtscp
3: c3 ret
*/
rdtscp = new byte[] { 0x0F, 0x01, 0xF9, 0xC3 };
/* RDTSC + CPUID x86
push ebx;
xor eax,eax;
cpuid;
rdtsc;
pop ebx;
ret;
0: 53 push ebx
1: 31 c0 xor eax,eax
3: 0f a2 cpuid
5: 0f 31 rdtsc
7: 5b pop ebx
8: c3 ret
*/
rdtsccpuid = new byte[] { 0x53, 0x31, 0xC0, 0x0F, 0xA2, 0x0F, 0x31, 0x5B, 0xC3 };
}
IntPtr buf = IntPtr.Zero;
try
{
// We pad the functions to 64 bytes (the length of a cache
// line on the Intel processors)
int cpuidLength = (cpuid.Length & 63) != 0 ? (cpuid.Length | 63) + 1 : cpuid.Length;
int rdtscLength = (rdtsc.Length & 63) != 0 ? (rdtsc.Length | 63) + 1 : rdtsc.Length;
int rdtscpLength = (rdtscp.Length & 63) != 0 ? (rdtscp.Length | 63) + 1 : rdtscp.Length;
int rdtsccpuidLength = (rdtsccpuid.Length & 63) != 0 ? (rdtsccpuid.Length | 63) + 1 : rdtsccpuid.Length;
// We don't know which one of rdtscp or rdtsccpuid we will
// use, so we calculate space for the biggest one.
// Note that it is very unlikely that we will go over 4096
// bytes (the minimum size of memory allocated by
// VirtualAlloc)
int totalLength = cpuidLength + rdtscLength + Math.Max(rdtscpLength, rdtsccpuidLength);
// We VirtualAlloc totalLength bytes, with R/W access
// Note that from what I've read, MEM_RESERVE is useless
// if the first parameter is IntPtr.Zero
buf = VirtualAlloc(IntPtr.Zero, (IntPtr)totalLength, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
if (buf == IntPtr.Zero)
{
throw new Win32Exception();
}
// Copy cpuid instructions in the buf
Marshal.Copy(cpuid, 0, buf, cpuid.Length);
for (int i = cpuid.Length; i < cpuidLength; i++)
{
Marshal.WriteByte(buf, i, 0x90); // nop
}
// Copy rdtsc instructions in the buf
Marshal.Copy(rdtsc, 0, buf + cpuidLength, rdtsc.Length);
for (int i = rdtsc.Length; i < rdtscLength; i++)
{
Marshal.WriteByte(buf, cpuidLength + i, 0x90); // nop
}
var cpuidFunc = (FuncUInt64)Marshal.GetDelegateForFunctionPointer(buf, typeof(FuncUInt64));
// We use cpuid, EAX=0x80000001 to check for the rdtscp
ulong supportedFeatures = cpuidFunc();
byte[] rdtscpSelected;
int rdtscpSelectedLength;
// Check the rdtscp flag
if ((supportedFeatures & (1L << 27)) != 0)
{
// rdtscp supported
rdtscpSelected = rdtscp;
rdtscpSelectedLength = rdtscpLength;
IsRdtscPSupported = true;
}
else
{
// rdtscp not supported. We use cpuid + rdtsc
rdtscpSelected = rdtsccpuid;
rdtscpSelectedLength = rdtsccpuidLength;
IsRdtscPSupported = false;
}
// Copy rdtscp/rdtsccpuid instructions in the buf
Marshal.Copy(rdtscpSelected, 0, buf + cpuidLength + rdtscLength, rdtscpSelected.Length);
for (int i = rdtscpSelected.Length; i < rdtscpSelectedLength; i++)
{
Marshal.WriteByte(buf, cpuidLength + rdtscLength + i, 0x90); // nop
}
// Change the access of the allocated memory from R/W to Execute
uint oldProtection;
bool result = VirtualProtect(buf, (IntPtr)totalLength, PAGE_EXECUTE, out oldProtection);
if (!result)
{
throw new Win32Exception();
}
// Create a delegate to the "function"
Timestamp = (FuncUInt64)Marshal.GetDelegateForFunctionPointer(buf + cpuidLength, typeof(FuncUInt64));
TimestampP = (FuncUInt64)Marshal.GetDelegateForFunctionPointer(buf + cpuidLength + rdtscLength, typeof(FuncUInt64));
buf = IntPtr.Zero;
}
finally
{
// There was an error!
if (buf != IntPtr.Zero)
{
// Free the allocated memory
bool result = VirtualFree(buf, IntPtr.Zero, MEM_RELEASE);
if (!result)
{
throw new Win32Exception();
}
}
}
}
// Fallback if rdtsc isn't available. We can't use directly
// Stopwatch.GetTimestamp() because the return type is different.
private static ulong StopwatchGetTimestamp()
{
return unchecked((ulong)Stopwatch.GetTimestamp());
}
}
有更长......有两种方法
It is much longer... There are two methods,
ulong ts1 = Rdtsc.Timestamp();
ulong ts2 = Rdtsc.TimestampP();
第一个使用 RDTSC
,而第二个用途 rdtscp
。 rdtscp
比 RDTSC
更好,因为它不是在管道中重新排序。在 TimestampP
方法人有对老年人的处理器,使用 CPUID + RDTSC
回退,但后备相当慢。对于这两个有使用非Intel / AMD处理器备用的 Stopwatch.GetTimestamp()
。在内部类使用 CPUID
指令来检查 rdtscp
指令的存在。有两个字段, IsRdtscSupported
和 IsRdtscPSupported
,告诉处理器是否支持 RDTSC
和 rdtscp
。
The first one uses rdtsc
, while the second one uses rdtscp
. rdtscp
is better than rdtsc
because it isn't reordered in the pipeline. The TimestampP
method one has a fallback for older processors, using cpuid + rdtsc
, but the fallback is quite slower. For both there is a fallback for non-Intel/Amd processors using the Stopwatch.GetTimestamp()
. Internally the class uses the cpuid
instruction to check for the presence of the rdtscp
instruction. There are two fields, IsRdtscSupported
and IsRdtscPSupported
that tells if the processor supports rdtsc
and rdtscp
.
这篇关于有没有一种方法调用从C#的RDTSC汇编指令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!