本文介绍了英特尔第三代(以及下一代)是否以错误的方式执行此代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前的理解是,在某些情况下(当发生大量的YMM读/写时)第二代英特尔执行不正确,当YMM寄存器被相应的4个QWORD替换时,它可以工作,测试用例:



My current understanding is that in some cases (when massive YMM reads/writes occur) 2nd gen Intel executes them improperly, when YMM registers are replaced by corresponding 4 QWORD ones then it works, the test case:

/*
; 'Tsubame' decompression loop, 96-15+6=135 bytes long, 40 instructions:
; mark_description "Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.0.108 Build 20140";
; mark_description "-TP -O3 -QxSSE4.1 -D_N_YMM -D_N_prefetch_4096 -D_N_HIGH_PRIORITY -FAcs";

.B16.3::
  00015 41 0f 18 8b 00
        10 00 00         prefetcht0 BYTE PTR [4096+r11]
  0001d 41 8b 13         mov edx, DWORD PTR [r11]
  00020 89 d1            mov ecx, edx
  00022 83 e1 03         and ecx, 3
  00025 75 34            jne .B16.7
.B16.4::
  00027 0f b6 d2         movzx edx, dl
  0002a 85 d2            test edx, edx
  0002c 74 0a            je .B16.6
.B16.5::
  0002e c4 c1 7e 6f 43
        01               vmovdqu ymm0, YMMWORD PTR [1+r11]
  00034 c5 fe 7f 00      vmovdqu YMMWORD PTR [rax], ymm0
.B16.6::
  00038 89 d1            mov ecx, edx
  0003a 41 b9 01 00 00
        00               mov r9d, 1
  00040 ba 00 00 00 00   mov edx, 0
  00045 41 0f 44 d1      cmove edx, r9d
  00049 c1 e9 03         shr ecx, 3
  0004c c1 e2 04         shl edx, 4
  0004f 03 d1            add edx, ecx
  00051 ff c1            inc ecx
  00053 48 03 c2         add rax, rdx
  00056 4c 03 d9         add r11, rcx
  00059 eb 38            jmp .B16.8
.B16.7::
  0005b c1 e1 03         shl ecx, 3
  0005e 41 b9 ff ff ff
        ff               mov r9d, -1
  00064 41 d3 e9         shr r9d, cl
  00067 44 23 ca         and r9d, edx
  0006a 83 e2 0c         and edx, 12
  0006d 41 c1 e9 04      shr r9d, 4
  00071 f7 da            neg edx
  00073 83 c2 10         add edx, 16
  00076 49 f7 d9         neg r9
  00079 4c 03 c8         add r9, rax
  0007c c1 e9 03         shr ecx, 3
  0007f f7 d9            neg ecx
  00081 83 c1 04         add ecx, 4
  00084 c4 c1 7e 6f 01   vmovdqu ymm0, YMMWORD PTR [r9]
  00089 c5 fe 7f 00      vmovdqu YMMWORD PTR [rax], ymm0
  0008d 48 03 c2         add rax, rdx
  00090 4c 03 d9         add r11, rcx
.B16.8::
  00093 4d 3b d8         cmp r11, r8
  00096 0f 82 79 ff ff
        ff               jb .B16.3
*/





因为我只有Core 2和i5 2540M我无法尝试下一个减压功能是否适用于3 ???和下一个Intel CPU正常,所以我要求有人运行这个命令行并分享是否'FAILED':





Since I have only Core 2 and i5 2540M I cannot try whether next decompression function works on 3??? and next ones Intel CPUs properly, so I ask for someone to run this command line and share whether 'FAILED':

D:\Tsubame\buggy_AVX_compile>Nakamichi_Tsubame_YMM_PREFETCH_4096_Intel_15.0_64bit_SSE41.exe alice29.txt
Nakamichi 'Tsubame', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Note: Conor Stokes' LZSSE2(FASTEST Textual Decompressor) is embedded, all credits along with many thanks go to him.
Limitation: Uncompressed 8192 MB of filesize.
Current priority class is HIGH_PRIORITY_CLASS.
Allocating Source-Buffer 0 MB ...
Allocating Target-Buffer 32 MB ...
Allocating Verification-Buffer 0 MB ...
Compressing 152,089 bytes ...
-; Each rotation means 64KB are encoded; Done 100%
NumberOfFullLiterals (lower-the-better): 4
NumberOf(Tiny)Matches[Tiny]Window (4): 157
NumberOf(Short)Matches[Tiny]Window (8): 52
NumberOf(Medium)Matches[Tiny]Window (12): 11
RAM-to-RAM performance: 11 KB/s.
Compressed to 73,071 bytes.
Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0x1366,78ee
Target-file-Hash(FNV1A_YoshimitsuTRIAD) = 0x8cec,be70
Decompressing 73,071 (being the compressed stream) bytes ...
RAM-to-RAM performance: 1152 MB/s.
Verification (input and output sizes match) OK.
Verification (input and output blocks mismatch) FAILED!





我感兴趣的命令行:





The command line that interests me:

D:\Tsubame\buggy_AVX_compile>Nakamichi_Tsubame_YMM_PREFETCH_4096_Intel_15.0_64bit_SSE41.exe alice29.txt





[]



我在英特尔的论坛上问了同样的问题,遗憾的是,似乎没有人关心:



[]



我尝试过:



笔记本电脑东芝i5-2540M,Windows 7,英特尔C优化器v15.0



The test suite, 241KB zip file, executables & source & testdatafile[^]

I asked the same question on Intel's forum, sadly, no one seems to care:

YMMWORD != 4xQWORD[^]

What I have tried:

Laptop Toshiba i5-2540M, Windows 7, Intel C Optimizer v15.0

推荐答案




这篇关于英特尔第三代(以及下一代)是否以错误的方式执行此代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-31 10:19