本文介绍了访问4字节长的单个字节(优化)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嗨!


在*给定架构*的机器上(就字节序等而言),我想要访问各个字节的
很长(*一次性*)和

一样快。


版本A,版本B或版本C更好吗?还有其他

替代品吗?


/ ****版本A ****** /

{

long mylong = -1;


printf(" 0x%02x 0x%02x 0x%02x 0x%02x \ n",\

(unsigned char)mylong,\

(unsigned char)(mylong> 8),\

(unsigned char)(mylong> > 16),\

(unsigned char)(mylong>> 24));

}


/ ****版本B ****** /

{

long mylong = -1;

unsigned char f_b [4 ];


*((长*)& f_b)= mylong;


printf(" 0x%02x 0x%02x 0x %02x 0x%02x \ n",f_b [0],f_b [1],f_b [2],

f_b [3]);

}


/ ****版本C ****** /

{

union align_array_and_long {

unsigned char four_b [4];

long dummy;

};


long mylong = -1;

union align_array_and_long四;


four =(union align_arra y_and_long)mylong;


printf(" 0x%02x 0x%02x 0x%02x 0x%02x \ n",\

four.four_b [0],\

four.four_b [1],\

four.four_b [2],\

四。 four_b [3]);

}

我的感觉是版本C最好。


可以说什么关于数组f_b和mylong对齐

版本B?

(我认为在版本B中,数组f_b和mylong的对齐可能是

倾斜,在这种情况下,它比C慢。如果在版本B中,four_b和

mylong对齐,那么版本B与版本C相同?)


..

..

..


现在如果需要访问单个字节*整个时间*?

A2,B2,C2或D2更快吗?


/ ****版本A2 ****** /

{

long mylong = -1;

unsigned char b0,b1,b2,b3;


b0 =(unsigned char)mylong;

b1 =(un签名字符)(mylong> 8);

b2 =(unsigned char)(mylong>> 16);

b3 =(unsigned char)(mylong> ;> 24);


//访问:b0,b1,b2,b3

}


/ ****版本B2 ****** /

{

long mylong = -1;

unsigned char f_b [4 ];


*((long *)& f_b)= mylong;


// access:f_b [0],f_b [ 1],f_b [2],f_b [3]

}


/ ****版本C2 ****** /

{

union align_array_and_long {

unsigned char four_b [4];

long dummy;

};


long mylong = -1;

union align_array_and_long four;


four =(union align_array_and_long )mylong;


//访问:four.four_b [0],four.four_b [1],four.four_b [2],

四。 four_b [3]

}


/ ****版本D2 ****** /

{

struct four_struct {

unsigned char byte0;

unsigned ch ar byte1;

unsigned char byte2;

unsigned char byte3;

};


union align_array_and_long {

struct four_struct four_s;

long dummy;

};


long mylong = -1;

union align_array_and_long四;


four =(union align_array_and_long)mylong;


//访问:four.four_s.byte0,four.four_s.byte1,

four.four_s.byte2,four.four_s.byte3


}


我的感觉是版本D2是最好的:mylong一次性加载到
中的四个(没有像A2中的班次等)。


在D2中,编译器总是知道我们确切地指定了我们想要的字节



four.four_s.byte0

这个在C2中是不同的:four.four_b [which_byte]

或者它真的不一样吗? :

是这两个等价物:four.four_s.byte0< - four.four_b [0] ???


..

..

..


版本A和A2在字节顺序方面都是可移植的,但问题是

不是关于可移植性 - 它是关于给定的

平台的优化。


谢谢。


anon.asdf

Hi!

On a machine of *given architecture* (in terms of endianness etc.), I
want to access the individual bytes of a long (*once-off*) as fast as
possible.

Is version A, version B, or version C better? Are there other
alternatives?

/**** Version A ******/
{
long mylong = -1;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
(unsigned char) mylong , \
(unsigned char) (mylong >8), \
(unsigned char) (mylong >>16), \
(unsigned char) (mylong >>24));
}

/**** Version B ******/
{
long mylong = -1;
unsigned char f_b[4];

*((long *)&f_b) = mylong;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", f_b[0], f_b[1], f_b[2],
f_b[3]);
}

/**** Version C ******/
{
union align_array_and_long {
unsigned char four_b[4];
long dummy;
};

long mylong = -1;
union align_array_and_long four;

four = (union align_array_and_long) mylong;

printf("0x%02x 0x%02x 0x%02x 0x%02x\n", \
four.four_b[0], \
four.four_b[1], \
four.four_b[2], \
four.four_b[3]);
}
My feeling is the Version C is best.

What can be said about the alignment of array f_b and mylong in
Version B?
(I think in Version B, the alignment of array f_b and mylong might be
skew, in which case it is slower than C. If in Version B, four_b and
mylong are aligned, then Version B is identical to Version C.?)

..
..
..

Now what if one needs to access the individual bytes the *whole time*?
Is A2, B2, C2 or D2 faster?

/**** Version A2 ******/
{
long mylong = -1;
unsigned char b0, b1, b2, b3;

b0 = (unsigned char) mylong;
b1 = (unsigned char) (mylong >8);
b2 = (unsigned char) (mylong >>16);
b3 = (unsigned char) (mylong >>24);

// access: b0, b1, b2, b3
}

/**** Version B2 ******/
{
long mylong = -1;
unsigned char f_b[4];

*((long *)&f_b) = mylong;

// access: f_b[0], f_b[1], f_b[2], f_b[3]
}

/**** Version C2 ******/
{
union align_array_and_long {
unsigned char four_b[4];
long dummy;
};

long mylong = -1;
union align_array_and_long four;

four = (union align_array_and_long) mylong;

// access: four.four_b[0], four.four_b[1], four.four_b[2],
four.four_b[3]
}

/**** Version D2 ******/
{
struct four_struct {
unsigned char byte0;
unsigned char byte1;
unsigned char byte2;
unsigned char byte3;
};

union align_array_and_long {
struct four_struct four_s;
long dummy;
};

long mylong = -1;
union align_array_and_long four;

four = (union align_array_and_long) mylong;

// access: four.four_s.byte0, four.four_s.byte1,
four.four_s.byte2, four.four_s.byte3

}

My feeling is the Version D2 is best: mylong is loaded into four in
one shot (no shifts etc. as in A2).

And in D2 the compiler always knows that we specify exactly which byte
we want:
four.four_s.byte0
This is different in C2: four.four_b[which_byte]
Or is it really different? :
are these 2 equivalent: four.four_s.byte0 <--four.four_b[0] ???

..
..
..

Version A and A2 are portable in terms of endianness, but the question
is not about portability - it''s about optimization for a given
platform.

Thanks.

anon.asdf

推荐答案



测量它们并查找。

Measure them and find out.



测量它们并找出答案。


-

Chris"表现一无所获没有测量 Dollin


Hewlett-Packard Limited注册号:

注册办事处:Cain Road,Bracknell,Berks RG12 1HN 690597英格兰

Measure them and find out.

--
Chris "performance is nothing without measurement" Dollin

Hewlett-Packard Limited registered no:
registered office: Cain Road, Bracknell, Berks RG12 1HN 690597 England




Mu。


微优化的规则之一:

不要做它。

规则二的微优化(仅限专家!):

不要这样做。

规则三微优化(仅在胁迫下):

测量,测量,测量。


除非你知道它重要,否则假设它不是,并写下

最清晰的代码。如果您认为您确实知道这很重要,请先收集

证据。只有通过在优化设置下,在您的

项目中使用您的实现,您才能最快地测量哪个是最快的?并且不要感到惊讶

发现你错了,差价不超过0.5%,

,误差为1%。


Richard

Mu.

Rule one of micro-optimisation:
Don''t Do It.
Rule two of micro-optimisation (for experts only!):
Don''t Do It Yet.
Rule three of micro-optimisation (only under duress):
Measure, Measure, Measure.

Unless you _know_ that it matters, assume that it doesn''t, and write the
clearest code. If you think you do know that it matters, first gather
evidence. Only by measuring which is the fastest will you know which is
the fastest - on your machine, using your implementation, in your
project, under your optimisation settings. And don''t be surprised to
find out that you were wrong, and the difference is no more than 0.5%,
with an error of 1%.

Richard




为了最快,请尝试:


printf(" 0x%08lx \ n",mylong); / * :-) * /


版本B和C,调用未定义的行为。定义的方式

版本B是:


void * vp =& mylong;

unsigned char * cp = vp;

/ *现在做你想要的cp [0]到cp [sizeof long] * /


没有必要撒谎数组。版本C非常可靠,但是该标准不保证访问任何

工会会员而不是最后一个工作会员(除非特别指出
) 普通初始会员的
例外情况。


类似的评论适用于您的其他代码片段。


-

Ben。

For the fastest, try:

printf("0x%08lx\n", mylong); /* :-) */

Versions B and C, invoke undefined behaviour. The defined way to do
version B is:

void *vp = &mylong;
unsigned char *cp = vp;
/* now do what you want with cp[0] to cp[sizeof long] */

There is no need to lie about having an array. Version C is very
likely to work, but the standard does not guarantee accesses to any
union member other than the last one assigned to (barring the special
exception for "common initial members").

Similar comments apply to the your other code fragments.

--
Ben.


这篇关于访问4字节长的单个字节(优化)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-15 16:53