问题描述
这是一个后续行动,我的previous问题:Does .NET互操作数组复制回力或销呢?
This is a follow-up to my previous question: Does .NET interop copy array data back and force or pins it?
我的方法是一个COM接口的方法(而不是一个的DllImport
法)。 C#的签名看起来是这样的:
My method is a COM interface method (rather than a DllImport
method). The C# signature looks like this:
void Next(ref int pcch,
[In, Out, MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 0)]
char [] pchText);
MSDN 说:
在托管char类型,它有统一code格式默认情况下,是 传递给非托管code,互操作封送转换的字符 设置为ANSI。您可以将DllImportAttribute属性应用到 平台调用声明和StructLayoutAttribute属性 一个COM互操作的声明来控制字符集 封送字符类型使用。
此外,@HansPassant在他的回答这里说:
Also, @HansPassant in his answer here says:
一个char []不能被封送LPWSTR,它必须是LPArray。现在, charset属性发挥了作用,因为你没有指定它时, 的char []将被封送一个8位的char [],而不是16位的wchar_t的[]。 所述编组数组元素是不相同的大小(这是不 blittable),因此,编码器必须将阵列。
pretty的不可取的,特别是考虑到你的C ++ code预计, wchar_t的。一个非常简单的方法,告诉在这种特定情况下是没有得到 任何东西阵列中。如果数组被复制,然后封 你必须明确地告诉编组,数组必须是 通话结束后复制回。你必须应用[输入,输出] 属性的参数。你会得到中国人。
Pretty undesirable, particularly given that your C++ code expects wchar_t. A very easy way to tell in this specific case is not getting anything back in the array. If the array is marshaled by copying then you have to tell the marshaller explicitly that the array needs to be copied back after the call. You'd have to apply the [In, Out] attribute on the argument. You'll get Chinese.
我coudn't找到一个模拟 字符集
(通常与 DllImportAttribute
和 StructLayoutAttribute
),它可以应用于一个COM接口方法
I coudn't find an analog of CharSet
(normally used with DllImportAttribute
and StructLayoutAttribute
) which could be applied to a COM interface method.
不过,我没有得到的输出中国人。一切似乎做工精细,我得到正确的统一code字符COM回来。
Nevertheless, I don't get "Chinese" on the output. Everything seems to work fine, I do get correct Unicode characters back from COM.
这是否意味着字符
总是跨preTED为 WCHAR
为COM互操作的方法?
Does it mean Char
is always interpreted as WCHAR
for COM method interop?
我找不到任何文件确认或否认这一点。
I couldn't find any documentation confirming or denying this.
推荐答案
我觉得这是一个很好的问题,而字符
(系统。字符
)的互操作行为确实值得一些关注。
I think this is a good question, and the char
(System.Char
) interop behavior does deserve some attention.
在管理code,的sizeof(字符)
总是等于 2
(两个字节),因为在.NET字符总是统一code。
In managed code, sizeof(char)
is always equal 2
(two bytes), because in .NET characters are always Unicode.
不过,封送处理规则的情况下有所不同,当字符
的的P / Invoke(调用一个导出的DLL API)和COM(调用COM接口方法)。
Nevertheless, the marshaling rules differ between cases when char
for P/Invoke (calling an exported DLL API) and COM (calling a COM interface method).
的P / Invoke的的可以显式地被用于任何 [的DllImport]
属性,或隐式通过 [模块|组件:DefaultCharSet(CharSet.Auto |安思|统一code)]
,更改默认设置为每个模块或每个装配[的DllImport]
声明。
For P/Invoke, CharSet
can be used explictly with any [DllImport]
attribute, or implicitly via [module|assembly: DefaultCharSet(CharSet.Auto|Ansi|Unicode)]
, to change the default setting for all [DllImport]
declarations per module or per assembly.
默认值为 CharSet.Ansi
,这意味着将有统一code到ANSI转换。我ussualy更改默认为统一code与 [模块:DefaultCharSet(CharSet.Uni code)]
,然后有选择地使用 [的DllImport(字符集= CharSet.Ansi)]
在那些罕见的情况下,我需要调用一个ANSI API。
The default value is CharSet.Ansi
, which means there will be Unicode-to-ANSI conversion. I ussualy change the default to Unicode with [module: DefaultCharSet(CharSet.Unicode)]
, and then selectively use [DllImport(CharSet = CharSet.Ansi)]
in those rare case where I need call an ANSI API.
也可以改变任何特定的字符
-typed参数与的MarshalAs(UnmanagedType.U1 | U2)
或的MarshalAs(UnmanagedType.LPArray,ArraySubType = UnmanagedType.U1 | U2)
(对于的char []
参数)。例如,你可能有这样的事情:
It is also possible to alter any specific char
-typed parameter with MarshalAs(UnmanagedType.U1|U2)
or MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.U1|U2)
(for a char[]
parameter). E.g., you may have something like this:
[DllImport("Test.dll", ExactSpelling = true, CharSet = CharSet.Unicode)]
static extern bool TestApi(
int length,
[In, Out, MarshalAs(UnmanagedType.LPArray] char[] buff1,
[In, Out, MarshalAs(UnmanagedType.LPArray,
ArraySubType = UnmanagedType.U1)] char[] buff2);
在这种情况下, BUFF1
将原样传递的双字节值的数组(这是),但 buff2
将被转换为与从单个字节值的数组。请注意,这仍然会是一个聪明的,统一code到OS-电流 - code-页(反面)转换为 buff2
。例如,一个统一code'\ x20AC(€
)将成为 \ X80
非托管code(规定外的OS code页面的Windows 1252
)。这是怎样的的MarshalAs编组(UnmanagedType.LPArray,ArraySubType = UnmanagedType.U1)的char [] BUFF
是从的MarshalAs不同(UnmanagedType.LPArray ,ArraySubType = UnmanagedType.U1)USHORT [] BUFF
。对于 USHORT
, 0x20AC
将被简单地转换为 0xAC
。
In this case, buff1
will be passed as an array of double-byte values (as is), but buff2
will be converted to and from an array of single byte values. Note, this still will be a smart, Unicode-to-OS-current-code-page (and back) conversion for buff2
. E.g, a Unicode '\x20AC' (€
) will become \x80
in the unmanaged code (rovided the OS code page is Windows-1252
). This is how marshalling of MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.U1)] char[] buff
would be different from MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.U1)] ushort[] buff
. For ushort
, 0x20AC
would be simply converted to 0xAC
.
对于调用COM接口方法,这个故事是完全不同的。在那里, 字符
总是被视为一个双字节值,再presenting一个统一code字。也许,这样的设计决策的原因可能是距离Don Box的基本COM暗示(引述脚注<一href="http://books.google.com.au/books?id=kfRWvKSePmAC&lpg=PA74&ots=o8aWicbKdo&dq=OLECHAR%20type&pg=PA74#v=onepage&q=OLECHAR%20type&f=false"相对=nofollow>这个页面):
For calling a COM interface method, the story is quite different. There, char
is always treated as a double-byte value representing a Unicode character. Perhaps, the reason for such design decision could be implied from Don Box's "Essential COM" (quoting the footnote from this page):
在 OLECHAR
类型被选择赞成使用由Win32 API,以减轻共同 TCHAR
数据类型需要支持两个版本的每个接口( CHAR
和 WCHAR
)。通过支持只有一个字符类型,对象开发人员从使用他们的客户UNI code preprocessor符号的状态脱钩。
显然,同一概念的方式作出了.NET。我是pretty的信心,这是即使是传统的ANSI平台真(例如Windows 95,其中 Marshal.SystemDefaultCharSize == 1
)。
Apparently, the same concept made its way to .NET. I'm pretty confident this is true even for legacy ANSI platforms (like Windows 95, where Marshal.SystemDefaultCharSize == 1
).
注意 DefaultCharSet
并没有对字符
任何作用时,它的COM接口方法签名的一部分。无论是有办法字符集
明确应用。不过,你仍然可以完全控制每个参数的封送处理行为与的MarshalAs
,以完全相同的方式进行的P / Invoke以上。例如,你的下一页
方法可能看上去像下面,如果非托管COM code预计,ANSI字符缓冲区:
Note that DefaultCharSet
doesn't have any effect on char
when it's a part of the COM interface method signature. Neither there is a way to apply CharSet
explicitly. However, you still have full control over the marshaling behavior of each individual parameter with MarshalAs
, in exactly the same way as for P/Invoke above. E.g., your Next
method might look like below, in case the unmanaged COM code expects a buffer of ANSI characters:
void Next(ref int pcch,
[In, Out, MarshalAs(UnmanagedType.LPArray,
ArraySubType = UnmanagedType.U1, SizeParamIndex = 0)] char [] pchText);
这篇关于COM方法,char类型和字符集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!