问题描述
出于性能方面的原因,我希望将 ByteString
(strict,现在)的零复制转换为 Vector
。由于 Vector
仅仅是一个 ByteArray#
,并且 ByteString
是一个 ForeignPtr
,它可能类似于:
caseBStoVector: :ByteString - >向量a
caseBStoVector(BS fptr off len)=
withForeignPtr fptr $ \ptr - >做
让ptr'= plusPtr ptr off
p = alignPtr ptr'(alignment(undefined :: a))
barr = ptrToByteArray#p len - 我想要这个函数,或者类似的
barr'= ByteArray barr
alignI = minusPtr p ptr
size =(len-alignI)`div` sizeOf(undefined :: a)
return(Vector 0 size barr')
这当然不对。即使缺少函数 ptrToByteArray#
,这似乎也需要在之外转义
范围。所以我的quesetions是: ptr
withForeignPtr
-
这篇文章可能会宣传我对
ByteArray#
,如果任何人都可以谈论ByteArray#
,它是代表性,它是如何管理的(GCed)等等,我会很感激。 -
事实上,
ByteArray#
存在于GCed堆中,而ForeignPtr
外部似乎是一个基本问题 - 所有的访问操作都是不同的。也许我应该从= ByteArray!Int!Int
重新定义Vector
为另一个间接寻址的东西?有些像= Location!Int!Int
其中data Location = LocBA ByteArray | LocFPtr ForeignPtr
并为这两种类型提供包装操作?虽然这种间接方式可能会损害性能。 没有将这两者结合在一起,也许我可以访问 This post probably advertises my primitive understanding of
ByteArray#
, if anyone can talk a bit aboutByteArray#
, it's representation, how it is managed (GCed), etc I'd be grateful.The fact that
ByteArray#
lives on the GCed heap andForeignPtr
is external seems to be a fundamental issue - all the access operations are different. Perhaps I should look at redefiningVector
from= ByteArray !Int !Int
to something with another indirection? Someing like= Location !Int !Int
wheredata Location = LocBA ByteArray | LocFPtr ForeignPtr
and provide wrapping operations for both those types? This indirection might hurt performance too much though.Failing to marry these two together, maybe I can just access arbitrary element types in a
ForeignPtr
in a more efficient manner. Does anyone know of a library that treatsForeignPtr
(orByteString
) as an array of arbitraryStorable
orPrimitive
types? This would still lose me the stream fusion and tuning from the Vector package.
ForeignPtr
以更有效的方式。有谁知道一个将> ForeignPtr
(或 ByteString
)视为任意可存储
或原始
类型?这仍然会让我失去从Vector包到流的融合和调整。 p>免责声明:这里的一切都是实现细节,并且特定于GHC和发布时所讨论的库的内部表示形式。 这个回复是几年事实之后,但确实有可能获得指向bytearray内容的指针。由于GC喜欢在堆中移动数据,GC堆外面的东西可能会泄漏,这不是理想的问题。 GHC通过以下方式解决这个问题:
newPinnedByteArray#:: Int# - >状态#s - > (#State#s,MutableByteArray#s#)
原始bytearrays(内部typedef'd C char数组)可静态固定为地址。 GC保证不移动它们。您可以将bytearray引用转换为具有此函数的指针:
byteArrayContents#:: ByteArray# - > Addr#
地址类型构成Ptr和ForeignPtr类型的基础。 Ptrs是用幻影类型标记的地址,ForeignPtrs是可选的GHC内存和IORef终结器引用。
如果你的ByteString是建立了Haskell。否则,你无法获得对字节数组的引用。您无法取消引用任意地址。不要试图施放或强制你的方式来一个bytearray;这种方式存在段错误。例如:$ $ p $
{ - #LANGUAGE MagicHash,UnboxedTuples# - }
导入GHC.IO
import GHC.Prim
import GHC.Types
main :: IO()
main = test
$ b $ test :: IO() - - 创建测试数组。
test = IO $ \s0 - > {(#s1,mbarr##) - >的情况newPinnedByteArray#8#s0 - >
- 写一些东西并将其作为基准读回。
case writeInt64Array#mbarr#0#1#s1 of {s2 - > $(#s3,x##) - >
case readInt64Array#mbarr#0#s2
- 打印它。应该匹配所写的内容。 {(#s4,_#) - >
case unIO(print(I#x#))s3
- 将bytearray转换为指针。
case byteArrayContents#(unsafeCoerce#mbarr#)of {addr# - >
- 取消引用指针。 $(#s5,x'##) - >
case readInt64OffAddr#addr#0#s4
- 列印读物。应该与上述相符。
case unIO(print(I#x'#))s5 {(#s6,_#) - >
- 将指针强制转换为数组并尝试读取。
case readInt64Array#(unsafeCoerce#addr#){(#s7,y##) - >&#
- Haskell不是C.数组不是指针。
- 这不匹配。它可能会段错误。充其量,它是垃圾。
case unIO(print(I#y#))s7 of(#s8,_#) - > (#s8,()#)}}}}}}}}
输出:
1
1
(一些垃圾值)
要从ByteString获取字节数组,您需要从Data.ByteString.Internal和pattern中导入构造函数匹配。
$ b $ pre $ data ByteString = PS!(ForeignPtr Word8)!Int!Int
(\(PS foreignPointer offset长度) - > foreignPointer)
现在我们需要将商品从ForeignPtr中取出。这部分完全是特定于实现的。对于GHC,从GHC.ForeignPtr进口。
$ b $ pre $ 数据ForeignPtr a = ForeignPtr Addr#ForeignPtrContents
(\(ForeignPtr addr#foreignPointerContents) - > foreignPointerContents)
data ForeignPtrContents = PlainForeignPtr!(IORef(Finalizers,[IO()]))
| MallocPtr(MutableByteArray#RealWorld)!(IORef(Finalizers,[IO()]))
| PlainPtr(MutableByteArray#RealWorld)
在GHC中,ByteString是使用PlainPtrs构建的, 。他们没有终结者。当它们超出范围时,它们就像普通的Haskell数据一样GC'd。虽然,地址不算数。 GHC假定他们指向GC堆外的事物。如果bytearray本身超出了范围,那么你只剩下一个悬挂的指针。
data PlainPtr =(MutableByteArray#RealWorld )
(\(PlainPtr mutableByteArray#) - > mutableByteArray#)
MutableByteArrays是与ByteArrays相同。如果您想要真正的零拷贝构造,请确保将unsafeCoerce#或unsafeFreeze#设置为bytearray。否则,GHC会创建一个副本。
mbarrTobarr :: MutableByteArray#s - > ByteArray#
mbarrTobarr = unsafeCoerce#
现在你已经准备好了ByteString的原始内容成为一个矢量。
最好的祝福,
For performance reasons I would like a zero-copy cast of ByteString
(strict, for now) to a Vector
. Since Vector
is just a ByteArray#
under the hood, and ByteString
is a ForeignPtr
this might look something like:
caseBStoVector :: ByteString -> Vector a
caseBStoVector (BS fptr off len) =
withForeignPtr fptr $ \ptr -> do
let ptr' = plusPtr ptr off
p = alignPtr ptr' (alignment (undefined :: a))
barr = ptrToByteArray# p len -- I want this function, or something similar
barr' = ByteArray barr
alignI = minusPtr p ptr
size = (len-alignI) `div` sizeOf (undefined :: a)
return (Vector 0 size barr')
That certainly isn't right. Even with the missing function ptrToByteArray#
this seems to need to escape the ptr
outside of the withForeignPtr
scope. So my quesetions are:
Disclaimer: everything here is an implementation detail and specific to GHC and the internal representations of the libraries in question at the time of posting.
This response is a couple years after the fact, but it is indeed possible to get a pointer to bytearray contents. It's problematic as the GC likes to move data in the heap around, and things outside of the GC heap can leak, which isn't necessarily ideal. GHC solves this with:
newPinnedByteArray# :: Int# -> State# s -> (#State# s, MutableByteArray# s#)
Primitive bytearrays (internally typedef'd C char arrays) can be statically pinned to an address. The GC guarantees not to move them. You can convert a bytearray reference to a pointer with this function:
byteArrayContents# :: ByteArray# -> Addr#
The address type forms the basis of Ptr and ForeignPtr types. Ptrs are addresses marked with a phantom type and ForeignPtrs are that plus optional references to GHC memory and IORef finalizers.
Disclaimer: This will only work if your ByteString was built Haskell. Otherwise, you can't get a reference to the bytearray. You cannot dereference an arbitrary addr. Don't try to cast or coerce your way to a bytearray; that way lies segfaults. Example:
{-# LANGUAGE MagicHash, UnboxedTuples #-}
import GHC.IO
import GHC.Prim
import GHC.Types
main :: IO()
main = test
test :: IO () -- Create the test array.
test = IO $ \s0 -> case newPinnedByteArray# 8# s0 of {(# s1, mbarr# #) ->
-- Write something and read it back as baseline.
case writeInt64Array# mbarr# 0# 1# s1 of {s2 ->
case readInt64Array# mbarr# 0# s2 of {(# s3, x# #) ->
-- Print it. Should match what was written.
case unIO (print (I# x#)) s3 of {(# s4, _ #) ->
-- Convert bytearray to pointer.
case byteArrayContents# (unsafeCoerce# mbarr#) of {addr# ->
-- Dereference the pointer.
case readInt64OffAddr# addr# 0# s4 of {(# s5, x'# #) ->
-- Print what's read. Should match the above.
case unIO (print (I# x'#)) s5 of {(# s6, _ #) ->
-- Coerce the pointer into an array and try to read.
case readInt64Array# (unsafeCoerce# addr#) 0# s6 of {(# s7, y# #) ->
-- Haskell is not C. Arrays are not pointers.
-- This won't match. It might segfault. At best, it's garbage.
case unIO (print (I# y#)) s7 of (# s8, _ #) -> (# s8, () #)}}}}}}}}
Output:
1
1
(some garbage value)
To get the bytearray from a ByteString, you need to import the constructor from Data.ByteString.Internal and pattern match.
data ByteString = PS !(ForeignPtr Word8) !Int !Int
(\(PS foreignPointer offset length) -> foreignPointer)
Now we need to rip the goods out of the ForeignPtr. This part is entirely implementation-specific. For GHC, import from GHC.ForeignPtr.
data ForeignPtr a = ForeignPtr Addr# ForeignPtrContents
(\(ForeignPtr addr# foreignPointerContents) -> foreignPointerContents)
data ForeignPtrContents = PlainForeignPtr !(IORef (Finalizers, [IO ()]))
| MallocPtr (MutableByteArray# RealWorld) !(IORef (Finalizers, [IO ()]))
| PlainPtr (MutableByteArray# RealWorld)
In GHC, ByteString is built with PlainPtrs which are wrapped around pinned byte arrays. They carry no finalizers. They are GC'd like regular Haskell data when they fall out of scope. Addrs don't count, though. GHC assumes they point to things outside of the GC heap. If the bytearray itself falls out of the scope, you're left with a dangling pointer.
data PlainPtr = (MutableByteArray# RealWorld)
(\(PlainPtr mutableByteArray#) -> mutableByteArray#)
MutableByteArrays are identical to ByteArrays. If you want true zero-copy construction, make sure you either unsafeCoerce# or unsafeFreeze# to a bytearray. Otherwise, GHC creates a duplicate.
mbarrTobarr :: MutableByteArray# s -> ByteArray#
mbarrTobarr = unsafeCoerce#
And now you have the raw contents of the ByteString ready to be turned into a vector.
Best Wishes,
这篇关于是否有任何希望将ForeignPtr转换为ByteArray#(对于函数:: ByteString - > Vector)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!