本文介绍了是否有任何希望将ForeignPtr转换为ByteArray#(对于函数:: ByteString - > Vector)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

出于性能方面的原因,我希望将 ByteString (strict,现在)的零复制转换为 Vector 。由于 Vector 仅仅是一个 ByteArray#,并且 ByteString 是一个 ForeignPtr ,它可能类似于:

  caseBStoVector: :ByteString  - >向量a 
caseBStoVector(BS fptr off len)=
withForeignPtr fptr $ \ptr - >做
让ptr'= plusPtr ptr off
p = alignPtr ptr'(alignment(undefined :: a))
barr = ptrToByteArray#p len - 我想要这个函数,或者类似的
barr'= ByteArray barr
alignI = minusPtr p ptr
size =(len-alignI)`div` sizeOf(undefined :: a)
return(Vector 0 size barr')

这当然不对。即使缺少函数 ptrToByteArray#,这似乎也需要在之外转义 ptr withForeignPtr 范围。所以我的quesetions是:


  1. 这篇文章可能会宣传我对 ByteArray#,如果任何人都可以谈论 ByteArray#,它是代表性,它是如何管理的(GCed)等等,我会很感激。


  2. 事实上, ByteArray#存在于GCed堆中,而 ForeignPtr 外部似乎是一个基本问题 - 所有的访问操作都是不同的。也许我应该从 = ByteArray!Int!Int 重新定义 Vector 为另一个间接寻址的东西?有些像 = Location!Int!Int 其中 data Location = LocBA ByteArray | LocFPtr ForeignPtr 并为这两种类型提供包装操作?虽然这种间接方式可能会损害性能。

  3. 没有将这两者结合在一起,也许我可以访问 ForeignPtr 以更有效的方式。有谁知道一个将> ForeignPtr (或 ByteString )视为任意可存储原始类型?这仍然会让我失去从Vector包到流的融合和调整。 p>免责声明:这里的一切都是实现细节,并且特定于GHC和发布时所讨论的库的内部表示形式。

    这个回复是几年事实之后,但确实有可能获得指向bytearray内容的指针。由于GC喜欢在堆中移动数据,GC堆外面的东西可能会泄漏,这不是理想的问题。 GHC通过以下方式解决这个问题:

    newPinnedByteArray#:: Int# - >状态#s - > (#State#s,MutableByteArray#s#)



    原始bytearrays(内部typedef'd C char数组)可静态固定为地址。 GC保证不移动它们。您可以将bytearray引用转换为具有此函数的指针:

    byteArrayContents#:: ByteArray# - > Addr#



    地址类型构成Ptr和ForeignPtr类型的基础。 Ptrs是用幻影类型标记的地址,ForeignPtrs是可选的GHC内存和IORef终结器引用。

    如果你的ByteString是建立了Haskell。否则,你无法获得对字​​节数组的引用。您无法取消引用任意地址。不要试图施放或强制你的方式来一个bytearray;这种方式存在段错误。例如:

    $ $ p $ { - #LANGUAGE MagicHash,UnboxedTuples# - }

    导入GHC.IO
    import GHC.Prim
    import GHC.Types

    main :: IO()
    main = test
    $ b $ test :: IO() - - 创建测试数组。
    test = IO $ \s0 - > {(#s1,mbarr##) - >的情况newPinnedByteArray#8#s0 - >
    - 写一些东西并将其作为基准读回。
    case writeInt64Array#mbarr#0#1#s1 of {s2 - > $(#s3,x##) - >
    case readInt64Array#mbarr#0#s2
    - 打印它。应该匹配所写的内容。 {(#s4,_#) - >
    case unIO(print(I#x#))s3
    - 将bytearray转换为指针。
    case byteArrayContents#(unsafeCoerce#mbarr#)of {addr# - >
    - 取消引用指针。 $(#s5,x'##) - >
    case readInt64OffAddr#addr#0#s4
    - 列印读物。应该与上述相符。
    case unIO(print(I#x'#))s5 {(#s6,_#) - >
    - 将指针强制转换为数组并尝试读取。
    case readInt64Array#(unsafeCoerce#addr#){(#s7,y##) - >&#
    - Haskell不是C.数组不是指针。
    - 这不匹配。它可能会段错误。充其量,它是垃圾。
    case unIO(print(I#y#))s7 of(#s8,_#) - > (#s8,()#)}}}}}}}}


    输出:
    1
    1
    (一些垃圾值)

    要从ByteString获取字节数组,您需要从Data.ByteString.Internal和pattern中导入构造函数匹配。
    $ b $ pre $ data ByteString = PS!(ForeignPtr Word8)!Int!Int
    (\(PS foreignPointer offset长度) - > foreignPointer)

    现在我们需要将商品从ForeignPtr中取出。这部分完全是特定于实现的。对于GHC,从GHC.ForeignPtr进口。
    $ b $ pre $ 数据ForeignPtr a = ForeignPtr Addr#ForeignPtrContents
    (\(ForeignPtr addr#foreignPointerContents) - > foreignPointerContents)

    data ForeignPtrContents = PlainForeignPtr!(IORef(Finalizers,[IO()]))
    | MallocPtr(MutableByteArray#RealWorld)!(IORef(Finalizers,[IO()]))
    | PlainPtr(MutableByteArray#RealWorld)

    在GHC中,ByteString是使用PlainPtrs构建的, 。他们没有终结者。当它们超出范围时,它们就像普通的Haskell数据一样GC'd。虽然,地址不算数。 GHC假定他们指向GC堆外的事物。如果bytearray本身超出了范围,那么你只剩下一个悬挂的指针。

      data PlainPtr =(MutableByteArray#RealWorld )
    (\(PlainPtr mutableByteArray#) - > mutableByteArray#)

    MutableByteArrays是与ByteArrays相同。如果您想要真正的零拷贝构造,请确保将unsafeCoerce#或unsafeFreeze#设置为bytearray。否则,GHC会创建一个副本。

      mbarrTobarr :: MutableByteArray#s  - > ByteArray#
    mbarrTobarr = unsafeCoerce#

    现在你已经准备好了ByteString的原始内容成为一个矢量。



    最好的祝福,


    For performance reasons I would like a zero-copy cast of ByteString (strict, for now) to a Vector. Since Vector is just a ByteArray# under the hood, and ByteString is a ForeignPtr this might look something like:

    caseBStoVector :: ByteString -> Vector a
    caseBStoVector (BS fptr off len) =
        withForeignPtr fptr $ \ptr -> do
            let ptr' = plusPtr ptr off
                p = alignPtr ptr' (alignment (undefined :: a))
                barr = ptrToByteArray# p len  -- I want this function, or something similar 
                barr' = ByteArray barr
                alignI = minusPtr p ptr
                size = (len-alignI) `div` sizeOf (undefined :: a)
            return (Vector 0 size barr')
    

    That certainly isn't right. Even with the missing function ptrToByteArray# this seems to need to escape the ptr outside of the withForeignPtr scope. So my quesetions are:

    1. This post probably advertises my primitive understanding of ByteArray#, if anyone can talk a bit about ByteArray#, it's representation, how it is managed (GCed), etc I'd be grateful.

    2. The fact that ByteArray# lives on the GCed heap and ForeignPtr is external seems to be a fundamental issue - all the access operations are different. Perhaps I should look at redefining Vector from = ByteArray !Int !Int to something with another indirection? Someing like = Location !Int !Int where data Location = LocBA ByteArray | LocFPtr ForeignPtr and provide wrapping operations for both those types? This indirection might hurt performance too much though.

    3. Failing to marry these two together, maybe I can just access arbitrary element types in a ForeignPtr in a more efficient manner. Does anyone know of a library that treats ForeignPtr (or ByteString) as an array of arbitrary Storable or Primitive types? This would still lose me the stream fusion and tuning from the Vector package.

    解决方案

    Disclaimer: everything here is an implementation detail and specific to GHC and the internal representations of the libraries in question at the time of posting.

    This response is a couple years after the fact, but it is indeed possible to get a pointer to bytearray contents. It's problematic as the GC likes to move data in the heap around, and things outside of the GC heap can leak, which isn't necessarily ideal. GHC solves this with:

    newPinnedByteArray# :: Int# -> State# s -> (#State# s, MutableByteArray# s#)

    Primitive bytearrays (internally typedef'd C char arrays) can be statically pinned to an address. The GC guarantees not to move them. You can convert a bytearray reference to a pointer with this function:

    byteArrayContents# :: ByteArray# -> Addr#

    The address type forms the basis of Ptr and ForeignPtr types. Ptrs are addresses marked with a phantom type and ForeignPtrs are that plus optional references to GHC memory and IORef finalizers.

    Disclaimer: This will only work if your ByteString was built Haskell. Otherwise, you can't get a reference to the bytearray. You cannot dereference an arbitrary addr. Don't try to cast or coerce your way to a bytearray; that way lies segfaults. Example:

    {-# LANGUAGE MagicHash, UnboxedTuples #-}
    
    import GHC.IO
    import GHC.Prim
    import GHC.Types
    
    main :: IO()
    main = test
    
    test :: IO ()        -- Create the test array.
    test = IO $ \s0 -> case newPinnedByteArray# 8# s0 of {(# s1, mbarr# #) ->
                         -- Write something and read it back as baseline.
                       case writeInt64Array# mbarr# 0# 1# s1 of {s2 ->
                       case readInt64Array# mbarr# 0# s2 of {(# s3, x# #) ->
                         -- Print it. Should match what was written.
                       case unIO (print (I# x#)) s3 of {(# s4, _ #) ->
                         -- Convert bytearray to pointer.
                       case byteArrayContents# (unsafeCoerce# mbarr#) of {addr# ->
                         -- Dereference the pointer.
                       case readInt64OffAddr# addr# 0# s4 of {(# s5, x'# #) ->
                         -- Print what's read. Should match the above.
                       case unIO (print (I# x'#)) s5 of {(# s6, _ #) ->
                         -- Coerce the pointer into an array and try to read.
                       case readInt64Array# (unsafeCoerce# addr#) 0# s6 of {(# s7, y# #) ->
                         -- Haskell is not C. Arrays are not pointers.
                         -- This won't match. It might segfault. At best, it's garbage.
                       case unIO (print (I# y#)) s7 of (# s8, _ #) -> (# s8, () #)}}}}}}}}
    
    
    Output:
       1
       1
     (some garbage value)
    

    To get the bytearray from a ByteString, you need to import the constructor from Data.ByteString.Internal and pattern match.

    data ByteString = PS !(ForeignPtr Word8) !Int !Int
    (\(PS foreignPointer offset length) -> foreignPointer)
    

    Now we need to rip the goods out of the ForeignPtr. This part is entirely implementation-specific. For GHC, import from GHC.ForeignPtr.

    data ForeignPtr a = ForeignPtr Addr# ForeignPtrContents
    (\(ForeignPtr addr# foreignPointerContents) -> foreignPointerContents)
    
    data ForeignPtrContents = PlainForeignPtr !(IORef (Finalizers, [IO ()]))
                            | MallocPtr      (MutableByteArray# RealWorld) !(IORef (Finalizers, [IO ()]))
                            | PlainPtr       (MutableByteArray# RealWorld)
    

    In GHC, ByteString is built with PlainPtrs which are wrapped around pinned byte arrays. They carry no finalizers. They are GC'd like regular Haskell data when they fall out of scope. Addrs don't count, though. GHC assumes they point to things outside of the GC heap. If the bytearray itself falls out of the scope, you're left with a dangling pointer.

    data PlainPtr = (MutableByteArray# RealWorld)
    (\(PlainPtr mutableByteArray#) -> mutableByteArray#)
    

    MutableByteArrays are identical to ByteArrays. If you want true zero-copy construction, make sure you either unsafeCoerce# or unsafeFreeze# to a bytearray. Otherwise, GHC creates a duplicate.

    mbarrTobarr :: MutableByteArray# s -> ByteArray#
    mbarrTobarr = unsafeCoerce#
    

    And now you have the raw contents of the ByteString ready to be turned into a vector.

    Best Wishes,

    这篇关于是否有任何希望将ForeignPtr转换为ByteArray#(对于函数:: ByteString - > Vector)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-29 08:22