问题描述
我已经收集在其上参与基于ARM主板通用DMA交易几个组成部分(包括软件和硬件)有些知识水平,但我不明白是怎么回事都完美地融为一体,我没找到一个完整连贯的说明这一点。
I've gathered some level of knowledge on several components (including software and hardware) which are involved in general DMA transactions in ARM based boards, but I don't understand how is it all perfectly integrated, I didn't find a full coherent description about this.
我会写下来的知识,我已经有高水平的,我希望有人能解决我在哪里,我错了,并完成缺少的部分使整个画面是清楚的。我的描述开始与用户空间的软件和钻下来的硬件组件。被误解的部分在的斜体粗体的格式。
I'll write down the high level of the knowledge I already have and I hope that someone could fix me where I'm wrong and complete the missing parts so the whole picture would be clear. My description starts with the userspace software and drills down to the hardware components. The misunderstood parts are in italic-bold format.
- 的用户模式应用程序请求读取某些设备/写,即让I / O操作。
- 操作系统接收请求,并把它交给合适的驱动程序(每个操作系统都有自己的机制来做到这一点,我并不需要进一步的深入在这里,但如果你想在这里分享见解,欢迎您)
- 这是负责处理I / O请求司机,必须知道哪些设备映射到地址(因为我感兴趣的是基于ARM板,据我所知,只有内存映射I / O和任何端口I / O)。在大多数情况下(如果我们考虑类似智能手机的电路板)有从它是从引导程序在启动时给定(现代方法)设备树解析设备地址的Linux内核,或者linux是precompiled于特定型号的家庭和板内它的设备地址(在其源$ C $ C硬codeD)(在旧的和过时的?方法)。在某些情况下(发生在智能手机很多)驱动程序的一部分是precompiled和刚刚打包到内核,即它们的来源是封闭的,因此,该地址对应的设备是未知的。 是不是正确的?的
- 鉴于驾驶员知道该设备的相关寄存器的地址它希望与之通信时,它分配一个缓冲区(通常在内核空间),以该装置将写入其数据(与DMA的帮助下) 。驾驶员需要通知设备有关缓冲器的位置,但该设备与工作(操作存储器)的地址是从该驱动器(CPU)一起工作的地址不同,因此,驾驶员需要通知设备有关缓冲区的总线地址它刚刚分配的。 如何司机告知该地址的设备?如何流行的是使用IOMMU?使用IOMMU时有一个管理解决或每设备吗?的 一个硬件组件
- 之后,驾驶员命令设备来完成其工作(通过操纵其寄存器)和直接在设备传送输出数据在存储器中分配的缓冲区。 在这里,我很困惑一点与设备驱动程序的关系:公交车:总线控制器:实际的设备。就拿哪知道在I2C协议进行通信的一些假想的装置;在SoC指定一个I2C总线接口 - 这是什么实际?确实I2C总线具有某种总线控制器?只有在CPU与I2C总线接口或直接与设备进行通信? (即,I2C总线接口是无缝的)。我想有人与设备驱动程序的一些经验,可以很容易地回答这个问题。的
- 设备填充一个DMA通道。由于该装置不直接连接到所述存储器,而是通过某些总线向DMA控制器(其主站总线),它与DMA交互以在存储器中的所需要的数据传送到所分配的缓冲器相连。当电路板供应商使用ARM IP核和总线规范那么这个步骤涉及了从AMBA规范(即AHB /多AHB / AXI)和设备,并在其顶部的DMAC之间的一些协议的总线事务。 的我想知道更多关于这一步,实际上发生了什么?有多种规格,由ARM DMA控制器,其中一个是流行?这是过时了吗?的
- 当装置完成的,它发送一个中断,它通过中断控制器行进到OS和OS的中断处理程序直接到现在知道DMA传输完成相应的驱动。
- The user-mode application requests to read/write from some device, i.e. makes I/O operation.
- The operating system receives the request and hand it to the appropriate driver (every OS has its own mechanism to do this, I don't need a further drill down here but if you want to share insights here you are welcome)
- The driver which is on charge to handle the I/O request, has to know the address to which the device is mapped to (since I'm interested in ARM based boards, afaik there is only memory-mapped I/O and no port I/O). In most of the cases (if we consider smartphone-like boards) there is a linux kernel that parses the devices addresses from the device-tree which is given from the bootloader at the boot time (the modern approach), or the linux is precompiled for the specific model family and board with the device addresses within it (hardcoded in its source code) (in older and obsolete? approach). In some cases (happens a lot in smartphones) part of the drivers are precompiled and are just packaged into the kernel, i.e. their source is closed, thus, the addresses correspond to the devices are unknown. Is it correct?
- Given that the driver knows the address of the relevant registers of the device it want to communicate with, it allocate a buffer (usually in the kernel space) to which the device would write its data (with the help of the DMA). The driver needs to inform the device about the location of that buffer, but the addresses that the devices work with (to manipulate memory) are different from the addresses that the drivers (cpu) work with, hence, the driver needs to inform the device about the 'bus address' of the buffer it has just allocated. How does the driver inform the device about that address? How popular is to use an IOMMU? when using IOMMU is there one hardware component that manages addressing or one per device?
- Then the driver commands the device to do its job (by manipulating its registers) and the device transfers output data directly to the allocated buffer in the memory. Here I'm confused a bit with the relation of device-driver:bus:bus-controller:actual-device. Take for example some imaginary device which knows to communicate in the I2C protocol; the SoC specify an I2C bus interface - what is this actually? does the I2C bus has some kind of bus controller? Does the cpu communicate with the I2C bus interface or directly with the device? (i.e. the I2C bus interface is seamless). I guess that someone with some experience with device drivers could answer this easily..
- The device populates a DMA channel. Since the device is not connected directly to the memory but rather is connected through some bus to the DMA controller (which masters the bus), it interacts with the DMA to transfer the required data to the allocated buffer in the memory. When the board vendor uses ARM IP cores and bus specifications then this step involves transactions over a bus from the AMBA spec (i.e. AHB/multi-AHB/AXI), and some protocol between the device and a DMAC on top of it. I would like to know more about this step, what actually happens? There are many specifications for DMA controller by ARM, which one is the popular? which is obsolete?
- When the device is done, it sends an interrupt, which travel to the OS through the interrupt controller, and the OS's interrupt handler direct it to the appropriate driver which now knows that the DMA transfer is completed.
推荐答案
您已经略有混为一谈两件事情在这里 - 有一些设备(如UART接口,MMC控制器,音频控制器,通常较低带宽的设备),它依赖于外部的DMA控制器(在Linux的术语DMA引擎),但许多装置是简单地在自己的权利总线主控器,并直接执行其自己的DMA(例如GPU的,USB主机控制器,当然所述的DMA控制器本身)。前者涉及一堆额外的CPU编程DMA控制器的复杂性,所以我会忽略它,只是考虑简单的总线主控DMA。
You've slightly conflated two things here - there are some devices (e.g. UARTs, MMC controllers, audio controllers, typically lower-bandwidth devices) which rely on an external DMA controller ("DMA engine" in Linux terminology), but many devices are simply bus masters in their own right and perform their own DMA directly (e.g. GPUs, USB host controllers, and of course the DMA controllers themselves). The former involves a bunch of extra complexity with the CPU programming the DMA controller, so I'm going to ignore it and just consider straightforward bus-master DMA.
在一个典型的ARM的SoC中,CPU簇和其它主外围设备,和存储器控制器和其它从属的外围设备,都具有各种AMBA互连连接在一起,形成一个单一的总线(一般都被映射到平台总线在Linux)的,在其上的主人根据互连的地址映射地址从站。您可以放心地假设,设备驱动程序知道(无论是设备树或硬codeD),设备会出现在CPU的物理地址映射,否则他们会是无用的。
In a typical ARM SoC, the CPU clusters and other master peripherals, and the memory controller and other slave peripherals, are all connected together with various AMBA interconnects, forming a single "bus" (generally all mapped to the "platform bus" in Linux), over which masters address slaves according to the address maps of the interconnect. You can safely assume that the device drivers know (whether by device tree or hardcoded) where devices appear in the CPU's physical address map, because otherwise they'd be useless.
在简单的系统,有一个单一的地址映射,如此使用由CPU处理的RAM的物理地址和外设可以与其他主作为DMA地址被自由分享。其他系统更加复杂 - 更著名的一个是的树莓派的BCM2835 ,其中CPU和GPU有不同的地址映射;例如互连是硬连接,使得那里的GPU在总线地址0x7e000000看到外设时,CPU看到他们在物理地址0x20000000。此外,与40位物理地址LPAE系统,互连可能的需求的以提供不同的主人不同的看法 - 例如在TI梯形2的SoC中,所有的DRAM的距离的角度的CPU'点的32位边界之上,因此,在32位DMA主将是无用的,如果互连没有向他们显示不同的地址的地图。对于Linux,请查看 DMA-范围
设备树财产以CPU怎么这么→总线转换描述。该CPU告诉主访问特定RAM或外围地址时必须把这些翻译成账户; Linux驱动程序应该使用提供适当的翻译DMA地址。
On simpler systems, there is a single address map, so the physical addresses used by the CPU to address RAM and peripherals can be freely shared with other masters as DMA addresses. Other systems are more complex - one of the more well-known is the Raspberry Pi's BCM2835, in which the CPU and GPU have different address maps; e.g. the interconnect is hard-wired such that where the GPU sees peripherals at "bus address" 0x7e000000, the CPU sees them at "physical address" 0x20000000. Furthermore, in LPAE systems with 40-bit physical addresses, the interconnect might need to provide different views to different masters - e.g. in the TI Keystone 2 SoCs, all the DRAM is above the 32-bit boundary from the CPUs' point of view, so the 32-bit DMA masters would be useless if the interconnect didn't show them a different addresses map. For Linux, check out the dma-ranges
device tree property for how such CPU→bus translations are described. The CPU must take these translations into account when telling a master to access a particular RAM or peripheral address; Linux drivers should be using the DMA mapping API which provides appropriately-translated DMA addresses.
IOMMUs提供比固定互连偏移更大的灵活性 - 典型地,地址可以动态重新映射,和用于系统的完整性的主人可以访问比那些在任何给定时间映射为DMA以外的任何地址pvented $ P $。此外,在一个LPAE或AArch64系统具有比4GB的内存中,IOMMU成为必要的,如果在32位外设需要能够访问缓冲器RAM中的任何地方。你会看到IOMMUs上很多关于集成传统32位设备的目的,目前的64位系统,但他们也对设备虚拟化的目的,越来越受欢迎。
IOMMUs provide more flexibility than fixed interconnect offsets - typically, addresses can be remapped dynamically, and for system integrity masters can be prevented from accessing any addresses other than those mapped for DMA at any given time. Furthermore, in an LPAE or AArch64 system with more than 4GB of RAM, an IOMMU becomes necessary if a 32-bit peripheral needs to be able to access buffers anywhere in RAM. You'll see IOMMUs on a lot of the current 64-bit systems for the purpose of integrating legacy 32-bit devices, but they are also increasingly popular for the purpose of device virtualisation.
IOMMU拓扑取决于系统和IOMMUs使用 - 的工作系统中有7单独的ARM MMU-四百分之四百零一设备在单个总线主控外设的前面;另一方面对ARM MMU-500可被实现为与每个主一个单独的TLB单个系统范围的装置;其他厂商有自己的设计。无论哪种方式,从Linux的角度来看,最设备驱动程序应使用上述DMA映射API来为DMA,如果设备连接到其中一个也将自动设置适当的IOMMU映射分配和prepare物理缓冲器。这样一来,个别设备驱动程序不需要关心的IOMMU与否的presence。其他驱动程序(通常为GPU驱动程序),但是,依赖于一个IOMMU,想完全控制,所以通过的。从本质上讲,IOMMU的页表设置为映射物理地址在一定范围的I范围/ O虚拟地址,这些IOVAs是考虑到设备的DMA(即总线)地址和IOMMU平移IOVAs回物理地址的设备访问它们。一旦DMA操作完成时,司机通常将删除IOMMU映射,既以释放IOVA空间,从而使设备不再有权访问的RAM。
IOMMU topology depends on the system and the IOMMUs in use - the system I'm currently working with has 7 separate ARM MMU-401/400 devices in front of individual bus-master peripherals; the ARM MMU-500 on the other hand can be implemented as a single system-wide device with a separate TLB for each master; other vendors have their own designs. Either way, from a Linux perspective, most device drivers should be using the aforementioned DMA mapping API to allocate and prepare physical buffers for DMA, which will also automatically set up the appropriate IOMMU mappings if the device is attached to one. That way, individual device drivers need not care about the presence of an IOMMU or not. Other drivers (typically GPU drivers) however, depend on an IOMMU and want complete control, so manage the mappings directly via the IOMMU API. Essentially, the IOMMU's page tables are set up to map certain ranges of physical addresses to ranges of I/O virtual addresses, those IOVAs are given to the device as DMA (i.e. bus) addresses, and the IOMMU translates the IOVAs back to physical addresses as the device accesses them. Once the DMA operation is finished, the driver typically removes the IOMMU mapping, both to free up IOVA space and so that the device no longer has access to RAM.
请注意,在某些情况下,DMA传送是环状和从不饰面。用类似的显示控制器中,CPU可能只是映射为DMA缓冲器,通过该地址到控制器,并触发它来启动,然后将连续执行DMA读扫描出任何CPU写该缓冲区,直到它告知停止。
Note that in some cases the DMA transfer is cyclic and never "finishes". With something like a display controller, the CPU might just map a buffer for DMA, pass that address to the controller and trigger it to start, and it will then continuously perform DMA reads to scan out whatever the CPU writes to that buffer until it is told to stop.
其他外围总线超越SoC的互连,如I C / SPI / USB /等。按照您怀疑 - 有一个总线控制器(这本身是AMBA总线上的设备,因此上述任何可能适用于它)有自己的设备驱动程序。在粗概括,CPU不直接与外部总线上的设备进行通信 - 在一个AMBA设备的驱动程序说写入用X注册Y,这恰好通过执行存储到内存映射地址CPU ;凡I I2C器件司机说写入用X注册Y,操作系统通常有一定的,总线控制器驱动程序实现,从而使CPU的程序用命令称控制器写成X在设备Z寄存器Y,总线控制器硬件将熄灭,这样做,那么通过中断或其他方式通知外围设备的响应的操作系统。
Other peripheral buses beyond the SoC interconnect, like IC/SPI/USB/etc. work as you suspect - there is a bus controller (which is itself a device on the AMBA bus, so any of the above might apply to it) with its own device driver. In a crude generalisation, the CPU doesn't communicate directly with devices on the external bus - where a driver for an AMBA device says "write X to register Y", that just happens by the CPU performing a store to a memory-mapped address; where an IC device driver says "write X to register Y", the OS usually has some bus abstraction layer which the bus controller driver implements, whereby the CPU programs the controller with a command saying "write X to register Y on device Z", the bus controller hardware will go off and do that, then notify the OS of the peripheral device's response via an interrupt or some other means.
*技术上,IOMMU本身或多或少只是另一个设备,的可能的已在互连为$ P $不同的地址映射pviously描述,但我会怀疑人真的建立这样一个系统的健全。
这篇关于相干了解关于DMA和公交车的软硬件交互的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!