问题描述
我对 Verilog 很陌生,但想正确理解它.目前我正在 FPGA 上制作 TxRx.我注意到我的代码消耗了大量的逻辑,尽管它不应该是那样的.所以我没有正确编写我的代码.我知道哪里出错了,显然我的 for 循环正在对表达式进行并行化(特别是因为这个 for 循环嵌套在另一个 for 循环中).编写代码以避免这种情况的正确方法是什么.该代码正在运行,但效率不高.随意评论,建议.我还在学习,所以每个建议都可能是好的.先感谢您.
I am pretty new to Verilog, but would like to understand it properly. Currently I am making TxRx on FPGA. I noticed that my code is consuming huge amount of logic, although it should not be like that. So I did not wrote my code properly. I know where is mistake, obviously my for loop is making parallelization of expressions (especially because this for loop is nested into another for loop). What would be right way to write code to avoid this. The code is working but it is not efficient. Feel free to comment, suggest. I am still learning so every advice will probably be good. Thank you in advance.
推荐答案
内循环的每一行都对数据进行三个乘法运算和一个加法运算以及一些其他较小的运算(例如 %16).合成器展开循环并尝试合成逻辑以在单个时钟周期内完成所有这些操作,这计为 6*256 次乘法.面积大,资源共享空间很小.
Each line of your inner loop has three multiplication on data and an addition operation as well as some other smaller operations (e.g. %16). The synthesizers unroll loops and tries to synthesize the logic to do all these operations in a single clock cycle, which counts to 6*256 multiplications. This has high area and leaves very little room for resource sharing.
您可以选择用一些性能来换取面积.我会尝试以下操作:
You have a choice to trade off some performance for area. I would try the following:
在单个周期内实现循环的每次迭代:计算该迭代,保存结果,然后将其用于下一个时钟周期.这将减少面积 256 倍,但需要 256 个时钟周期才能完成,即您可以每 256 个时钟周期接受新输入.您可以在一个时钟周期内尝试不同次数的迭代.例如,您可以在单个循环中计算外循环的每次迭代.这会将您的面积减少 16 倍,每次计算需要 16 个时钟周期.
Implement each iteration of the loop in a single cycle: calculate that iteration, save the results, then use it for the next next clock cycle. This will reduce the area 256 times, but it would take 256 clock cycles to finish, i.e., you can accept new input each 256 clock cycles. You can experiment with different number of iterations in a clock cycle. For example, you can just calculate each iteration of your outer loop in a single cycle. This will reduce your area by 16 times and each calculation takes 16 clock cycles.
如果性能非常重要,您可以尝试流水线化您的电路.这会使您的代码更复杂一些,但会显着提高您的吞吐量.例如,您可以有 256 个阶段 + 流水线寄存器的面积开销,但您的时钟周期可以短得多 256 倍.同样,您可以尝试使用各种管道阶段,然后选择最适合您需求的阶段.
If performance is of high importance, you can try pipelining your circuit. This makes your code a bit more complex, but will significantly increase your throughput. For example, you can have 256 stages + the area overhead of pipeline registers, but your clock period can be 256 times much shorter. Again, you can experiment with having various pipeline stages and chose the one that fits your needs best.
这是一个实现迭代算法的示例在单个时钟周期或多个时钟周期内(参见 simple_mult 模块).
Here is an example of implementing an iterative algorithm either in a single clock cycle or in multiple clock cycles (see simple_mult module).
这篇关于Verilog for 循环 - 综合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!