FPGA—YOLO – 第 15 页

使用 python 创建软件仿真测试平台：

二进制在线乘法：https://www.osgeo.cn/app/s2841

验证是一件痛苦的事…….

测试设计的 MAC（乘法和累加）单元。

自动化测试平台的基本思想是拥有一个经过验证的黄金模型，并且始终为给定的一组输入输出正确的值。虽然可以通过多种方式构建黄金模型，但我最喜欢的一直是 python，因为它易于使用，并且为数学计算领域的任何事物提供了强大的库集。Python 让您的生活更轻松，尤其是当您的任务涉及与算法相关的 DSP 时，这些算法很容易使用软件。

乘法和累加单元的python模型：

下面代码将生成随机浮点数据，我们可以将其传递给我们的 Verilog 模块，并将其结果与此 python 代码本身的结果进行比较。 truncate 将number（1.234545…）截取指定小数位return 1.23

import numpy as np 
import subprocess
import os
import math

#Function to truncate (not round) any floating point number to two decimal places
def truncate(number, digits) -> float:
    stepper = 10.0 ** digits
    return math.trunc(stepper * number) / stepper

NoT = 10 # Number of tests to be run

for i in range(NoT):
    #choosing the values of a,b,c randomly
    a = np.random.uniform(-1,1,1)[0]
    b = np.random.uniform(-1,1,1)[0]
    c = np.random.uniform(-1,1,1)[0]
	#performs the MAC (Multiply and Add) operation
    p_golden = a*b + c
    #truncating to two decimal places because the value generated by our hardware will never be the
    #exact same owing to the precision loss due to fixed point representation. 
	p_golden_trunc = truncate(p_golden,2)

定点乘法器和定点加法器的构建如下所示：

//file: qmult.v
// (Q,N) = (12,16) => 1 sign-bit + 3 integer-bits + 12 fractional-bits = 16 total-bits
//                    |S|III|FFFFFFFFFFFF|
// The same thing in A(I,F) format would be A(3,12)
module qmult #(
	//Parameterized values
	parameter N = 16,
	parameter Q = 12
	)
	(
	 input                  clk,
	 input                  rst,
	 input			[N-1:0]	a,
	 input			[N-1:0]	b,
	 output         [N-1:0] q_result,    //output quantized to same number of bits as the input
     output			overflow             //signal to indicate output greater than the range of our format
	 );
	 
	 //	The underlying assumption, here, is that both fixed-point values are of the same length (N,Q)
	 //	Because of this, the results will be of length N+N = 2N bits
	 //	This also simplifies the hand-back of results, as the binimal point 
	 //	will always be in the same location
	
	wire [2*N-1:0]	f_result;		//	Multiplication by 2 values of N bits requires a 
									//	register that is N+N = 2N deep
	wire [N-1:0]   multiplicand;
	wire [N-1:0]	multiplier;
	wire [N-1:0]    a_2cmp, b_2cmp;
	wire [N-2:0]    quantized_result,quantized_result_2cmp;
	
	assign a_2cmp = {~a[N-1],~a[N-2:0]+ 1'b1};  //2's complement of a {(N-1){1'b1}} - 
	assign b_2cmp = {~b[N-1],~b[N-2:0]+ 1'b1};  //2's complement of b  {(N-1){1'b1}} - 
	
    assign multiplicand = (a[N-1]) ? a_2cmp : a;              
    assign multiplier   = (b[N-1]) ? b_2cmp : b;
    //We remove the sign bit for multiplication
    assign f_result = multiplicand[N-2:0] * multiplier[N-2:0];  
    //Sign bit of output would be XOR or input sign bits                                             
    assign q_result[N-1] = a[N-1]^b[N-1];             
    //Quantization of output to required number of bits                                               
    assign quantized_result = f_result[N-2+Q:Q];              
    //2's complement of quantized_result                                                             
    assign quantized_result_2cmp = ~quantized_result[N-2:0] + 1'b1;  
     //If the result is negative, we return a 2's complement representation of the output value
    assign q_result[N-2:0] = (a[N-1]^b[N-1]) ? quantized_result_2cmp : quantized_result; 
    																					 
    assign overflow = (f_result[2*N-2:N-1+Q] > 0) ? 1'b1 : 1'b0;

endmodule

//file: qadd.v
module qadd #(
	parameter N = 16,
	parameter Q = 12
	)
	(
    input [N-1:0] a,
    input [N-1:0] b,
    output [N-1:0] c
    );

// (Q,N) = (12,16) => 1 sign-bit + 3 integer-bits + 12 fractional-bits = 16 total-bits
//                    |S|III|FFFFFFFFFFFF|
// The same thing in A(I,F) format would be A(3,12)

//Since we supply every negative number in it's 2's complement form by default, all we 
//need to do is add these two numbers together (note that to subtract a binary number 
//is the same as to add its two's complement)
assign c = a + b;

//If for whatever reason your system (the software/testbench feeding this hadrware with 
//inputs) does not supply negative numbers in their two's complement form,(some people 
//prefer to keep the magnitude as it is and make the sign bit '1' to represent negatives)
// then you should take a look at the fixed point arithmetic modules at opencores linked 
//above this code.

endmodule

仿真错误：

a = 0000000111100001 = 0.11748478081607838
b = 1111111111111110 = -0.0005302183628016488  //对于接近于0的数
c = 1111101101111001 = -0.283025178155329
p_golden = -0.28 //软件仿真
p_practical = 7.71 //硬件实现
abs_diff = 7.43

通过手动传递这些输入来分析模块的内部信号后，我能够找到错误，它是由qmult模块中的以下行引起的：

assign quantized_result = f_result[N-2+Q:Q];

基本上，我们将乘法器的输出截断为固定位数，这样每次乘法后数据路径就不会变得异常大。但是需要注意的是，这种截断会导致非常小的数字的丢失，即当截断范围内的所有位都为零并且非零位位于该范围之后，因此不会在量化结果中捕获.

增加用于表示数字小数部分的位数。但这会减少可用于表示整数部分的位数，使最大数字小于当前值 2。
增加用于表示定点数的总位数（使用 32 而不是 16）。例如，当我将 N 更改为 32 并将 Q 更改为 26 时，会解决该问题

乘法：

N=16，Q=12(小数位)：精度0.01

软件实现：真正的（16bit）x*y =00000000001001100011111111110100

硬件应该的结果：舍弃低Q=12位，保留16bit，从低13位开始。’0000001001100011′

N=16，Q=8(小数位)，7bit整数绝大部分数在0-8之间。

结果14.92 vs 14.94 误差在0.01

N=16，Q=9(小数位)，6bit整数绝大部分数在0-8之间。

加法：c = a+b

N=16，Q=9(小数位)，6bit整数绝大部分数在卷积输出值在0-8之间，也就是说九个数相乘求和一般不会超出8.

https://www.99cankao.com/digital-computation/binaryarith.php

软件仿真与log输出：

python辅助工具：

目前完成：

1、校验vivado中输出file文件和 python测试的结果是否正确、误差大小

2、完成单个卷积、多通道卷积（补0）的输入输出python数据log信息

3、完成激活、池化的log

页码： 123456789101112131415161718192021222324

使用 python 创建软件仿真测试平台：

python辅助工具：

发表评论 取消回复

发表评论取消回复