使用 python 创建软件仿真测试平台:
二进制在线乘法:https://www.osgeo.cn/app/s2841
验证是一件痛苦的事…….
测试设计的 MAC(乘法和累加)单元。
自动化测试平台的基本思想是拥有一个经过验证的黄金模型,并且始终为给定的一组输入输出正确的值。虽然可以通过多种方式构建黄金模型,但我最喜欢的一直是 python,因为它易于使用,并且为数学计算领域的任何事物提供了强大的库集。Python 让您的生活更轻松,尤其是当您的任务涉及与算法相关的 DSP 时,这些算法很容易使用软件。
乘法和累加单元的python模型:
下面代码将生成随机浮点数据,我们可以将其传递给我们的 Verilog 模块,并将其结果与此 python 代码本身的结果进行比较。 truncate 将number(1.234545…)截取指定小数位return 1.23
import numpy as np
import subprocess
import os
import math
#Function to truncate (not round) any floating point number to two decimal places
def truncate(number, digits) -> float:
stepper = 10.0 ** digits
return math.trunc(stepper * number) / stepper
NoT = 10 # Number of tests to be run
for i in range(NoT):
#choosing the values of a,b,c randomly
a = np.random.uniform(-1,1,1)[0]
b = np.random.uniform(-1,1,1)[0]
c = np.random.uniform(-1,1,1)[0]
#performs the MAC (Multiply and Add) operation
p_golden = a*b + c
#truncating to two decimal places because the value generated by our hardware will never be the
#exact same owing to the precision loss due to fixed point representation.
p_golden_trunc = truncate(p_golden,2)
定点乘法器和定点加法器的构建如下所示:
//file: qmult.v
// (Q,N) = (12,16) => 1 sign-bit + 3 integer-bits + 12 fractional-bits = 16 total-bits
// |S|III|FFFFFFFFFFFF|
// The same thing in A(I,F) format would be A(3,12)
module qmult #(
//Parameterized values
parameter N = 16,
parameter Q = 12
)
(
input clk,
input rst,
input [N-1:0] a,
input [N-1:0] b,
output [N-1:0] q_result, //output quantized to same number of bits as the input
output overflow //signal to indicate output greater than the range of our format
);
// The underlying assumption, here, is that both fixed-point values are of the same length (N,Q)
// Because of this, the results will be of length N+N = 2N bits
// This also simplifies the hand-back of results, as the binimal point
// will always be in the same location
wire [2*N-1:0] f_result; // Multiplication by 2 values of N bits requires a
// register that is N+N = 2N deep
wire [N-1:0] multiplicand;
wire [N-1:0] multiplier;
wire [N-1:0] a_2cmp, b_2cmp;
wire [N-2:0] quantized_result,quantized_result_2cmp;
assign a_2cmp = {~a[N-1],~a[N-2:0]+ 1'b1}; //2's complement of a {(N-1){1'b1}} -
assign b_2cmp = {~b[N-1],~b[N-2:0]+ 1'b1}; //2's complement of b {(N-1){1'b1}} -
assign multiplicand = (a[N-1]) ? a_2cmp : a;
assign multiplier = (b[N-1]) ? b_2cmp : b;
//We remove the sign bit for multiplication
assign f_result = multiplicand[N-2:0] * multiplier[N-2:0];
//Sign bit of output would be XOR or input sign bits
assign q_result[N-1] = a[N-1]^b[N-1];
//Quantization of output to required number of bits
assign quantized_result = f_result[N-2+Q:Q];
//2's complement of quantized_result
assign quantized_result_2cmp = ~quantized_result[N-2:0] + 1'b1;
//If the result is negative, we return a 2's complement representation of the output value
assign q_result[N-2:0] = (a[N-1]^b[N-1]) ? quantized_result_2cmp : quantized_result;
assign overflow = (f_result[2*N-2:N-1+Q] > 0) ? 1'b1 : 1'b0;
endmodule
//file: qadd.v
module qadd #(
parameter N = 16,
parameter Q = 12
)
(
input [N-1:0] a,
input [N-1:0] b,
output [N-1:0] c
);
// (Q,N) = (12,16) => 1 sign-bit + 3 integer-bits + 12 fractional-bits = 16 total-bits
// |S|III|FFFFFFFFFFFF|
// The same thing in A(I,F) format would be A(3,12)
//Since we supply every negative number in it's 2's complement form by default, all we
//need to do is add these two numbers together (note that to subtract a binary number
//is the same as to add its two's complement)
assign c = a + b;
//If for whatever reason your system (the software/testbench feeding this hadrware with
//inputs) does not supply negative numbers in their two's complement form,(some people
//prefer to keep the magnitude as it is and make the sign bit '1' to represent negatives)
// then you should take a look at the fixed point arithmetic modules at opencores linked
//above this code.
endmodule
仿真错误:
a = 0000000111100001 = 0.11748478081607838
b = 1111111111111110 = -0.0005302183628016488 //对于接近于0的数
c = 1111101101111001 = -0.283025178155329
p_golden = -0.28 //软件仿真
p_practical = 7.71 //硬件实现
abs_diff = 7.43
通过手动传递这些输入来分析模块的内部信号后,我能够找到错误,它是由qmult
模块中的以下行引起的:
assign quantized_result = f_result[N-2+Q:Q];
基本上,我们将乘法器的输出截断为固定位数,这样每次乘法后数据路径就不会变得异常大。但是需要注意的是,这种截断会导致非常小的数字的丢失,即当截断范围内的所有位都为零并且非零位位于该范围之后,因此不会在量化结果中捕获.
- 增加用于表示数字小数部分的位数。但这会减少可用于表示整数部分的位数,使最大数字小于当前值 2。
- 增加用于表示定点数的总位数(使用 32 而不是 16)。例如,当我将 N 更改为 32 并将 Q 更改为 26 时,会解决该问题
乘法:
N=16,Q=12(小数位):精度0.01
软件实现:真正的(16bit)x*y =00000000001001100011111111110100
硬件应该的结果:舍弃低Q=12位,保留16bit,从低13位开始。’0000001001100011′
N=16,Q=8(小数位),7bit整数 绝大部分数在0-8之间。
结果14.92 vs 14.94 误差在0.01
N=16,Q=9(小数位),6bit整数 绝大部分数在0-8之间。
加法:c = a+b
N=16,Q=9(小数位),6bit整数 绝大部分数在卷积输出值在0-8之间,也就是说 九个数相乘求和一般不会超出8.
https://www.99cankao.com/digital-computation/binaryarith.php
软件仿真与log输出:
python辅助工具:
目前完成:
1、校验vivado中输出file文件和 python测试的结果是否正确、误差大小
2、完成单个卷积、多通道卷积(补0)的输入输出python数据log信息
3、完成激活、池化的log