torch.roll 函数

The Question about the mask of window attention：

https://github.com/microsoft/Swin-Transformer/issues/38

torch.roll(input, shifts, dims=None) → Tensor

Roll the tensor input along the given dimension(s). Elements that are shifted beyond the last position are re-introduced at the first position. If dims is None, the tensor will be flattened before rolling and then restored to the original shape.Parameters

input (Tensor) – the input tensor.
shifts (int or tuple of python:ints) – The number of places by which the elements of the tensor are shifted. If shifts is a tuple, dims must be a tuple of the same size, and each dimension will be rolled by the corresponding value
dims (int or tuple of python:ints) – Axis along which to roll

沿给定维数滚动张量，移动到最后一个位置以外的元素将在第一个位置重新引入。如果没有指定尺寸，张量将在轧制前被压平，然后恢复到原始形状。

简单理解：shifts的值为正数相当于向下挤牙膏，挤出的牙膏又从顶部塞回牙膏里面；shifts的值为负数相当于向上挤牙膏，挤出的牙膏又从底部塞回牙膏里面

input (Tensor) —— 输入张量。
shifts (python:int 或 tuple of python:int) —— 张量元素移位的位数。如果该参数是一个元组（例如shifts=(x,y)），dims必须是一个相同大小的元组（例如dims=(a,b)），相当于在第a维度移x位，在b维度移y位
dims (int 或 tuple of python:int) 确定的维度。

Example:

>>> x = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8]).view(4, 2)
>>> x
tensor([[1, 2],
        [3, 4],
        [5, 6],
        [7, 8]])
>>> torch.roll(x, 1)
tensor([[8, 1],
        [2, 3],
        [4, 5],
        [6, 7]])

'''第0维度向下移1位，多出的[7,8]补充到顶部'''
>>> torch.roll(x, 1, 0)
tensor([[7, 8],
        [1, 2],
        [3, 4],
        [5, 6]])

'''第0维度向上移1位，多出的[1,2]补充到底部'''
>>> torch.roll(x, -1, 0)
tensor([[3, 4],
        [5, 6],
        [7, 8],
        [1, 2]])

'''tuple元祖,维度一一对应：
第0维度向下移2位，多出的[5,6][7,8]补充到顶部，
第1维向右移1位，多出的[6,8,2,4]补充到最左边'''
>>> torch.roll(x, shifts=(2, 1), dims=(0, 1))
tensor([[6, 5],
        [8, 7],
        [2, 1],
        [4, 3]])

空洞卷积

Multi-Scale Context Aggregation by Dilated Convolutions

一个简单的例子，[动态图来源：vdumoulin/conv_arithmetic]：

Standard Convolution with a 3 x 3 kernel (and padding)

动图封面 — Dilated Convolution with a 3 x 3 kernel and dilation rate 2

对于 dilated convolution，我们已经可以发现他的优点，即内部数据结构的保留和避免使用 down-sampling 这样的特性。但是完全基于 dilated convolution 的结构如何设计则是一个新的问题。

潜在问题 1：The Gridding Effect

假设我们仅仅多次叠加 dilation rate 2 的 3 x 3 kernel 的话，则会出现这个问题：

我们发现我们的 kernel 并不连续，也就是并不是所有的 pixel 都用来计算了，因此这里将信息看做 checker-board 的方式会损失信息的连续性。这对 pixel-level dense prediction 的任务来说是致命的。

潜在问题 2：Long-ranged information might be not relevant.

我们从 dilated convolution 的设计背景来看就能推测出这样的设计是用来获取 long-ranged information。然而光采用大 dilation rate 的信息或许只对一些大物体分割有效果，而对小物体来说可能则有弊无利了。如何同时处理不同大小的物体的关系，则是设计好 dilated convolution 网络的关键。

通向标准化设计：Hybrid Dilated Convolution (HDC)

对于上个 section 里提到的几个问题，图森组的文章对其提出了较好的解决的方法。他们设计了一个称之为 HDC 的设计结构。

第一个特性是，叠加卷积的 dilation rate 不能有大于1的公约数。比如 [2, 4, 6] 则不是一个好的三层卷积，依然会出现 gridding effect。

第二个特性是，我们将 dilation rate 设计成锯齿状结构，例如 [1, 2, 5, 1, 2, 5] 循环结构。

第三个特性是，我们需要满足一下这个式子： Mi=max[Mi+1−2ri,Mi+1−2(Mi+1−ri),ri]

其中 ri 是 i 层的 dilation rate 而 Mi 是指在 i 层的最大dilation rate，那么假设总共有n层的话，默认 Mn=rn 。假设我们应用于 kernel 为 k x k 的话，我们的目标则是 M2≤k ，这样我们至少可以用 dilation rate 1 即 standard convolution 的方式来覆盖掉所有洞。

一个简单的例子: dilation rate [1, 2, 5] with 3 x 3 kernel (可行的方案)

而这样的锯齿状本身的性质就比较好的来同时满足小物体大物体的分割要求(小 dilation rate 来关心近距离信息，大 dilation rate 来关心远距离信息)。

这样我们的卷积依然是连续的也就依然能满足VGG组观察的结论，大卷积是由小卷积的 regularisation 的叠加。

代码：（绘制空洞卷积）

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap


def dilated_conv_one_pixel(center: (int, int),feature_map: np.ndarray,k: int = 3,r: int = 1,v: int = 1):
    """
    膨胀卷积核中心在指定坐标center处时，统计哪些像素被利用到，
    并在利用到的像素位置处加上增量v
    Args:
        center: 膨胀卷积核中心的坐标
        feature_map: 记录每个像素使用次数的特征图
        k: 膨胀卷积核的kernel大小
        r: 膨胀卷积的dilation rate
        v: 使用次数增量
    """
    assert divmod(3, 2)[1] == 1

    # left-top: (x, y)
    left_top = (center[0] - ((k - 1) // 2) * r, center[1] - ((k - 1) // 2) * r)
    for i in range(k):
        for j in range(k):
            feature_map[left_top[1] + i * r][left_top[0] + j * r] += v


def dilated_conv_all_map(dilated_map: np.ndarray,
                         k: int = 3,
                         r: int = 1):
    """
    根据输出特征矩阵中哪些像素被使用以及使用次数，
    配合膨胀卷积k和r计算输入特征矩阵哪些像素被使用以及使用次数
    Args:
        dilated_map: 记录输出特征矩阵中每个像素被使用次数的特征图
        k: 膨胀卷积核的kernel大小
        r: 膨胀卷积的dilation rate
    """
    new_map = np.zeros_like(dilated_map)
    for i in range(dilated_map.shape[0]):
        for j in range(dilated_map.shape[1]):
            if dilated_map[i][j] > 0:
                dilated_conv_one_pixel((j, i), new_map, k=k, r=r, v=dilated_map[i][j])

    return new_map


def plot_map(matrix: np.ndarray):
    plt.figure()

    c_list = ['white', 'blue', 'red']
    new_cmp = LinearSegmentedColormap.from_list('chaos', c_list)
    plt.imshow(matrix, cmap=new_cmp)

    ax = plt.gca()
    ax.set_xticks(np.arange(-0.5, matrix.shape[1], 1), minor=True)
    ax.set_yticks(np.arange(-0.5, matrix.shape[0], 1), minor=True)

    # 显示color bar
    plt.colorbar()

    # 在图中标注数量
    thresh = 5
    for x in range(matrix.shape[1]):
        for y in range(matrix.shape[0]):
            # 注意这里的matrix[y, x]不是matrix[x, y]
            info = int(matrix[y, x])
            ax.text(x, y, info,
                    verticalalignment='center',
                    horizontalalignment='center',
                    color="white" if info > thresh else "black")
    ax.grid(which='minor', color='black', linestyle='-', linewidth=1.5)
    plt.show()
    plt.close()


def main():
    # bottom to top
    dilated_rates = [1, 2, 3]
    # init feature map
    size = 31
    m = np.zeros(shape=(size, size), dtype=np.int32)
    center = size // 2
    m[center][center] = 1
    # print(m)
    # plot_map(m)

    for index, dilated_r in enumerate(dilated_rates[::-1]):
        new_map = dilated_conv_all_map(m, r=dilated_r)
        m = new_map
    print(m)
    plot_map(m)

绘制结果：

torch grid_sample() 函数

grid_sample底层是应用双线性插值，把输入的tensor转换为指定大小。那它和interpolate有啥区别呢？
interpolate是规则采样（uniform)，但是grid_sample的转换方式，内部采点的方式并不是规则的，是一种更为灵活的方式。可以认为采样点根据 grid 矩阵来决定。

Pytorch中grid_sample函数的接口声明如下：

torch.nn.functional.grid_sample(input, grid, mode='bilinear', padding_mode='zeros', align_corners=None)

在官方文档里面关于该函数的作用是这样描述的：

Given an input and a flow-field grid, computes the output using input values and pixel locations from grid.

简单来说就是，提供一个input的Tensor以及一个对应的flow-field网格(比如光流，体素流等)，然后根据网格（grid）中每个位置提供的坐标信息(这里指input中pixel的坐标)，将input中对应位置的像素值填充到grid指定的位置，得到最终的输出。

关于input、grid以及output的尺寸如下所示：（input也可以是5D的Tensor，这里我们只考虑4D的情况）注意output的尺寸可以大于input，所以 grid_sample 可以用来上采样

input:(N,C,Hin,Win)
grid:(N,Hout,Wout,2)
output:(N,C,Hout,Wout)

这里的input和output就是输入的图片，或者是网络中的feature map。关键的处理过程在于grid，grid的最后一维的大小为2，即表示input中pixel的位置信息 (x,y) ,这里一般会将x和y的取值范围归一化到 [−1,1] 之间， (−1,−1) 表示input左上角的像素的坐标，(1,1) 表示input右下角的像素的坐标，对于超出这个范围的坐标(x,y)，函数将会根据参数padding_mode的设定进行不同的处理。

padding_mode=’zeros’:对于越界的位置在网格中采用pixel value=0进行填充。
padding_mode=’border’:对于越界的位置在网格中采用边界的pixel value进行填充。
padding_mode=’reflection’:对于越界的位置在网格中采用关于边界的对称值进行填充。

对于mode=’bilinear’参数，则定义了在input中指定位置的pixel value中进行插值的方法，为什么需要插值呢？因为前面我们说了，grid中表示的位置信息x和y的取值范围在 [−1,1] 之间，这就意味着我们要根据一个浮点型的坐标值在input中对pixel value进行采样，mode有'bilinear' | 'nearest' | 'bicubic'（双三次插值）三种模式。 nearest就是直接采用与 (x,y) 距离最近处的像素值来填充grid，而bilinear则是采用双线性插值的方法来进行填充，mode=’bicubic’仅支持四维输入，总之其与nearest的区别就是nearest只考虑最近点的pixel value，而bilinear则采用(x,y)周围的四个pixel value进行加权平均值来填充grid。

双线性插值：双线性插值是用原图像中4(2*2)个点计算新图像中1个点

双三次插值(Bicubic interpolation)：双三次插值是用原图像中16(4*4)个点计算新图像中1个点，效果比较好，但是计算代价过大。

上面讲到双线性插值会对 (x,y) 周围的四个pixel value进行加权平均，那么每个位置的权重是多少呢？可以简单参考下图中双线性插值的例子：

其双线性插值的结果为：

采用下图我们可以对双线性插值有个更为直观的认识：

从上图中可以看到双线性插值就是首先在平面 zoy 内，对 f(x0,y0) 和 f(x0,y1) 进行插值得到 z1 ，对 f(x1,y0) 和 f(x1,y1) 进行插值得到 z2 ，随后在平面 zox 内进行插值得到最终的 z 点的值就是最终所求的结果，这里的平面内插值其实就是采用我们高中学的，直线的两点式求出直线表达是，再带入自变量（x或y）的坐标得到插值的结果。联立两次直线的两点式就能得到双线性插值的结果，说到这里“双线性”也顾名思义了。

下面给出正式的推导：

已知四点的坐标如下所示:

Q11=(x1,y1,f(x1,y1)), Q21=(x2,y1,f(x2,y2)), Q12=(x1,y2,f(x3,y3)), Q22=(x2,y2,f(x4,y4))

其中有z=f(x,y):

先在x方向上进行插值有：

以上式子便是最终双线性插值的最终表达式，由于4个点的权重部分中分母是相同的可以忽略不计，现在再回去看上面的例子是不是就一目了然了。

例子：

import torch
from torch.nn import functional as F

inp = torch.ones(1, 1, 4, 4)
print(inp)
# 目的是得到一个 长宽为20的tensor
out_h = 20
out_w = 20
 # grid的生成方式等价于用mesh_grid
new_h = torch.linspace(-1, 1, out_h).view(-1, 1).repeat(1, out_w)
new_w = torch.linspace(-1, 1, out_w).repeat(out_h, 1)
grid = torch.cat((new_h.unsqueeze(2), new_w.unsqueeze(2)), dim=2)
grid = grid.unsqueeze(0) #返回一个新的张量，对输入的既定位置插入维度 1
print(grid.shape)
outp = F.grid_sample(inp, grid=grid, mode='bilinear')
print(outp.shape)  #torch.Size([1, 1, 20, 20])

在上面的例子中，我们将一个大小为4×4的tensor 转换为了一个20×20的。grid的大小指定了输出大小，每个grid的位置是一个（x,y）坐标，其值来自于：输入input的（x，y）中的四邻域插值得到的。

在这里插入图片描述
图片来自于SFnet（eccv2020）。flow field是grid， low_resolution是input， high resolution是output。

python 类型注释 # type

Type Comments[类型注解]

注释是在Python 3中引入的，并且它们没有被反向移植到Python 2.这意味着如果您正在编写需要支持旧版Python的代码，则无法使用注释。

要向函数添加类型注释，您可以执行以下操作：

import math 
def circumference(radius):    
# type: (float) -> float    
   return 2 * math.pi * radius

类型注释只是注释，所以它们可以用在任何版本的Python中。

类型注释由类型检查器直接处理，所以不存在__annotations__字典对象中:

>>> circumference.__annotations__{}

类型注释必须以type: 字面量开头，并与函数定义位于同一行或下一行。如果您想用几个参数来注释一个函数，您可以用逗号分隔每个类型:

def headline(text, width=80, fill_char="-"):  
  # type: (str, int, str) -> str    
   return f" {text.title()} ".center(width, fill_char) 

print(headline("type comments work", width=40))

您还可以使用自己的注释在单独的行上编写每个参数:

# headlines.py
 
  def headline(
      text,           # type: str
      width=80,       # type: int
      fill_char="-",  # type: str
  ):                  # type: (...) -> str
      return f" {text.title()} ".center(width, fill_char)
 
 print(headline("type comments work", width=40))

通过Python和Mypy运行示例：

$  python headlines.py
---------- Type Comments Work ---------- 
$ mypy headline.py
$

如果传入一个字符串width=”full”，再次运行mypy会出现一下错误。

$ mypy headline.py
headline.py:10: error: Argument "width" to "headline" has incompatible
                       type "str"; expected "int"

您还可以向变量添加类型注释。这与您向参数添加类型注释的方式类似:

pi = 3.142  # type: float

上面的例子可以检测出pi是float类型。

AI部署系列：你知道模型权重的小秘密吗？？？

今天简单聊聊模型权重，也就是我们俗称的weight。

深度学习中，我们一直在训练模型，通过反向传播求导更新模型的权重，最终得到一个泛化能力比较强的模型。同样，如果我们不训练，仅仅随机初始化权重，同样能够得到一个同样大小的模型。虽然两者大小一样，不过两者其中的权重信息分布相差会很大，一个脑子装满了知识、一个脑子都是水，差不多就这个意思。

所谓的AI模型部署阶段，说白了就是将训练好的权重挪到另一个地方去跑。一般来说，权重信息以及权重分布基本不会变（可能会改变精度、也可能会合并一些权重）。

不过执行模型操作（卷积、全连接、反卷积）的算子会变化，可能从Pytorch->TensorRT或者TensorFlow->TFLITE，也就是实现算子的方式变了，同一个卷积操作，在Pytorch框架中是一种实现，在TensorRT又是另一种时间，两者的基本原理是一样的，但是精度和速度不一样，TensorRT可以借助Pytorch训练好的卷积的权重，实现与Pytorch中一样的操作，不过可能更快些。

权重/Weight/CheckPoint

那么权重都有哪些呢？他们长什么样？

这还真不好描述…其实就是一堆数据。对的，我们千辛万苦不断调优训练出来的权重，就是一堆数据而已。也就是这个神奇的数据，搭配各种神经网络的算子，就可以实现各种检测、分类、识别的任务。

例如上图，我们用Netron这个工具去查看某个ONNX模型的第一个卷积权重。很显然这个卷积只有一个W权重，没有偏置b。而这个卷积的权重值的维度是[64,3,7,7]，也就是输入通道3、输出通道64、卷积核大小7x7。

再仔细看，其实这个权重的数值范围相差还是很大，最大的也就0.1的级别。但是最小的呢，肉眼看了下（其实应该统计一波），最小的竟然有1e-10级别。

一般我们训练的时候，输入权重都是0-1，当然也有0-255的情况，但不论是0-1还是0-255，只要不溢出精度上限和下限，就没啥问题。对于FP32来说，1e-10是小case，但是对于FP16来说就不一定了。

我们知道FP16的普遍精度是~5.96e−8 (6.10e−5) … 65504，具体的精度细节先不说，但是可以很明显的看到，上述的1e-10的精度，已经溢出了FP16的精度下限。如果一个模型中的权重分布大部分都处在溢出边缘的话，那么模型转换完FP16精度的模型指标可能会大大下降。

除了FP16，当然还有很多其他精度(TF32、BF16、IN8)，这里暂且不谈，不过有篇讨论各种精度的文章可以先了解下。

话说回来，我们该如何统计该层的权重信息呢？利用Pytorch中原生的代码就可以实现：

# 假设v是某一层conv的权重，我们可以简单通过以下命令查看到该权重的分布
v.max()
tensor(0.8559)
v.min()
tensor(-0.9568)
v.abs()
tensor([[0.0314, 0.0045, 0.0182,  ..., 0.0309, 0.0204, 0.0345],
        [0.0295, 0.0486, 0.0746,  ..., 0.0363, 0.0262, 0.0108],
        [0.0328, 0.0582, 0.0149,  ..., 0.0932, 0.0444, 0.0221],
        ...,
        [0.0337, 0.0518, 0.0280,  ..., 0.0174, 0.0078, 0.0010],
        [0.0022, 0.0297, 0.0167,  ..., 0.0472, 0.0006, 0.0128],
        [0.0631, 0.0144, 0.0232,  ..., 0.0072, 0.0704, 0.0479]])
v.abs().min() # 可以看到权重绝对值的最小值是1e-10级别
tensor(2.0123e-10)
v.abs().max()
tensor(0.9568)
torch.histc(v.abs()) # 这里统计权重的分布，分为100份，最小最大分别是[-0.9558,0.8559]
tensor([3.3473e+06, 3.2437e+06, 3.0395e+06, 2.7606e+06, 2.4251e+06, 2.0610e+06,
        1.6921e+06, 1.3480e+06, 1.0352e+06, 7.7072e+05, 5.5376e+05, 3.8780e+05,
        2.6351e+05, 1.7617e+05, 1.1414e+05, 7.3327e+04, 4.7053e+04, 3.0016e+04,
        1.9576e+04, 1.3106e+04, 9.1220e+03, 6.4780e+03, 4.6940e+03, 3.5140e+03,
        2.8330e+03, 2.2040e+03, 1.7220e+03, 1.4020e+03, 1.1130e+03, 1.0200e+03,
        8.2400e+02, 7.0600e+02, 5.7900e+02, 4.6400e+02, 4.1600e+02, 3.3400e+02,
        3.0700e+02, 2.4100e+02, 2.3200e+02, 1.9000e+02, 1.5600e+02, 1.1900e+02,
        1.0800e+02, 9.9000e+01, 6.9000e+01, 5.2000e+01, 4.9000e+01, 2.2000e+01,
        1.8000e+01, 2.8000e+01, 1.2000e+01, 1.3000e+01, 8.0000e+00, 3.0000e+00,
        4.0000e+00, 3.0000e+00, 1.0000e+00, 1.0000e+00, 0.0000e+00, 1.0000e+00,
        1.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        1.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 2.0000e+00,
        0.0000e+00, 2.0000e+00, 1.0000e+00, 0.0000e+00, 1.0000e+00, 0.0000e+00,
        2.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 1.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 1.0000e+00])

这样看如果觉着不是很直观，那么也可以自己画图或者通过Tensorboard来时候看。

那么看权重分布有什么用呢？

肯定是有用处的，训练和部署的时候权重分布可以作为模型是否正常，精度是否保持的一个重要信息。不过这里先不展开说了。

有权重，所以重点关照

在模型训练过程中，有很多需要通过反向传播更新的权重，常见的有：

卷积层
全连接层
批处理化层(BN层、或者各种其他LN、IN、GN)
transformer-encoder层
DCN层

这些层一般都是神经网络的核心部分，当然都是有参数的，一定会参与模型的反向传播更新，是我们在训练模型时候需要注意的重要参数。

# Pytorch中conv层的部分代码，可以看到参数的维度等信息
self._reversed_padding_repeated_twice = _reverse_repeat_tuple(self.padding, 2)
if transposed:
    self.weight = Parameter(torch.Tensor(
        in_channels, out_channels // groups, *kernel_size))
else:
    self.weight = Parameter(torch.Tensor(
        out_channels, in_channels // groups, *kernel_size))
if bias:
    self.bias = Parameter(torch.Tensor(out_channels))

也有不参与反向传播，但也会随着训练一起更新的参数。比较常见的就是BN层中的running_mean和running_std：

# 截取了Pytorch中BN层的部分代码
def __init__(
    self,
    num_features: int,
    eps: float = 1e-5,
    momentum: float = 0.1,
    affine: bool = True,
    track_running_stats: bool = True
) -> None:
    super(_NormBase, self).__init__()
    self.num_features = num_features
    self.eps = eps
    self.momentum = momentum
    self.affine = affine
    self.track_running_stats = track_running_stats
    if self.affine:
        self.weight = Parameter(torch.Tensor(num_features))
        self.bias = Parameter(torch.Tensor(num_features))
    else:
        self.register_parameter('weight', None)
        self.register_parameter('bias', None)
    if self.track_running_stats:
        # 可以看到在使用track_running_stats时，BN层会更新这三个参数
        self.register_buffer('running_mean', torch.zeros(num_features))
        self.register_buffer('running_var', torch.ones(num_features))
        self.register_buffer('num_batches_tracked', torch.tensor(0, dtype=torch.long))
    else:
        self.register_parameter('running_mean', None)
        self.register_parameter('running_var', None)
        self.register_parameter('num_batches_tracked', None)
    self.reset_parameters()

可以看到上述代码的注册区别，对于BN层中的权重和偏置使用的是register_parameter，而对于running_mean和running_var则使用register_buffer，那么这两者有什么区别呢，那就是注册为buffer的参数往往不会参与反向传播的计算，但仍然会在模型训练的时候更新，所以也需要认真对待。

关于BN层，转换模型和训练模型的时候会有暗坑，需要注意一下。

刚才描述的这些层都是有参数的，那么还有一些没有参数的层有哪些呢？当然有，我们的网络中其实有很多op，仅仅是做一些维度变换、索引取值或者上/下采样的操作，例如：

Reshape
Squeeze
Unsqueeze
Split
Transpose
Gather

等等等等，这些操作没有参数仅仅是对上一层传递过来的张量进行维度变换，用于实现一些”炫技“的操作。至于这些炫技吗，有些很有用有些就有些无聊了。

上图这一堆乱七八槽的op，如果单独拆出来都认识，但是如果都连起来（像上图这样），估计连它爸都不认识了。

开个玩笑，其实有时候在通过Pytorch转换为ONNX的时候，偶尔会发生一些转换诡异的情况。比如一个简单的reshape会四分五裂为gather+slip+concat，这种操作相当于复杂化了，不过一般来说这种情况可以使用ONNX-SIMPLIFY去优化掉，当然遇到较为复杂的就需要自行优化了。

哦对了，对于这些变形类的操作算子，其实有些是有参数的，例如下图的reshap:

像这种的op，怎么说呢，有时候会比较棘手。如果我们想要将这个ONNX模型转换为TensorRT，那么100%会遇到问题，因为TensorRT的解释器在解析ONNX的时候，不支持reshape层的shape是输入TensorRT，而是把这个shape当成attribute来处理，而ONNX的推理框架Inference则是支持的。

不过这些都是小问题，大部分情况我们可以通过改模型或者换结构解决，而且成本也不高。但是还会有一些其他复杂的问题，可能就需要我们重点研究下了。

提取权重

想要将训练好的模型从这个平台部署至另一个平台，那么首要的就是转移权重。不过实际中大部分的转换器都帮我们做好了（比如onnx-TensorRT），不用我们自己操心！

不过如果想要对模型权重的有个整体认知的话，还是建议自己亲手试一试。

Caffe2Pytorch

先简单说下Caffe和Pytorch之间的权重转换。这里推荐一个开源仓库Caffe-python，已经帮我们写好了提取Caffemodel权重和根据prototxt构建对应Pytorch模型结构的过程，不需要我们重复造轮子。

我们都知道Caffe的权重使用Caffemodel表示，而相应的结构是prototxt。如上图，左面是prototxt右面是caffemodel，而caffemodel使用的是protobuf这个数据结构表示的。我们当然也要先读出来：

model = caffe_pb2.NetParameter()
print('Loading caffemodel: ' + caffemodel)
with open(caffemodel, 'rb') as fp:
    model.ParseFromString(fp.read())

caffe_pb2就是caffemodel格式的protobuf结构，具体的可以看上方老潘提供的库，总之就是定义了一些Caffe模型的结构。

而提取到模型权重后，通过prototxt中的模型信息，挨个从caffemodel的protobuf权重中找，然后复制权重到Pytorch端，仔细看这句caffe_weight = torch.from_numpy(caffe_weight).view_as(self.models[lname].weight)，其中self.models[lname]就是已经搭建好的对应Pytorch的卷积层，这里取weight之后通过self.models[lname].weight.data.copy_(caffe_weight)将caffe的权重放到Pytorch中。

很简单吧。

if ltype in ['Convolution', 'Deconvolution']:
    print('load weights %s' % lname)
    convolution_param = layer['convolution_param']
    bias = True
    if 'bias_term' in convolution_param and convolution_param['bias_term'] == 'false':
        bias = False
    # weight_blob = lmap[lname].blobs[0]
    # print('caffe weight shape', weight_blob.num, weight_blob.channels, weight_blob.height, weight_blob.width)
    caffe_weight = np.array(lmap[lname].blobs[0].data)
    caffe_weight = torch.from_numpy(caffe_weight).view_as(self.models[lname].weight)
    # print("caffe_weight", caffe_weight.view(1,-1)[0][0:10])
    self.models[lname].weight.data.copy_(caffe_weight)
    if bias and len(lmap[lname].blobs) > 1:
        self.models[lname].bias.data.copy_(torch.from_numpy(np.array(lmap[lname].blobs[1].data)))
        print("convlution %s has bias" % lname)

Pytorch2TensorRT

先举个简单的例子，一般我们使用Pytorch模型进行训练。训练得到的权重，我们一般都会使用torch.save()保存为.pth的格式。

PTH是Pytorch使用python中内置模块pickle来保存和读取，我们使用netron看一下pth长什么样。。

可以看到只有模型中有参数权重的表示，并不包含模型结构。不过我们可以通过.py的模型结构一一加载.pth的权重到我们模型中即可。

看一下我们读取.pth后，state_dict的key。这些key也就对应着我们在构建模型时候注册每一层的权重名称和权重信息（也包括维度和类型等）。

当然这个pth也可以包含其他字符段{'epoch': 190, 'state_dict': OrderedDict([('conv1.weight', tensor([[...，比如训练到多少个epoch，学习率啥的。

对于pth，我们可以通过以下代码将其提取出来，存放为TensorRT的权重格式。

def extract_weight(args):
    # Load model
    state_dict = torch.load(args.weight)
    with open(args.save_path, "w") as f:
        f.write("{}\n".format(len(state_dict.keys())))
        for k, v in state_dict.items():
            vr = v.reshape(-1).cpu().numpy()
            f.write("{} {} ".format(k, len(vr)))
            for vv in vr:
                f.write(" ")
                f.write(struct.pack(">f", float(vv)).hex())
            f.write("\n")

需要注意，这里的TensorRT权重格式指的是在build之前的权重，TensorRT仅仅是拿来去构建整个网络，将每个解析到的层的权重传递进去，然后通过TensorRT的network去build好engine。

// Load weights from files shared with TensorRT samples.
// TensorRT weight files have a simple space delimited format:
// [type] [size] <data x size in hex>
std::map<std::string, Weights> loadWeights(const std::string file)
{
    std::cout << "Loading weights: " << file << std::endl;
    std::map<std::string, Weights> weightMap;

    // Open weights file
    std::ifstream input(file);
    assert(input.is_open() && "Unable to load weight file.");

    // Read number of weight blobs
    int32_t count;
    input >> count;
    assert(count > 0 && "Invalid weight map file.");

    while (count--)
    {
        Weights wt{DataType::kFLOAT, nullptr, 0};
        uint32_t size;

        // Read name and type of blob
        std::string name;
        input >> name >> std::dec >> size;
        wt.type = DataType::kFLOAT;

        // Load blob
        uint32_t *val = reinterpret_cast<uint32_t *>(malloc(sizeof(val) * size));
        for (uint32_t x = 0, y = size; x < y; ++x)
        {
            input >> std::hex >> val[x];
        }
        wt.values = val;
        wt.count = size;
        weightMap[name] = wt;
    }
    std::cout << "Finished Load weights: " << file << std::endl;
    return weightMap;
}

那么被TensorRT优化后？模型又长什么样子呢？我们的权重放哪儿了呢？

肯定在build好后的engine里头，不过这些权重因为TensorRT的优化，可能已经被合并/移除/merge了。

模型参数的学问还是很多，近期也有很多相关的研究，比如参数重参化，是相当solid的工作，在很多训练和部署场景中经常会用到。

中文文本清洗与特征提取

摘自知乎：

bookname嵌入式AI算法研究

中文文本清洗

中文文本清洗：

– 去除指定无用的符号

– 让文本只保留汉字

– 文本中的表情符号去除

– 繁体中文与简体中文转换

中文文本清洗类

import re
from opencc import OpenCC
from bs4 import BeautifulSoup
import jieba
from glob import glob

import torch
from tqdm.auto import tqdm

import sys
!ls ../package/
sys.path.insert(0, "../package/")
from ltp import LTP
nlp = LTP(path="base")

class TextCleaner:
    '''
        批量清洗数据
    '''
    def __init__(self,
                 remove_space=True, # 去除空格
                 remove_suspension=True, # 转换省略号
                 only_zh=False, # 只保留汉子
                 remove_sentiment_character=True, # 去除表情符号
                 to_simple=True, # 转化为简体中文
                 remove_html_label=True,
                 remove_stop_words=False,
                 stop_words_dir="./停用词/",
                 with_space=False,
                 batch_size=256):
        self._remove_space = remove_space
        self._remove_suspension = remove_suspension
        self._remove_sentiment_character = remove_sentiment_character

        self._only_zh = only_zh
        self._to_simple = to_simple

        self._remove_html_label = remove_html_label
        self._remove_stop_words = remove_stop_words
        self._stop_words_dir = stop_words_dir

        self._with_space = with_space
        self._batch_size = batch_size

    def clean_single_text(self, text):
        if self._remove_space:
            text = self.remove_space(text)
        if self._remove_suspension:
            text = self.remove_suspension(text)
        if self._remove_sentiment_character:
            text = self.remove_sentiment_character(text)
        if self._to_simple:
            text = self.to_simple(text)
        if self._only_zh:
            text = self.get_zh_only(text)
        if self._remove_html_label:
            text = self.remove_html(text)
        return text

    def clean_text(self, text_list):
        text_list = [self.clean_single_text(text) for text in tqdm(text_list)]
        tokenized_words_list = self.tokenizer_batch_text(text_list)
        if self._remove_stop_words:
            text_list = [self.remove_stop_words(words_list, self._stop_words_dir, self._with_space) for words_list in tokenized_words_list]
        return text_list

    def remove_space(self, text):     #定义函数
        return text.replace(' ','')   # 去掉文本中的空格

    def remove_suspension(self, text):
        return text.replace('...', '。')

    def get_zh_only(self, text):
        def is_chinese(uchar):
            if uchar >= u'\u4e00' and uchar <= u'\u9fa5':  # 判断一个uchar是否是汉字 中文字符的编码范围 \u4e00 - \u9fff，只要在这个范围就可以
                return True
            else:
                return False

        content = ''
        for i in text:
            if is_chinese(i):
                content = content+i
        return content

    def remove_sentiment_character(self, sentence):    
        pattern = re.compile("[^\u4e00-\u9fa5^,^.^!^，^。^?^？^！^a-z^A-Z^0-9]")  #只保留中英文、数字和符号，去掉其他东西
        #若只保留中英文和数字，则替换为[^\u4e00-\u9fa5^a-z^A-Z^0-9]
        line = re.sub(pattern,'',sentence)  #把文本中匹配到的字符替换成空字符
        new_sentence=''.join(line.split())    #去除空白
        return new_sentence

    def to_simple(self, sentence):
        new_sentence = OpenCC('t2s').convert(sentence)   # 繁体转为简体
        return new_sentence

    def to_tradition(self, sentence):
        new_sentence = OpenCC('s2t').convert(sentence)   # 简体转为繁体
        return new_sentence

    def remove_html(self, text):
        return BeautifulSoup(text, 'html.parser').get_text() #去掉html标签

    def tokenizer_batch_text(self, text_list):
        tokenized_text = []
        len_text = len(text_list)
        with torch.no_grad():
            steps = self._batch_size
            for start_idx in tqdm(range(0, len_text, steps)):
                if start_idx + steps > len_text:
                    tokenized_text += nlp.seg(text_list[start_idx:])[0]
                else:
                    tokenized_text += nlp.seg(text_list[start_idx:start_idx+steps])[0]
        return tokenized_text

    def remove_stop_words(self, words_list, stop_words_dir, with_space=False):
        """
        中文数据清洗  stopwords_chineses.txt存放在博客园文件中
        :param text:
        :return:
        """
        stop_word_filepath_list = glob(stop_words_dir + "/*.txt")
        for stop_word_filepath in stop_word_filepath_list:
            with open(stop_word_filepath) as fp:
                stopwords = {}.fromkeys([line.rstrip() for line in fp]) #加载停用词(中文)
        eng_stopwords = set(stopwords) #去掉重复的词
        words = [w for w in words_list if w not in eng_stopwords] #去除文本中的停用词
        if with_space:
            return ' '.join(words)
        else:
            return ''.join(words)
ltp


file /root/.cache/torch/ltp/8909177e47aa4daf900c569b86053ac68838d09da28c7bbeb42b8efcb08f56aa-edb9303f86310d4bcfd1ac0fa20a744c9a7e13ee515fe3cf88ad31921ed616b2-extracted/config.json not found
file /root/.cache/torch/ltp/8909177e47aa4daf900c569b86053ac68838d09da28c7bbeb42b8efcb08f56aa-edb9303f86310d4bcfd1ac0fa20a744c9a7e13ee515fe3cf88ad31921ed616b2-extracted/config.json not found
cleaner = TextCleaner(remove_stop_words=True, with_space=True)
contents = ['   大家好， 欢迎一起来学习文本的空格   去除   ！', '   大家好，文本的空格   去除   ！']
results = cleaner.clean_text(contents)
print(results)
0%|          | 0/2 [00:00<?, ?it/s]



  0%|          | 0/1 [00:00<?, ?it/s]


['好 ， 学习 文本 空格 去除 ！', '好 ， 文本 空格 去除 ！']

去除空格

# 去除空格
contents = '   大家好， 欢迎一起来学习文本的空格   去除   ！'
print('处理前文本：'+contents)
def process(our_data):     #定义函数
    content = our_data.replace(' ','')   # 去掉文本中的空格
    print('处理后文本：'+content)
process(contents)
处理前文本：   大家好， 欢迎一起来学习文本的空格   去除   ！
处理后文本：大家好，欢迎一起来学习文本的空格去除！

去除空格的同时把省略号转换为句号

# 去除空格的同时把省略号转换为句号
contents = '   大家好， 这里还有  很多的知识...一起拉学习吧 ！'
print('处理前文本：'+contents)
def process(data):     #定义函数
    content1 = data.replace(' ','')    # 去掉文本中的空格
    content2 = content1.replace('...','。')    # 去掉文本中的空格
    print('处理后文本：'+ content2)
process(contents)
处理前文本：   大家好， 这里还有  很多的知识...一起拉学习吧 ！
处理后文本：大家好，这里还有很多的知识。一起拉学习吧！

让文本只保留汉字

def is_chinese(uchar):
    if uchar >= u'\u4e00' and uchar <= u'\u9fa5':  # 判断一个uchar是否是汉字
        return True
    else:
        return False

def allcontents(contents):
    content = ''
    for i in contents:
        if is_chinese(i):
            content = content+i
    print('\n处理后的句子为:\n'+content)

centents = '1,2,3...我们开始吧， 加油！'
print('原句子为:\n'+centents)
allcontents(centents)
原句子为:
1,2,3...我们开始吧， 加油！

处理后的句子为:
我们开始吧加油

文本中的表情符号去除

import re
sentence='现在听着音乐,duo rui mi,很开心*_*'
print('原句子为:\n'+sentence)

def clear_character(sentence):    
    pattern = re.compile("[^\u4e00-\u9fa5^,^.^!^a-z^A-Z^0-9]")  #只保留中英文、数字和符号，去掉其他东西
    #若只保留中英文和数字，则替换为[^\u4e00-\u9fa5^a-z^A-Z^0-9]
    line=re.sub(pattern,'',sentence)  #把文本中匹配到的字符替换成空字符
    new_sentence=''.join(line.split())    #去除空白
    print('\n处理后的句子为:\n'+new_sentence) 

clear_character(sentence)
原句子为:
现在听着音乐,duo rui mi,很开心*_*

处理后的句子为:
现在听着音乐,duoruimi,很开心

繁体中文与简体中文转换

from opencc import OpenCC

sentence = '你现在读的这里是简体，这里是繁体，能看懂吗？'
print('原句子为:\n'+sentence)

def Simplified(sentence):
    new_sentence = OpenCC('t2s').convert(sentence)   # 繁体转为简体
    print('\n处理后的句子为:\n'+new_sentence)

def Traditional(sentence):
    new_sentence = OpenCC('s2t').convert(sentence)   # 简体转为繁体
    print('\n处理后的句子为:\n'+new_sentence) 

Simplified(sentence)
Traditional(sentence)
原句子为:
你现在读的这里是简体，这里是繁体，能看懂吗？

处理后的句子为:
你现在读的这里是简体，这里是繁体，能看懂吗？

处理后的句子为:
你现在读的这里是简体，这里是繁体，能看懂吗？

OpenCC的参数设置：

- hk2s: Traditional Chinese (Hong Kong standard) to Simplified Chinese
- s2hk: Simplified Chinese to Traditional Chinese (Hong Kong standard)
- s2t: Simplified Chinese to Traditional Chinese
- s2tw: Simplified Chinese to Traditional Chinese (Taiwan standard)
- s2twp: Simplified Chinese to Traditional Chinese (Taiwan standard, with phrases)
- t2hk: Traditional Chinese to Traditional Chinese (Hong Kong standard)
- t2s: Traditional Chinese to Simplified Chinese
- t2tw: Traditional Chinese to Traditional Chinese (Taiwan standard)
- tw2s: Traditional Chinese (Taiwan standard) to Simplified Chinese
- tw2sp: Traditional Chinese (Taiwan standard) to Simplified Chinese (with phrases)

去除html标签和停用词

from bs4 import BeautifulSoup
import jieba
from glob import glob

def clean_chineses_text(text, with_space=False):
    """
    中文数据清洗  stopwords_chineses.txt存放在博客园文件中
    :param text:
    :return:
    """
    text = BeautifulSoup(text, 'html.parser').get_text() #去掉html标签
    text = jieba.lcut(text)
    stop_word_filepath_list = glob("./停用词/*.txt")
#     print(stop_word_filepath_list)
    for stop_word_filepath in stop_word_filepath_list:
        with open(stop_word_filepath) as fp:
            stopwords = {}.fromkeys([line.rstrip() for line in fp]) #加载停用词(中文)
    eng_stopwords = set(stopwords) #去掉重复的词
    words = [w for w in text if w not in eng_stopwords] #去除文本中的停用词
    if with_space:
        return ' '.join(words)
    else:
        return ''.join(words)
clean_chineses_text("你现在读的这里是简体，这里是繁体，能看懂吗？", with_space=True)
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.703 seconds.
Prefix dict has been built successfully.





'读 简体 ， 这里 繁体 ， 能看懂 吗 ？'
ENGLISH_STOP_WORDS = frozenset([
    "about", "above", "across", "after", "afterwards", "again", "against",
    "all", "almost", "alone", "along", "already", "also", "although", "always",
    "am", "among", "amongst", "amoungst", "amount", "an", "and", "another",
    "any", "anyhow", "anyone", "anything", "anyway", "anywhere", "are",
    "around", "as", "at", "back", "be", "became", "because", "become",
    "becomes", "becoming", "been", "before", "beforehand", "behind", "being",
    "below", "beside", "besides", "between", "beyond", "bill", "both",
    "bottom", "but", "by", "call", "can", "cannot", "cant", "co", "con",
    "could", "couldnt", "cry", "de", "describe", "detail", "do", "done",
    "down", "due", "during", "each", "eg", "eight", "either", "eleven", "else",
    "elsewhere", "empty", "enough", "etc", "even", "ever", "every", "everyone",
    "everything", "everywhere", "except", "few", "fifteen", "fifty", "fill",
    "find", "fire", "first", "five", "for", "former", "formerly", "forty",
    "found", "four", "from", "front", "full", "further", "get", "give", "go",
    "had", "has", "hasnt", "have", "he", "hence", "her", "here", "hereafter",
    "hereby", "herein", "hereupon", "hers", "herself", "him", "himself", "his",
    "how", "however", "hundred", "ie", "if", "in", "inc", "indeed",
    "interest", "into", "is", "it", "its", "itself", "keep", "last", "latter",
    "latterly", "least", "less", "ltd", "made", "many", "may", "me",
    "meanwhile", "might", "mill", "mine", "more", "moreover", "most", "mostly",
    "move", "much", "must", "my", "myself", "name", "namely", "neither",
    "never", "nevertheless", "next", "nine", "no", "nobody", "none", "noone",
    "nor", "not", "nothing", "now", "nowhere", "of", "off", "often", "on",
    "once", "one", "only", "onto", "or", "other", "others", "otherwise", "our",
    "ours", "ourselves", "out", "over", "own", "part", "per", "perhaps",
    "please", "put", "rather", "re", "same", "see", "seem", "seemed",
    "seeming", "seems", "serious", "several", "she", "should", "show", "side",
    "since", "sincere", "six", "sixty", "so", "some", "somehow", "someone",
    "something", "sometime", "sometimes", "somewhere", "still", "such",
    "system", "take", "ten", "than", "that", "the", "their", "them",
    "themselves", "then", "thence", "there", "thereafter", "thereby",
    "therefore", "therein", "thereupon", "these", "they", "thick", "thin",
    "third", "this", "those", "though", "three", "through", "throughout",
    "thru", "thus", "to", "together", "too", "top", "toward", "towards",
    "twelve", "twenty", "two", "un", "under", "until", "up", "upon", "us",
    "very", "via", "was", "we", "well", "were", "what", "whatever", "when",
    "whence", "whenever", "where", "whereafter", "whereas", "whereby",
    "wherein", "whereupon", "wherever", "whether", "which", "while", "whither",
    "who", "whoever", "whole", "whom", "whose", "why", "will", "with",
    "within", "without", "would", "yet", "you", "your", "yours", "yourself",
    "yourselves", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l",
    "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"])

特征抽取

BOW
TF-IDF
LDA

文本特征提取类

import numpy as np
import pandas as pd
from tqdm.auto import tqdm
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer, HashingVectorizer

import sys
!ls ../package/
sys.path.insert(0, "../package/")
from ltp import LTP
nlp = LTP(path="base")

from gensim.models import Word2Vec

class TextFeatures:
    def __init__(self, ngram_range=(1, 2)):
        self.cvt = CountVectorizer(tokenizer=self.tokenizer, ngram_range=ngram_range)
        self.tvt = TfidfVectorizer(tokenizer=self.tokenizer, ngram_range=ngram_range)
        self.hvt = HashingVectorizer(tokenizer=self.tokenizer, ngram_range=ngram_range)
        self.cleaner = TextCleaner(remove_html_label=True, remove_stop_words=True, with_space=True)

    def clean_text(self, text_list):
        return self.cleaner.clean_text(text_list)

    def tokenizer(self, text):
        return text.split(" ")

    def get_bow(self, text_list):
        return self.cvt.fit_transform(text_list)

    def get_tfidf(self, text_list):
        return self.tvt.fit_transform(text_list)

    def get_hashing(self, text_list):
        return self.hvt.fit_transform(text_list)
ltp


file /root/.cache/torch/ltp/8909177e47aa4daf900c569b86053ac68838d09da28c7bbeb42b8efcb08f56aa-edb9303f86310d4bcfd1ac0fa20a744c9a7e13ee515fe3cf88ad31921ed616b2-extracted/config.json not found
file /root/.cache/torch/ltp/8909177e47aa4daf900c569b86053ac68838d09da28c7bbeb42b8efcb08f56aa-edb9303f86310d4bcfd1ac0fa20a744c9a7e13ee515fe3cf88ad31921ed616b2-extracted/config.json not found
train_df = pd.read_csv("../0.数据/1.情感分析/NLPCC14-SC/train.tsv", sep="\t", error_bad_lines=False)
train_df.head()

	label	text_a

set(train_df["label"]), train_df.shape
({0, 1}, (10000, 2))
cleaner = TextCleaner(remove_html_label=True, remove_stop_words=True, with_space=True)
contents = ['   大家好， 欢迎一起来学习文本的空格   去除   ！']
results = cleaner.clean_text(contents)
print(results)
0%|          | 0/1 [00:00<?, ?it/s]



  0%|          | 0/1 [00:00<?, ?it/s]


['好 ， 学习 文本 空格 去除 ！']
tqdm.pandas(desc="clean data")
train_df["cleaned_text"] = cleaner.clean_text(train_df["text_a"].values)
0%|          | 0/10000 [00:00<?, ?it/s]



  0%|          | 0/40 [00:00<?, ?it/s]
train_df.to_csv("cleaned_train.csv", index=None)
# import torch
# from tqdm.auto import tqdm

# tokenized_text = []
# text_list = list(train_df["cleaned_text"].values)
# with torch.no_grad():
#     steps = 256
#     for start_idx in tqdm(range(0, train_df.shape[0], steps)):
# #         print(start_idx)
#         if start_idx + steps > train_df.shape[0]:
#             tokenized_text += nlp.seg(text_list[start_idx:])[0]
#         else:
#             tokenized_text += nlp.seg(text_list[start_idx:start_idx+steps])[0]
# from joblib import dump, load
# 关掉显存占用
# from numba import cuda

# cuda.select_device(0)
# cuda.close()

BOW

!ls ../1.基础/停用词/
中文停用词库.txt  哈工大停用词表.txt  四川大学停用词表.txt  百度停用词表.txt
from glob import glob
# 停用词列表
stop_words = []
txt_list = glob("../1.基础/停用词/*.txt")
for txt_path in txt_list:
    with open(txt_path, "r") as fp:
        lines = fp.readlines()
    stop_words += [line.strip() for line in lines]
len(stop_words)
3893
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer, HashingVectorizer
from sklearn.linear_model import Ridge, Lasso, LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, roc_auc_score
def tokenizer(text):
    return text.split(" ")
# corpus = [" ".join(text_list) for text_list in tokenized_text]
# corpus[:2]
corpus = train_df["cleaned_text"].values
cvt = CountVectorizer(stop_words=stop_words, tokenizer=tokenizer, ngram_range=(1, 2))
x_cvt = cvt.fit_transform(corpus)
len(cvt.vocabulary_)
137525
y = train_df["label"].values
X_train, X_val, y_train, y_val = train_test_split(x_cvt, y, test_size=0.1)

clf = Ridge(alpha=500.)
clf.fit(X_train, y_train)

print("train score: ")
y_pred = clf.predict(X_train)
print(roc_auc_score(y_train, y_pred), accuracy_score(y_train, y_pred>0.5))
print()
print("valid score: ")
y_pred = clf.predict(X_val)
print(roc_auc_score(y_val, y_pred), accuracy_score(y_val, y_pred>0.5))
train score: 
0.8657380740314067 0.798

valid score: 
0.8009079767378523 0.733

TFIDF

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer, HashingVectorizer
tvt = TfidfVectorizer(stop_words=stop_words, tokenizer=tokenizer, ngram_range=(1, 2))
x_tvt = tvt.fit_transform(corpus)
len(tvt.vocabulary_)
137525
y = train_df["label"].values
X_train, X_val, y_train, y_val = train_test_split(x_tvt, y, test_size=0.1)

clf = Ridge(alpha=10.)
clf.fit(X_train, y_train)

print("train score: ")
y_pred = clf.predict(X_train)
print(roc_auc_score(y_train, y_pred), accuracy_score(y_train, y_pred>0.5))
print()
print("valid score: ")
y_pred = clf.predict(X_val)
print(roc_auc_score(y_val, y_pred), accuracy_score(y_val, y_pred>0.5))
train score: 
0.9349220324539836 0.8745555555555555

valid score: 
0.7963706773775423 0.728

HashingVectorizer

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer, HashingVectorizer
hvt = HashingVectorizer(stop_words=stop_words, tokenizer=tokenizer, ngram_range=(1, 2))
x_hvt = hvt.fit_transform(corpus)
y = train_df["label"].values
X_train, X_val, y_train, y_val = train_test_split(x_hvt, y, test_size=0.1)

clf = Ridge(alpha=1.)
clf.fit(X_train, y_train)

print("train score: ")
y_pred = clf.predict(X_train)
print(roc_auc_score(y_train, y_pred), accuracy_score(y_train, y_pred>0.5))
print()
print("valid score: ")
y_pred = clf.predict(X_val)
print(roc_auc_score(y_val, y_pred), accuracy_score(y_val, y_pred>0.5))
train score: 
0.99204728016389 0.969

valid score: 
0.8349841394447204 0.749

LDA

train_df = pd.read_csv("./cleaned_train.csv")
train_df.head()

	label	text_a	cleaned_text

from glob import glob
# 停用词列表
stop_words = []
txt_list = glob("../1.基础/停用词/*.txt")
for txt_path in txt_list:
    with open(txt_path, "r") as fp:
        lines = fp.readlines()
    stop_words += [line.strip() for line in lines]
len(stop_words)
3893
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer, HashingVectorizer
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.linear_model import Ridge, Lasso, LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, roc_auc_score
def tokenizer(text):
    return text.split(" ")

corpus = train_df["cleaned_text"].values
corpus = [string if string is not np.nan else "" for string in corpus]
cvt = CountVectorizer(tokenizer=tokenizer, ngram_range=(1, 2))
x_cvt = cvt.fit_transform(corpus)
lda = LatentDirichletAllocation(n_components=32, doc_topic_prior=None, topic_word_prior=None, learning_method='batch', 
                                learning_decay=0.7, learning_offset=50.0, max_iter=10, batch_size=128, evaluate_every=-1, 
                                total_samples=1000000.0, perp_tol=0.1, mean_change_tol=0.001, max_doc_update_iter=100, 
                                n_jobs=None, verbose=0, random_state=402)
docres = lda.fit_transform(x_cvt)
docres.shape
(10000, 32)
y = train_df["label"].values
X_train, X_val, y_train, y_val = train_test_split(docres, y, test_size=0.1)

clf = Ridge(alpha=500.)
clf.fit(X_train, y_train)

print("train score: ")
y_pred = clf.predict(X_train)
print(roc_auc_score(y_train, y_pred), accuracy_score(y_train, y_pred>0.5))
print()
print("valid score: ")
y_pred = clf.predict(X_val)
print(roc_auc_score(y_val, y_pred), accuracy_score(y_val, y_pred>0.5))
train score: 
0.5984059229289742 0.5741111111111111

valid score: 
0.5797141495568878 0.57

gensim

corpus = [string.split(" ") for string in corpus]
from gensim import corpora
dictionary = corpora.Dictionary(corpus)
dictionary.save('qzone.dict')
dictionary.filter_extremes(no_below=20, no_above=0.5)
dictionary.compactify()
corpus = [dictionary.doc2bow(s) for s in corpus]
corpora.MmCorpus.serialize('corpus_bow.mm', corpus)  # 存储语料库
from gensim.models import LdaModel

num_topics = 100
chunksize = 2000
passes = 20
iterations = 400
eval_every = None 

temp = dictionary[0]
id2word = dictionary.id2token

model = LdaModel(
    corpus=corpus,
    id2word=id2word,
    chunksize=chunksize,
    alpha='auto',
    eta='auto',
    iterations=iterations,
    num_topics=num_topics,
    passes=passes,
    eval_every=eval_every
)

model.save('qzone.model')
top_topics = model.top_topics(corpus)
avg_topic_coherence = sum([t[1] for t in top_topics]) / num_topics
print('Average topic coherence: %.4f.' % avg_topic_coherence)
Average topic coherence: -5.7200.
len(top_topics), len(corpus)
(100, 10000)

LTP特征提取

import sys
!ls ../package/

sys.path.insert(0, "../package/")

from ltp import LTP
nlp = LTP(path="base")
ltp


file /root/.cache/torch/ltp/8909177e47aa4daf900c569b86053ac68838d09da28c7bbeb42b8efcb08f56aa-edb9303f86310d4bcfd1ac0fa20a744c9a7e13ee515fe3cf88ad31921ed616b2-extracted/config.json not found
file /root/.cache/torch/ltp/8909177e47aa4daf900c569b86053ac68838d09da28c7bbeb42b8efcb08f56aa-edb9303f86310d4bcfd1ac0fa20a744c9a7e13ee515fe3cf88ad31921ed616b2-extracted/config.json not found
seg, hidden = nlp.seg(["他叫汤姆去拿外衣。"])
pos = nlp.pos(hidden)
ner = nlp.ner(hidden)
srl = nlp.srl(hidden)
dep = nlp.dep(hidden)
sdp = nlp.sdp(hidden)

对于LTP提取的特征，可以参考LTP的文档

静态词向量
动态词向量

python 包、模块的书写以及 all 变量的用法

一、模块

相信使用过Python编写代码的同学，会经常在文件头看到这样的import …，是的，这就是导入模块的语句，而每一个后缀名为.py的文件都是一个模块。

import jieba
import os

1. 什么是模块？

逻辑上来说模块是一组功能的组合；实质上一个模块就是一个包含了python定义和声明的文件，文件名就是模块名字加上.py的后缀。

import加载的模块分为四个通用类别：

a. 使用python编写的代码（.py文件）；
b. 已被编译为共享库或DLL的C或C++扩展；
c. 包好一组模块的包
d. 使用C编写并链接到python解释器的内置模块；

如何使用模块？
　　想要使用模块，必须先要将模块加载进来，可以通过关键字 import 或 from进行加载；需要注意的是模块和当前文件在不同的命名空间中。

2. 模块的构成

模块可以包含可执行的语句和函数的定义，这些语句的目的是初始化模块，它们只在模块名第一次遇到导入import语句时才执行（import语句是可以在程序中的任意位置使用的,且针对同一个模块很import多次,为了防止你重复导入，python的优化手段是：第一次导入后就将模块名加载到内存了，后续的import语句仅是对已经加载大内存中的模块对象增加了一次引用，不会重新执行模块内的语句）

二、模块的导入

1、导入整个模块

比如我们有一个myModule的文件夹，里面有一个first.py文件，文件中的内容如下

a = 1
def myfun(s):
    print(s + 1)

在myModule的文件夹下打开终端/cmd，输入python进入命令行交互模式
写完模块导入的语句之后，接着就可以调用该模块下的函数了。调用方式为

>>> import first
>>> a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
>>> first.a
1
>>> first.myfun(2)
3

在这里插入图片描述
2、导入特定的函数/变量

所以说first.py文件就是一个模块，可以用import导入，里面变量和方法都要用first.前缀来引用，如果想不使用这个前缀或是我们只是想要使用模块中的某个函数，就可以只导入该变量或函数。导入方式为：from module_name import function_name。
如果导入的是变量，就可以直接输入变量名来获得变量的值；如果直接导入的是函数，可以直接使用function_name() 的方式调用函数，无需在函数名前面加上模块名。

# 导入变量
>>> from first import a
>>> a
1
# 导入函数
>>> from first import myfun
>>> myfun(3)
4
# 一次导入多个变量
>>> from first import a,myfun
>>> a
1
>>> myfun(5)
6
# 导入模块中全部变量
>>> from first import *
>>> a
1
>>> myfun(5)
6
>>>

3、使用as给模块指定别名

可以在后面使用as给函数指定别名。句式如：import module_name as new_name，

>>> import first as f
>>> f.a
1
>>> f.myfun(6)
7

在上述导入函数的基础上，可以在后面用as语句给导入的函数指定别名。句式如：from module_name import function_name as new_function。

>>> from first import myfun as add
>>> add(8)
9

三、包、库

模块(module) 其实就是py文件，里面定义了一些函数、类、变量等。
包(package) 是多个模块的聚合体形成的文件夹，里面可以是多个py文件，也可以嵌套文件夹。
库是参考其他编程语言的说法，是指完成一定功能的代码集合，在python中的形式就是模块和包。

一个包的架构：

sound/                          Top-level package
      __init__.py               Initialize the sound package
      formats/                  Subpackage for file format conversions
              __init__.py
              wavread.py
              wavwrite.py
              aiffread.py
              aiffwrite.py
              auread.py
              auwrite.py
              ...
      effects/                  Subpackage for sound effects
              __init__.py
              echo.py
              surround.py
              reverse.py
              ...
      filters/                  Subpackage for filters
              __init__.py
              equalizer.py
              vocoder.py
              karaoke.py
              ...

Python 只把含 __init__.py 文件的目录当成包。这样可以防止以 string 等通用名称命名的目录，无意中屏蔽出现在后方模块搜索路径中的有效模块。最简情况下，__init__.py 只是一个空文件，但该文件也可以执行包的初始化代码，或设置 __all__ 变量，详见下文。

四、包的导入

导入包的本质：导入一个包就是执行包下的__init__.py文件

只要一个文件夹下面有个__init__.py 文件，那么这个文件夹就可以看做是一个包。

包导入的过程和模块的基本一致，只是导入包的时候会执行此包目录下的 init.py 而不是模块里面的语句了。另外，如果只是单纯的导入包，而包的 init.py 中又没有明确的其他初始化操作，那么此包下面的模块是不会自动导入的。

另外需要注意两点

__ init__ .py文件编写时，如果要在__init__.py中导入其他模块中的变量，即使__ init__.py文件和abcd.py文件在同一个文件夹下，也不能from abcd import b，要从abcd文件从哪里来开始写，即从包的名称开始，from folder.abcd import b。
folder文件夹里的嵌套文件夹内不需要新建__init__.py文件即可像模块一样调用，但是一般还是要新建这个文件，可以方便地导入常用变量。
init.py文件其实是一个特殊的文件，它相当于名为folder模块，即如果使用import folder则可以调用在__init__.py文件文件中定义的变量。

五、 all

使用 from sound.effects import * 时会发生什么？理想情况下，该语句在文件系统查找并导入包的所有子模块。这项操作花费的时间较长，并且导入子模块可能会产生不必要的副作用，这种副作用只有在显式导入子模块时才会发生。

唯一的解决方案是提供包的显式索引。import 语句使用如下惯例：如果包的 __init__.py 代码定义了列表 __all__，运行 from package import * 时，它就是用于导入的模块名列表。发布包的新版本时，包的作者应更新此列表。如果包的作者认为没有必要在包中执行导入 * 操作，也可以不提供此列表。例如，sound/effects/__init__.py 文件包含以下代码：

__all__ = ["echo", "surround", "reverse"]

这将意味着将 from sound.effects import * 导入 sound.effects 包的三个命名的子模块。

如果没有定义 __all__，from sound.effects import * 语句不会把包 sound.effects 中所有子模块都导入到当前命名空间；该语句只确保导入包 sound.effects （可能还会运行 __init__.py 中的初始化代码），然后，再导入包中定义的名称。这些名称包括 __init__.py 中定义的任何名称（以及显式加载的子模块），还包括之前 import 语句显式加载的包里的子模块。

变量all的好处：只会导出all中的子模块，可以有效地避免命名空间的污染，并加速模块的导入

一、模块公开接口的一种约定
__all__可以在模块级别暴露接口，形式如下：
__all__ = [“foo”, “bar”]
Python 没有原生的可见性控制，其可见性的维护是靠一套需要大家自觉遵守的”约定“，比如，下划线开头的变量对外部不可见。
__all__ 是针对模块公开接口的一种约定，以提供了”白名单“的形式暴露接口。如果定义了__all__，其他文件中使用from xxx import *导入该文件时，只会导入 __all__ 列出的成员，可以其他成员都被排除在外。
如，test1.py,test2.py,test3.py三个文件：
test1.py
#__all__ = [‘func’]
def func():
pass

test2.py
import test1

__all__ = [‘func2’, ‘test1’]
def func2():
pass

def func22():
pass

test3.py
from test2 import *

func2() #能正常引用
test1.func() #能正常引用
func22() #不能正常引用

二、控制 from xxx import * 的行为
python不提倡用 from xxx import * 这种写法。如果一个模块 xxx 没有定义 __all__，执行 from spam import * 时会将 xxx 中所有非下划线开头的成员（包括该模块import的其他模块成员）都会导入当前命名空间，这样就可能弄脏当前的命名空间。显式声明了 __all__，import * 就只会导入 __all__ 列出的成员，如果 __all__ 定义有误，还会明确地抛出异常，方便检查错误。

三、为 lint 等代码检查工具提供辅助
编写库时，经常会在 __init__.py 中暴露整个包的 API，而这些 API 的实现可能是在包的其他模块中。如果仅仅这样写：from xxx import a, b，一些代码检查工具，如 pyflakes 会报错，认为变量 a和 b import 了但没被使用。一个可行的方法是把这个警告压掉：from xxx import a, b # noqa （No Q/A，即无质量保证），但更好的方法是显式定义 __all__，这样代码检查工具就会理解，从而不再报 unused variables 的警告。

四、定义 all 需要注意的地方

__all__ 的形式都是 list类型。如果写成其他类型， pyflakes 等 lint 工具可能无法识别。
不能动态生成 __all__，如使用列表解析式。__all__ 的作用是定义公开接口，需要以字面量的形式显式写出来。
即使定义了 __all__，也不应该在非临时代码中使用 from xxx import * 语法，或用编程工具模拟 Ruby 的自动 import。Python 不像 Ruby，没有 Module 这类成员，模块就是命名空间隔离的执行者。如果打破了这一层，引入诸多动态因素，生产环境中跑的代码就可能充满不确定性，调试也会变得困难。
按照 PEP8 建议的风格，__all__ 应该写在所有 import 语句下面，函数、常量等成员定义的上面。
如果一个模块需要暴露的接口改动频繁，__all__ 可以这样定义：

__all__ = [
“foo”,
“bar”,
“egg”,
]
这样修改一个暴露的接口只修改一行，方便版本控制的时候看 diff。最后多出的逗号在 Python 中是允许的，符合 PEP8 风格。

由上面的输出结果，我们可以知道import *只会导入__all__中指定的变量，无论是否以下划线开头。这样限制可以防止import *命令导入太多变量污染命名空间，过滤掉一些中间变量如b

五、模块导入的绝对引用与相对引用

python中的import分为绝对引用和相对引用两种。它们之间的差异在于，引用模块时，定位被引用模块位置的方式不同。

绝对引用是通过.的连接，指定出最高级文件（夹），到目标文件的绝对路径。我们上面的所有用法都属于绝对引用。

而相对引用是指定待引用模块与当前文件的相对位置，.表示上一级文件

绝对引用：from folder.abcd import myclass
相对引用：from .abcd import myclass

在实际使用中，无论是绝对导入还是相对导入都要注意，如何导入与被调用位置有关。

Pytorch 中 model.eval() model.train() 和 with torch.no_grad() 的区别

1、model.eval() model.train()区别

model.train()和model.eval()的区别主要在于Batch Normalization和Dropout两层。

官方文档 model.train() ：
启用 Batch Normalization 和 Dropout。
如果模型中有BN层(Batch Normalization）和 Dropout，需要在训练时添加model.train()。model.train()是保证BN层能够用到每一批数据的均值和方差。对于Dropout，model.train()是随机取一部分网络连接来训练更新参数。

官方文档 model.eval()
不启用 Batch Normalization 和 Dropout。
如果模型中有BN层(Batch Normalization）和Dropout，在测试时添加model.eval()。model.eval()是保证BN层能够用全部训练数据的均值和方差，即测试过程中要保证BN层的均值和方差不变。对于Dropout，model.eval()是利用到了所有网络连接，即不进行随机舍弃神经元。

训练完train样本后，生成的模型model要用来测试样本。在model(test)之前，需要加上model.eval()，否则的话，有输入数据，即使不训练，它也会改变权值。这是model中含有BN层和Dropout所带来的的性质。

在做one classification的时候，训练集和测试集的样本分布是不一样的，尤其需要注意这一点。

2 . model.eval()和with torch.no_grad()的区别：

在PyTorch中进行validation时，会使用model.eval()切换到测试模式，在该模式下，

主要用于通知dropout层和batchnorm层在train和val模式间切换
在train模式下，dropout网络层会按照设定的参数p设置保留激活单元的概率（保留概率=p); batchnorm层会继续计算数据的mean和var等参数并更新。
在val模式下，dropout层会让所有的激活单元都通过，而batchnorm层会停止计算和更新mean和var，直接使用在训练阶段已经学出的mean和var值。
该模式不会影响各层的gradient计算行为，即gradient计算和存储与training模式一样，只是不进行反传（backprobagation）

with torch.no_grad()则主要是用于停止autograd模块的工作，以起到加速和节省显存的作用，具体行为就是停止gradient计算，从而节省了GPU算力和显存，但是并不会影响dropout和batchnorm层的行为。

使用场景：
如果不在意显存大小和计算时间的话，仅仅使用model.eval()已足够得到正确的validation的结果；而with torch.zero_grad()则是更进一步加速和节省gpu空间（因为不用计算和存储gradient），从而可以更快计算，也可以跑更大的batch来测试。

Python装饰器：python中的@符号的作用以及 torch中经常出现的 @torch.no_grad()

@符号是装饰器（修饰符）的语法糖，在定义函数的时候使用，避免再一次赋值操作

装饰器(Decorators)是 Python 的一个重要部分。简单地说：他们是修改其他函数的功能的函数。他们有助于让我们的代码更简短，也更Pythonic（Python范儿）。大多数初学者不知道在哪儿使用它们，所以我将要分享下，哪些区域里装饰器可以让你的代码更简洁。首先，让我们讨论下如何写你自己的装饰器。

‘@’符号用作函数修饰符是python2.4新增加的功能，修饰符必须出现在函数定义前一行，不允许和函数定义在同一行。也就是说＠A def f(): 是非法的。只可以在模块或类定义层内对函数进行修饰，不允许修饰一个类。一个修饰符就是一个函数，它将被修饰的函数做为参数，并返回修饰后的同名函数或其它可调用的东西。

实例（1）：

def spamrun(fn):
   def sayspam(*args):
       print("spam,spam,spam")
   return sayspam

@spamrun
def useful(a,b):
   print (a**2+b**2)

执行： useful(3,4)

返回：spam,spam,spam

def addspam(fn):
   def new(*args):
       print "spam,spam,spam"
       return fn(*args)
   return new

@addspam
def useful(a,b):
   print a**2+b**2

执行： useful(4,3)

结果：

spam,spam,spam

@torch.no_grad()

@torch.no_grad()
def eval():
	...

@torch.no_grad()后面的函数的数据不需要计算梯度，也不会进行反向传播

Python装饰器：

装饰器本质上是一个Python函数，它可以让其他函数在不需要做任何代码变动的前提下增加额外功能，装饰器的返回值也是一个函数对象。它经常用于有切面需求的场景，比如：插入日志、性能测试、事务处理、缓存、权限校验等场景。装饰器是解决这类问题的绝佳设计，有了装饰器，我们就可以抽离出大量与函数功能本身无关的雷同代码并继续重用。概括的讲，装饰器的作用就是为已经存在的对象添加额外的功能。

先来看一个简单例子：

def foo():
    print('i am foo')

现在有一个新的需求，希望可以记录下函数的执行日志，于是在代码中添加日志代码：

def foo():
    print('i am foo')
    logging.info("foo is running")

bar()、bar2()也有类似的需求，怎么做？再写一个logging在bar函数里？这样就造成大量雷同的代码，为了减少重复写代码，我们可以这样做，重新定义一个函数：专门处理日志，日志处理完之后再执行真正的业务代码

def use_logging(func):
    logging.warn("%s is running" % func.__name__)
    func()

def bar():
    print('i am bar')

use_logging(bar)

逻辑上不难理解，但是这样的话，我们每次都要将一个函数作为参数传递给use_logging函数。而且这种方式已经破坏了原有的代码逻辑结构，之前执行业务逻辑时，执行运行bar()，但是现在不得不改成use_logging(bar)。那么有没有更好的方式的呢？当然有，答案就是装饰器。

简单装饰器

def use_logging(func):

    def wrapper(*args, **kwargs):
        logging.warn("%s is running" % func.__name__)
        return func(*args, **kwargs)
    return wrapper

def bar():
    print('i am bar')

bar = use_logging(bar)
bar()

函数use_logging就是装饰器，它把执行真正业务方法的func包裹在函数里面，看起来像bar被use_logging装饰了。在这个例子中，函数进入和退出时，被称为一个横切面(Aspect)，这种编程方式被称为面向切面的编程(Aspect-Oriented Programming)。

@符号是装饰器的语法糖，在定义函数的时候使用，避免再一次赋值操作

方法一：不用语法糖@符号

# 装饰器不传入参数时
f = decorator(函数名)

# 装饰器传入参数时
f = (decorator(参数))(函数名)


方法二：采用语法糖@符号

# 已定义的装饰器
@decorator 
def f():  
    pass

# 执行被装饰过的函数 
f()

def use_logging(func):

    def wrapper(*args, **kwargs):
        logging.warn("%s is running" % func.__name__)
        return func(*args)
    return wrapper

@use_logging
def foo():
    print("i am foo")

@use_logging
def bar():
    print("i am bar")

bar()

如上所示，这样我们就可以省去bar = use_logging(bar)这一句了，直接调用bar()即可得到想要的结果。如果我们有其他的类似函数，我们可以继续调用装饰器来修饰函数，而不用重复修改函数或者增加新的封装。这样，我们就提高了程序的可重复利用性，并增加了程序的可读性。

装饰器在Python使用如此方便都要归因于Python的函数能像普通的对象一样能作为参数传递给其他函数，可以被赋值给其他变量，可以作为返回值，可以被定义在另外一个函数内。

带参数的装饰器

装饰器还有更大的灵活性，例如带参数的装饰器：在上面的装饰器调用中，比如@use_logging，该装饰器唯一的参数就是执行业务的函数。装饰器的语法允许我们在调用时，提供其它参数，比如@decorator(a)。这样，就为装饰器的编写和使用提供了更大的灵活性。

def use_logging(level):
    def decorator(func):
        def wrapper(*args, **kwargs):
            if level == "warn":
                logging.warn("%s is running" % func.__name__)
            return func(*args)
        return wrapper

    return decorator

@use_logging(level="warn")
def foo(name='foo'):
    print("i am %s" % name)

foo()

上面的use_logging是允许带参数的装饰器。它实际上是对原有装饰器的一个函数封装，并返回一个装饰器。我们可以将它理解为一个含有参数的闭包。当我们使用@use_logging(level=”warn”)调用的时候，Python能够发现这一层的封装，并把参数传递到装饰器的环境中。

类装饰器

再来看看类装饰器，相比函数装饰器，类装饰器具有灵活度大、高内聚、封装性等优点。使用类装饰器还可以依靠类内部的__call__方法，当使用 @ 形式将装饰器附加到函数上时，就会调用此方法。

__call__方法 : 在生成一个类的实例时，自动自行一次call方法

当执行Foo时候生成一个实例，就会自动调用__call__方法

class Foo(object):
    def __init__(self, func):
    self._func = func

def __call__(self):
    print ('class decorator runing')
    self._func()
    print ('class decorator ending')

@Foo
def bar():
    print ('bar')

bar()

functools.wraps

使用装饰器极大地复用了代码，但是他有一个缺点就是原函数的元信息不见了，比如函数的docstring、__name__、参数列表，先看例子：

装饰器

def logged(func):
    def with_logging(*args, **kwargs):
        print func.__name__ + " was called"
        return func(*args, **kwargs)
    return with_logging

函数

@logged
def f(x):
   """does some math"""
   return x + x * x

该函数完成等价于：

def f(x):
    """does some math"""
    return x + x * x
f = logged(f)

不难发现，函数f被with_logging取代了，当然它的docstring，__name__就是变成了with_logging函数的信息了。

print f.__name__    # prints 'with_logging'
print f.__doc__     # prints None

这个问题就比较严重的，好在我们有functools.wraps，wraps本身也是一个装饰器，它能把原函数的元信息拷贝到装饰器函数中，这使得装饰器函数也有和原函数一样的元信息了。

from functools import wraps
def logged(func):
    @wraps(func)
    def with_logging(*args, **kwargs):
        print func.__name__ + " was called"
        return func(*args, **kwargs)
    return with_logging

@logged
def f(x):
    """does some math"""
    return x + x * x

print f.__name__  # prints 'f'
print f.__doc__   # prints 'does some math'

内置装饰器

@staticmathod、@classmethod、@property

@property

把类内方法当成属性来使用，必须要有返回值，相当于getter；

假如没有定义 @func.setter 修饰方法的话，就是只读属性

class Car:

    def __init__(self, name, price):
        self._name = name
        self._price = price    
     
    @property
    def car_name(self):
        return self._name
        
     # car_name可以读写的属性   
     @car_name.setter
     def car_name(self, value):
         self._name = value
         
     # car_price是只读属性 
     @property
     def car_price(self):
         return str(self._price) + '万'
         
benz = Car('benz', 30)

print(benz.car_name)   # benz
benz.car_name = "baojun"
print(benz.car_name)   # baojun
print(benz.car_price)  # 30万

@staticmethod

静态方法，不需要表示自身对象的self和自身类的cls参数，就跟使用函数一样。

静态方法的使用场景：

如果在方法中不需要访问任何实例方法和属性，纯粹地通过传入参数并返回数据的功能性方法，那么它就适合用静态方法来定义，它节省了实例化对象的开销成本，往往这种方法放在类外面的模块层作为一个函数存在也是没问题的，而放在类中，仅为这个类服务。

@classmethod

类方法，不需要self参数，但第一个参数需要是表示自身类的cls参数。

类方法的使用场景有：

作为工厂方法创建实例对象，例如内置模块 datetime.date 类中就有大量使用类方法作为工厂方法，以此来创建date对象。如果希望在方法里面调用静态类，那么把方法定义成类方法是合适的，因为要是定义成静态方法，那么你就要显示地引用类A，这对继承来说可不是一件好事情。

例子

class Demo(object):

    text = "三种方法的比较"
    
    def instance_method(self):
        print("调用实例方法")

    @classmethod
    def class_method(cls):
        print("调用类方法")
        print("在类方法中 访问类属性 text: {}".format(cls.text))
        print("在类方法中 调用实例方法 instance_method: {}".format(cls().instance_method()))

    @staticmethod
    def static_method():
        print("调用静态方法")
        print("在静态方法中 访问类属性 text: {}".format(Demo.text))
        print("在静态方法中 调用实例方法 instance_method: {}".format(Demo().instance_method()))

if __name__ == "__main__":
    # 实例化对象
    d = Demo()
    
    # 对象可以访问 实例方法、类方法、静态方法
    # 通过对象访问text属性
    print(d.text)
    
    # 通过对象调用实例方法
    d.instance_method()
    
    # 通过对象调用类方法
    d.class_method()
    
    # 通过对象调用静态方法
    d.static_method()
    
    # 类可以访问类方法、静态方法
    # 通过类访问text属性
    print(Demo.text)
    
    # 通过类调用类方法
    Demo.class_method()
    
    # 通过类调用静态方法
    Demo.static_method()

@staticmethod 和 @classmethod 的区别和 使用场景：

在上述例子中，我们可以看出，

区别

在定义静态类方法和类方法时，@staticmethod 装饰的静态方法里面，想要访问类属性或调用实例方法，必须需要把类名写上；

而@classmethod装饰的类方法里面，会传一个cls参数，代表本类，这样就能够避免手写类名的硬编码。

在调用静态方法和类方法时，实际上写法都差不多，一般都是通过类名.静态方法() 或类名.类方法()。也可以用实例对象调用类方法和静态方法。对象可以访问实例方法、类方法、静态方法，类可以访问类方法、静态方法

也可以用实例化对象去调用静态方法和类方法，但为了和实例方法区分，最好还是用类去调用静态方法和类方法。

使用场景

所以，在定义类的时候，

假如不需要用到与类相关的属性或方法时，就用静态方法@staticmethod；

假如需要用到与类相关的属性或方法，然后又想表明这个方法是整个类通用的，而不是对象特异的，就可以使用类方法@classmethod。

装饰器的顺序

@a
@b
@c
def f ():

等效于

f = a(b(c(f)))

CE Loss 与 BCE Loss (分类问题损失函数)

有两个问题曾困扰着我：

为何MSE loss是一种回归问题的loss，不可以用在分类问题？而非要用CE或BCE呢？
为何CE与softmax激活函数搭配，而BCE与sigmoid搭配？有什么理由？

在学习过后，我发现这个问题在数学上有多种理解的角度，而结论却是一致的。在这里我梳理出一种角度，在未来如果有新的理解再进行补充。

该思路学习自李宏毅老师的机器学习课程

这说明，如果用MSE loss来训练分类问题，不论预测接近真实值或是接近错误值，梯度都很小。这也就解释了为何我们需要CE或BCE损失来处理分类问题。

2. BCE 损失函数

既然在分类问题中，MSE损失函数的梯度不能满足需要，现在我们来推导BCE损失函数的梯度。

3. CE 损失函数

江湖上流传一句话，交叉熵损失可以采用“sigmoid+BCE”或是“softmax+CE”。对于前者，上一部分已经对其回传的梯度进行了推导，证实了其合理性。

在这一部分我们对“softmax+CE”的梯度进行推导。

4. 应用

在Pytorch中，“sigmoid+BCE”对应的是torch.nn.BCEWithLogitsLoss，而“softmax+CE”对应的是torch.nn.CrossEntropyLoss

具体参数和用法可以参考 BCEWithLogitsLoss 和 CrossEntropyLoss

在分类问题中，如果遇到类别间不互斥的情况，只能采用“sigmoid+BCE”；

如果遇到类别间互斥的情况（只能有一类胜出），“sigmoid+BCE”化为多个二分类问题与“softmax+CE”直接进行分类都是有被用到的方法。

Softmax函数和Sigmoid函数的区别与联系

1. 前言

对于Softmax函数和Sigmoid函数，我们分为两部分讲解，第一部分：对于分类任务，第二部分：对于二分类任务（详细讲解）。

优点：1. Sigmoid函数的输出在(0,1)之间，输出范围有限，优化稳定，可以用作输出层。2. 连续函数，便于求导。

缺点：1. 最明显的就是饱和性，从上图也不难看出其两侧导数逐渐趋近于0，容易造成梯度消失。2.激活函数的偏移现象。Sigmoid函数的输出值均大于0，使得输出不是0的均值，这会导致后一层的神经元将得到上一层非0均值的信号作为输入，这会对梯度产生影响。 3. 计算复杂度高，因为Sigmoid函数是指数形式。

2.2 Softmax函数

Softmax =多类别分类问题=只有一个正确答案=互斥输出（例如手写数字，鸢尾花）。构建分类器，解决只有唯一正确答案的问题时，用Softmax函数处理各个原始输出值。Softmax函数的分母综合了原始输出值的所有因素，这意味着，Softmax函数得到的不同概率之间相互关联。

Softmax函数是二分类函数Sigmoid在多分类上的推广，目的是将多分类的结果以概率的形式展现出来。如图2所示，Softmax直白来说就是将原来输出是3,1,-3通过Softmax函数一作用，就映射成为(0,1)的值，而这些值的累和为1（满足概率的性质），那么我们就可以将它理解成概率，在最后选取输出结点的时候，我们就可以选取概率最大（也就是值对应最大的）结点，作为我们的预测目标。

由于Softmax函数先拉大了输入向量元素之间的差异（通过指数函数），然后才归一化为一个概率分布，在应用到分类问题时，它使得各个类别的概率差异比较显着，最大值产生的概率更接近1，这样输出分布的形式更接近真实分布。

Softmax可以由三个不同的角度来解释。从不同角度来看softmax函数，可以对其应用场景有更深刻的理解：

softmax可以当作arg max的一种平滑近似，与arg max操作中暴力地选出一个最大值（产生一个one-hot向量）不同，softmax将这种输出作了一定的平滑，即将one-hot输出中最大值对应的1按输入元素值的大小分配给其他位置。
softmax将输入向量归一化映射到一个类别概率分布，即 n 个类别上的概率分布（前文也有提到）。这也是为什么在深度学习中常常将softmax作为MLP的最后一层，并配合以交叉熵损失函数（对分布间差异的一种度量）。
从概率图模型的角度来看，softmax的这种形式可以理解为一个概率无向图上的联合概率。因此你会发现，条件最大熵模型与softmax回归模型实际上是一致的，诸如这样的例子还有很多。由于概率图模型很大程度上借用了一些热力学系统的理论，因此也可以从物理系统的角度赋予softmax一定的内涵。

2.3 总结

如果模型输出为非互斥类别，且可以同时选择多个类别，则采用Sigmoid函数计算该网络的原始输出值。
如果模型输出为互斥类别，且只能选择一个类别，则采用Softmax函数计算该网络的原始输出值。
Sigmoid函数可以用来解决多标签问题，Softmax函数用来解决单标签问题。^[1]
对于某个分类场景，当Softmax函数能用时，Sigmoid函数一定可以用。

3. 二分类任务

对于二分类问题来说，理论上，两者是没有任何区别的。由于我们现在用的Pytorch、TensorFlow等框架计算矩阵方式的问题，导致两者在反向传播的过程中还是有区别的。实验结果表明，两者还是存在差异的，对于不同的分类模型，可能Sigmoid函数效果好，也可能是Softmax函数效果。

然后我们再分析为什么两者之间还存着差异（以Pytorch为例）：

首先我们要明白，当你用Sigmoid函数的时候，你的最后一层全连接层的神经元个数为1，而当你用Softmax函数的时候，你的最后一层全连接层的神经元个数是2。这个很好理解，因为Sigmoid函数只有是目标和不是目标之分，实际上只存在一类目标类，另外一个是背景类。而Softmax函数将目标分类为了二类，所以有两个神经元。这也是导致两者存在差异的主要原因。

Sigmoid函数针对两点分布提出。神经网络的输出经过它的转换，可以将数值压缩到(0,1)之间，得到的结果可以理解成分类成目标类别的概率P，而不分类到该类别的概率是(1 – P)，这也是典型的两点分布的形式。

Softmax函数本身针对多项分布提出，当类别数是2时，它退化为二项分布。而它和Sigmoid函数真正的区别就在——二项分布包含两个分类类别（姑且分别称为A和B），而两点分布其实是针对一个类别的概率分布，其对应的那个类别的分布直接由1-P得出。

简单点理解就是，Sigmoid函数，我们可以当作成它是对一个类别的“建模”，将该类别建模完成，另一个相对的类别就直接通过1减去得到。而softmax函数，是对两个类别建模，同样的，得到两个类别的概率之和是1。

神经网络在做二分类时，使用Softmax还是Sigmoid，做法其实有明显差别。由于Softmax是对两个类别（正反两类，通常定义为0/1的label）建模，所以对于NLP模型而言（比如泛BERT模型），Bert输出层需要通过一个nn.Linear()全连接层压缩至2维，然后接Softmax（Pytorch的做法，就是直接接上torch.nn.CrossEntropyLoss）；而Sigmoid只对一个类别建模（通常就是正确的那个类别），所以Bert输出层需要通过一个nn.Linear()全连接层压缩至1维，然后接Sigmoid（torch就是接torch.nn.BCEWithLogitsLoss）。

总而言之，Sotfmax和Sigmoid确实在二分类的情况下可以化为相同的数学表达形式，但并不意味着二者有一样的含义，而且二者的输入输出都是不同的。Sigmoid得到的结果是“分到正确类别的概率和未分到正确类别的概率”，Softmax得到的是“分到正确类别的概率和分到错误类别的概率”。

一种常见的错法（NLP中）：即错误地将Softmax和Sigmoid混为一谈，再把BERT输出层压缩至2维的情况下，却用Sigmoid对结果进行计算。这样我们得到的结果其意义是什么呢？

假设我们现在BERT输出层经nn.Linear()压缩后，得到一个二维的向量：

[-0.9419267177581787, 1.944047451019287]

对应类别分别是(0,1)。我们经过Sigmoid运算得到：

tensor([0.2805, 0.8748])

前者0.2805指的是分类类别为0的概率，0.8748指的是分类类别为1的概率。二者相互独立，可看作两次独立的实验（显然在这里不适用，因为0-1类别之间显然不是相互独立的两次伯努利事件）。所以显而易见的，二者加和并不等于1。

若用softmax进行计算，可得：

tensor([0.0529, 0.9471])

这里两者加和是1，才是正确的选择。

经验：

对于NLP而言，这两者之间确实有差别，Softmax的处理方式有时候会比Sigmoid的处理方式好一点。

对于CV而言，这两者之间也是有差别的，Sigmoid的处理方式有时候会比Softmax的处理方式好一点。