torch.cuda.empty_cache 清空显存

empty_cache（）不会增加Pytorch可用的GPU内存量。但是，在某些情况下，它可能有助于减少GPU记忆的碎片化。有关GPU内存管理的更多详细信息，请参见内存管理。

因为PyTorch是有缓存区的设置的，意思就是一个Tensor就算被释放了，进程也不会把空闲出来的显存还给GPU，而是等待下一个Tensor来填入这一片被释放的空间。所以我们用nvidia-smi/gpustat 看到的显存占用不会减少

用torch.cuda.empty_cache可以清空缓冲区：

在程序中加上这句会使速度变慢一些，但是有些情况下会有用，例如程序之前test的时候总是爆显存，然后在循环中加上了这句就不爆了

for i, data in enumerate(data_loader):
    torch.cuda.empty_cache()
    img_meta = data['img_meta'][0].data[0]
    img_name = img_meta[0]['filename'].split('/')[-1]
    with torch.no_grad():
        result = model(return_loss=False, rescale=not show, **data)

如果显存资源比较紧缺，可以在每个epoch开始时释放下不用的显存资源。

torch.cuda.empty_cache() # 释放显存
————————————————

Windows下使用VSCode配置OpenCV开发环境

通过使用 GitHub 上他人编译好的动态库，进行 OpenCV 环境的配置。

本博客对应 Bilibili 实操视频：https://www.bilibili.com/video/BV1BP4y1S7NX/目录

配置环境的前置知识非常多，在此一一罗列

环境变量的作用
MinGW 不同版本的差异
C/CPP 文件的编译与链接
动态链接与静态链接
OpenCV 编译后的文件夹的结构
g++ 编译命令中 -I, -L, -l 三个参数的含义
VS Code 开发 CPP 项目，生成的三个 .json 文件的作用

MinGW 安装

选择 POSIX

这里涉及到环境变量相关的知识。

来到 https://sourceforge.net/projects/mingw-w64/files/#mingw-w64-gcc-8-1-0

看到

MinGW-W64 GCC-8.1.0
- x86_64-posix-sjlj
- x86_64-posix-seh (请选择这个版本下载)
- x86_64-win32-sjlj
- x86_64-win32-seh
- i686-posix-sjlj
- i686-posix-dwarf
- i686-win32-sjlj
- i686-win32-dwarf

如果之前从未使用过 MinGW，那么请下载 posix 类别，并配置环境变量；

如果之前使用过 MinGW，那么你大概率下载的是 win32 类别，所以请重新安装并配置环境变量。

就算不编译 OpenCV 源码，要用它提供的动态链接库，也得老老实实使用 posix 那个。

保留两个 MinGW

我自己之前下载的是 win32 ，所以直接踩坑。

最后我通过修改文件夹名字的方法，把两个 MinGW 都留下来，等以后要换的时候，再改回来。

# 我自己使用的部分系统环境变量
GCC_WIN32_HOME: C:\Library\CPP\mingw64-posix
GCC_POSIX_HOME: C:\Library\CPP\mingw64-win32
Path: 
  ...
  %GCC_POSIX_HOME%\bin
  ...

关于 win32 和 posix 的区别，请参考【c/cpp 开发工具】MingGW 各版本区别及安装说明。

检验是否成功

g++ --version

C:\Users\User>g++ --versiong++ (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0Copyright (C) 2018 Free Software Foundation, Inc.This is free software; see the source for copying conditions.  There is NOwarranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

看第二行，有 posix 字样。

C/CPP 文件的编译与链接

如果你是只会写单个 .c/.cpp 文件的小白
如果你没有上过计算机组成、编译原理和操作系统这三门课
如果你没有听说过 readelf，objdump，.text/.data/.bss

我建议你先打好基础，再来配置 OpenCV，否则欲速则不达，就算手把手教会如何配置，

知其然而不知其所以然，下回配置环境还是得折腾一遍。

我先抛出一个问题：

一句简单的 printf("hello world");，在我的电脑上编译出了 .exe 文件，能直接在另一台电脑上运行？我为什么不用手动实现 printf 这个函数，别人也不用实现，这个函数到底定义在了哪里，实现在了哪里？

放两个视频：

OpenCV 文件夹结构

版本选择

使用别人编译好的 OpenCV dll 文件，保证编译后的文件能够运行。

这份文件在 GitHub 仓库上可以下载，GitHub 下载的加速有很多办法，这里用最简单的一种：

GitHub仓库快速导入Gitee及同步更新。

来到 Gitee 上别人同步好的镜像仓库，直接下载 zip 文件就行，20 MB 左右。

我使用的版本是 OpenCV 4.5.2-x64，OpenCV 4.5.5-x64 我试了不行。

4.5.5 的问题是能够生成 .exe 文件，但是无法运行。

所以为了保证成功，用 4.5.2。

文件结构

C:\LIBRARY\CPP\PACKAGES\OPENCV-MINGW-BUILD-OPENCV-4.1.0-X64├─etc # 不用管├─include # 头文件│  └─opencv2│      ├─calib3d│      ├─core│      ├─dnn│      ├─features2d│      ├─flann│      ├─... # 不一一罗列└─x64    └─mingw        ├─bin        │  └─ *.dll # 一堆 dll 文件        └─lib

首先，include 文件夹，字面意思，用来 include 的。

我们知道 CPP 是定义和实现分离的，以函数为例，通常在 .h 文件中声明，在 .cpp 文件中实现。

如果 main.cpp 里使用了其他文件中定义的函数，而我们只是想要生成 .o 文件，只需要 include 对应的 .h 文件即可。

include/opencv2 下就是各种 .h 文件。

然后是 x64/mingw/bin，这里有一堆 *.dll，这就是动态链接库文件。

将动态链接库添加到系统环境变量

为了让第三方的动态链接库生效，我们需要将动态链接库添加到系统环境变量。

对于我来说，是将 C:\Library\CPP\Packages\OpenCV-MinGW-Build-OpenCV-4.1.0-x64\x64\mingw\bin 添加到环境变量中。

这一步的作用类似于告诉系统 prinf() 这个函数的二进制文件在哪儿。

g++ 命令的参数

介绍三个参数 -I、-L 和 -l。

-I 告诉编译器，头文件里的 include<package> 去哪儿找。

-L 告诉编译器，添加一个要动态链接的目录

-l 指定具体的动态链接库的名称

具体可以参考 gcc -L -l -I -i参数。

VS Code 项目配置

我们使用 VS Code 生成的三个 .json 文件来配置 OpenCV 项目，而不是使用 cmake。

这三个文件分别是：

c_cpp_properties.json，launch.json 和 tasks.json。

c_cpp_properties.json

这个文件删了，不影响编译与链接，但是 VS Code 的 C/C++ 插件依赖于这个文件做智能提示和代码分析。

{
    "configurations": [
        {
            "name": "Win32", // 指示平台，如 Mac/Linux/Windows，实测乱填也行
            "includePath": [
                "${workspaceFolder}/**",
                "C:/Library/CPP/Packages/OpenCV-MinGW-Build-OpenCV-4.5.2-x64/include"
            ],
            "defines": [
                "_DEBUG",
                "UNICODE",
                "_UNICODE"
            ],
            "compilerPath": "C:/Library/CPP/mingw64-posix/bin/g++.exe"
        }
    ],
    "version": 4
}

includePath：告诉插件，要用的依赖在哪儿。

compilerPath: 告诉插件，编译器的路径在哪儿。

tasks.json

{
    "tasks": [
        {
            "type": "cppbuild",
            "label": "build",
            "command": "g++",
            "args": [
                "-fdiagnostics-color=always",
                "-g",
                "${file}",
                "-I",
                "C:/Library/CPP/Packages/OpenCV-MinGW-Build-OpenCV-4.5.2-x64/include",
                "-L",
                "C:/Library/CPP/Packages/OpenCV-MinGW-Build-OpenCV-4.5.2-x64/x64/mingw/bin",
                "-l",
                "libopencv_calib3d452",
                "-l",
                "libopencv_core452",
                "-l",
                "libopencv_dnn452",
                "-l",
                "libopencv_features2d452",
                "-l",
                "libopencv_flann452",
                "-l",
                "libopencv_gapi452",
                "-l",
                "libopencv_highgui452",
                "-l",
                "libopencv_imgcodecs452",
                "-l",
                "libopencv_imgproc452",
                "-l",
                "libopencv_ml452",
                "-l",
                "libopencv_objdetect452",
                "-l",
                "libopencv_photo452",
                "-l",
                "libopencv_stitching452",
                "-l",
                "libopencv_video452",
                "-l",
                "libopencv_videoio452",
                "-l",
                "opencv_videoio_ffmpeg452_64",
                "-o",
                "${fileDirname}\\${fileBasenameNoExtension}.exe"
            ],
            "options": {
                "cwd": "${fileDirname}"
            },
            "problemMatcher": [
                "$gcc"
            ],
            "group": {
                "kind": "build",
                "isDefault": true
            },
            "detail": "调试器生成的任务。"
        }
    ],
    "version": "2.0.0"
}

这个文件想要看懂，就需要 g++ 命令那些参数的相关知识了，而那些参数相关的知识，就是编译和链接。

isDefault：表示这是默认的构建任务，可以发现，-l 后面跟着的参数，就是我们下载的 OpenCV的 bin 目录下的动态链接库的文件名。

launch.json

注意 gdb 文件路径即可。

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "(gdb) Launch", // 配置名称，将会在启动配置的下拉菜单中显示   
            "type": "cppdbg", // 配置类型，这里只能为cppdbg
            "preLaunchTask": "build",
            "request": "launch", //请求配置类型，可以为launch（启动）或attach（附加）    
            "program": "${fileDirname}\\${fileBasenameNoExtension}.exe",
            // 将要进行调试的程序的路径
            "args": [], // 程序调试时传递给程序的命令行参数，一般设为空即可      
            "stopAtEntry": false, // 设为true时程序将暂停在程序入口处，一般设置为false     
            "cwd": "${fileDirname}", // 调试程序时的工作目录，一般为${workspaceRoot}即代码所在目录workspaceRoot已被弃用，现改为workspaceFolder      
            "environment": [],
            "externalConsole": false, // 调试时是否显示控制台窗口  
            "MIMode": "gdb",
            "miDebuggerPath": "C:/Library/CPP/mingw64-posix/bin/gdb.exe", // miDebugger的路径，注意这里要与MinGw的路径对应  
            "setupCommands": [
                {
                    "description": "Enable pretty-printing for gdb",
                    "text": "-enable-pretty-printing",
                    "ignoreFailures": false
                }
            ]
        }
    ]
}

更新 VS Code 环境变量

VS Code 有一个特别的设置，就是在外面更新了环境变量以后，VS Code 内部的命令行是不知道的。

这个问题参考 vscode终端不能识别系统设置的环境变量？ – 朝阳的回答 – 知乎

原因就是 VS Code 想要保存上一次关闭时候的命令行的历史记录，所以没有更新环境变量。

比如新建一个 Destop/print-hello.exe，并且添加到环境变量中，然后我们正常 Win + R, cmd，调出命令行，是可以直接运行 print-hello.exe。

但是，如果这个时候用 VS Code 打开一个项目，项目内部的命令行是不知道有这个环境变量的，它将不能 print-hello。

解决方法就是，使用一个“感知到”新的环境变量的命令行，使用指令 code <workspace> 重新打开项目，这个时候 VS Code 才会更新环境变量。

构建并运行

准备一个 test.cpp，内容如下，注意修改对应的图片地址。

#include <opencv2/opencv.hpp>
#include <opencv2/highgui.hpp>
#include <iostream>
using namespace cv;
int main()
{
    Mat img = imread("./opencv.jpeg");
    imshow("image", img);
    waitKey();
    return 0;
}

现在文件目录是这样的：

.
├── .vscode
│   ├── c_cpp_properties.json
│   ├── launch.json
│   └── tasks.json
├── test.cpp
└── opencv.jpeg

点击小三角就行。

总结

虽然是以配置 OpenCV 为引子，但是整个流程并不复杂：

OpenCV 动态链接库添加到系统环境变量
tasks.json 中填写编译参数
更新 VS Code 环境变量

至于环境变量之类的，各位老手想必是轻车熟路了。

只要做好上面这三步，就能生成并调试 .exe 文件了。

如果您有任何关于文章的建议，欢迎评论或在 GitHub 提 PR

作者：ticlab本文为作者原创，转载请在 文章开头 注明出处：https://www.cnblogs.com/ticlab/p/16817542.html

3D-Machine-Learning相关资源集合

项目地址：https://github.com/timzhang642/3D-Machine-Learning

3D Machine Learning

In recent years, tremendous amount of progress is being made in the field of 3D Machine Learning, which is an interdisciplinary field that fuses computer vision, computer graphics and machine learning. This repo is derived from my study notes and will be used as a place for triaging new research papers.

I’ll use the following icons to differentiate 3D representations:

📷 Multi-view Images
👾 Volumetric
🎲 Point Cloud
💎 Polygonal Mesh
💊 Primitive-based

To find related papers and their relationships, check out Connected Papers, which provides a neat way to visualize the academic field in a graph representation.

Get Involved

To contribute to this Repo, you may add content through pull requests or open an issue to let me know.

⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐
We have also created a Slack workplace for people around the globe to ask questions, share knowledge and facilitate collaborations. Together, I’m sure we can advance this field as a collaborative effort. Join the community with this link.
⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐

Available Courses

Stanford CS231A: Computer Vision-From 3D Reconstruction to Recognition (Winter 2018)

UCSD CSE291-I00: Machine Learning for 3D Data (Winter 2018)

Stanford CS468: Machine Learning for 3D Data (Spring 2017)

MIT 6.838: Shape Analysis (Spring 2017)

Princeton COS 526: Advanced Computer Graphics (Fall 2010)

Princeton CS597: Geometric Modeling and Analysis (Fall 2003)

Geometric Deep Learning

Paper Collection for 3D Understanding

CreativeAI: Deep Learning for Graphics

3d-photography-papers

3D机器学习相关合集：https://github.com/timzhang642/3D-Machine-Learning

A paper list of 3D photography and cinemagraph.

This list is non-exhaustive. Feel free to pull requests or create issues to add papers.

Following this repo, I use some icons to (imprecisely) differentiate the 3D representations:

🍃 Layered Depth Image
💎 Mesh
✈️ Multiplane Images
🚕 Nerf
☁️ Point Cloud
👾 Voxel

3D Photography from a Single Image

Here I include the papers for novel view synthesis with a single input image based on 3D geometry.

[ECCV 2022] InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images [paper] [project page]
[SIGGRAPH 2022] Single-View View Synthesis in the Wild with Learned Adaptive Multiplane Images [paper] [code] [project page] ✈️
[CVPR 2022] Efficient Geometry-aware 3D Generative Adversarial Networks [paper] [code] [project page]
[CVPRW 2022] Artistic Style Novel View Synthesis Based on A Single Image [paper] [code] [project page] ☁️
[CVPR 2022] 3D Photo Stylization: Learning to Generate Stylized Novel Views from a Single Image [paper] [code] [project page] ☁️
[ICCV 2021] Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image [paper] [code] [project page] 💎
[ICCV 2021] MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis [paper] [code] [project page] ✈️ 🚕
[ICCV 2021] PixelSynth: Generating a 3D-Consistent Experience from a Single Image [paper] [code] [project page] ☁️
[ICCV 2021] SLIDE: Single Image 3D Photography with Soft Layering and Depth-aware Inpainting [paper] [project page] 💎
[ICCV 2021] Video Autoencoder: self-supervised disentanglement of static 3D structure and motion [paper] [code] [project page] 👾
[ICCV 2021] Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image [paper] [code] [project page] 💎
[CVPR 2021] Layout-Guided Novel View Synthesis from a Single Indoor Panorama [paper] [dataset]
[WACV 2021] Adaptive Multiplane Image Generation from a Single Internet Picture [paper] ✈️
[CVPR 2020] Single-View View Synthesis with Multiplane Images [paper] [code] [project page] ✈️
[CVPR 2020] SynSin: End-to-end View Synthesis from a Single Image [paper] [code] [project page] ☁️
[CVPR 2020] 3D Photography using Context-aware Layered Depth Inpainting [paper] [code] [project page] 🍃
[Trans. Graph. 2020] One Shot 3D Photography [paper] [code] [project page] 🍃 💎
[Trans. Graph. 2019] 3D Ken Burns Effect from a Single Image [paper] [code] ☁️
[ICCV 2019] Monocular Neural Image-based Rendering with Continuous View Control [paper] [code]
[ECCV 2018] Layer-structured 3D Scene Inference via View Synthesis [paper] [code] [project page] 🍃
[SIGGRAPH Posters 2011] Layered Photo Pop-Up [poster] [abstract] [project page]

Binocular-Input Novel View Synthesis

Not a complete list.

[CVPR 2022] 3D Moments from Near-Duplicate Photos [paper] [code] [project page] 🍃☁️
[CVPR 2022] Stereo Magnification with Multi-Layer Images [paper] [code] [project page] ✈️💎
[ICCV 2019] Extreme View Synthesis [paper] [code]
[CVPR 2019] Pushing the Boundaries of View Extrapolation with Multiplane Images [paper] ✈️
[SIGGRAPH 2018] Stereo Magnification: Learning View Synthesis using Multiplane Images [paper] [code] [project page] ✈️

Landscape Animation

Animating landscape: running water, moving clouds, etc.

[SA 2022] Water Simulation and Rendering from a Still Photograph [paper] [project page]
[arXiv 2022] DiffDreamer: Consistent Single-view Perpetual View Generation with Conditional Diffusion Models [paper] [project page]
[arXiv 2022] Towards Smooth Video Composition [paper] [project page]
[arXiv 2022] Simulating Fluids in Real-World Still Images [paper] [code] [project page]
[CVPR 2022] Controllable Animation of Fluid Elements in Still Images [paper] [project page]
[CVPR 2022] StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2 [paper] [code] [project page]
[CVPR 2021] Animating Pictures with Eulerian Motion Fields [paper] [project page]
[MultiMedia 2021] Learning Fine-Grained Motion Embedding for Landscape Animation [paper]
[ECCV 2020] DeepLandscape: Adversarial Modeling of Landscape Videos [paper] [code] [project page]
[ECCV 2020] DTVNet: Dynamic Time-lapse Video Generation via Single Still Image [paper] [code]
[SIGGRAPH Asia 2019] Animating Landscape: Self-Supervised Learning of Decoupled Motion and Appearance for Single-Image Video Synthesis [paper] [code] [project page]
[CVPR 2018] Learning to Generate Time-lapse Videos Using Multi-stage Dynamic Generative Adversarial Networks [paper] [code] [project page]

Some Other Papers

Some other interesting papers for novel view synthesis or cinemagraph.

[arXiv 2022] Make-A-Video: Text-to-Video Generation without Text-Video Data [paper] [project page]
[ECCV 2022] SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image [paper] [code] [project page] 🚕
[CVPR 2022] Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene Video from A Single Image [paper] [code] [project page]
[ICCV 2021] Geometry-Free View Synthesis: Transformers and no 3D Priors [paper] [code] [project page]
[ICCV 2021] iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis [paper] [code] [project page]
[ICCV 2021] Learning to Stylize Novel Views [paper] [code] [project page] ☁️
[ICCV 2021] Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis [paper] [code] [project page] 🚕
[CVPR 2021] Stochastic Image-to-Video Synthesis Using cINNs [paper] [code] [project page]
[CVPR 2021] Understanding Object Dynamics for Interactive Image-to-Video Synthesis [paper] [code] [project page]
[SIGGRAPH 2021] Endless Loops: Detecting and Animating Periodic Patterns in Still Images [paper] [project page]
[ECCV 2018] Flow-Grounded Spatial-Temporal Video Prediction from Still Images [paper] [code]
[CVPR 2018] Controllable Video Generation with Sparse Trajectories [paper] [code] [project page]
[CVPR 2018] MoCoGAN: Decomposing Motion and Content for Video Generation [paper] [code]
[ICCV 2017] Personalized Cinemagraphs using Semantic Understanding and Collaborative Learning [paper]

语音信号基本概念– 采样率、采样深度和比特率

(1) 采样率/采样频率

我们经常听到的第一个术语是采样率或采样频率，两者指的是同一件事。你可能遇到过的一些数值是8kHz、44.1kHz和48kHz。究竟什么是音频文件的采样率？

采样率是指每秒钟记录的音频样本数。它是以每秒的样本或赫兹（缩写为Hz或kHz，1kHz为1000Hz）来衡量。一个音频样本只是一个数字，代表在一个特定时间点的测量声波值。非常重要的一点是，这些样本是在一秒钟内时间上相等的时刻采集的。例如，如果采样率是8000赫兹，那么在一秒钟内有8000个采样是不够的；它们必须在一秒钟的1/8000时间内准确地被采集。在这种情况下，1/8000的数字被称为采样间隔（以秒为单位），而采样率只是该间隔的乘法倒数。

采样率类似于视频的帧率或FPS（每秒帧数）测量。视频只是一系列的图片，在这里通常称为 “帧”，非常快速地背对背显示，给人以连续不间断运动或移动的错觉（至少对我们人类来说）。

虽然音频采样率和视频帧率是相似的，但在每一个中保证可用性的通常的最低数字是非常不同的。对于视频来说，为了保证运动的准确描述，每秒至少需要24帧；少于这个数字，运动可能会显得不流畅，连续不间断运动的错觉也无法保持。这一点在帧与帧之间发生的运动越多时尤其适用。此外，每秒1或2帧的视频可能会有 “瞬间 “事件，保证在帧之间被错过。

对于音频来说，要明确地表示英语语音，每秒的最小采样数是8000赫兹。由于各种原因，使用低于这个数字的采样率会导致语音无法被理解，其中一个原因是相似的话语将无法相互区分。较低的采样率会混淆音素或语言中的声音，这些声音具有显着的高频能量；例如，在5000赫兹下，很难将/s/与/sh/或/f/区分开来。

既然我们提到了视频帧，另一个值得详细说明的术语是音频帧。虽然音频样本和音频帧都是以赫兹为单位，但它们并不是一回事。一个音频帧是来自一个或多个音频通道的一个时间实例的音频样本组。

最常见的采样率值是前面提到的8kHz（最常见于电话通信）、44.1kHz（最常见于音乐CD）和48kHz（最常见于电影的音轨）。较低的采样率意味着每秒钟的采样数较少，这反过来又意味着较少的音频数据，因为有较少的采样点来表示音频的数量。采样率的选择取决于需要采集哪些声学伪影。一些声学人工制品如语音语调需要的采样率比声学人工制品如音乐CD中的音乐曲调要低。值得注意的是，更高的采样率需要更多的存储空间和处理能力来处理，尽管这在过去数字存储和处理能力是首要考虑的情况下，现在可能不是那么大的问题。

(2) 采样深度/采样精度/采样大小

除了采样率，也就是我们有多少个音频的数据点，还有采样深度。以每个样本的比特为单位，样本深度（也称为样本精度或样本大小）是音频文件或音频流的第二个重要属性，它代表了每个样本的细节水平，或 “质量”。正如我们上面提到的，每个音频样本只是一个数字，虽然有很多数字有助于表示音频，但你也需要每个单独数字的范围或 “质量 “足够大，以准确表示每个样本或数据点。“质量 “是什么意思？对于一个音频样本来说，它只是意味着该音频样本可以代表更高的振幅范围。8比特的采样深度意味着我们有2^8=256个不同的振幅，而16比特的采样深度意味着我们有2^16=65,536个不同的振幅，以此类推，采样深度更高。电话音频最常见的采样深度是16比特和32比特。在数字录音中，有越多不同的振幅，数字录音听起来就越接近原声事件。

同样，这也类似于我们可能听到的关于图像质量的8位或16位数字。对于图像或视频，图像或视频帧中的每个像素也有一定数量的比特来表示颜色。像素中的比特深度越高，产生的像素颜色就越准确，因为像素有更多的比特来 “描述 “屏幕上要表现的颜色，而且像素或图像总体上看起来更符合人们在现实生活中的样子。从技术上讲，一个像素的比特深度表明该像素可以代表多少种不同的颜色。如果你允许R、G和B中的每一个用8位数字表示，那么每个像素就用3 x 8 = 24位表示。这意味着有2^24~1700万种不同的颜色可以由该像素表示。

(3) 比特率 =（每秒的样本数）x（每个样本的比特数）

将采样率和采样深度联系在一起的是比特率，它是两者的简单乘积。由于采样率是以每秒的样本数来衡量的，而采样深度是以每个样本的比特数来衡量的，因此它是以（每秒的样本数）x（每个样本的比特数）=每秒比特数来衡量的，缩写为bps或kbps。值得注意的是，由于采样深度和比特率是相关的，它们经常被交换使用，但也是错误的。

音频中的比特率因应用而异。要求高音频质量的应用，如音乐，通常有一个更高的比特率，产生更高的质量，或 “更清晰 “的音频。电话音频，包括呼叫中心的音频，不需要高比特率，因此普通电话的比特率通常比音乐CD的比特率低得多。无论是采样率还是比特率，较低的数值可能听起来更糟糕，但同样，根据应用，较低的数值可以节省存储空间和/或处理能力。

总而言之，当涉及到音频时，压缩到底意味着什么？压缩的音频格式，如AAC或MP3，其比特率比采样率和采样深度的真正乘积小一些。这些格式是通过 “外科手术 “从比特流中去除信息来实现的，这意味着在动态情况下那些由于生物原因人耳听不到的频率或振幅不会被存储，从而导致整体文件大小变小。

ffmpeg音频处理-截取、查看、修改采样率

截取音频：

ffmpeg -i input.wav -ss 00:00:05 -t 00:00:10 output.wav

-ss为开始时间 -t为持续时间

查看音频格式：

ffprobe input.wav

修改音频文件采样率：

ffmpeg -i input.wav -ar 16000 output.wav

多通道变单通道

ffmpeg -i input.wav -ac 1 output.wav

转换格式

ffmpeg -i input.mp3 outpit.wav

提取一个通道并重采样

ffmpeg -i input.wav -ac 1 -ar 16000 output.wav

修改采样精度（位数）

ffmpeg -y -i input.wav -acodec pcm_f32le -ac 1 -ar 16000 -vn output.wav

转换音频格式

ffmpeg -i input.flac output.wav

EnCodec: High Fidelity Neural Audio Compression

GitHub：https://github.com/facebookresearch/encodec

Paper：https://arxiv.org/abs/2210.13438

高保真神经网络音频编码器:

本文介绍了meta推出的音频AI Codec，其整体风格深受Google的SoundStream的影响。在其影响下改进了原有的鉴别器，引入语言模型进一步降低码率，并提出了一种提升稳定性的训练策略。

与之前的AI Codec的动机相同，本文同样希望借助深度学习设计一款端到端多码率、立体声音频编码器，实现对语音和音乐的低码率压缩并高质量还原。
神经网络天然的抽象特征提取能力使其具有相比传统编码器更强的信号表征压缩能力，低码率的问题相对并不困难；
难点主要有两点：1. 音频的动态范围过大；2. 模型效率问题(计算复杂度和参数量)

本文贡献：

为解决音频动态范围过大的问题，使用庞大多样的训练集以及用鉴别器作为感知损失(这点似乎相比SoundStream)也并未见有什么突破；
限制在单核CPU上实时运行，并采用残差矢量量化(Residual Vector Quantization, RVQ)提高编码效率；
提出了语言模型进一步降低码率；
鉴别器采用多分辨率复数谱STFT鉴别器；
提出了一种balancer以保证GAN训练的稳定性

模型采用的基于GAN的模型，生成器采用时域编码器-量化器-解码器结构，鉴别器采用多分辨率的STFT鉴别器。

编解码器：编解码器采用SEANET，编码器由一层一维卷积对时域波形进行特征提取后经过B个用于降采样的残差单元(即convolurion blocks)，而后加入了两层LSTM用于序列建模，最后经过一层卷积得到音频的潜在表征。解码器则是编码器的镜像，其中残差单元的卷积被替换为反卷积用于上采样。根据文中采用的下采样因子(通过卷积的stride实现){2,4,5,8}，其编码器将音频下采样320倍(2x4x5x8=320)，即传输的一帧中压缩了320倍采样点，因此在采样率为24kHz时1s的音频经编码器输出的时间维数为24000/320=75，48kHz时为48000/320=150。通过卷积的Padding和调整Nomalization去设置模型是否流式。

量化器：量化器采用残差矢量量化RVQ，关于RVQ的详细介绍参看[1]和[2]。每个码书包含1024个向量(entries)，对于采样率为24kHz的音频，最多使用32个码书，即最大码率为32xlog_2(1024)/13.3=24kbps。为了支持多码率，训练过程中码书数量被设置为{2,4,8,16,32}，分别对应1.5kbps,3kbps,6kbps,12kbps,24kbps；且每个码率在训练时所使用的鉴别器是不同的。

语言模型和熵编码：此部分可选，使用Transormer语音模型对RVQ得到的索引映射到新隐藏空间的概率分布，对对应概率密度函数的累积分布函数进行Range Coder熵编码，从而进一步降低码率。

鉴别器：短时傅立叶变换（STFT）。采用多分辨率复数STFT鉴别器，而非TTS中常见的多分辨率Mel谱鉴别器，也没加Multiple-period 鉴别器MPD(消融实验显示多分辨率复数STFT鉴别器性能更优，额外引入MPD有少量性能提升，但考虑训练时长舍弃)。每个分辨率的鉴别器有二维卷积组成，结构如图所示(注意：其中正文和图中的卷积核尺寸不一致，3×8 v.s. 3×9)。

鉴别器采用hinge loss训练，为保证生成器和鉴别器训练平衡稳定，鉴别器以2/3的概率更新其参数

生成器的损失函数：包括重构损失、感知损失(实为对抗损失)以及RVQ的commitment loss三部分，重构损失包括时域和频域两部分，时域损失是波形的L1损失，，频域损失是多时间尺度的Mel谱损失，对抗损失采用hinge loss和特征匹配损失。commitment loss用于使VQ选择的向量满足量化后的变量与未量化的变量间最相近，采用欧式距离度量

数据增广策略：多数据源混合；加混响；音量标准化并随机化增益-10～6 dB；无clip

Demo界面：

1、 https://github.com/facebookresearch/encodec

2、https://ai.honu.io/papers/encodec/samples.html

torchaudio

官网：https://pytorch.org/audio/stable/torchaudio.html

Torchaudio is a library for audio and signal processing with PyTorch. It provides I/O, signal and data processing functions, datasets, model implementations and application components.

读取音频：

使用 torchaudio.load 加载音频数据。torchaudio.load 支持类路径对象和类文件对象。返回值是波形（tensor）和采样率（int）的元组。默认情况下，生成的 tensor 对象的类型为 torch.float32，其值在[−1.0,1.0][−1.0,1.0]内标准化。
waveform, sr = torchaudio.load(filepath, frame_offset=0 , num_frames=-1, normalize=True, channels_first=True)
参数：

filepath (str): 原始音频文件路径；
frame_offset (int): 在此之后开始读取，默认为0，以帧为单位；
num_frames (int): 读取的最大帧数。默认是-1，则表示从frame_offset直到末尾。如果给定文件中没有足够的帧，这个函数可能会返回实际剩余的帧数。
normalize (bool): 当为True时，该函数总是返回float32，并且所有的值被归一化到[-1,1]。如果输入文件是wav，且是整形，若为False时，则会输出int类型。需要注意的是，该参数仅对wav类型的文件起作用，默认是True；
channels_first (bool)—当为True时，返回的Tensor的维度是[channel, time]。否则，维数为[time, channel]，默认是True。
返回：

waveform (torch.Tensor): 如果输入文件是int类型的wav，且normalization为False，则waveform的数据就为int类型的，否则是float32；如果channel_first=True，则waveform.shape=[channel, time]。
sr (int): 采样率
重采样
waveform = torchaudio.transforms.Resample(orig_freq=16000, new_freq=16000)(waveform)
参数：

orig_freq (int, optional): 原始采样率，默认:16000；
new_freq (int, optional): 转换后的采样率，默认:16000；
resampling_method (str, optional) – 重采样方法，默认: ‘sinc_interpolation’；
waveform (torch.Tensor): 输入音频维度可以是[channel,time]，也可以是[time, channel]；
返回：

waveform (torch.Tensor): 输出音频维度和输入音频相同，但由于重采样了，time的数值会不同；
保存音频
torchaudio.save(filepath, src, sample_rate, channels_first)
参数：

firepath (str or pathlib.Path): 保存路径；
src (torch.Tensor): 音频数据，必须是二维的；(注：需要转到cpu下的tensor）
sample_rate(int): 采样率；
channels_first (bool): If True, 维度必须是[channel, time]，否则是[time, channel]。

The NSynth Dataset

A large-scale and high-quality dataset of annotated musical notes.（一个大规模、高质量的注释音符数据集。）

下载地址：https://magenta.tensorflow.org/datasets/nsynth#files

Motivation

Recent breakthroughs in generative modeling of images have been predicated on the availability of high-quality and large-scale datasebts such as MNIST, CIFAR and ImageNet. We recognized the need for an audio dataset that was as approachable as those in the image domain.

Audio signals found in the wild contain multi-scale dependencies that prove particularly difficult to model, leading many previous efforts at data-driven audio synthesis to focus on more constrained domains such as texture synthesis or training small parametric models.

We encourage the broader community to use NSynth as a benchmark and entry point into audio machine learning. We also view NSynth as a building block for future datasets and envision a high-quality multi-note dataset for tasks like generation and transcription that involve learning complex language-like dependencies.

Description

NSynth is an audio dataset containing 305,979 musical notes, each with a unique pitch, timbre, and envelope. For 1,006 instruments from commercial sample libraries, we generated four second, monophonic 16kHz audio snippets, referred to as notes, by ranging over every pitch of a standard MIDI pian o (21-108) as well as five different velocities (25, 50, 75, 100, 127). The note was held for the first three seconds and allowed to decay for the final second.

Some instruments are not capable of producing all 88 pitches in this range, resulting in an average of 65.4 pitches per instrument. Furthermore, the commercial sample packs occasionally contain duplicate sounds across multiple velocities, leaving an average of 4.75 unique velocities per pitch.

We also annotated each of the notes with three additional pieces of information based on a combination of human evaluation and heuristic algorithms:

Source: The method of sound production for the note’s instrument. This can be one of acoustic or electronic for instruments that were recorded from acoustic or electronic instruments, respectively, or synthetic for synthesized instruments. See their frequencies below.
Family: The high-level family of which the note’s instrument is a member. Each instrument is a member of exactly one family. See the complete list and their frequencies below.
Qualities: Sonic qualities of the note. See the quality descriptions and their co-occurrences below. Each note is annotated with zero or more qualities.

Format

Files

The NSynth dataset can be download in two formats:

TFRecord files of serialized TensorFlow Example protocol buffers with one Example proto per note.
JSON files containing non-audio features alongside 16-bit PCM WAV audio files.

The full dataset is split into three sets:

Train [tfrecord | json/wav]: A training set with 289,205 examples. Instruments do not overlap with valid or test.
Valid [tfrecord | json/wav]: A validation set with 12,678 examples. Instruments do not overlap with train.
Test [tfrecord | json/wav]: A test set with 4,096 examples. Instruments do not overlap with train.

Below we detail how the note features are encoded in the Example protocol buffers and JSON files.

Example Features

Each Example contains the following features.

Feature	Type	Description
note	`int64`	A unique integer identifier for the note.
note_str	`bytes`	A unique string identifier for the note in the format `<instrument_str>-<pitch>-<velocity>`.
instrument	`int64`	A unique, sequential identifier for the instrument the note was synthesized from.
instrument_str	`bytes`	A unique string identifier for the instrument this note was synthesized from in the format `<instrument_family_str>-<instrument_production_str>-<instrument_name>`.
pitch	`int64`	The 0-based MIDI pitch in the range [0, 127].
velocity	`int64`	The 0-based MIDI velocity in the range [0, 127].
sample_rate	`int64`	The samples per second for the `audio` feature.
audio*	`[float]`	A list of audio samples represented as floating point values in the range [-1,1].
qualities	`[int64]`	A binary vector representing which sonic qualities are present in this note.
qualities_str	`[bytes]`	A list IDs of which qualities are present in this note selected from the sonic qualities list.
instrument_family	`int64`	The index of the instrument family this instrument is a member of.
instrument_family_str	`bytes`	The ID of the instrument family this instrument is a member of.
instrument_source	`int64`	The index of the sonic source for this instrument.
instrument_source_str	`bytes`	The ID of the sonic source for this instrument.

* Note: the “audio” feature is ommited from the JSON-encoded examples since the audio data is stored separately in WAV files keyed by the “note_str”.

Python程序入口 name == ‘main’ 有重要功能（多线程）而非编程习惯

摘自：https://zhuanlan.zhihu.com/p/340965963

在Python中，被称为「程序的入口」的 if __name__ ==’__main__’: 总是出现在各种示例代码中，有一种流传广泛的错误观点是「这只是Python的一种编码习惯」。事实上程序的入口非常有用，绝非可有可无，例如在Python自带的多线程库要求必须把主进程写在 if入口内部才能正常运行。

直接写在Python最左端没有缩进的代码，在这个 *.py 文件被直接运行、或者被调用时会被执行，只有写在 if __name__ ==’__main__’: if入口内部才不会在被调用时执行。Python用这个简单的方法来判断当前的模块是被直接运行还是被调用，这是很重要的功能，如：

我们可以把不想在被调用时执行的代码放在程序入口的if内部，比如自检程序。
我们还把多线程的主线程写在程序入口的if内部。只能这么做，避免自己调用自己时重复执行主进程，下面会详细解释。

因此，初学Python时，直接把主程序写在不需要缩进的位置，完全不写 if __name__ ==’__main__’: 当然可以。一个既没有写一个被调用的库的能力，也不一定要学多进程的新手，很容易错误地认为「程序的入口」没什么用。

类似的，还有被少数人误解的还有 Python的文件头：

#!/bin/bash/python3  # 这一句话用来在代码被执行时，主动说明该选哪个路径下的编译器
#!/bin/bash/python2  # 例如这一句就选了Python2，不过2020年Python2快要完成过渡使命了

2020年底，我在写Python多进程教程时，没有搜索到合适的文章解释“程序的入口 if __name__ ==’__main__’: 与多线程的必要联系”，反而看到了很多高赞的片面回答。无奈之下只能自己写。对于少数有基础的人，下面讲程序入口与多线程部分也值得一看。

「程序的入口」是什么？

用很短的话就能解释，我认可菜鸟分析↓的回答，部分高赞答案写得啰嗦

__name__ 是当前模块名，当模块被直接运行时模块名为 __main__ 。这句话的意思就是，当模块被直接运行时，以下代码块将被运行，当模块是被导入时，代码块不被运行。

举例说明：当我在终端直接运行 python3 run1.py时，模块名被一律改为字符串__main__，当模块是被另一个 *.py程序导入（如在 *.py 中 import run1）而不是直接运行时，模块名是字符串run1。

在C语言和Java里也有类似的「程序入口」：

# Python的程序入口
if __name__ =='__main__': # 它对多进程非常重要
    # 这里是主程序

# C语言的程序入口
void main(){
    /* 这里是主程序 */
}

# Java的程序入口
public static void main(String[] args){
    // 这里是主程序
}

检验一下自己：下面的程序会print出什么东西？

# 新建一个名为【run1.py】的文件，填入下方代码
# 然后在终端输入【python3 run1.py】并运行

print(__name__, 'run1-outside')    # 它会print出【__main__ run1-outside】
if __name__ =='__main__':
    print(__name__, 'run1-inside') # 它会print出【__main__ run1-inside】

再检验一下自己：

# 新建另一个名为【run2.py】的文件，填入下方代码，并放在与【run1.py】的相同目录下，
# 然后在终端输入【python3 run2.py】并运行。用run2 调用run1

import run1  # 这一行代码调用了外部的代码 run1，它只会print出：
# 【run1 run1-outside】 # 它只print出这一行东西，并且run1.py的【__main__】变成了【run1】
# 【run1 run1-inside】  # 写在run1【if】缩进里的东西都没有被执行

print(__name__, 'run2-outside')    # 它会print出【__main__ run2-outside】
if __name__ =='__main__':
    print(__name__, 'run2-inside') # 它会print出【__main__ run2-inside】

程序入口与自检程序

当一个开发者编写一个库时（例如把它命名为 utils.py），如

# 这个库（模块）被命名为 utils.py
class C1:
   ...
def func1():
   ...

c1 = C1()  # 这是错误的做法，应该挪到 程序入口if内部
func1()    # 这是错误的做法，应该挪到 程序入口if内部
if __name__ =='__main__':
    c = C1()
    func1()

当其他人只想调用 C1 或者 func1时，他在另一个Python文件中，用 import utils 导入 utils这个库时就不会运行程序入口if内部的任何代码了。

程序入口与多线程

实现多线程时，「程序入口」这个功能不可或缺。我需要同时运行多个 fun1，或者同时运行 fun1 fun2 … 如下：

def function1(id):  # 这里是子进程
    print(f'id {id}')

def run__process():  # 这里是主进程
    from multiprocessing import Process
    process = [mp.Process(target=function1, args=(1,)),
               mp.Process(target=function1, args=(2,)), ]
    [p.start() for p in process]  # 开启了两个进程
    [p.join() for p in process]   # 等待两个进程依次结束

# run__mp()  # 主线程不建议写在 if外部。由于这里的例子很简单，你强行这么做可能不会报错
if __name__ =='__main__':
    run__mp()  # 正确做法：主线程只能写在 if内部

当我运行上面这个程序，它的【__name__ ==’__main__’】，因此它会执行【if】内的代码。这些代码会创建新的多个子进程，自己调用自己。在被调用的子进程中，它的【__name__】不等于【__main__】，因此它只会执行被主进程分配的任务（比如fun1），而不会像主进程一样通过「程序入口」再调用别的进程（行此僭越之事）。这是一个非常重要的功能，这里讲的不仅是Python，其他成熟的编程语言也能用相似的方法。

RuntimeError: context has already been set(multiprocessing) #3492 PyTorch Issue

尽管forkserver 依然不如 spawn更节省资源，但能解决问题也算不错了

由于我上面的例子过于简单（没有涉及进程通信、进程退出条件），如果你强行把主进程写在 if外部，也可能不会看到报错。这涉及很多因素，它与你使用的系统、子进程的创建方式（spwan、fork、forkserver、force=True/False）有关。我在这里只讲「程序的入口」，更多内容请移步 Compulsory usage of if __name__==“__main__” in windows while using multiprocessing – Stack Overflow ，Tim Peters 与 David Heffernan 的回答都不错。

尽管Python的多进程已经做得挺不错了，希望随着以后版本的更新，多进程与「程序入口」的依赖关系应该能得到更好的解决。

使用PyTorch CUDA multiprocessing 的时候出现的错误 UserWarning: semaphore_tracker

（写于2021-03-03）

错误如下：

multiprocessing/semaphore_tracker.py:144: 
UserWarning: semaphore_tracker: There appear to be 1 leaked semaphores to clean up at shutdown
  len(cache))

Issue with multiprocessing semaphore tracking

相同问题描述：

semaphore_tracker: There appear to be 1 leaked semaphores to clean up at shutdown len(cache)) #200

解决方案：

Issue with multiprocessing semaphore tracking – sbelharbi 的解决方案

即在运行 .py 文件前，使用以下语句修改环境参数，忽略这个Warning 带来的程序暂停

export PYTHONWARNINGS='ignore:semaphore_tracker:UserWarning'

等同于在 .py 文件内部使用：

os.environ['PYTHONWARNINGS'] = 'ignore:semaphore_tracker:UserWarning'

在Python中优雅地用多进程

摘自知乎：https://zhuanlan.zhihu.com/p/340657122

Python自带的多进程库 multiprocessing 可实现多进程。我想用这些短例子示范如何优雅地用多线程。中文网络上，有些人只是翻译了旧版的 Python官网的多进程文档。而我这篇文章会额外讲一讲下方加粗部分的内容。

创建进程 Process，fork直接继承资源，所以初始化更快，spawn只继承必要的资源，所以更省内存，「程序的入口」 if name == main
进程池 Pool，Pool只能接受一个参数，但有办法传入多个
管道通信 Pipe，最基本的功能，运行速度快
队列通信 Queue，有最常用的功能，运行速度稍慢
共享内存 Manager Value，Python3.9 新特性 真正的共享内存 shared_memory

如下所示，中文网络上一些讲Python多进程的文章，很多重要的东西没讲（毕竟只是翻译了Python官网的多进程旧版文档）。上方的加粗部分他们没讲，但是这是做多进程总需要知道的内容。

若你无法流畅阅读有专人更新的Python官网多进程英文文档，那么姑且可看写于2019不保证更新的南山南：一篇文章搞定Python多进程(全)
知我莫言：谈谈python的GIL、多线程、多进程，点赞多但旧，写于2016，你还不如看我下方写的「简述何为多线程 threading与多进程 processing」

目录（请挑选感兴趣的看，没必要全看）

多线程与多进程的区别
全局锁与多进程
子进程 Process
进程池 Pool
管道 Pipe
队列 Queue
共享内存 Manager
回答评论区的有用问题（别私信）
我为何写【在Python中优雅地用多进程】？

更新记录：第一版 2021-1-4，第二版 2021-1-8 被迫更新了一些私信问到的问题

1. 多线程与多进程的区别

多线程 threading： 一个人有与异性聊天和看剧两件事要做。单线程的她可以看完剧再去聊天，但这样子可能就没人陪她聊天了「哼，发消息不回」。我们把她看成一个CPU核心，为她开起多线程——先看一会剧，偶尔看看新消息，在两件事（线程）间来回切换。多线程：单个CPU核心可以同时做几件事，不至于卡在某一步傻等着。

用处：爬取网站信息（爬虫），等待多个用户输入

多进程 processing： 一个人有很多砖需要搬，他领取手套、推车各种物资（向系统申请了资源）然后开始搬砖。然而他身边有很多人，我们让这些人去帮他！（一核有难，八核围观）。于是他们做了分工，砖很快就搬完了。多进程让多个CPU核心可以一起做事，不至于只有一人干活而其他人傻站着。

用处：进行高性能计算。只有多进程方案设计合理，才能加速计算。

2. 全局锁与多进程

为何在Python里用多进程这么麻烦？ 因为Python的线程是操作系统线程，因此要有Python全局解释器锁。一个python解释器进程内有一条主线程，以及多条用户程序的执行线程。即使在多核CPU平台上，由于GIL的存在，所以禁止多线程的并行执行。——来自百度百科词条全局解释器锁。发展历程：

Python全局锁。Python 3.2的时候更新过GIL。在我小时候，由于Python GIL的存在（全局解释器锁 Global Interpreter Lock），此时Python无法靠自己实现多进程
外部多进程通信。Python3.5。在2015年，要么用Python调用C语言（如Numpy此类用其他语言在底层实现多进程的第三方库），要么需要在外部代码（MPI 2015）
内置多进程通信。Python 3.6 才让 multiprocessing逐渐发展成一个能用的Python内置多进程库，可以进行进程间的通信，以及有限的内存共享
共享内存。Python 3.8 在2019年增加了新特性 shared_memory

3. 子进程 Process

多进程的主进程一定要写在程序入口 if __name__ ==’__main__’: 内部

def function1(id):  # 这里是子进程
    print(f'id {id}')

def run__process():  # 这里是主进程
    from multiprocessing import Process
    process = [mp.Process(target=function1, args=(1,)),
               mp.Process(target=function1, args=(2,)), ]
    [p.start() for p in process]  # 开启了两个进程
    [p.join() for p in process]   # 等待两个进程依次结束

# run__process()  # 主线程不建议写在 if外部。由于这里的例子很简单，你强行这么做可能不会报错
if __name__ =='__main__':
    run__process()  # 正确做法：主线程只能写在 if内部

尽管在这个简单的例子里，把主进程run__process()写在程序入口if外部不会有报错。但是你最好还是按我要求去做。详细解释的内容过长，我写在→「Python程序入口有重要功能（多线程）而非编程习惯」

上面的例子只是用Process开启了多进程，不涉及进程通信。当我准备把一个串行任务编排成多进程时，我还需要多进程通信。进程池Pool可以让主程序获得子进程的计算结果（不太灵活，适合简单任务），管道Pipe 队列Queue 等等可以让进程之间进行通信（足够灵活）。共享值 Value 共享数组 Array 共享内容 shared_memory（Python 3.6 Python3.9 的新特性，还不太成熟）下面开讲。

Python多进程可以选择两种创建进程的方式，spawn 与 fork。分支创建：fork会直接复制一份自己给子进程运行，并把自己所有资源的handle 都让子进程继承，因而创建速度很快，但更占用内存资源。分产创建：spawn只会把必要的资源的handle 交给子进程，因此创建速度稍慢。详细解释请看 Stack OverFlow multiprocessing fork vs spawn 。（分产spawn 是我自己随便翻译的，有更好的翻译请推荐。我绝不把handle 翻译成句柄）

multiprocessing.set_start_method('spawn')  # default on WinOS or MacOS
multiprocessing.set_start_method('fork')   # default on Linux (UnixOS)

请注意：我说分支fork 在初始化创建多进程的时候比分产spawn 快，而不是说高性能计算会比较快。通常高性能计算需要让程序运行很久，因此为了节省内存以及进程安全，我建议选择 spawn。

4. 进程池 Pool

几乎Python多进程代码都需要你明明白白地调用Process。而进程池Pool 会自动帮我们管理子进程。Python的Pool 不方便传入多个参数，我这里提供两个解决思路：

思路1：函数 func2 需要传入多个参数，现在把它改成一个参数，无论你直接让args作为一个元组tuple、词典dict、类class都可以

思路2：使用 function.partial Passing multiple parameters to pool.map() function in Python。这个不灵活的方法固定了其他参数，且需要导入Python的内置库，我不推荐

import time

def func2(args):  # multiple parameters (arguments)
    # x, y = args
    x = args[0]  # write in this way, easier to locate errors
    y = args[1]  # write in this way, easier to locate errors

    time.sleep(1)  # pretend it is a time-consuming operation
    return x - y


def run__pool():  # main process
    from multiprocessing import Pool

    cpu_worker_num = 3
    process_args = [(1, 1), (9, 9), (4, 4), (3, 3), ]

    print(f'| inputs:  {process_args}')
    start_time = time.time()
    with Pool(cpu_worker_num) as p:
        outputs = p.map(func2, process_args)
    print(f'| outputs: {outputs}    TimeUsed: {time.time() - start_time:.1f}    \n')

    '''Another way (I don't recommend)
    Using 'functions.partial'. See https://stackoverflow.com/a/25553970/9293137
    from functools import partial
    # from functools import partial
    # pool.map(partial(f, a, b), iterable)
    '''

if __name__ =='__main__':
    run__pool()

5. 管道 Pipe

顾名思义，管道Pipe 有两端，因而 main_conn, child_conn = Pipe() ，管道的两端可以放在主进程或子进程内，我在实验中没发现主管道口main_conn 和子管道口child_conn 的区别。两端可以同时放进去东西，放进去的对象都经过了深拷贝：用 conn.send()在一端放入，用 conn.recv() 另一端取出，管道的两端可以同时给多个进程。conn是 connect的缩写。

import time

def func_pipe1(conn, p_id):
    print(p_id)

    time.sleep(0.1)
    conn.send(f'{p_id}_send1')
    print(p_id, 'send1')

    time.sleep(0.1)
    conn.send(f'{p_id}_send2')
    print(p_id, 'send2')

    time.sleep(0.1)
    rec = conn.recv()
    print(p_id, 'recv', rec)

    time.sleep(0.1)
    rec = conn.recv()
    print(p_id, 'recv', rec)


def func_pipe2(conn, p_id):
    print(p_id)

    time.sleep(0.1)
    conn.send(p_id)
    print(p_id, 'send')
    time.sleep(0.1)
    rec = conn.recv()
    print(p_id, 'recv', rec)


def run__pipe():
    from multiprocessing import Process, Pipe

    conn1, conn2 = Pipe()

    process = [Process(target=func_pipe1, args=(conn1, 'I1')),
               Process(target=func_pipe2, args=(conn2, 'I2')),
               Process(target=func_pipe2, args=(conn2, 'I3')), ]

    [p.start() for p in process]
    print('| Main', 'send')
    conn1.send(None)
    print('| Main', conn2.recv())
    [p.join() for p in process]

if __name__ =='__main__':
    run__pipe()

如果追求运行更快，那么最好使用管道Pipe而非下面介绍的队列Queue，详细请移步Python pipes and queues performance ↓

So yes, pipes are faster than queues – but only by 1.5 to 2 times, what did surprise me was that Python 3 is MUCH slower than Python 2 – most other tests I have done have been a bit up and down (as long as it is Python 3.4 – Python 3.2 seems to be a bit of a dog – especially for memory usage).

我小时候曾经用到Python多线程队列功能写过一个实际例子 ↓，若追求极致性能，还可以把里面的Queue改为Pipe。读取多个(海康\大华)网络摄像头的视频流 (使用opencv-python)，解决实时读取延迟问题392 赞同 · 281 评论文章

Pipe还有 duplex参数 和 poll() 方法 需要了解。默认情况下 duplex==True，若不开启双向管道，那么传数据的方向只能 conn1 ← conn2 。conn2.poll()==True 意味着可以马上使用 conn2.recv() 拿到传过来的数据。conn2.poll(n) 会让它等待n秒钟再进行查询。

from multiprocessing import Pipe

conn1, conn2 = Pipe(duplex=True)  # 开启双向管道，管道两端都能存取数据。默认开启
# 
conn1.send('A')
print(conn1.poll())  # 会print出 False，因为没有东西等待conn1去接收
print(conn2.poll())  # 会print出 True ，因为conn1 send 了个 'A' 等着conn2 去接收
print(conn2.recv(), conn2.poll(2))  # 会等待2秒钟再开始查询，然后print出 'A False'

尽管我下面的例子不会报错，但这是因为它过于简单，没有真的开多线程去跑，也没有写在程序入口的if内部。很多时候 Pipe运行会快一点，但是它的功能太少了，得用 Queue。最明显的一个区别是：

conn1, conn2 = multiprocessing.Pipe()  # 管道有两端，某一端放入的东西，只能在另一端拿到
queue = multiprocessing.Queue()        # 队列只有一个，放进去的东西可以在任何地方拿到。

6. 队列 Queue

可以 import queue 调用Python内置的队列，在多线程里也有队列 from multiprocessing import Queue。下面提及的都是多线程的队列。

队列Queue 的功能与前面的管道Pipe非常相似：无论主进程或子进程，都能访问到队列，放进去的对象都经过了深拷贝。不同的是：管道Pipe只有两个断开，而队列Queue 有基本的队列属性，更加灵活，详细请移步Stack Overflow Multiprocessing – Pipe vs Queue。

def func1(i):
    time.sleep(1)
    print(f'args {i}')

def run__queue():
    from multiprocessing import Process, Queue

    queue = Queue(maxsize=4)  # the following attribute can call in anywhere
    queue.put(True)
    queue.put([0, None, object])  # you can put deepcopy thing
    queue.qsize()  # the length of queue
    print(queue.get())  # First In First Out
    print(queue.get())  # First In First Out
    queue.qsize()  # the length of queue

    process = [Process(target=func1, args=(queue,)),
               Process(target=func1, args=(queue,)), ]
    [p.start() for p in process]
    [p.join() for p in process]

if __name__ =='__main__':
    run__queue()

除了上面提及的 Python多线程，读取多个(海康\大华)网络摄像头的视频流，我自己写的开源的强化学习库：小雅 ElegantRL 也使用了 Queue 进行多CPU多GPU训练，为了提速，我已经把Queue 改为 Pipe。

7. 共享内存 Manager

为了在Python里面实现多进程通信，上面提及的 Pipe Queue 把需要通信的信息从内存里深拷贝了一份给其他线程使用（需要分发的线程越多，其占用的内存越多）。而共享内存会由解释器负责维护一块共享内存（而不用深拷贝），这块内存每个进程都能读取到，读写的时候遵守管理（因此不要以为用了共享内存就一定变快）。

Manager可以创建一块共享的内存区域，但是存入其中的数据需要按照特定的格式，Value可以保存数值，Array可以保存数组，如下。这里不推荐认为自己写代码能力弱的人尝试。下面这里例子来自Python官网的Document。

# https://docs.python.org/3/library/multiprocessing.html?highlight=multiprocessing%20array#multiprocessing.Array

from multiprocessing import Process, Lock
from multiprocessing.sharedctypes import Value, Array
from ctypes import Structure, c_double

class Point(Structure):
    _fields_ = [('x', c_double), ('y', c_double)]

def modify(n, x, s, A):
    n.value **= 2
    x.value **= 2
    s.value = s.value.upper()
    for a in A:
        a.x **= 2
        a.y **= 2

if __name__ == '__main__':
    lock = Lock()

    n = Value('i', 7)
    x = Value(c_double, 1.0/3.0, lock=False)
    s = Array('c', b'hello world', lock=lock)
    A = Array(Point, [(1.875,-6.25), (-5.75,2.0), (2.375,9.5)], lock=lock)

    p = Process(target=modify, args=(n, x, s, A))
    p.start()
    p.join()

    print(n.value)
    print(x.value)
    print(s.value)
    print([(a.x, a.y) for a in A])

我删掉了Python 3.8 的shared_momery 介绍，这部分有Bug

下文来自 Stack Overflow，问题 Shared memory in multiprocessing 下thuzhf 的回答 2021-01 ：

For those interested in using Python3.8 ‘s shared_memory module, it still has a bug which hasn’t been fixed and is affecting Python3.8/3.9/3.10 by now (2021-01-15). The bug is about resource tracker destroys shared memory segments when other processes should still have valid access. So take care if you use it in your code.

PyTorch 也有自带的多进程 torch.multiprocessing

How to share a list of tensors in PyTorch multiprocessing? rozyang 的回答，非常简单，核心代码如下：

import torch.multiprocessing as mp
tensor.share_memory_()

8. 回答评论区的有用问题（不建议私信）

正文已经结束，我把部分multiprocessing的代码都放在github。希望大家能写出让自己满意的多线程。我设计高性能的多进程时，会遵守以下规则：

尽可能少传一点数据
尽可能减少主线程的负担
尽可能不让某个进程傻等着
尽可能减少进程间通信的频率

9. 我为何写【在Python中优雅地用多进程】？

我在2019年前没有在网上看到让我满意的同类文章。
我之前用多进程解决过曾伊言：读取多个(海康\大华)网络摄像头的视频流 (使用opencv-python)，解决实时读取延迟问题，尽管我不喜欢私信，但还是有人在私信和评论区讨要多线程学习资料
在2019年的Python3.9的新特性：真正的共享内存 shared_memory 做出来后，我认为现在介绍时机成熟了
我开源的强化学习库：小雅 ElegantRL 为了追求训练速度，使用了多进程实现了：multi-workers（rollout）、分开采样-训练-测试模块、未来还想要实现适合强化学习的multiGPU… … 在积累了一些Python多进程经验后，我打算分享一下。

开源的深度强化学习(DRL)算法库伯克利的Ray-project Rllib训练快，但太复杂，OpenAI的 SpinningUp简单，但不快（没有提及的开源库比不上它们，写于2020年）。刚好我又懂一点多进程、Numpy、深度学习框架、深度强化学习这些双层优化算法，所以我觉得自己也写一个DRL库难度不大，于是开源了强化学习库：小雅 ElegantRL。让别人好好看看，DRL库挺简单的一个东西弄那么复杂做什么？

尽管这个库会一直保持框架小巧、代码优雅来方便入门深度强化学习的人，但 ElegantRL 却把训练效率放在首位（正因如此，ElegantRL 与 SpinningUp的定位不同），所以我需要用Python的多进程来加速 DRL的训练。因而顺便写【在Python中优雅地用多进程】这篇东西。