生成对抗网络系列论文+pytorch实现

github链接:

https://github.com/eriklindernoren/PyTorch-GAN

This repository has gone stale as I unfortunately do not have the time to maintain it anymore. If you would like to continue the development of it as a collaborator send me an email at eriklindernoren@gmail.com.

PyTorch-GAN

Collection of PyTorch implementations of Generative Adversarial Network varieties presented in research papers. Model architectures will not always mirror the ones proposed in the papers, but I have chosen to focus on getting the core ideas covered instead of getting every layer configuration right. Contributions and suggestions of GANs to implement are very welcomed.

See also: Keras-GAN

Table of Contents

Installation

$ git clone https://github.com/eriklindernoren/PyTorch-GAN
$ cd PyTorch-GAN/
$ sudo pip3 install -r requirements.txt

Implementations

Auxiliary Classifier GAN

Auxiliary Classifier Generative Adversarial Network

Authors

Augustus Odena, Christopher Olah, Jonathon Shlens

Abstract

Synthesizing high resolution photorealistic images has been a long-standing challenge in machine learning. In this paper we introduce new methods for the improved training of generative adversarial networks (GANs) for image synthesis. We construct a variant of GANs employing label conditioning that results in 128×128 resolution image samples exhibiting global coherence. We expand on previous work for image quality assessment to provide two new analyses for assessing the discriminability and diversity of samples from class-conditional image synthesis models. These analyses demonstrate that high resolution samples provide class information not present in low resolution samples. Across 1000 ImageNet classes, 128×128 samples are more than twice as discriminable as artificially resized 32×32 samples. In addition, 84.7% of the classes have samples exhibiting diversity comparable to real ImageNet data.

[Paper] [Code]

Run Example

$ cd implementations/acgan/
$ python3 acgan.py

Adversarial Autoecoder

Adversarial Autoencoder

Authors

Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, Brendan Frey

Abstract

n this paper, we propose the “adversarial autoencoder” (AAE), which is a probabilistic autoencoder that uses the recently proposed generative adversarial networks (GAN) to perform variational inference by matching the aggregated posterior of the hidden code vector of the autoencoder with an arbitrary prior distribution. Matching the aggregated posterior to the prior ensures that generating from any part of prior space results in meaningful samples. As a result, the decoder of the adversarial autoencoder learns a deep generative model that maps the imposed prior to the data distribution. We show how the adversarial autoencoder can be used in applications such as semi-supervised classification, disentangling style and content of images, unsupervised clustering, dimensionality reduction and data visualization. We performed experiments on MNIST, Street View House Numbers and Toronto Face datasets and show that adversarial autoencoders achieve competitive results in generative modeling and semi-supervised classification tasks.

[Paper] [Code]

Run Example

$ cd implementations/aae/
$ python3 aae.py

BEGAN

BEGAN: Boundary Equilibrium Generative Adversarial Networks

Authors

David Berthelot, Thomas Schumm, Luke Metz

Abstract

We propose a new equilibrium enforcing method paired with a loss derived from the Wasserstein distance for training auto-encoder based Generative Adversarial Networks. This method balances the generator and discriminator during training. Additionally, it provides a new approximate convergence measure, fast and stable training and high visual quality. We also derive a way of controlling the trade-off between image diversity and visual quality. We focus on the image generation task, setting a new milestone in visual quality, even at higher resolutions. This is achieved while using a relatively simple model architecture and a standard training procedure.

[Paper] [Code]

Run Example

$ cd implementations/began/
$ python3 began.py

BicycleGAN

Toward Multimodal Image-to-Image Translation

Authors

Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, Eli Shechtman

Abstract

Many image-to-image translation problems are ambiguous, as a single input image may correspond to multiple possible outputs. In this work, we aim to model a \emph{distribution} of possible outputs in a conditional generative modeling setting. The ambiguity of the mapping is distilled in a low-dimensional latent vector, which can be randomly sampled at test time. A generator learns to map the given input, combined with this latent code, to the output. We explicitly encourage the connection between output and the latent code to be invertible. This helps prevent a many-to-one mapping from the latent code to the output during training, also known as the problem of mode collapse, and produces more diverse results. We explore several variants of this approach by employing different training objectives, network architectures, and methods of injecting the latent code. Our proposed method encourages bijective consistency between the latent encoding and output modes. We present a systematic comparison of our method and other variants on both perceptual realism and diversity.

[Paper] [Code]

Run Example

$ cd data/
$ bash download_pix2pix_dataset.sh edges2shoes
$ cd ../implementations/bicyclegan/
$ python3 bicyclegan.py

Various style translations by varying the latent code.​

Boundary-Seeking GAN

Boundary-Seeking Generative Adversarial Networks

Authors

R Devon Hjelm, Athul Paul Jacob, Tong Che, Adam Trischler, Kyunghyun Cho, Yoshua Bengio

Abstract

Generative adversarial networks (GANs) are a learning framework that rely on training a discriminator to estimate a measure of difference between a target and generated distributions. GANs, as normally formulated, rely on the generated samples being completely differentiable w.r.t. the generative parameters, and thus do not work for discrete data. We introduce a method for training GANs with discrete data that uses the estimated difference measure from the discriminator to compute importance weights for generated samples, thus providing a policy gradient for training the generator. The importance weights have a strong connection to the decision boundary of the discriminator, and we call our method boundary-seeking GANs (BGANs). We demonstrate the effectiveness of the proposed algorithm with discrete image and character-based natural language generation. In addition, the boundary-seeking objective extends to continuous data, which can be used to improve stability of training, and we demonstrate this on Celeba, Large-scale Scene Understanding (LSUN) bedrooms, and Imagenet without conditioning.

[Paper] [Code]

Run Example

$ cd implementations/bgan/
$ python3 bgan.py

Cluster GAN

ClusterGAN: Latent Space Clustering in Generative Adversarial Networks

Authors

Sudipto Mukherjee, Himanshu Asnani, Eugene Lin, Sreeram Kannan

Abstract

Generative Adversarial networks (GANs) have obtained remarkable success in many unsupervised learning tasks and unarguably, clustering is an important unsupervised learning problem. While one can potentially exploit the latent-space back-projection in GANs to cluster, we demonstrate that the cluster structure is not retained in the GAN latent space. In this paper, we propose ClusterGAN as a new mechanism for clustering using GANs. By sampling latent variables from a mixture of one-hot encoded variables and continuous latent variables, coupled with an inverse network (which projects the data to the latent space) trained jointly with a clustering specific loss, we are able to achieve clustering in the latent space. Our results show a remarkable phenomenon that GANs can preserve latent space interpolation across categories, even though the discriminator is never exposed to such vectors. We compare our results with various clustering baselines and demonstrate superior performance on both synthetic and real datasets.

[Paper] [Code]

Code based on a full PyTorch [implementation].

Run Example

$ cd implementations/cluster_gan/
$ python3 clustergan.py

Conditional GAN

Conditional Generative Adversarial Nets

Authors

Mehdi Mirza, Simon Osindero

Abstract

Generative Adversarial Nets [8] were recently introduced as a novel way to train generative models. In this work we introduce the conditional version of generative adversarial nets, which can be constructed by simply feeding the data, y, we wish to condition on to both the generator and discriminator. We show that this model can generate MNIST digits conditioned on class labels. We also illustrate how this model could be used to learn a multi-modal model, and provide preliminary examples of an application to image tagging in which we demonstrate how this approach can generate descriptive tags which are not part of training labels.

[Paper] [Code]

Run Example

$ cd implementations/cgan/
$ python3 cgan.py

Context-Conditional GAN

Semi-Supervised Learning with Context-Conditional Generative Adversarial Networks

Authors

Emily Denton, Sam Gross, Rob Fergus

Abstract

We introduce a simple semi-supervised learning approach for images based on in-painting using an adversarial loss. Images with random patches removed are presented to a generator whose task is to fill in the hole, based on the surrounding pixels. The in-painted images are then presented to a discriminator network that judges if they are real (unaltered training images) or not. This task acts as a regularizer for standard supervised training of the discriminator. Using our approach we are able to directly train large VGG-style networks in a semi-supervised fashion. We evaluate on STL-10 and PASCAL datasets, where our approach obtains performance comparable or superior to existing methods.

[Paper] [Code]

Run Example

$ cd implementations/ccgan/
$ python3 ccgan.py

Context Encoder

Context Encoders: Feature Learning by Inpainting

Authors

Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, Alexei A. Efros

Abstract

We present an unsupervised visual feature learning algorithm driven by context-based pixel prediction. By analogy with auto-encoders, we propose Context Encoders — a convolutional neural network trained to generate the contents of an arbitrary image region conditioned on its surroundings. In order to succeed at this task, context encoders need to both understand the content of the entire image, as well as produce a plausible hypothesis for the missing part(s). When training context encoders, we have experimented with both a standard pixel-wise reconstruction loss, as well as a reconstruction plus an adversarial loss. The latter produces much sharper results because it can better handle multiple modes in the output. We found that a context encoder learns a representation that captures not just appearance but also the semantics of visual structures. We quantitatively demonstrate the effectiveness of our learned features for CNN pre-training on classification, detection, and segmentation tasks. Furthermore, context encoders can be used for semantic inpainting tasks, either stand-alone or as initialization for non-parametric methods.

[Paper] [Code]

Run Example

$ cd implementations/context_encoder/
<follow steps at the top of context_encoder.py>
$ python3 context_encoder.py

Rows: Masked | Inpainted | Original | Masked | Inpainted | Original​

Coupled GAN

Coupled Generative Adversarial Networks

Authors

Ming-Yu Liu, Oncel Tuzel

Abstract

We propose coupled generative adversarial network (CoGAN) for learning a joint distribution of multi-domain images. In contrast to the existing approaches, which require tuples of corresponding images in different domains in the training set, CoGAN can learn a joint distribution without any tuple of corresponding images. It can learn a joint distribution with just samples drawn from the marginal distributions. This is achieved by enforcing a weight-sharing constraint that limits the network capacity and favors a joint distribution solution over a product of marginal distributions one. We apply CoGAN to several joint distribution learning tasks, including learning a joint distribution of color and depth images, and learning a joint distribution of face images with different attributes. For each task it successfully learns the joint distribution without any tuple of corresponding images. We also demonstrate its applications to domain adaptation and image transformation.

[Paper] [Code]

Run Example

$ cd implementations/cogan/
$ python3 cogan.py

Generated MNIST and MNIST-M images​

CycleGAN

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

Authors

Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros

Abstract

Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. Our goal is to learn a mapping G:X→Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping F:Y→X and introduce a cycle consistency loss to push F(G(X))≈X (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.

[Paper] [Code]

Run Example

$ cd data/
$ bash download_cyclegan_dataset.sh monet2photo
$ cd ../implementations/cyclegan/
$ python3 cyclegan.py --dataset_name monet2photo

Monet to photo translations.​

Deep Convolutional GAN

Deep Convolutional Generative Adversarial Network

Authors

Alec Radford, Luke Metz, Soumith Chintala

Abstract

In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Additionally, we use the learned features for novel tasks – demonstrating their applicability as general image representations.

[Paper] [Code]

Run Example

$ cd implementations/dcgan/
$ python3 dcgan.py

DiscoGAN

Learning to Discover Cross-Domain Relations with Generative Adversarial Networks

Authors

Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, Jiwon Kim

Abstract

While humans easily recognize relations between data from different domains without any supervision, learning to automatically discover them is in general very challenging and needs many ground-truth pairs that illustrate the relations. To avoid costly pairing, we address the task of discovering cross-domain relations given unpaired data. We propose a method based on generative adversarial networks that learns to discover relations between different domains (DiscoGAN). Using the discovered relations, our proposed network successfully transfers style from one domain to another while preserving key attributes such as orientation and face identity.

[Paper] [Code]

Run Example

$ cd data/
$ bash download_pix2pix_dataset.sh edges2shoes
$ cd ../implementations/discogan/
$ python3 discogan.py --dataset_name edges2shoes

Rows from top to bottom: (1) Real image from domain A (2) Translated image from
domain A (3) Reconstructed image from domain A (4) Real image from domain B (5)
Translated image from domain B (6) Reconstructed image from domain B​

DRAGAN

On Convergence and Stability of GANs

Authors

Naveen Kodali, Jacob Abernethy, James Hays, Zsolt Kira

Abstract

We propose studying GAN training dynamics as regret minimization, which is in contrast to the popular view that there is consistent minimization of a divergence between real and generated distributions. We analyze the convergence of GAN training from this new point of view to understand why mode collapse happens. We hypothesize the existence of undesirable local equilibria in this non-convex game to be responsible for mode collapse. We observe that these local equilibria often exhibit sharp gradients of the discriminator function around some real data points. We demonstrate that these degenerate local equilibria can be avoided with a gradient penalty scheme called DRAGAN. We show that DRAGAN enables faster training, achieves improved stability with fewer mode collapses, and leads to generator networks with better modeling performance across a variety of architectures and objective functions.

[Paper] [Code]

Run Example

$ cd implementations/dragan/
$ python3 dragan.py

DualGAN

DualGAN: Unsupervised Dual Learning for Image-to-Image Translation

Authors

Zili Yi, Hao Zhang, Ping Tan, Minglun Gong

Abstract

Conditional Generative Adversarial Networks (GANs) for cross-domain image-to-image translation have made much progress recently. Depending on the task complexity, thousands to millions of labeled image pairs are needed to train a conditional GAN. However, human labeling is expensive, even impractical, and large quantities of data may not always be available. Inspired by dual learning from natural language translation, we develop a novel dual-GAN mechanism, which enables image translators to be trained from two sets of unlabeled images from two domains. In our architecture, the primal GAN learns to translate images from domain U to those in domain V, while the dual GAN learns to invert the task. The closed loop made by the primal and dual tasks allows images from either domain to be translated and then reconstructed. Hence a loss function that accounts for the reconstruction error of images can be used to train the translators. Experiments on multiple image translation tasks with unlabeled data show considerable performance gain of DualGAN over a single GAN. For some tasks, DualGAN can even achieve comparable or slightly better results than conditional GAN trained on fully labeled data.

[Paper] [Code]

Run Example

$ cd data/
$ bash download_pix2pix_dataset.sh facades
$ cd ../implementations/dualgan/
$ python3 dualgan.py --dataset_name facades

Energy-Based GAN

Energy-based Generative Adversarial Network

Authors

Junbo Zhao, Michael Mathieu, Yann LeCun

Abstract

We introduce the “Energy-based Generative Adversarial Network” model (EBGAN) which views the discriminator as an energy function that attributes low energies to the regions near the data manifold and higher energies to other regions. Similar to the probabilistic GANs, a generator is seen as being trained to produce contrastive samples with minimal energies, while the discriminator is trained to assign high energies to these generated samples. Viewing the discriminator as an energy function allows to use a wide variety of architectures and loss functionals in addition to the usual binary classifier with logistic output. Among them, we show one instantiation of EBGAN framework as using an auto-encoder architecture, with the energy being the reconstruction error, in place of the discriminator. We show that this form of EBGAN exhibits more stable behavior than regular GANs during training. We also show that a single-scale architecture can be trained to generate high-resolution images.

[Paper] [Code]

Run Example

$ cd implementations/ebgan/
$ python3 ebgan.py

Enhanced Super-Resolution GAN

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

Authors

Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Chen Change Loy, Yu Qiao, Xiaoou Tang

Abstract

The Super-Resolution Generative Adversarial Network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied with unpleasant artifacts. To further enhance the visual quality, we thoroughly study three key components of SRGAN – network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN). In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. Moreover, we borrow the idea from relativistic GAN to let the discriminator predict relative realness instead of the absolute value. Finally, we improve the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery. Benefiting from these improvements, the proposed ESRGAN achieves consistently better visual quality with more realistic and natural textures than SRGAN and won the first place in the PIRM2018-SR Challenge. The code is available at this https URL.

[Paper] [Code]

Run Example

$ cd implementations/esrgan/
<follow steps at the top of esrgan.py>
$ python3 esrgan.py

Nearest Neighbor Upsampling | ESRGAN​

GAN

Generative Adversarial Network

Authors

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio

Abstract

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

[Paper] [Code]

Run Example

$ cd implementations/gan/
$ python3 gan.py

InfoGAN

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Authors

Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel

Abstract

This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. We derive a lower bound to the mutual information objective that can be optimized efficiently, and show that our training procedure can be interpreted as a variation of the Wake-Sleep algorithm. Specifically, InfoGAN successfully disentangles writing styles from digit shapes on the MNIST dataset, pose from lighting of 3D rendered images, and background digits from the central digit on the SVHN dataset. It also discovers visual concepts that include hair styles, presence/absence of eyeglasses, and emotions on the CelebA face dataset. Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods.

[Paper] [Code]

Run Example

$ cd implementations/infogan/
$ python3 infogan.py

Result of varying categorical latent variable by column.​

Result of varying continuous latent variable by row.​

Least Squares GAN

Least Squares Generative Adversarial Networks

Authors

Xudong Mao, Qing Li, Haoran Xie, Raymond Y.K. Lau, Zhen Wang, Stephen Paul Smolley

Abstract

Unsupervised learning with generative adversarial networks (GANs) has proven hugely successful. Regular GANs hypothesize the discriminator as a classifier with the sigmoid cross entropy loss function. However, we found that this loss function may lead to the vanishing gradients problem during the learning process. To overcome such a problem, we propose in this paper the Least Squares Generative Adversarial Networks (LSGANs) which adopt the least squares loss function for the discriminator. We show that minimizing the objective function of LSGAN yields minimizing the Pearson χ2 divergence. There are two benefits of LSGANs over regular GANs. First, LSGANs are able to generate higher quality images than regular GANs. Second, LSGANs perform more stable during the learning process. We evaluate LSGANs on five scene datasets and the experimental results show that the images generated by LSGANs are of better quality than the ones generated by regular GANs. We also conduct two comparison experiments between LSGANs and regular GANs to illustrate the stability of LSGANs.

[Paper] [Code]

Run Example

$ cd implementations/lsgan/
$ python3 lsgan.py

MUNIT

Multimodal Unsupervised Image-to-Image Translation

Authors

Xun Huang, Ming-Yu Liu, Serge Belongie, Jan Kautz

Abstract

Unsupervised image-to-image translation is an important and challenging problem in computer vision. Given an image in the source domain, the goal is to learn the conditional distribution of corresponding images in the target domain, without seeing any pairs of corresponding images. While this conditional distribution is inherently multimodal, existing approaches make an overly simplified assumption, modeling it as a deterministic one-to-one mapping. As a result, they fail to generate diverse outputs from a given source domain image. To address this limitation, we propose a Multimodal Unsupervised Image-to-image Translation (MUNIT) framework. We assume that the image representation can be decomposed into a content code that is domain-invariant, and a style code that captures domain-specific properties. To translate an image to another domain, we recombine its content code with a random style code sampled from the style space of the target domain. We analyze the proposed framework and establish several theoretical results. Extensive experiments with comparisons to the state-of-the-art approaches further demonstrates the advantage of the proposed framework. Moreover, our framework allows users to control the style of translation outputs by providing an example style image. Code and pretrained models are available at this https URL

[Paper] [Code]

Run Example

$ cd data/
$ bash download_pix2pix_dataset.sh edges2shoes
$ cd ../implementations/munit/
$ python3 munit.py --dataset_name edges2shoes

Results by varying the style code.​

Pix2Pix

Unpaired Image-to-Image Translation with Conditional Adversarial Networks

Authors

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros

Abstract

We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.

[Paper] [Code]

Run Example

$ cd data/
$ bash download_pix2pix_dataset.sh facades
$ cd ../implementations/pix2pix/
$ python3 pix2pix.py --dataset_name facades

Rows from top to bottom: (1) The condition for the generator (2) Generated image
based of condition (3) The true corresponding image to the condition​

PixelDA

Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks

Authors

Konstantinos Bousmalis, Nathan Silberman, David Dohan, Dumitru Erhan, Dilip Krishnan

Abstract

Collecting well-annotated image datasets to train modern machine learning algorithms is prohibitively expensive for many tasks. One appealing alternative is rendering synthetic data where ground-truth annotations are generated automatically. Unfortunately, models trained purely on rendered images often fail to generalize to real images. To address this shortcoming, prior work introduced unsupervised domain adaptation algorithms that attempt to map representations between the two domains or learn to extract features that are domain-invariant. In this work, we present a new approach that learns, in an unsupervised manner, a transformation in the pixel space from one domain to the other. Our generative adversarial network (GAN)-based method adapts source-domain images to appear as if drawn from the target domain. Our approach not only produces plausible samples, but also outperforms the state-of-the-art on a number of unsupervised domain adaptation scenarios by large margins. Finally, we demonstrate that the adaptation process generalizes to object classes unseen during training.

[Paper] [Code]

MNIST to MNIST-M Classification

Trains a classifier on images that have been translated from the source domain (MNIST) to the target domain (MNIST-M) using the annotations of the source domain images. The classification network is trained jointly with the generator network to optimize the generator for both providing a proper domain translation and also for preserving the semantics of the source domain image. The classification network trained on translated images is compared to the naive solution of training a classifier on MNIST and evaluating it on MNIST-M. The naive model manages a 55% classification accuracy on MNIST-M while the one trained during domain adaptation achieves a 95% classification accuracy.

$ cd implementations/pixelda/
$ python3 pixelda.py
MethodAccuracy
Naive55%
PixelDA95%

Rows from top to bottom: (1) Real images from MNIST (2) Translated images from
MNIST to MNIST-M (3) Examples of images from MNIST-M​

Relativistic GAN

The relativistic discriminator: a key element missing from standard GAN

Authors

Alexia Jolicoeur-Martineau

Abstract

In standard generative adversarial network (SGAN), the discriminator estimates the probability that the input data is real. The generator is trained to increase the probability that fake data is real. We argue that it should also simultaneously decrease the probability that real data is real because 1) this would account for a priori knowledge that half of the data in the mini-batch is fake, 2) this would be observed with divergence minimization, and 3) in optimal settings, SGAN would be equivalent to integral probability metric (IPM) GANs. We show that this property can be induced by using a relativistic discriminator which estimate the probability that the given real data is more realistic than a randomly sampled fake data. We also present a variant in which the discriminator estimate the probability that the given real data is more realistic than fake data, on average. We generalize both approaches to non-standard GAN loss functions and we refer to them respectively as Relativistic GANs (RGANs) and Relativistic average GANs (RaGANs). We show that IPM-based GANs are a subset of RGANs which use the identity function. Empirically, we observe that 1) RGANs and RaGANs are significantly more stable and generate higher quality data samples than their non-relativistic counterparts, 2) Standard RaGAN with gradient penalty generate data of better quality than WGAN-GP while only requiring a single discriminator update per generator update (reducing the time taken for reaching the state-of-the-art by 400%), and 3) RaGANs are able to generate plausible high resolutions images (256×256) from a very small sample (N=2011), while GAN and LSGAN cannot; these images are of significantly better quality than the ones generated by WGAN-GP and SGAN with spectral normalization.

[Paper] [Code]

Run Example

$ cd implementations/relativistic_gan/
$ python3 relativistic_gan.py # Relativistic Standard GAN
$ python3 relativistic_gan.py --rel_avg_gan # Relativistic Average GAN

Semi-Supervised GAN

Semi-Supervised Generative Adversarial Network

Authors

Augustus Odena

Abstract

We extend Generative Adversarial Networks (GANs) to the semi-supervised context by forcing the discriminator network to output class labels. We train a generative model G and a discriminator D on a dataset with inputs belonging to one of N classes. At training time, D is made to predict which of N+1 classes the input belongs to, where an extra class is added to correspond to the outputs of G. We show that this method can be used to create a more data-efficient classifier and that it allows for generating higher quality samples than a regular GAN.

[Paper] [Code]

Run Example

$ cd implementations/sgan/
$ python3 sgan.py

Softmax GAN

Softmax GAN

Authors

Min Lin

Abstract

Softmax GAN is a novel variant of Generative Adversarial Network (GAN). The key idea of Softmax GAN is to replace the classification loss in the original GAN with a softmax cross-entropy loss in the sample space of one single batch. In the adversarial learning of N real training samples and M generated samples, the target of discriminator training is to distribute all the probability mass to the real samples, each with probability 1M, and distribute zero probability to generated data. In the generator training phase, the target is to assign equal probability to all data points in the batch, each with probability 1M+N. While the original GAN is closely related to Noise Contrastive Estimation (NCE), we show that Softmax GAN is the Importance Sampling version of GAN. We futher demonstrate with experiments that this simple change stabilizes GAN training.

[Paper] [Code]

Run Example

$ cd implementations/softmax_gan/
$ python3 softmax_gan.py

StarGAN

StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

Authors

Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, Jaegul Choo

Abstract

Recent studies have shown remarkable success in image-to-image translation for two domains. However, existing approaches have limited scalability and robustness in handling more than two domains, since different models should be built independently for every pair of image domains. To address this limitation, we propose StarGAN, a novel and scalable approach that can perform image-to-image translations for multiple domains using only a single model. Such a unified model architecture of StarGAN allows simultaneous training of multiple datasets with different domains within a single network. This leads to StarGAN’s superior quality of translated images compared to existing models as well as the novel capability of flexibly translating an input image to any desired target domain. We empirically demonstrate the effectiveness of our approach on a facial attribute transfer and a facial expression synthesis tasks.

[Paper] [Code]

Run Example

$ cd implementations/stargan/
<follow steps at the top of stargan.py>
$ python3 stargan.py

Original | Black Hair | Blonde Hair | Brown Hair | Gender Flip | Aged​

Super-Resolution GAN

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

Authors

Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi

Abstract

Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image super-resolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4x upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method.

[Paper] [Code]

Run Example

$ cd implementations/srgan/
<follow steps at the top of srgan.py>
$ python3 srgan.py

Nearest Neighbor Upsampling | SRGAN​

UNIT

Unsupervised Image-to-Image Translation Networks

Authors

Ming-Yu Liu, Thomas Breuel, Jan Kautz

Abstract

Unsupervised image-to-image translation aims at learning a joint distribution of images in different domains by using images from the marginal distributions in individual domains. Since there exists an infinite set of joint distributions that can arrive the given marginal distributions, one could infer nothing about the joint distribution from the marginal distributions without additional assumptions. To address the problem, we make a shared-latent space assumption and propose an unsupervised image-to-image translation framework based on Coupled GANs. We compare the proposed framework with competing approaches and present high quality image translation results on various challenging unsupervised image translation tasks, including street scene image translation, animal image translation, and face image translation. We also apply the proposed framework to domain adaptation and achieve state-of-the-art performance on benchmark datasets. Code and additional results are available in this https URL.

[Paper] [Code]

Run Example

$ cd data/
$ bash download_cyclegan_dataset.sh apple2orange
$ cd implementations/unit/
$ python3 unit.py --dataset_name apple2orange

Wasserstein GAN

Wasserstein GAN

Authors

Martin Arjovsky, Soumith Chintala, Léon Bottou

Abstract

We introduce a new algorithm named WGAN, an alternative to traditional GAN training. In this new model, we show that we can improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches. Furthermore, we show that the corresponding optimization problem is sound, and provide extensive theoretical work highlighting the deep connections to other distances between distributions.

[Paper] [Code]

Run Example

$ cd implementations/wgan/
$ python3 wgan.py

Wasserstein GAN GP

Improved Training of Wasserstein GANs

Authors

Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville

Abstract

Generative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserstein GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only low-quality samples or fail to converge. We find that these problems are often due to the use of weight clipping in WGAN to enforce a Lipschitz constraint on the critic, which can lead to undesired behavior. We propose an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input. Our proposed method performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning, including 101-layer ResNets and language models over discrete data. We also achieve high quality generations on CIFAR-10 and LSUN bedrooms.

[Paper] [Code]

Run Example

$ cd implementations/wgan_gp/
$ python3 wgan_gp.py

Wasserstein GAN DIV

Wasserstein Divergence for GANs

Authors

Jiqing Wu, Zhiwu Huang, Janine Thoma, Dinesh Acharya, Luc Van Gool

Abstract

In many domains of computer vision, generative adversarial networks (GANs) have achieved great success, among which the fam- ily of Wasserstein GANs (WGANs) is considered to be state-of-the-art due to the theoretical contributions and competitive qualitative performance. However, it is very challenging to approximate the k-Lipschitz constraint required by the Wasserstein-1 metric (W-met). In this paper, we propose a novel Wasserstein divergence (W-div), which is a relaxed version of W-met and does not require the k-Lipschitz constraint.As a concrete application, we introduce a Wasserstein divergence objective for GANs (WGAN-div), which can faithfully approximate W-div through optimization. Under various settings, including progressive growing training, we demonstrate the stability of the proposed WGAN-div owing to its theoretical and practical advantages over WGANs. Also, we study the quantitative and visual performance of WGAN-div on standard image synthesis benchmarks, showing the superior performance of WGAN-div compared to the state-of-the-art methods.

[Paper] [Code]

Run Example

$ cd implementations/wgan_div/
$ python3 wgan_div.py

命名张量

张量命名是一个非常有用的方法,这样可以方便地使用维度的名字来做索引或其他操作,大大提高了可读性、易用性,防止出错。

命名的对象是张量的维度:

NCHW = [‘N’, ‘C’, ‘H’, ‘W’]
#张量命名
images = torch.randn(32, 3, 56, 56, names=NCHW)
images.sum('C')
images.select('C', index=0)
# 也可以这么设置
tensor = torch.rand(3,4,1,2,names=('C', 'N', 'H', 'W'))
# 使用align_to可以对维度方便地排序
tensor = tensor.align_to('N', 'C', 'H', 'W')

命名张量允许用户为张量维度指定明确的名称。在大多数情况下,采用维度参数的操作将接受维度名称,从而无需按位置跟踪维度。此外,命名张量使用名称来自动检查 API 在运行时是否被正确使用,从而提供额外的安全性。名称还可用于重新排列维度,例如,支持“按名称广播”而不是“按位置广播”。

使用方法:对于任意张量,采用一个 names 参数,参数值是一个字符类型列表or元组,值表示维度名称,将名称与每个维度相关联。

>>> torch.zeros(2, 3, names=('N', 'C'))
tensor([[0., 0., 0.],
        [0., 0., 0.]], names=('N', 'C'))


<!-- wp:preformatted -->
<pre id="codecell0" class="wp-block-preformatted">>>> <strong>torch.zeros(</strong>2<strong>,</strong> 3<strong>,</strong> <strong>names=(</strong>'N'<strong>,</strong> 'C'<strong>))</strong>
tensor([[0., 0., 0.],
        [0., 0., 0.]], names=('N', 'C'))


#获得维度的名称为i的tensor
tensor.names[i]#获得为维度的名称i的tensor

以下函数支持命名张量:

使用tensor.names访问张量的维度名称和 tensor.rename()重命名命名:

>>> imgs = torch.randn(1, 2, 2, 3 , names=('N', 'C', 'H', 'W'))
>>> imgs.names
('N', 'C', 'H', 'W')

>>> renamed_imgs = imgs.rename(H='height', W='width')
>>> renamed_imgs.names
('N', 'C', 'height', 'width)

命名张量可以与未命名张量共存;命名张量是 的实例 torch.Tensor。未命名张量具有命名None维度。命名张量不需要命名所有维度。

>>> imgs = torch.randn(1, 2, 2, 3 , names=(None, 'C', 'H', 'W'))
>>> imgs.names
(None, 'C', 'H', 'W')

名称传播语义

命名张量使用名称来自动检查 API 在运行时是否被正确调用。这发生在一个名为name inference的过程中。更正式地说,名称推断包括以下两个步骤:

  • 检查名称:操作员可以在运行时执行自动检查,以检查某些维度名称是否必须匹配。
  • 传播名称:名称推断将名称传播到输出张量。

所有支持命名张量的操作都会传播名称。

>>> x = torch.randn(3, 3, names=('N', 'C'))
>>> x.abs().names
('N', 'C')

官方文档地址: https://pytorch.org/docs/stable/named_tensor.html

PyTorch常用代码段合集

挺有用的,保留下来,撸代码的时候参考备用。

作者丨Jack Stark@知乎

来源丨https://zhuanlan.zhihu.com/p/104019160

本文是PyTorch常用代码段合集,涵盖基本配置、张量处理、模型定义与操作、数据处理、模型训练与测试等5个方面,还给出了多个值得注意的Tips,内容非常全面。

PyTorch最好的资料是官方文档。本文是PyTorch常用代码段,在参考资料[1](张皓:PyTorch Cookbook)的基础上做了一些修补,方便使用时查阅。

1. 基本配置

导入包和版本查询

import torch
import torch.nn as nn
import torchvision
print(torch.__version__)
print(torch.version.cuda)
print(torch.backends.cudnn.version())
print(torch.cuda.get_device_name(0))

可复现性

在硬件设备(CPU、GPU)不同时,完全的可复现性无法保证,即使随机种子相同。但是,在同一个设备上,应该保证可复现性。具体做法是,在程序开始的时候固定torch的随机种子,同时也把numpy的随机种子固定。

np.random.seed(0)
torch.manual_seed(0)
torch.cuda.manual_seed_all(0)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

显卡设置

如果只需要一张显卡

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

如果需要指定多张显卡,比如0,1号显卡。

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'

也可以在命令行运行代码时设置显卡:

CUDA_VISIBLE_DEVICES=0,1 python train.py

清除显存

torch.cuda.empty_cache()

也可以使用在命令行重置GPU的指令

nvidia-smi --gpu-reset -i [gpu_id]

2. 张量(Tensor)处理

张量的数据类型

PyTorch有9种CPU张量类型和9种GPU张量类型。

张量基本信息

tensor = torch.randn(3,4,5)
print(tensor.type())  # 数据类型
print(tensor.size())  # 张量的shape,是个元组
print(tensor.dim())   # 维度的数量

命名张量

张量命名是一个非常有用的方法,这样可以方便地使用维度的名字来做索引或其他操作,大大提高了可读性、易用性,防止出错。、

# 在PyTorch 1.3之前,需要使用注释
# Tensor[N, C, H, W]
images = torch.randn(32, 3, 56, 56)
images.sum(dim=1)
images.select(dim=1, index=0)

# PyTorch 1.3之后
NCHW = [‘N’, ‘C’, ‘H’, ‘W’]
images = torch.randn(32, 3, 56, 56, names=NCHW)
images.sum('C')
images.select('C', index=0)
# 也可以这么设置
tensor = torch.rand(3,4,1,2,names=('C', 'N', 'H', 'W'))
# 使用align_to可以对维度方便地排序
tensor = tensor.align_to('N', 'C', 'H', 'W')

数据类型转换


# 设置默认类型,pytorch中的FloatTensor远远快于DoubleTensor
torch.set_default_tensor_type(torch.FloatTensor)

# 类型转换
tensor = tensor.cuda()
tensor = tensor.cpu()
tensor = tensor.float()
tensor = tensor.long()

torch.Tensor与np.ndarray转换

除了CharTensor,其他所有CPU上的张量都支持转换为numpy格式然后再转换回来。

ndarray = tensor.cpu().numpy()
tensor = torch.from_numpy(ndarray).float()
tensor = torch.from_numpy(ndarray.copy()).float() # If ndarray has negative stride.

Torch.tensor与PIL.Image转换

# pytorch中的张量默认采用[N, C, H, W]的顺序,并且数据范围在[0,1],需要进行转置和规范化
# torch.Tensor -> PIL.Image
image = PIL.Image.fromarray(torch.clamp(tensor*255, min=0, max=255).byte().permute(1,2,0).cpu().numpy())
image = torchvision.transforms.functional.to_pil_image(tensor)  # Equivalently way

# PIL.Image -> torch.Tensor
path = r'./figure.jpg'
tensor = torch.from_numpy(np.asarray(PIL.Image.open(path))).permute(2,0,1).float() / 255
tensor = torchvision.transforms.functional.to_tensor(PIL.Image.open(path)) # Equivalently way

np.ndarray与PIL.Image的转换

image = PIL.Image.fromarray(ndarray.astype(np.uint8))

ndarray = np.asarray(PIL.Image.open(path))

从只包含一个元素的张量中提取值


value = torch.rand(1).item()

item() → number  将单个元素的tensor数值转换为普通的python数值。
Returns the value of this tensor as a standard Python number. This only works for tensors with one element. For other cases, see tolist().

This operation is not differentiable.

Example:

>>> x = torch.tensor([1.0])
>>> x.item()
1.0

torch向量转python list :tolist()

tolist() -> list or number

Returns the tensor as a (nested) list. For scalars, a standard Python number is returned, just like with item(). Tensors are automatically moved to the CPU first if necessary.

This operation is not differentiable.

Examples:

>>> a = torch.randn(2, 2)
>>> a.tolist()
[[0.012766935862600803, 0.5415473580360413],
 [-0.08909505605697632, 0.7729271650314331]]
>>> a[0,0].tolist()
0.012766935862600803

张量形变

# 在将卷积层输入全连接层的情况下通常需要对张量做形变处理,
# 相比torch.view,torch.reshape可以自动处理输入张量不连续的情况。
tensor = torch.rand(2,3,4)
shape = (6, 4)
tensor = torch.reshape(tensor, shape)

打乱顺序

tensor = tensor[torch.randperm(tensor.size(0))]  # 打乱第一个维度

水平翻转

# pytorch不支持tensor[::-1]这样的负步长操作,水平翻转可以通过张量索引实现
# 假设张量的维度为[N, D, H, W].
tensor = tensor[:,:,:,torch.arange(tensor.size(3) - 1, -1, -1).long()]

复制张量

# Operation                 |  New/Shared memory | Still in computation graph |
tensor.clone()            # |        New         |          Yes               |
tensor.detach()           # |      Shared        |          No                |
tensor.detach.clone()()   # |        New         |          No                |

张量拼接


'''
注意torch.cat和torch.stack的区别在于torch.cat沿着给定的维度拼接,
而torch.stack会新增一维。例如当参数是3个10x5的张量,torch.cat的结果是30x5的张量,
而torch.stack的结果是3x10x5的张量。
'''
tensor = torch.cat(list_of_tensors, dim=0)
tensor = torch.stack(list_of_tensors, dim=0)

将整数标签转为one-hot编码


# pytorch的标记默认从0开始
tensor = torch.tensor([0, 2, 1, 3])
N = tensor.size(0)
num_classes = 4
one_hot = torch.zeros(N, num_classes).long()
one_hot.scatter_(dim=1, index=torch.unsqueeze(tensor, dim=1), src=torch.ones(N, num_classes).long())

得到非零元素

torch.nonzero(tensor)               # index of non-zero elements
torch.nonzero(tensor==0)            # index of zero elements
torch.nonzero(tensor).size(0)       # number of non-zero elements
torch.nonzero(tensor == 0).size(0)  # number of zero elements

判断两个张量相等
torch.allclose(tensor1, tensor2)  # float tensor
torch.equal(tensor1, tensor2)     # int tensor
张量扩展
# Expand tensor of shape 64*512 to shape 64*512*7*7.
tensor = torch.rand(64,512)
torch.reshape(tensor, (64, 512, 1, 1)).expand(64, 512, 7, 7)
矩阵乘法
# Matrix multiplcation: (m*n) * (n*p) * -> (m*p).
result = torch.mm(tensor1, tensor2)

# Batch matrix multiplication: (b*m*n) * (b*n*p) -> (b*m*p)
result = torch.bmm(tensor1, tensor2)

# Element-wise multiplication.
result = tensor1 * tensor2

计算两组数据之间的两两欧式距离
利用broadcast机制
dist = torch.sqrt(torch.sum((X1[:,None,:] - X2) ** 2, dim=2))

3. 模型定义和操作

一个简单两层卷积网络的示例


# convolutional neural network (2 convolutional layers)
class ConvNet(nn.Module):
    def __init__(self, num_classes=10):
        super(ConvNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.fc = nn.Linear(7*7*32, num_classes)

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        return out


model = ConvNet(num_classes).to(device)

卷积层的计算和展示可以用这个网站辅助。

双线性汇合(bilinear pooling)

X = torch.reshape(N, D, H * W)                        # Assume X has shape N*D*H*W
X = torch.bmm(X, torch.transpose(X, 1, 2)) / (H * W)  # Bilinear pooling
assert X.size() == (N, D, D)
X = torch.reshape(X, (N, D * D))
X = torch.sign(X) * torch.sqrt(torch.abs(X) + 1e-5)   # Signed-sqrt normalization
X = torch.nn.functional.normalize(X)                  # L2 normalization

多卡同步 BN(Batch normalization)

当使用 torch.nn.DataParallel 将代码运行在多张 GPU 卡上时,PyTorch 的 BN 层默认操作是各卡上数据独立地计算均值和标准差,同步 BN 使用所有卡上的数据一起计算 BN 层的均值和标准差,缓解了当批量大小(batch size)比较小时对均值和标准差估计不准的情况,是在目标检测等任务中一个有效的提升性能的技巧。

sync_bn = torch.nn.SyncBatchNorm(num_features, eps=1e-05, momentum=0.1, affine=True, 
                                 track_running_stats=True)

将已有网络的所有BN层改为同步BN层

def convertBNtoSyncBN(module, process_group=None):
    '''Recursively replace all BN layers to SyncBN layer.

    Args:
        module[torch.nn.Module]. Network
    '''
    if isinstance(module, torch.nn.modules.batchnorm._BatchNorm):
        sync_bn = torch.nn.SyncBatchNorm(module.num_features, module.eps, module.momentum, 
                                         module.affine, module.track_running_stats, process_group)
        sync_bn.running_mean = module.running_mean
        sync_bn.running_var = module.running_var
        if module.affine:
            sync_bn.weight = module.weight.clone().detach()
            sync_bn.bias = module.bias.clone().detach()
        return sync_bn
    else:
        for name, child_module in module.named_children():
            setattr(module, name) = convert_syncbn_model(child_module, process_group=process_group))
        return module

类似 BN 滑动平均

如果要实现类似 BN 滑动平均的操作,在 forward 函数中要使用原地(inplace)操作给滑动平均赋值。

class BN(torch.nn.Module)
    def __init__(self):
        ...
        self.register_buffer('running_mean', torch.zeros(num_features))

    def forward(self, X):
        ...
        self.running_mean += momentum * (current - self.running_mean)
计算模型整体参数量

num_parameters = sum(torch.numel(parameter) for parameter in model.parameters())

查看网络中的参数

可以通过model.state_dict()或者model.named_parameters()函数查看现在的全部可训练参数(包括通过继承得到的父类中的参数)
params = list(model.named_parameters())
(name, param) = params[28]
print(name)
print(param.grad)
print('-------------------------------------------------')
(name2, param2) = params[29]
print(name2)
print(param2.grad)
print('----------------------------------------------------')
(name1, param1) = params[30]
print(name1)
print(param1.grad)

模型可视化(使用pytorchviz)

szagoruyko/pytorchvizgithub.com
类似 Keras 的 model.summary() 输出模型信息,使用pytorch-summary
sksq96/pytorch-summarygithub.com

模型权重初始化
注意 model.modules() 和 model.children() 的区别:model.modules() 会迭代地遍历模型的所有子层,而 model.children() 只会遍历模型下的一层。

# Common practise for initialization.
for layer in model.modules():
    if isinstance(layer, torch.nn.Conv2d):
        torch.nn.init.kaiming_normal_(layer.weight, mode='fan_out',
                                      nonlinearity='relu')
        if layer.bias is not None:
            torch.nn.init.constant_(layer.bias, val=0.0)
    elif isinstance(layer, torch.nn.BatchNorm2d):
        torch.nn.init.constant_(layer.weight, val=1.0)
        torch.nn.init.constant_(layer.bias, val=0.0)
    elif isinstance(layer, torch.nn.Linear):
        torch.nn.init.xavier_normal_(layer.weight)
        if layer.bias is not None:
            torch.nn.init.constant_(layer.bias, val=0.0)

# Initialization with given tensor.
layer.weight = torch.nn.Parameter(tensor)

提取模型中的某一层
modules()会返回模型中所有模块的迭代器,它能够访问到最内层,比如self.layer1.conv1这个模块,还有一个与它们相对应的是name_children()属性以及named_modules(),这两个不仅会返回模块的迭代器,还会返回网络层的名字。
# 取模型中的前两层
new_model = nn.Sequential(*list(model.children())[:2] 
# 如果希望提取出模型中的所有卷积层,可以像下面这样操作:
for layer in model.named_modules():
    if isinstance(layer[1],nn.Conv2d):
         conv_model.add_module(layer[0],layer[1])


部分层使用预训练模型
注意如果保存的模型是 torch.nn.DataParallel,则当前的模型也需要是
model.load_state_dict(torch.load('model.pth'), strict=False)
将在 GPU 保存的模型加载到 CPU
model.load_state_dict(torch.load('model.pth', map_location='cpu'))

导入另一个模型的相同部分到新的模型
模型导入参数时,如果两个模型结构不一致,则直接导入参数会报错。用下面方法可以把另一个模型的相同的部分导入到新的模型中。
# model_new代表新的模型
# model_saved代表其他模型,比如用torch.load导入的已保存的模型
model_new_dict = model_new.state_dict()
model_common_dict = {k:v for k, v in model_saved.items() if k in model_new_dict.keys()}
model_new_dict.update(model_common_dict)
model_new.load_state_dict(model_new_dict)

4. 数据处理

计算数据集的均值和标准差
import os
import cv2
import numpy as np
from torch.utils.data import Dataset
from PIL import Image


def compute_mean_and_std(dataset):
    # 输入PyTorch的dataset,输出均值和标准差
    mean_r = 0
    mean_g = 0
    mean_b = 0

    for img, _ in dataset:
        img = np.asarray(img) # change PIL Image to numpy array
        mean_b += np.mean(img[:, :, 0])
        mean_g += np.mean(img[:, :, 1])
        mean_r += np.mean(img[:, :, 2])

    mean_b /= len(dataset)
    mean_g /= len(dataset)
    mean_r /= len(dataset)

    diff_r = 0
    diff_g = 0
    diff_b = 0

    N = 0

    for img, _ in dataset:
        img = np.asarray(img)

        diff_b += np.sum(np.power(img[:, :, 0] - mean_b, 2))
        diff_g += np.sum(np.power(img[:, :, 1] - mean_g, 2))
        diff_r += np.sum(np.power(img[:, :, 2] - mean_r, 2))

        N += np.prod(img[:, :, 0].shape)

    std_b = np.sqrt(diff_b / N)
    std_g = np.sqrt(diff_g / N)
    std_r = np.sqrt(diff_r / N)

    mean = (mean_b.item() / 255.0, mean_g.item() / 255.0, mean_r.item() / 255.0)
    std = (std_b.item() / 255.0, std_g.item() / 255.0, std_r.item() / 255.0)
    return mean, std


得到视频数据基本信息
import cv2
video = cv2.VideoCapture(mp4_path)
height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
num_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
fps = int(video.get(cv2.CAP_PROP_FPS))
video.release()
TSN 每段(segment)采样一帧视频
K = self._num_segments
if is_train:
    if num_frames > K:
        # Random index for each segment.
        frame_indices = torch.randint(
            high=num_frames // K, size=(K,), dtype=torch.long)
        frame_indices += num_frames // K * torch.arange(K)
    else:
        frame_indices = torch.randint(
            high=num_frames, size=(K - num_frames,), dtype=torch.long)
        frame_indices = torch.sort(torch.cat((
            torch.arange(num_frames), frame_indices)))[0]
else:
    if num_frames > K:
        # Middle index for each segment.
        frame_indices = num_frames / K // 2
        frame_indices += num_frames // K * torch.arange(K)
    else:
        frame_indices = torch.sort(torch.cat((                              
            torch.arange(num_frames), torch.arange(K - num_frames))))[0]
assert frame_indices.size() == (K,)
return [frame_indices[i] for i in range(K)]



常用训练和验证数据预处理
其中 ToTensor 操作会将 PIL.Image 或形状为 H×W×D,数值范围为 [0, 255] 的 np.ndarray 转换为形状为 D×H×W,数值范围为 [0.0, 1.0] 的 torch.Tensor。
train_transform = torchvision.transforms.Compose([
    torchvision.transforms.RandomResizedCrop(size=224,
                                             scale=(0.08, 1.0)),
    torchvision.transforms.RandomHorizontalFlip(),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize(mean=(0.485, 0.456, 0.406),
                                     std=(0.229, 0.224, 0.225)),
 ])
 val_transform = torchvision.transforms.Compose([
    torchvision.transforms.Resize(256),
    torchvision.transforms.CenterCrop(224),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize(mean=(0.485, 0.456, 0.406),
                                     std=(0.229, 0.224, 0.225)),
])

5. 模型训练和测试

分类模型训练代码
# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Train the model
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i ,(images, labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward and optimizer
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i+1) % 100 == 0:
            print('Epoch: [{}/{}], Step: [{}/{}], Loss: {}'
                  .format(epoch+1, num_epochs, i+1, total_step, loss.item()))


分类模型测试代码
# Test the model
model.eval()  # eval mode(batch norm uses moving mean/variance 
              #instead of mini-batch mean/variance)
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Test accuracy of the model on the 10000 test images: {} %'
          .format(100 * correct / total))



自定义loss
继承torch.nn.Module类写自己的loss。
class MyLoss(torch.nn.Moudle):
    def __init__(self):
        super(MyLoss, self).__init__()

    def forward(self, x, y):
        loss = torch.mean((x - y) ** 2)
        return loss



标签平滑(label smoothing)
写一个label_smoothing.py的文件,然后在训练代码里引用,用LSR代替交叉熵损失即可。label_smoothing.py内容如下:
import torch
import torch.nn as nn


class LSR(nn.Module):

    def __init__(self, e=0.1, reduction='mean'):
        super().__init__()

        self.log_softmax = nn.LogSoftmax(dim=1)
        self.e = e
        self.reduction = reduction

    def _one_hot(self, labels, classes, value=1):
        """
            Convert labels to one hot vectors

        Args:
            labels: torch tensor in format [label1, label2, label3, ...]
            classes: int, number of classes
            value: label value in one hot vector, default to 1

        Returns:
            return one hot format labels in shape [batchsize, classes]
        """

        one_hot = torch.zeros(labels.size(0), classes)

        #labels and value_added  size must match
        labels = labels.view(labels.size(0), -1)
        value_added = torch.Tensor(labels.size(0), 1).fill_(value)

        value_added = value_added.to(labels.device)
        one_hot = one_hot.to(labels.device)

        one_hot.scatter_add_(1, labels, value_added)

        return one_hot

    def _smooth_label(self, target, length, smooth_factor):
        """convert targets to one-hot format, and smooth
        them.
        Args:
            target: target in form with [label1, label2, label_batchsize]
            length: length of one-hot format(number of classes)
            smooth_factor: smooth factor for label smooth

        Returns:
            smoothed labels in one hot format
        """
        one_hot = self._one_hot(target, length, value=1 - smooth_factor)
        one_hot += smooth_factor / (length - 1)

        return one_hot.to(target.device)

    def forward(self, x, target):

        if x.size(0) != target.size(0):
            raise ValueError('Expected input batchsize ({}) to match target batch_size({})'
                    .format(x.size(0), target.size(0)))

        if x.dim() < 2:
            raise ValueError('Expected input tensor to have least 2 dimensions(got {})'
                    .format(x.size(0)))

        if x.dim() != 2:
            raise ValueError('Only 2 dimension tensor are implemented, (got {})'
                    .format(x.size()))


        smoothed_target = self._smooth_label(target, x.size(1), self.e)
        x = self.log_softmax(x)
        loss = torch.sum(- x * smoothed_target, dim=1)

        if self.reduction == 'none':
            return loss

        elif self.reduction == 'sum':
            return torch.sum(loss)

        elif self.reduction == 'mean':
            return torch.mean(loss)

        else:
            raise ValueError('unrecognized option, expect reduction to be one of none, mean, sum')
或者直接在训练文件里做label smoothing
for images, labels in train_loader:
    images, labels = images.cuda(), labels.cuda()
    N = labels.size(0)
    # C is the number of classes.
    smoothed_labels = torch.full(size=(N, C), fill_value=0.1 / (C - 1)).cuda()
    smoothed_labels.scatter_(dim=1, index=torch.unsqueeze(labels, dim=1), value=0.9)

    score = model(images)
    log_prob = torch.nn.functional.log_softmax(score, dim=1)
    loss = -torch.sum(log_prob * smoothed_labels) / N
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()



Mixup训练
beta_distribution = torch.distributions.beta.Beta(alpha, alpha)
for images, labels in train_loader:
    images, labels = images.cuda(), labels.cuda()

    # Mixup images and labels.
    lambda_ = beta_distribution.sample([]).item()
    index = torch.randperm(images.size(0)).cuda()
    mixed_images = lambda_ * images + (1 - lambda_) * images[index, :]
    label_a, label_b = labels, labels[index]

    # Mixup loss.
    scores = model(mixed_images)
    loss = (lambda_ * loss_function(scores, label_a)
            + (1 - lambda_) * loss_function(scores, label_b))
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()


L1 正则化
l1_regularization = torch.nn.L1Loss(reduction='sum')
loss = ...  # Standard cross-entropy loss
for param in model.parameters():
    loss += torch.sum(torch.abs(param))
loss.backward()



不对偏置项进行权重衰减(weight decay)
pytorch里的weight decay相当于l2正则
bias_list = (param for name, param in model.named_parameters() if name[-4:] == 'bias')
others_list = (param for name, param in model.named_parameters() if name[-4:] != 'bias')
parameters = [{'parameters': bias_list, 'weight_decay': 0},                
              {'parameters': others_list}]
optimizer = torch.optim.SGD(parameters, lr=1e-2, momentum=0.9, weight_decay=1e-4)



梯度裁剪(gradient clipping)
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=20)



得到当前学习率
# If there is one global learning rate (which is the common case).
lr = next(iter(optimizer.param_groups))['lr']

# If there are multiple learning rates for different layers.
all_lr = []
for param_group in optimizer.param_groups:
    all_lr.append(param_group['lr'])
另一种方法,在一个batch训练代码里,当前的lr是optimizer.param_groups[0]['lr']



学习率衰减
# Reduce learning rate when validation accuarcy plateau.
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='max', patience=5, verbose=True)
for t in range(0, 80):
    train(...)
    val(...)
    scheduler.step(val_acc)

# Cosine annealing learning rate.
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=80)
# Reduce learning rate by 10 at given epochs.
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[50, 70], gamma=0.1)
for t in range(0, 80):
    scheduler.step()    
    train(...)
    val(...)

# Learning rate warmup by 10 epochs.
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda t: t / 10)
for t in range(0, 10):
    scheduler.step()
    train(...)
    val(...)



优化器链式更新
从1.4版本开始,torch.optim.lr_scheduler 支持链式更新(chaining),即用户可以定义两个 schedulers,并交替在训练中使用。
import torch
from torch.optim import SGD
from torch.optim.lr_scheduler import ExponentialLR, StepLR
model = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 0.1)
scheduler1 = ExponentialLR(optimizer, gamma=0.9)
scheduler2 = StepLR(optimizer, step_size=3, gamma=0.1)
for epoch in range(4):
    print(epoch, scheduler2.get_last_lr()[0])
    optimizer.step()
    scheduler1.step()
    scheduler2.step()



模型训练可视化
PyTorch可以使用tensorboard来可视化训练过程。
安装和运行TensorBoard。
pip install tensorboard
tensorboard --logdir=runs
使用SummaryWriter类来收集和可视化相应的数据,放了方便查看,可以使用不同的文件夹,比如'Loss/train'和'Loss/test'。
from torch.utils.tensorboard import SummaryWriter
import numpy as np

writer = SummaryWriter()

for n_iter in range(100):
    writer.add_scalar('Loss/train', np.random.random(), n_iter)
    writer.add_scalar('Loss/test', np.random.random(), n_iter)
    writer.add_scalar('Accuracy/train', np.random.random(), n_iter)
    writer.add_scalar('Accuracy/test', np.random.random(), n_iter)



保存与加载断点
注意为了能够恢复训练,我们需要同时保存模型和优化器的状态,以及当前的训练轮数。
start_epoch = 0
# Load checkpoint.
if resume: # resume为参数,第一次训练时设为0,中断再训练时设为1
    model_path = os.path.join('model', 'best_checkpoint.pth.tar')
    assert os.path.isfile(model_path)
    checkpoint = torch.load(model_path)
    best_acc = checkpoint['best_acc']
    start_epoch = checkpoint['epoch']
    model.load_state_dict(checkpoint['model'])
    optimizer.load_state_dict(checkpoint['optimizer'])
    print('Load checkpoint at epoch {}.'.format(start_epoch))
    print('Best accuracy so far {}.'.format(best_acc))

# Train the model
for epoch in range(start_epoch, num_epochs): 
    ... 

    # Test the model
    ...

    # save checkpoint
    is_best = current_acc > best_acc
    best_acc = max(current_acc, best_acc)
    checkpoint = {
        'best_acc': best_acc,
        'epoch': epoch + 1,
        'model': model.state_dict(),
        'optimizer': optimizer.state_dict(),
    }
    model_path = os.path.join('model', 'checkpoint.pth.tar')
    best_model_path = os.path.join('model', 'best_checkpoint.pth.tar')
    torch.save(checkpoint, model_path)
    if is_best:
        shutil.copy(model_path, best_model_path)


提取 ImageNet 预训练模型某层的卷积特征
# VGG-16 relu5-3 feature.
model = torchvision.models.vgg16(pretrained=True).features[:-1]
# VGG-16 pool5 feature.
model = torchvision.models.vgg16(pretrained=True).features
# VGG-16 fc7 feature.
model = torchvision.models.vgg16(pretrained=True)
model.classifier = torch.nn.Sequential(*list(model.classifier.children())[:-3])
# ResNet GAP feature.
model = torchvision.models.resnet18(pretrained=True)
model = torch.nn.Sequential(collections.OrderedDict(
    list(model.named_children())[:-1]))

with torch.no_grad():
    model.eval()
    conv_representation = model(image)




提取 ImageNet 预训练模型多层的卷积特征
class FeatureExtractor(torch.nn.Module):
    """Helper class to extract several convolution features from the given
    pre-trained model.

    Attributes:
        _model, torch.nn.Module.
        _layers_to_extract, list<str> or set<str>

    Example:
        >>> model = torchvision.models.resnet152(pretrained=True)
        >>> model = torch.nn.Sequential(collections.OrderedDict(
                list(model.named_children())[:-1]))
        >>> conv_representation = FeatureExtractor(
                pretrained_model=model,
                layers_to_extract={'layer1', 'layer2', 'layer3', 'layer4'})(image)
    """
    def __init__(self, pretrained_model, layers_to_extract):
        torch.nn.Module.__init__(self)
        self._model = pretrained_model
        self._model.eval()
        self._layers_to_extract = set(layers_to_extract)

    def forward(self, x):
        with torch.no_grad():
            conv_representation = []
            for name, layer in self._model.named_children():
                x = layer(x)
                if name in self._layers_to_extract:
                    conv_representation.append(x)
            return conv_representation



微调全连接层
model = torchvision.models.resnet18(pretrained=True)
for param in model.parameters():
    param.requires_grad = False
model.fc = nn.Linear(512, 100)  # Replace the last fc layer
optimizer = torch.optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9, weight_decay=1e-4)


以较大学习率微调全连接层,较小学习率微调卷积层
model = torchvision.models.resnet18(pretrained=True)
finetuned_parameters = list(map(id, model.fc.parameters()))
conv_parameters = (p for p in model.parameters() if id(p) not in finetuned_parameters)
parameters = [{'params': conv_parameters, 'lr': 1e-3}, 
              {'params': model.fc.parameters()}]
optimizer = torch.optim.SGD(parameters, lr=1e-2, momentum=0.9, weight_decay=1e-4)

6. 其他注意事项


不要使用太大的线性层。因为nn.Linear(m,n)使用的是的内存,线性层太大很容易超出现有显存。

不要在太长的序列上使用RNN。因为RNN反向传播使用的是BPTT算法,其需要的内存和输入序列的长度呈线性关系。

model(x) 前用 model.train() 和 model.eval() 切换网络状态。
不需要计算梯度的代码块用 with torch.no_grad() 包含起来。

model.eval() 和 torch.no_grad() 的区别在于,model.eval() 是将网络切换为测试状态,例如 BN 和dropout在训练和测试阶段使用不同的计算方法。torch.no_grad() 是关闭 PyTorch 张量的自动求导机制,以减少存储使用和加速计算,得到的结果无法进行 loss.backward()。

model.zero_grad()会把整个模型的参数的梯度都归零, 而optimizer.zero_grad()只会把传入其中的参数的梯度归零.

torch.nn.CrossEntropyLoss 的输入不需要经过 Softmax。torch.nn.CrossEntropyLoss 等价于 torch.nn.functional.log_softmax + torch.nn.NLLLoss。

loss.backward() 前用 optimizer.zero_grad() 清除累积梯度。

torch.utils.data.DataLoader 中尽量设置 pin_memory=True,对特别小的数据集如 MNIST 设置 pin_memory=False 反而更快一些。num_workers 的设置需要在实验中找到最快的取值。

用 del 及时删除不用的中间变量,节约 GPU 存储。
使用 inplace 操作可节约 GPU 存储,如x = torch.nn.functional.relu(x, inplace=True)
减少 CPU 和 GPU 之间的数据传输。例如如果你想知道一个 epoch 中每个 mini-batch 的 loss 和准确率,先将它们累积在 GPU 中等一个 epoch 结束之后一起传输回 CPU 会比每个 mini-batch 都进行一次 GPU 到 CPU 的传输更快。

使用半精度浮点数 half() 会有一定的速度提升,具体效率依赖于 GPU 型号。需要小心数值精度过低带来的稳定性问题。
时常使用 assert tensor.size() == (N, D, H, W) 作为调试手段,确保张量维度和你设想中一致。

除了标记 y 外,尽量少使用一维张量,使用 n*1 的二维张量代替,可以避免一些意想不到的一维张量计算结果。

统计代码各部分耗时

with torch.autograd.profiler.profile(enabled=True, use_cuda=False) as profile:    ...print(profile)# 或者在命令行运行python -m torch.utils.bottleneck main.py

使用TorchSnooper来调试PyTorch代码,程序在执行的时候,就会自动 print 出来每一行的执行结果的 tensor 的形状、数据类型、设备、是否需要梯度的信息。
# pip install torchsnooperimport torchsnooper# 对于函数,使用修饰器@torchsnooper.snoop()# 如果不是函数,使用 with 语句来激活 TorchSnooper,把训练的那个循环装进 with 语句中去。with torchsnooper.snoop():    原本的代码

https://github.com/zasdfgbnm/TorchSnoopergithub.com
模型可解释性,使用captum库:https://captum.ai/captum.ai

常用git命令汇总

  • Workspace:工作区
  • Index / Stage:暂存区
  • Repository:仓库区(或本地仓库)
  • Remote:远程仓库

git缓存修改:如果你没有改完这个分支,不想提交,但需要切换到别的分支,使用暂存:

git stash –include-untracked

  • 在其他分支上修改完以后再切回之前的feature分支,把暂存的修改拿出来
git stash pop

如何将自己的代码上传到git远程仓库:

  1. 打开 Git Bash。
  2. 将当前工作目录更改为您的本地仓库。
  3. 将要提交的文件暂存到本地仓库。$ git add . # Adds the file to your local repository and stages it for commit. 要取消暂存文件,请使用 'git reset HEAD YOUR-FILE'。
  4. 提交暂存在本地仓库中的文件。$ git commit -m "Add existing file" # Commits the tracked changes and prepares them to be pushed to a remote repository. 要删除此提交并修改文件,请使用 'git reset --soft HEAD~1' 并再次提交和添加文件。
  5. 推送更改(本地仓库中)到 GitHub.com。$ git push origin your-branch
  6. 注意如果https://github.com/vina_MolecularDockingScript.git不可以的话,尝试将 origin改成 git@github.com:/vina_MolecularDockingScript.git

git 比较不同分支差异

git diff  分支1 分支2 –stat (加上 –stat 是显示文件列表, 默认是文件内容diff)

git diff branch1 branch2 –stat                   //显示出所有有差异的文件列表


git diff branch1 branch2   具体文件路径   //显示指定文件的详细差异


git diff branch1 branch2                            //显示出所有有差异的文件的详细差异

git diff 被修改的文件名 就可以看到被修改过的原始内容


 

Git 出现!reject的情况

解决办法:首先将本地修改的文件备份,然后强制让远程文件覆盖本地仓库文件

Git fetch –all

Git reset –hard origin/liangxianchen

当删除本地文件后,如何与远端同步?

首先不能直接在本地按del键删除,这样对远端没有任何影响,必须使用git rm filename 命令删除,其次,删除完之后,git commit -m “delete filename” 一下,最后 git push就OK了

假设我们需要从远程pull下代码,本地修改后在传到远程完整流程:

1、本地新建文件夹:test,并在test内部打开 git

2、输入 git init 初始化git仓库

$ git config --global user.name "runoob"
$ git config --global user.email test@runoob.com

3、输入

git pull <远程主机名> <远程分支名>:<本地分支名>

比如:git pull git@10.112.55.138:CNN/code.git master:liangxianchen

把我的远程分支master拉到了本地 liangxianchen 分支

4、git branch -a 查看本地和远程分支

5、此时本地已经有了远程分支代码,可以在本地对代码进行修改和增加删除

修改完成后:

首先一定要先拉取远程分支!!!!:

比如:git pull git@10.112.55.138:CNN/code.git master: liangxianchen

然后:git add .

git commit -m “提交的备注信息”

git push git@10.112.55.138:CNN/code.git liangxianchen :liangxianchen

如果想要 将liangxianchen分支合并到master分支:

现在本地将两个分支合并,在使用

git push git@10.112.55.138:CNN/code.git master:master

注:git 还可以指定某些文件不参与git提交:gitignore,我们在git目录下创建一个.gitignore文件,然后在这个文件当中列举出我们不希望提交的文件即可。

凡是列在这个文件当中的名称,当我们在使用git add的时候都会替我们忽略掉。我们也没有必要从头开始编写这个gitignore文件,因为git当中已经替我们写好了很多模板,我们可以直接拿过来参考。

模板的地址:https://github.com/github/gitignore

一般来说,项目的log文件不需要传输,放到ignore中

git branch liangxianchen 新建分支

git checkout liangxianchen 切换分支

git 使用merge 对本地分支进行合并 并进行代码提交的流程

1、为啥需要git创建分支:

对于本地来说,其实一个分支是足够的,但假如你目前正在修改整个项目的partA,还没修改完,老板让你今天加个班,把partB改一下然后提交上去,这时候就需要分支了,你可以切换不同的分支去完成不同的工作,这样的化,你只需要把partB的分支推上去。如果你只有一个分支,那推上去的就是所有的修改。

第二种情况:如果你需要去临时修改一部分内容,可以新建分支,在该分支修改,修改完成,在切换回主分支,继续之前的工作。

1.只有当将修改内容commit后 该修改才完全生效,进行merge前需要将两个分支修改的内容都进行commit

2.假设本地两个分支   用于开发的分支:dev    用于同步远程仓库的分支:master 

3、我要同时开发几个功能

3.切换到master分支 进行 (git pull origin 远程分支) 不要在master 分支进行开发(也不要在master分支进行add commit),以此保证当在master分支进行git pull 不会产生冲突(如果不慎在master分支修改了内容, 可以先撤销所有修改,再将版本回退到没有冲突的地方)

4.在master分支拉取了最新代码后,如果没有在master分支进行过开发,那么这个分支内容就是没有冲突的最新的内容

5.切换到dev分支, 将所有的修改进行add 以及commit

dev分支的工作完成,我们就可以切换回master分支:

$ git checkout master
Switched to branch 'master'

切换回master分支后,再查看一个readme.txt文件,刚才添加的内容不见了!因为那个提交是在dev分支上,而master分支此刻的提交点并没有变

dev分支的工作成果合并到master分支上 git merge命令用于合并指定分支到当前分支 :

git merge dev(分支名)

合并完成后,就可以放心地删除dev分支了:

$ git branch -d dev
  • merge 遇见冲突后会直接停止,等待手动解决冲突并重新提交 commit 后,才能再次 merge
  • merge 是一个合并操作,会将两个分支的修改合并在一起,默认操作的情况下会提交合并中修改的内容

二、假设远程仓库只有mater分支

1. 克隆代码

git clone https://github.com/master-dev.git  
# 这个git路径是无效的,示例而已

2. 查看所有分支

git branch --all  
# 默认只有master分支,所以会看到如下两个分支
# master[本地主分支] origin/master[远程主分支]
# 新克隆下来的代码默认master和origin/master是关联的,也就是他们的代码保持同步

3. 创建本地新的dev分支

git branch dev  # 创建本地分支
git branch  # 查看分支
# 这是会看到master和dev,而且master上会有一个星号
# 这个时候dev是一个本地分支,远程仓库不知道它的存在
# 本地分支可以不同步到远程仓库,我们可以在dev开发,然后merge到master,使用master同步代码,当然也可以同步

4. 发布dev分支

发布dev分支指的是同步dev分支的代码到远程服务器

git push origin dev:dev  # 这样远程仓库也有一个dev分支了

5. 在dev分支开发代码

git checkout dev  # 切换到dev分支进行开发
# 开发代码之后,我们有两个选择
# 第一个:如果功能开发完成了,可以合并主分支
git checkout master  # 切换到主分支
git merge dev  # 把dev分支的更改和master合并
git push  # 提交主分支代码远程
git checkout dev  # 切换到dev远程分支
git push  # 提交dev分支到远程
# 第二个:如果功能没有完成,可以直接推送
git push  # 提交到dev远程分支
# 注意:在分支切换之前最好先commit全部的改变,除非你真的知道自己在做什么

6. 删除分支

git push origin :dev  # 删除远程dev分支,危险命令哦
# 下面两条是删除本地分支
git checkout master  # 切换到master分支
git branch -d dev  # 删除本地dev分支

git取消Commit,取消add,回滚代码

git commit之后取消的操作使用reset指令进行。 1.git reset –soft HEAD^,撤销commit,但是不撤销add动作。 2.git reset –hard HEAD^,撤销commit,并且撤销add动作。 3.git reset HEAD <文件名>,撤回add动作。 4.git checkout .,丢弃本次修改内容(文件被修改了,但未执行git add操作

git reset HEAD filename 取消某个文件的add

git fetch

使用 git fetch 指令将远程分支上的最新的修改下载下来。

<div><br class=”Apple-interchange-newline”>git merge</div>





git fetch upstream
git merge upstream/master

pull=fetch+merge,pull的话,下拉远程分支并与本地分支合并。fetch只是下拉远程分支,怎么合并,可以自己再做选择。

git pull

git pull 指令实际做了两件事:git fetch 和 git merge。

将分支代码拉取到本地,修改后如何上传到git分支

如果是多人合作,推送之前最好先更新一遍代码,因为可能别人更新过该分支,防止覆盖别人更新后的代码。将最新的分支代码拉取到本地后,再进行提交

1. 创建文件夹并初始化本地仓库

mkdir test
cd test
git init

关联本地仓库和远程仓库

git remote add origin git@github.com:liqiangyz/learngit.git

说明:

  • origin表示远程库的名字,可以随意,一般默认为origin;
  • origin后面表示远程仓库的真实地址,如下图所示。我这里使用的是SSH地址,当然也可以使用HTTPS地址,复制过来就行。

git remote 删除已添加的远程仓库地址

添加远程仓库出错,因为已经存在了一个错误的地址

查看当前配置的远程仓库地址:

git remote -v

删除本地指定的远程地址:

git remote remove origin
再执行-v的查看,可以添加新的远程仓库地址了

3. 拉取分支代码

git fetch是将远程主机的最新内容拉到本地,用户在检查了以后决定是否合并到工作本机分支中。

git pull 则是将远程主机的最新内容拉下来后直接合并,即:git pull = git fetch + git merge,这样可能会产生冲突,需要手动解决。

git fetch origin develop #可以使用git fetch origin 拉取全部

说明:

  • develop为我的分支名字,根据自己的分支决定。
  • 有的同学可能会用git pull,git pull = git fetch + git merge,因为pull拉取会合并本地文件,可能会导致冲突。

4. 创建本地分支,并切换到本地分支

经过上一步,在本地还看不到拉取的代码,需要手动创建一下:

不要使用git checkout -b develop,如果没有关联远程,会出问题!!!

git checkout -b develop origin/develop

说明:

git checkout表示切换分支或恢复工作树文件。
-b表示进入git checkout之前执行git branch 创建分支操作。
综合起来考虑,这一步的操作相当于执行checkout命令检出远程拉取分支,并进入该分支。
使用ls命令可以看到下拉的文件,并且使用git branch命令可以看到当前停留的分支

5. 更新分支代码

如果远程分支上有更新,可以使用pull命令对本地进行更新,如果没有,则可以跳过此步骤。

git pull origin develop(分支)

1、查看文件状态

git status
2、添加全部文件到暂存区

git add .

3、将暂存区内容添加到本地仓库中

git commit -m “cpp”

6、推送(push)本地分支到远程分支

推送之前最好先更新一遍代码!防止覆盖!!!自己去理解一下哈。

git pull origin develop
git push origin develop

一、新建代码库


# 在当前目录新建一个Git代码库
$ git init

# 新建一个目录,将其初始化为Git代码库
$ git init [project-name]

# 下载一个项目和它的整个代码历史
$ git clone [url]

二、配置

Git的设置文件为.gitconfig,它可以在用户主目录下(全局配置),也可以在项目目录下(项目配置)。


# 显示当前的Git配置
$ git config --list

# 编辑Git配置文件
$ git config -e [--global]

# 设置提交代码时的用户信息
$ git config [--global] user.name "[name]"
$ git config [--global] user.email "[email address]"

三、增加/删除文件


# 添加指定文件到暂存区
$ git add [file1] [file2] ...

# 添加指定目录到暂存区,包括子目录
$ git add [dir]

# 添加当前目录的所有文件到暂存区
$ git add .

git add -u :将文件的修改、文件的删除,添加到暂存区。

git add . :将文件的修改,文件的新建,添加到暂存区。

git add -A :将文件的修改,文件的删除,文件的新建,添加到暂存区。

# 添加每个变化前,都会要求确认
# 对于同一个文件的多处变化,可以实现分次提交
$ git add -p

# 删除工作区文件,并且将这次删除放入暂存区
$ git rm [file1] [file2] ...

# 停止追踪指定文件,但该文件会保留在工作区
$ git rm --cached [file]

# 改名文件,并且将这个改名放入暂存区
$ git mv [file-original] [file-renamed]

四、代码提交


# 提交暂存区到仓库区
$ git commit -m [message]

# 提交暂存区的指定文件到仓库区
$ git commit [file1] [file2] ... -m [message]

# 提交工作区自上次commit之后的变化,直接到仓库区
$ git commit -a

# 提交时显示所有diff信息
$ git commit -v

# 使用一次新的commit,替代上一次提交
# 如果代码没有任何新变化,则用来改写上一次commit的提交信息
$ git commit --amend -m [message]

# 重做上一次commit,并包括指定文件的新变化
$ git commit --amend [file1] [file2] ...

五、分支


# 列出所有本地分支
$ git branch

# 列出所有远程分支
$ git branch -r

# 列出所有本地分支和远程分支
$ git branch -a

# 新建一个分支,但依然停留在当前分支
$ git branch [branch-name]

# 新建一个分支,并切换到该分支
$ git checkout -b [branch]

# 新建一个分支,指向指定commit
$ git branch [branch] [commit]

# 新建一个分支,与指定的远程分支建立追踪关系
$ git branch --track [branch] [remote-branch]

# 切换到指定分支,并更新工作区
$ git checkout [branch-name]

# 切换到上一个分支
$ git checkout -

# 建立追踪关系,在现有分支与指定的远程分支之间
$ git branch --set-upstream [branch] [remote-branch]

# 合并指定分支到当前分支
$ git merge [branch]

# 选择一个commit,合并进当前分支
$ git cherry-pick [commit]

# 删除分支
$ git branch -d [branch-name]

# 删除远程分支
$ git push origin --delete [branch-name]
$ git branch -dr [remote/branch]

六、标签


# 列出所有tag
$ git tag

# 新建一个tag在当前commit
$ git tag [tag]

# 新建一个tag在指定commit
$ git tag [tag] [commit]

# 删除本地tag
$ git tag -d [tag]

# 删除远程tag
$ git push origin :refs/tags/[tagName]

# 查看tag信息
$ git show [tag]

# 提交指定tag
$ git push [remote] [tag]

# 提交所有tag
$ git push [remote] --tags

# 新建一个分支,指向某个tag
$ git checkout -b [branch] [tag]

七、查看信息


# 显示有变更的文件
$ git status

# 显示当前分支的版本历史
$ git log

# 显示commit历史,以及每次commit发生变更的文件
$ git log --stat

# 搜索提交历史,根据关键词
$ git log -S [keyword]

# 显示某个commit之后的所有变动,每个commit占据一行
$ git log [tag] HEAD --pretty=format:%s

# 显示某个commit之后的所有变动,其"提交说明"必须符合搜索条件
$ git log [tag] HEAD --grep feature

# 显示某个文件的版本历史,包括文件改名
$ git log --follow [file]
$ git whatchanged [file]

# 显示指定文件相关的每一次diff
$ git log -p [file]

# 显示过去5次提交
$ git log -5 --pretty --oneline

# 显示所有提交过的用户,按提交次数排序
$ git shortlog -sn

# 显示指定文件是什么人在什么时间修改过
$ git blame [file]

# 显示暂存区和工作区的差异
$ git diff

# 显示暂存区和上一个commit的差异
$ git diff --cached [file]

# 显示工作区与当前分支最新commit之间的差异
$ git diff HEAD

# 显示两次提交之间的差异
$ git diff [first-branch]...[second-branch]

# 显示今天你写了多少行代码
$ git diff --shortstat "@{0 day ago}"

# 显示某次提交的元数据和内容变化
$ git show [commit]

# 显示某次提交发生变化的文件
$ git show --name-only [commit]

# 显示某次提交时,某个文件的内容
$ git show [commit]:[filename]

# 显示当前分支的最近几次提交
$ git reflog

八、远程同步


# 下载远程仓库的所有变动
$ git fetch [remote]

# 显示所有远程仓库
$ git remote -v

# 显示某个远程仓库的信息
$ git remote show [remote]

# 增加一个新的远程仓库,并命名
$ git remote add [shortname] [url]

# 取回远程仓库的变化,并与本地分支合并
$ git pull [remote] [branch]

# 上传本地指定分支到远程仓库
$ git push [remote] [branch]

# 强行推送当前分支到远程仓库,即使有冲突
$ git push [remote] --force

# 推送所有分支到远程仓库
$ git push [remote] --all

九、撤销


# 恢复暂存区的指定文件到工作区
$ git checkout [file]

# 恢复某个commit的指定文件到暂存区和工作区
$ git checkout [commit] [file]

# 恢复暂存区的所有文件到工作区
$ git checkout .

# 重置暂存区的指定文件,与上一次commit保持一致,但工作区不变
$ git reset [file]

# 重置暂存区与工作区,与上一次commit保持一致
$ git reset --hard

# 重置当前分支的指针为指定commit,同时重置暂存区,但工作区不变
$ git reset [commit]

# 重置当前分支的HEAD为指定commit,同时重置暂存区和工作区,与指定commit一致
$ git reset --hard [commit]

# 重置当前HEAD为指定commit,但保持暂存区和工作区不变
$ git reset --keep [commit]

# 新建一个commit,用来撤销指定commit
# 后者的所有变化都将被前者抵消,并且应用到当前分支
$ git revert [commit]

# 暂时将未提交的变化移除,稍后再移入
$ git stash
$ git stash pop

十、其他


# 生成一个可供发布的压缩包
$ git archive

汇总表:

git init # 初始化本地git仓库(创建新仓库)
git config –global user.name “xxx” # 配置用户名
git config –global user.email “xxx@xxx.com” # 配置邮件
git config –global color.ui true # git status等命令自动着色
git config –global color.status auto
git config –global color.diff auto
git config –global color.branch auto
git config –global color.interactive auto
git config –global –unset http.proxy # remove proxy configuration on git
git clone git+ssh://git@192.168.53.168/VT.git # clone远程仓库
git status # 查看当前版本状态(是否修改)
git add xyz # 添加xyz文件至index
git add . # 增加当前子目录下所有更改过的文件至index
git commit -m ‘xxx’ # 提交
git commit –amend -m ‘xxx’ # 合并上一次提交(用于反复修改)
git commit -am ‘xxx’ # 将add和commit合为一步
git rm xxx # 删除index中的文件
git rm -r * # 递归删除
git log # 显示提交日志
git log -1 # 显示1行日志 -n为n行
git log -5
git log –stat # 显示提交日志及相关变动文件
git log -p -m
git show dfb02e6e4f2f7b573337763e5c0013802e392818 # 显示某个提交的详细内容
git show dfb02 # 可只用commitid的前几位
git show HEAD # 显示HEAD提交日志
git show HEAD^ # 显示HEAD的父(上一个版本)的提交日志 ^^为上两个版本 ^5为上5个版本
git tag # 显示已存在的tag
git tag -a v2.0 -m ‘xxx’ # 增加v2.0的tag
git show v2.0 # 显示v2.0的日志及详细内容
git log v2.0 # 显示v2.0的日志
git diff # 显示所有未添加至index的变更
git diff –cached # 显示所有已添加index但还未commit的变更
git diff HEAD^ # 比较与上一个版本的差异
git diff HEAD — ./lib # 比较与HEAD版本lib目录的差异
git diff origin/master..master # 比较远程分支master上有本地分支master上没有的
git diff origin/master..master –stat # 只显示差异的文件,不显示具体内容
git remote add origin git+ssh://git@192.168.53.168/VT.git # 增加远程定义(用于push/pull/fetch)
git branch # 显示本地分支
git branch –contains 50089 # 显示包含提交50089的分支
git branch -a # 显示所有分支
git branch -r # 显示所有原创分支
git branch –merged # 显示所有已合并到当前分支的分支
git branch –no-merged # 显示所有未合并到当前分支的分支
git branch -m master master_copy # 本地分支改名
git checkout -b master_copy # 从当前分支创建新分支master_copy并检出
git checkout -b master master_copy # 上面的完整版
git checkout features/performance # 检出已存在的features/performance分支
git checkout –track hotfixes/BJVEP933 # 检出远程分支hotfixes/BJVEP933并创建本地跟踪分支
git checkout v2.0 # 检出版本v2.0
git checkout -b devel origin/develop # 从远程分支develop创建新本地分支devel并检出
git checkout — README # 检出head版本的README文件(可用于修改错误回退)
git merge origin/master # 合并远程master分支至当前分支
git cherry-pick ff44785404a8e # 合并提交ff44785404a8e的修改
git push origin master # 将当前分支push到远程master分支
git push origin :hotfixes/BJVEP933 # 删除远程仓库的hotfixes/BJVEP933分支
git push –tags # 把所有tag推送到远程仓库
git fetch # 获取所有远程分支(不更新本地分支,另需merge)
git fetch –prune # 获取所有原创分支并清除服务器上已删掉的分支
git pull origin master # 获取远程分支master并merge到当前分支
git mv README README2 # 重命名文件README为README2
git reset –hard HEAD # 将当前版本重置为HEAD(通常用于merge失败回退)
git rebase
git branch -d hotfixes/BJVEP933 # 删除分支hotfixes/BJVEP933(本分支修改已合并到其他分支)
git branch -D hotfixes/BJVEP933 # 强制删除分支hotfixes/BJVEP933
git ls-files # 列出git index包含的文件
git show-branch # 图示当前分支历史
git show-branch –all # 图示所有分支历史
git whatchanged # 显示提交历史对应的文件修改
git revert dfb02e6e4f2f7b573337763e5c0013802e392818 # 撤销提交dfb02e6e4f2f7b573337763e5c0013802e392818
git ls-tree HEAD # 内部命令:显示某个git对象
git rev-parse v2.0 # 内部命令:显示某个ref对于的SHA1 HASH
git reflog # 显示所有提交,包括孤立节点
git show HEAD@{5}
git show master@{yesterday} # 显示master分支昨天的状态
git log –pretty=format:’%h %s’ –graph # 图示提交日志
git show HEAD~3
git show -s –pretty=raw 2be7fcb476
git stash # 暂存当前修改,将所有至为HEAD状态
git stash list # 查看所有暂存
git stash show -p stash@{0} # 参考第一次暂存
git stash apply stash@{0} # 应用第一次暂存
git grep “delete from” # 文件中搜索文本“delete from”
git grep -e ‘#define’ –and -e SORT_DIRENT
git gc
git fsck
参考:
https://git-scm.com/docs/git-add
https://www.ruanyifeng.com/blog/2015/12/git-cheat-sheet.html
https://gist.github.com/guweigang/9848271#file-git_toturial

自己理解:

编写前

Git checkout  liangchen

#先拉去远程分支代码

Git pull origin liangxianchen:liangxianchen

Git add 我改过的代码文件  #只提交自己该过的代码

Git commit -m

Git push origin liangchen:liangxianchen(从liangchen本地分支推送到liangxianchen远程分支)

编写代码

如果想回推代码在没有提交前

Git status

git diff 被修改的文件名

就可以看到被修改过的原始内容

Git 出现!reject的情况

解决办法:首先将本地修改的文件备份,然后强制让远程文件覆盖本地仓库文件

Git fetch –all

Git reset –hard origin/liangxianchen

当删除本地文件后,如何与远端同步?

首先不能直接在本地按del键删除,这样对远端没有任何影响,必须使用git rm filename 命令删除,其次,删除完之后,git commit -m “delete filename” 一下,最后 git push就OK了

YOLO系列(一)

What is YOLOv5


YOLO an acronym for ‘You only look once’, is an object detection algorithm that divides images into a grid system. Each cell in the grid is responsible for detecting objects within itself.

YOLO is one of the most famous object detection algorithms due to its speed and accuracy.

The History of YOLO


YOLOv5

Shortly after the release of YOLOv4 Glenn Jocher introduced YOLOv5 using the Pytorch framework.
The open source code is available on GitHub

Author: Glenn Jocher
Released: 18 May 2020

YOLOv4

With the original authors work on YOLO coming to a standstill, YOLOv4 was released by Alexey Bochoknovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. The paper was titled YOLOv4: Optimal Speed and Accuracy of Object Detection

Author: Alexey BochoknovskiyChien-Yao Wang, and Hong-Yuan Mark Liao
Released: 23 April 2020

yolov4-tiny : https://arxiv.org/abs/2011.04244

code yolov4-tiny

https://github.com/bubbliiiing/yolov4-tiny-pytorch

yolov4 tiny结构

YOLOv3

YOLOv3 improved on the YOLOv2 paper and both Joseph Redmon and Ali Farhadi, the original authors, contributed.
Together they published YOLOv3: An Incremental Improvement

The original YOLO papers were are hosted here

Author: Joseph Redmon and Ali Farhadi
Released: 8 Apr 2018

YOLOv2

YOLOv2 was a joint endevor by Joseph Redmon the original author of YOLO and Ali Farhadi.
Together they published YOLO9000:Better, Faster, Stronger

Author: Joseph Redmon and Ali Farhadi
Released: 25 Dec 2016

YOLOv1

YOLOv1 was released as a research paper by Joseph Redmon.
The paper was titled You Only Look Once: Unified, Real-Time Object Detection

Author: Joseph Redmon
Released: 8 Jun 2015

mAP—–目标检测

在评价一个目标检测算法的“好坏”程度的时候,往往采用的是pascal voc 2012的评价标准mAP。

官方定义:

  1. Compute a version of the measured precision/recall curve with precision monotonically decreasing, by setting the precision for recall r to the maximum precision obtained for any recall r′ ≥ r.
  2. Compute the AP as the area under this curve by numerical integration. No approximation is involved since the curve is piecewise constant.

Note that prior to 2010 the AP is computed by sampling the monotonically

decreasing curve at a fixed set of uniformly-spaced recall values 0, 0.1, 0.2, . . . , 1. By contrast, VOC2010–2012 effectively samples the curve at all unique recall values.

mAP(均值平均精度)是目标检测中的最常用的评价指标,详细理解其计算方式有助于我们评估算法的有效性,并针对评测指标对算法进行调整。mean Average Precision, 即各类别AP的平均值

TP TN FP FN:

T或者N代表的是该样本是否被分类分对,P或者N代表的是该样本被分为什么

TP(True Positives)意思我们倒着来翻译就是“被分为正样本,并且分对了”,TN(True Negatives)意思是“被分为负样本,而且分对了”,FP(False Positives)意思是“被分为正样本,但是分错了”,FN(False Negatives)意思是“被分为负样本,但是分错了”。

按下图来解释,左半矩形是正样本,右半矩形是负样本。一个2分类器,在图上画了个圆,分类器认为圆内是正样本,圆外是负样本。那么左半圆分类器认为是正样本,同时它确实是正样本,那么就是“被分为正样本,并且分对了”即TP,左半矩形扣除左半圆的部分就是分类器认为它是负样本,但是它本身却是正样本,就是“被分为负样本,但是分错了”即FN。右半圆分类器认为它是正样本,但是本身却是负样本,那么就是“被分为正样本,但是分错了”即FP。右半矩形扣除右半圆的部分就是分类器认为它是负样本,同时它本身确实是负样本,那么就是“被分为负样本,而且分对了”即TN。

Precision(精度)和Recall(召回率)的概念:

Precision(精度 )翻译成中文就是“分类器认为是正类并且确实是正类的部分占所有分类器认为是正类的比例”,衡量的是一个分类器分出来的正类的确是正类的概率。两种极端情况就是,如果精度是100%,就代表所有分类器分出来的正类确实都是正类。如果精度是0%,就代表分类器分出来的正类没一个是正类。光是精度还不能衡量分类器的好坏程度,比如50个正样本和50个负样本,我的分类器把49个正样本和50个负样本都分为负样本,剩下一个正样本分为正样本,这样我的精度也是100%,但是傻子也知道这个分类器很垃圾。

Recall(召回率) ,翻译成中文就是“分类器认为是正类并且确实是正类的部分占所有确实是正类的比例”,衡量的是一个分类能把所有的正类都找出来的能力。两种极端情况,如果召回率是100%,就代表所有的正类都被分类器分为正类。如果召回率是0%,就代表没一个正类被分为正类。

IOU(交并比)

它是模型所预测的检测框(bbox)和真实的检测框(ground truth)的交集和并集之间的比例

IOU
def Iou(rec1,rec2):
  x1,x2,y1,y2 = rec1 #分别是第一个矩形左右上下的坐标
  x3,x4,y3,y4 = rec2 #分别是第二个矩形左右上下的坐标
  area_1 = (x2-x1)*(y1-y2)
  area_2 = (x4-x3)*(y3-y4)
  sum_area = area_1 + area_2
  w1 = x2 - x1#第一个矩形的宽
  w2 = x4 - x3#第二个矩形的宽
  h1 = y1 - y2
  h2 = y3 - y4
  W = min(x1,x2,x3,x4)+w1+w2-max(x1,x2,x3,x4)#交叉部分的宽
  H = min(y1,y2,y3,y4)+h1+h2-max(y1,y2,y3,y4)#交叉部分的高
  Area = W*H#交叉的面积
  Iou = Area/(sum_area-Area)
  return Iou

举例计算mAP

有3张图如下,要求算法找出face。蓝色框代表标签label,绿色框代表算法给出的结果pre,旁边的红色小字代表置信度。设定第一张图的检出框叫pre1,第一张的标签框叫label1。第二张、第三张同理。

首先我们计算每张图的pre和label的IOU,根据IOU是否大于0.5来判断该pre是属于TP(分成正样本,分对了)还是属于FP(被分为正样本,但是分错了)。显而易见,pre1是TP,pre2是FP,pre3是TP。

根据每个pre的置信度进行从高到低排序,这里pre1、pre2、pre3置信度刚好就是从高到低。

在不同置信度阈值下获得Precision和Recall:

首先,设置阈值为0.9,无视所有小于0.9的pre。那么检测器检出的所有框pre即TP+FP=1,并且pre1是TP,那么Precision=1/1。因为所有的label=3,所以Recall=1/3。这样就得到一组P、R值。

然后,设置阈值为0.8,无视所有小于0.8的pre。那么检测器检出的所有框pre即TP+FP=2,因为pre1是TP,pre2是FP,那么Precision=1/2=0.5。因为所有的label=3,所以Recall=1/3=0.33。这样就又得到一组P、R值。

再然后,设置阈值为0.7,无视所有小于0.7的pre。那么检测器检出的所有框pre即TP+FP=3,因为pre1是TP,pre2是FP,pre3是TP,那么Precision=2/3=0.67。因为所有的label=3,所以Recall=2/3=0.67。这样就又得到一组P、R值。

根据上面3组PR值绘制PR曲线如下。然后每个“峰值点”往左画一条线段直到与上一个峰值点的垂直线相交。这样画出来的红色线段与坐标轴围起来的面积就是AP值。

计算mAP


AP衡量的是对一个类检测好坏,mAP就是对多个类的检测好坏。就是简单粗暴的把所有类的AP值取平均就好了。比如有两类,类A的AP值是0.5,类B的AP值是0.2,那么mAP=(0.5+0.2)/2=0.35

计算mAP的github地址:

https://github.com/Cartucho/mAP

# AP的计算
def _average_precision(self, rec, prec):
    """
    Params:
    ----------
    rec : numpy.array
            cumulated recall
    prec : numpy.array
            cumulated precision
    Returns:
    ----------
    ap as float
    """
    if rec is None or prec is None:
        return np.nan
    ap = 0.
    for t in np.arange(0., 1.1, 0.1):  #十一个点的召回率,对应精度最大值
        if np.sum(rec >= t) == 0:
            p = 0
        else:
            p = np.max(np.nan_to_num(prec)[rec >= t])
        ap += p / 11.  #加权平均
    return ap

正确理解 YOLO 的辨识准确率

2021: A Year Full of Amazing AI papers – A Review

A curated list of the latest breakthroughs in AI by release date with a clear video explanation, link to a more in-depth article, and code.

comefrom:

https://www.louisbouchard.ai/2021-ai-papers-review/

Table of content

  • DALL·E: Zero-Shot Text-to-Image Generation from OpenAI [1]
  • VOGUE: Try-On by StyleGAN Interpolation Optimization [2]
  • Taming Transformers for High-Resolution Image Synthesis [3]
  • Thinking Fast And Slow in AI [4]
  • Automatic detection and quantification of floating marine macro-litter in aerial images [5]
  • ShaRF: Shape-conditioned Radiance Fields from a Single View [6]
  • Generative Adversarial Transformers [7]
  • We Asked Artificial Intelligence to Create Dating Profiles. Would You Swipe Right? [8]
  • Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [9]
  • IMAGE GANS MEET DIFFERENTIABLE RENDERING FOR INVERSE GRAPHICS AND INTERPRETABLE 3D NEURAL RENDERING [10]
  • Deep nets: What have they ever done for vision? [11]
  • Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image [12]
  • Portable, Self-Contained Neuroprosthetic Hand with Deep Learning-Based Finger Control [13]
  • Total Relighting: Learning to Relight Portraits for Background Replacement [14]
  • LASR: Learning Articulated Shape Reconstruction from a Monocular Video [15]
  • Enhancing Photorealism Enhancement [16]
  • DefakeHop: A Light-Weight High-Performance Deepfake Detector [17]
  • High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network [18]
  • Barbershop: GAN-based Image Compositing using Segmentation Masks [19]
  • TextStyleBrush: Transfer of text aesthetics from a single example [20]
  • Animating Pictures with Eulerian Motion Fields [21]
  • CVPR 2021 Best Paper Award: GIRAFFE – Controllable Image Generation [22]
  • GitHub Copilot & Codex: Evaluating Large Language Models Trained on Code [23]
  • Apple: Recognizing People in Photos Through Private On-Device Machine Learning [24]
  • Image Synthesis and Editing with Stochastic Differential Equations [25]
  • Sketch Your Own GAN [26]
  • Tesla’s Autopilot Explained [27]
  • Styleclip: Text-driven manipulation of StyleGAN imagery [28]
  • TimeLens: Event-based Video Frame Interpolation [29]
  • Diverse Generation from a Single Video Made Possible [30]
  • Skillful Precipitation Nowcasting using Deep Generative Models of Radar [31]
  • The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks [32]
  • ADOP: Approximate Differentiable One-Pixel Point Rendering [33]
  • (Style)CLIPDraw: Coupling Content and Style in Text-to-Drawing Synthesis [34]
  • SwinIR: Image restoration using swin transformer [35]
  • EditGAN: High-Precision Semantic Image Editing [36]
  • CityNeRF: Building NeRF at City Scale [37]
  • ClipCap: CLIP Prefix for Image Captioning [38]
  • Paper references

A Parameterisable FPGA-tailored Architecture for YOLOv3-tiny

————– YOLOv3-tiny 的ZYNQ实现

摘要:

实验表明,当针对低端 FPGA 设备时,与设备的硬核处理器相比,所提出的架构实现了 290 倍的延迟改进,同时与原始模型相比,mAP 下降了2.5pp(30.9% 对 33.4%)。 所呈现的工作为低端 FPGA 设备上的低延迟对象检测开辟了道路。

introduction:

目标检测技术用于检测图像和视频中的对象实例。这项技术的应用可以在高级智能系统的部署中找到,如先进的驾驶辅助系统(adas)和视频监控。精确的对象分类和对象位置的标识通常是必需的,因为这些信息构成了应用程序管道的其余部分进一步处理和决策的基础。最近,利用机器学习的最新进展,特别是深层神经网络的发展,研究人员和实践者开发了强大的目标检测系统,可以在许多具有挑战性的情况下提供准确的检测。此外,在需要较低处理延迟的情况下,该领域的工作已经从扫描多个位置的图像转向应用图像分类器(即将目标检测问题转化为多窗口的分类问题) ,转而将上述不同步骤组合在一个单一的管道中,通常基于深层神经网络。

这篇论文的创新贡献是针对YOLOv3tiny工作负载定制的延迟优化参数化架构可以根据硬件资源,自定义参数),该架构可根据任何目标FPGA设备的资源可用性进行调整。为了实现上述功能,我们使用Vivado HLS开发并实现了一个可参数化的体系结构。导出了性能和资源模型,用于指导设计空间探索(DSE)阶段,以确定优化系统延迟的设计点,同时满足资源约束。
针对的是低功耗和资源有限的FPGA设备,需要使用片外存储器来存储网络的参数和中间结果,从而能够在资源极其有限的情况下部署YOLOv3 tiny。

网络结构:

YOLOv3 tiny接受416×416分辨率的RGB图像作为输入。与YOLOv3相反,YOLOv3 tiny只在两种不同的尺度上预测边界框。第一个比例将输入图像划分为13×13个网格,而第二个比例在26×26个网格上操作。该框架在每个网格中生成三个边界框。网络输出一个3d张量,其中包含关于边界框、对象可信度和类预测的信息。该网络主要使用五种类型的层;卷积、最大池、路由、上采样和Yolo层。路由层负责在网络中创建不同的流,其中上采样用于支持多个检测比例。Yolo层负责生成输出向量。

yolov3网络模型

这项工作的目标是设计一种延迟优化和FPGA定制架构,该架构可以根据可用的FPGA资源进行定制,以加速YOLOv3微小模型的推理阶段。所提出的架构在HLS中是定制的,编译时可参数化,以有限资源的低端FPGA设备为目标,因此系统不会对用于存储数据的足够片上内存施加硬约束。

模块设计:

FPGA硬件加速器由五个主要计算模块组成;卷积、累加、最大池、上采样和yolo块

FPGA硬件加速器表示提议的FPGA架构,由三级管道组成,其中每一级对应一层YOLOv3微型网络。加速器由负责系统总体控制的ARM处理器控制。数据和权重通过DMA接口在加速器和片外存储器之间传输。FPGA加速器由三级流水线组成。管道的第一级支持卷积层的执行,其输出在管道的第二级中累积。根据在给定时间执行的网络结构,累加结果将发送到Max pooling、Upsample或YLO层进行进一步处理

在推理阶段,ARM处理器充当主处理器并控制推理过程。网络的计算被分解为更小的组件,即层批次,由处理器调度并按顺序执行。图4捕获了单层批次的处理流程。更具体地说,ARM处理器首先为硬件加速器中的每个单独块设置参数,并配置DMA模块。然后,它通过DMA流启动权重加载和输入数据传输到硬件加速器。FPGA加速块开始处理数据,并将输出数据传输回片外DDR存储器。相应缓存区域的必要失效由处理器执行,以确保正确的数据传输。

实验验证:

使用具有Xilinx XC7Z020 SoC和512 MB DDR3的Zedboard开发工具包对提议的框架进行评估。可编程逻辑与处理系统的时钟频率为100MHz和666。分别为7MHz。设计空间探索阶段以整个设备的利用率为目标,确定满足可用资源所施加约束的多个设计点,并预测每个点的延迟数字。穿越的空间如图5所示。性能最好的设计实现了每次推断(在电路板上测量)532ms的延迟,需要185个BRAM、160个DSP、25个CPU。9k LUT和46。7k自由流速度。测量的功耗为3。36W

论文:

Yu Z, Bouganis C S, Rincón F, et al. A Parameterisable FPGA-Tailored Architecture for YOLOv3-Tiny[C]//ARC. 2020: 330-344.

Improved Regularization of Convolutional Neural Networks with Cutou

–cutout 正则化

cutout,是一种数据增强的方法,主要应用于分类任务中。
  cutout的实现方法为,在图像中随机选取一个点作为中心点,覆盖一个固定大小的方形zero-mask。mask的大小是一个超参数,在文中是通过网格搜索得到的长度。mask区域可以在图像外。

cutout方法提出的出发点是作为一个正则化方法,防止CNN过拟合。cutcout方法很简单,就是在训练的时候,在随机位置应用一个方形矩阵。
  作者认为这种技术鼓励网络去利用整个图片的信息,而不是依赖于小部分特定的视觉特征。

  相比于dropout,cutout更像是数据增强的一种手段,而不是添加噪声。

  在刚开始应用maks的时候,作者也尝试应用mask于关键部位(那些激活值最大的区域),并得到了不错的结果(如下图所示)。但后来发现随机去除固定大小区域和直接在目标区域的效果是相当的,所以之后都采用移除固定大小区域的策略。

 同时,作者发现zero-mask区域大小的选择比形状的选择更重要。大小的选择在文中是通过网格搜索完成的,但都是应用于较小的数据集(CIFAR10/CIFAR100/SVHN)上。在选择应用区域的时候,发现zero-mask随机应用效果比较好,即部分mask在图像外。作者解释为,部分mask在图像外是实现良好性能的关键。