In recent years, tremendous amount of progress is being made in the field of 3D Machine Learning, which is an interdisciplinary field that fuses computer vision, computer graphics and machine learning. This repo is derived from my study notes and will be used as a place for triaging new research papers.
I’ll use the following icons to differentiate 3D representations:
📷 Multi-view Images
👾 Volumetric
🎲 Point Cloud
💎 Polygonal Mesh
💊 Primitive-based
To find related papers and their relationships, check out Connected Papers, which provides a neat way to visualize the academic field in a graph representation.
Get Involved
To contribute to this Repo, you may add content through pull requests or open an issue to let me know.
⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ We have also created a Slack workplace for people around the globe to ask questions, share knowledge and facilitate collaborations. Together, I’m sure we can advance this field as a collaborative effort. Join the community with this link. ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐
[ECCV 2020] DTVNet: Dynamic Time-lapse Video Generation via Single Still Image [paper][code]
[SIGGRAPH Asia 2019] Animating Landscape: Self-Supervised Learning of Decoupled Motion and Appearance for Single-Image Video Synthesis [paper][code][project page]
[CVPR 2018] Learning to Generate Time-lapse Videos Using Multi-stage Dynamic Generative Adversarial Networks [paper][code][project page]
Some Other Papers
Some other interesting papers for novel view synthesis or cinemagraph.
[arXiv 2022] Make-A-Video: Text-to-Video Generation without Text-Video Data [paper][project page]
[ECCV 2022] SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image [paper][code][project page] 🚕
[CVPR 2022] Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene Video from A Single Image [paper][code][project page]
Torchaudio is a library for audio and signal processing with PyTorch. It provides I/O, signal and data processing functions, datasets, model implementations and application components.
Recent breakthroughs in generative modeling of images have been predicated on the availability of high-quality and large-scale datasebts such as MNIST, CIFAR and ImageNet. We recognized the need for an audio dataset that was as approachable as those in the image domain.
Audio signals found in the wild contain multi-scale dependencies that prove particularly difficult to model, leading many previous efforts at data-driven audio synthesis to focus on more constrained domains such as texture synthesis or training small parametric models.
We encourage the broader community to use NSynth as a benchmark and entry point into audio machine learning. We also view NSynth as a building block for future datasets and envision a high-quality multi-note dataset for tasks like generation and transcription that involve learning complex language-like dependencies.
Description
NSynth is an audio dataset containing 305,979 musical notes, each with a unique pitch, timbre, and envelope. For 1,006 instruments from commercial sample libraries, we generated four second, monophonic 16kHz audio snippets, referred to as notes, by ranging over every pitch of a standard MIDI pian o (21-108) as well as five different velocities (25, 50, 75, 100, 127). The note was held for the first three seconds and allowed to decay for the final second.
Some instruments are not capable of producing all 88 pitches in this range, resulting in an average of 65.4 pitches per instrument. Furthermore, the commercial sample packs occasionally contain duplicate sounds across multiple velocities, leaving an average of 4.75 unique velocities per pitch.
We also annotated each of the notes with three additional pieces of information based on a combination of human evaluation and heuristic algorithms:
Source: The method of sound production for the note’s instrument. This can be one of acoustic or electronic for instruments that were recorded from acoustic or electronic instruments, respectively, or synthetic for synthesized instruments. See their frequencies below.
Family: The high-level family of which the note’s instrument is a member. Each instrument is a member of exactly one family. See the complete list and their frequencies below.
Qualities: Sonic qualities of the note. See the quality descriptions and their co-occurrences below. Each note is annotated with zero or more qualities.
Format
Files
The NSynth dataset can be download in two formats:
Train [tfrecord | json/wav]: A training set with 289,205 examples. Instruments do not overlap with valid or test.
Valid [tfrecord | json/wav]: A validation set with 12,678 examples. Instruments do not overlap with train.
Test [tfrecord | json/wav]: A test set with 4,096 examples. Instruments do not overlap with train.
Below we detail how the note features are encoded in the Example protocol buffers and JSON files.
Example Features
Each Example contains the following features.
Feature
Type
Description
note
int64
A unique integer identifier for the note.
note_str
bytes
A unique string identifier for the note in the format <instrument_str>-<pitch>-<velocity>.
instrument
int64
A unique, sequential identifier for the instrument the note was synthesized from.
instrument_str
bytes
A unique string identifier for the instrument this note was synthesized from in the format <instrument_family_str>-<instrument_production_str>-<instrument_name>.
pitch
int64
The 0-based MIDI pitch in the range [0, 127].
velocity
int64
The 0-based MIDI velocity in the range [0, 127].
sample_rate
int64
The samples per second for the audio feature.
audio*
[float]
A list of audio samples represented as floating point values in the range [-1,1].
qualities
[int64]
A binary vector representing which sonic qualities are present in this note.
qualities_str
[bytes]
A list IDs of which qualities are present in this note selected from the sonic qualities list.
def function1(id): # 这里是子进程
print(f'id {id}')
def run__process(): # 这里是主进程
from multiprocessing import Process
process = [mp.Process(target=function1, args=(1,)),
mp.Process(target=function1, args=(2,)), ]
[p.start() for p in process] # 开启了两个进程
[p.join() for p in process] # 等待两个进程依次结束# run__mp() # 主线程不建议写在 if外部。由于这里的例子很简单,你强行这么做可能不会报错
if __name__ =='__main__':
run__mp() # 正确做法:主线程只能写在 if内部
使用PyTorch CUDA multiprocessing 的时候出现的错误 UserWarning: semaphore_tracker
(写于2021-03-03)
错误如下:
multiprocessing/semaphore_tracker.py:144:
UserWarning: semaphore_tracker: There appear to be 1 leaked semaphores to clean up at shutdown
len(cache))
Issue with multiprocessing semaphore tracking
def function1(id): # 这里是子进程
print(f'id {id}')
def run__process(): # 这里是主进程
from multiprocessing import Process
process = [mp.Process(target=function1, args=(1,)),
mp.Process(target=function1, args=(2,)), ]
[p.start() for p in process] # 开启了两个进程
[p.join() for p in process] # 等待两个进程依次结束# run__process() # 主线程不建议写在 if外部。由于这里的例子很简单,你强行这么做可能不会报错
if __name__ =='__main__':
run__process() # 正确做法:主线程只能写在 if内部
import time
def func2(args): # multiple parameters (arguments)# x, y = args
x = args[0] # write in this way, easier to locate errors
y = args[1] # write in this way, easier to locate errors
time.sleep(1) # pretend it is a time-consuming operation
return x - y
def run__pool(): # main process
from multiprocessing import Pool
cpu_worker_num = 3
process_args = [(1, 1), (9, 9), (4, 4), (3, 3), ]
print(f'| inputs: {process_args}')
start_time = time.time()
with Pool(cpu_worker_num) as p:
outputs = p.map(func2, process_args)
print(f'| outputs: {outputs} TimeUsed: {time.time() - start_time:.1f} \n')
'''Another way (I don't recommend)
Using 'functions.partial'. See https://stackoverflow.com/a/25553970/9293137
from functools import partial
# from functools import partial
# pool.map(partial(f, a, b), iterable)
'''
if __name__ =='__main__':
run__pool()
So yes, pipes are faster than queues – but only by 1.5 to 2 times, what did surprise me was that Python 3 is MUCH slower than Python 2 – most other tests I have done have been a bit up and down (as long as it is Python 3.4 – Python 3.2 seems to be a bit of a dog – especially for memory usage).
可以 import queue 调用Python内置的队列,在多线程里也有队列 from multiprocessing import Queue。下面提及的都是多线程的队列。
队列Queue 的功能与前面的管道Pipe非常相似:无论主进程或子进程,都能访问到队列,放进去的对象都经过了深拷贝。不同的是:管道Pipe只有两个断开,而队列Queue 有基本的队列属性,更加灵活,详细请移步Stack Overflow Multiprocessing – Pipe vs Queue。
def func1(i):
time.sleep(1)
print(f'args {i}')
def run__queue():
from multiprocessing import Process, Queue
queue = Queue(maxsize=4) # the following attribute can call in anywhere
queue.put(True)
queue.put([0, None, object]) # you can put deepcopy thing
queue.qsize() # the length of queue
print(queue.get()) # First In First Out
print(queue.get()) # First In First Out
queue.qsize() # the length of queue
process = [Process(target=func1, args=(queue,)),
Process(target=func1, args=(queue,)), ]
[p.start() for p in process]
[p.join() for p in process]
if __name__ =='__main__':
run__queue()
For those interested in using Python3.8 ‘s shared_memory module, it still has a bug which hasn’t been fixed and is affecting Python3.8/3.9/3.10 by now (2021-01-15). The bug is about resource tracker destroys shared memory segments when other processes should still have valid access. So take care if you use it in your code.
仍以前文创建 HTTP 服务为例,“http”是 Python 内置的一个包,它没有“__main__.py”文件,所以使用“-m”方式执行时,就会报错:No module named http.__main__; ‘http’ is a package and cannot be directly executed。