torchaudio – chenpaopao

官网：https://pytorch.org/audio/stable/torchaudio.html

Torchaudio is a library for audio and signal processing with PyTorch. It provides I/O, signal and data processing functions, datasets, model implementations and application components.

读取音频：

使用 torchaudio.load 加载音频数据。torchaudio.load 支持类路径对象和类文件对象。返回值是波形（tensor）和采样率（int）的元组。默认情况下，生成的 tensor 对象的类型为 torch.float32，其值在[−1.0,1.0][−1.0,1.0]内标准化。
waveform, sr = torchaudio.load(filepath, frame_offset=0 , num_frames=-1, normalize=True, channels_first=True)
参数：

filepath (str): 原始音频文件路径；
frame_offset (int): 在此之后开始读取，默认为0，以帧为单位；
num_frames (int): 读取的最大帧数。默认是-1，则表示从frame_offset直到末尾。如果给定文件中没有足够的帧，这个函数可能会返回实际剩余的帧数。
normalize (bool): 当为True时，该函数总是返回float32，并且所有的值被归一化到[-1,1]。如果输入文件是wav，且是整形，若为False时，则会输出int类型。需要注意的是，该参数仅对wav类型的文件起作用，默认是True；
channels_first (bool)—当为True时，返回的Tensor的维度是[channel, time]。否则，维数为[time, channel]，默认是True。
返回：

waveform (torch.Tensor): 如果输入文件是int类型的wav，且normalization为False，则waveform的数据就为int类型的，否则是float32；如果channel_first=True，则waveform.shape=[channel, time]。
sr (int): 采样率
重采样
waveform = torchaudio.transforms.Resample(orig_freq=16000, new_freq=16000)(waveform)
参数：

orig_freq (int, optional): 原始采样率，默认:16000；
new_freq (int, optional): 转换后的采样率，默认:16000；
resampling_method (str, optional) – 重采样方法，默认: ‘sinc_interpolation’；
waveform (torch.Tensor): 输入音频维度可以是[channel,time]，也可以是[time, channel]；
返回：

waveform (torch.Tensor): 输出音频维度和输入音频相同，但由于重采样了，time的数值会不同；
保存音频
torchaudio.save(filepath, src, sample_rate, channels_first)
参数：

firepath (str or pathlib.Path): 保存路径；
src (torch.Tensor): 音频数据，必须是二维的；(注：需要转到cpu下的tensor）
sample_rate(int): 采样率；
channels_first (bool): If True, 维度必须是[channel, time]，否则是[time, channel]。

相关文章：

发表评论 取消回复

发表评论取消回复