数据集:https://github.com/LAION-AI/audio-dataset/
https://github.com/LAION-AI/audio-dataset/blob/main/data_collection/README.md
数据集集合 (Dataset List)
This dataset list includes all the raw datasets we have found up to now. You may also find their Data Type* as well as their status*.
此数据集列表包括我们迄今为止找到的所有原始数据集。您还可以找到他们的数据类型* 以及他们的状态*。
Most datasets are made public, hence downloadable through the URL in the list. You may find download scripts for some of them in audio-dataset/utils/
. For those datasets who do not have any link in the list, they are purchased by LAION hence we can not make it public due to license issue. Do please contact us if you want to process them.
大多数数据集都是公开的,因此可以通过列表中的 URL 下载。您可以在 audio-dataset/utils/
中找到其中一些的下载脚本。对于那些列表中没有任何链接的数据集,它们是由 LAION 购买的,因此由于许可证问题,我们无法公开。如果您想处理它们,请联系我们。
For using the excat processed dataset for training your models, please contact LAION.
如需使用 excat 处理的数据集来训练您的模型,请联系 LAION。
*Data Type Terminology Explanation
*数据类型术语解释
- Caption: A natural language sentence describing the content of the audio
字幕:描述音频内容的自然语言句子Example: A wooden door creaks open and closed multiple times
示例:木门吱吱作响地打开和关闭多次 - Class label: Labels that are often manually annotated for classification in curated datasets. Each audio clip can be assigned with one or several class label.
类标签:通常在特选数据集中手动注释以进行分类的标签。可以为每个音频剪辑分配一个或多个类标签。Example: Cat, Dog, Water 示例:猫、狗、水 - Tag: Tags of the audio that are commenly associated with data in website. A audio clip may be associated to several tags
标签:与网站中的数据相关的音频标签。一个音频剪辑可能与多个标签相关联Example: phone recording, city, sound effect
示例:电话录音、城市、音效 - Relative text: Any text about the audio. May be comments on the audio, or other metadata. Can be very long.
相对文本:有关音频的任何文本。可能是对音频的评论或其他元数据。可以很长。Exmaple: An impact sound that I would hear over an action scene, with some cinematic drums for more tension and a high pitched preexplosion sound followed by the impact of the explosion. Please rate only if you like it, haha. Thanks!
示例:我在动作场景中会听到的撞击声,一些电影鼓声更加紧张,爆炸前发出高亢的音调,然后是爆炸的冲击声。请只评价你喜欢的,哈哈。谢谢! - Transcription: Transcription of human speech. Only used for Speech Datasets.
转录:人类语音的转录。仅用于语音数据集。 - Translation: Transcription in an other language of what the speaker uses.
翻译:说话人使用的其他语言的转录。
*Status Terminology Explanation
*状态术语解释
- processed: Dataset already converted to webdataset format.
processed:数据集已转换为 webdataset 格式。 - processing: Dataset already downloaded and the processing going on.
processing:数据集已下载,处理正在进行中。 - meatadata downloaded: We have already scraped the dataset website, wheras the dataset itself is not yet downloaded.
meatadata downloaded:我们已经抓取了数据集网站,但数据集本身尚未下载。 - assigned: Someone have begun the work on the dataset.
assigned:有人已开始处理数据集。
General Sound Dataset General Sound 数据集
Name 名字 | Description 描述 | URL | Data Type 数据类型 | Total Duration 总持续时间 | Total Audio Number 音频总数 | Status 地位 |
---|---|---|---|---|---|---|
AudioSet 音频集 | The AudioSet dataset is a large-scale collection of human-labeled 10-second sound clips drawn from YouTube videos. To collect all our data we worked with human annotators who verified the presence of sounds they heard within YouTube segments. To nominate segments for annotation, we relied on YouTube metadata and content-based search. The sound events in the dataset consist of a subset of the AudioSet ontology. You can learn more about the dataset construction in our ICASSP 2017 paper. Explore the dataset annotations by sound class below. There are 2,084,320 YouTube videos containing 527 labels AudioSet 数据集是从 YouTube 视频中提取的人工标记的 10 秒声音剪辑的大规模集合。为了收集我们的所有数据,我们与人工注释者合作,他们验证了他们在 YouTube 片段中听到的声音是否存在。为了提名要注释的片段,我们依靠 YouTube 元数据和基于内容的搜索。数据集中的声音事件由 AudioSet 本体的子集组成。您可以在我们的 ICASSP 2017 论文中了解有关数据集构建的更多信息。探索下面的 sound 类数据集注释。有 2,084,320 个 YouTube 视频,包含 527 个标签 | Click here 点击这里 | class labels, video, audio 类标签, 视频, 音频 | 5420hrs 5420 小时 | 1951460 | processed 处理 |
AudioSet Strong AudioSet 强 | Audio events from AudioSet clips with singal class label annotation 来自 AudioSet 剪辑的音频事件,带有 singal 类标签注释 | Click here 点击这里 | 1 class label, video, audio 1 个类标签、视频、音频 | 625.93hrs 625.93 小时 | 1074359 | processed (@marianna13#7139) 已处理 (@marianna13#7139) |
BBC sound effects BBC 音效 | 33066 sound effects with text description. Type: mostly environmental sound. Each audio has a natural text description. (need to see check the license) 33066 个带有文本描述的音效。类型:主要是环境声音。每个音频都有一个自然的文本描述。(需要查看 检查许可证) | Click here 点击这里 | 1 caption, audio 1 个字幕、音频 | 463.48hrs 463.48 小时 | 15973 | processed 处理 |
AudioCaps 音频帽 | 40 000 audio clips of 10 seconds, organized in three splits; a training slipt, a validation slipt, and a testing slipt. Type: environmental sound. 40 000 个 10 秒的音频剪辑,分为三个部分;训练滑道、验证滑道和测试滑道。类型:环境声音。 | Click here 点击这里 | 1 caption, audio 1 个字幕、音频 | 144.94hrs 144.94 小时 | 52904 | processed 处理 |
Audio Caption Hospital & Car Dataset 音频字幕医院和汽车数据集 | 3700 audio clips from “Hospital” scene and around 3600 audio clips from the “Car” scene. Every audio clip is 10 seconds long and is annotated with five captions. Type: environmental sound. 来自 “Hospital” 场景的 3700 个音频剪辑和来自 “Car” 场景的大约 3600 个音频剪辑。每个音频剪辑时长 10 秒,并带有 5 个字幕。类型:环境声音。 | Click here 点击这里 | 5 captions, audio 5 个字幕、音频 | 10.64 + 20.91hrs 10.64 + 20.91 小时 | 3709 + 7336 | we don’t need that 我们不需要那个 |
Clotho dataset Clotho 数据集 | Clotho consists of 6974 audio samples, and each audio sample has five captions (a total of 34 870 captions). Audio samples are of 15 to 30 s duration and captions are eight to 20 words long. Type: environmental sound. Clotho 由 6974 个音频样本组成,每个音频样本有 5 个字幕(总共 34870 个字幕)。音频样本的持续时间为 15 到 30 秒,字幕的长度为 8 到 20 个单词。类型:环境声音。 | Click here 点击这里 | 5 captions, audio 5 个字幕、音频 | 37.0hrs 37.0 小时 | 5929 | processed 处理 |
Audiostock 音频库 | Royalty Free Music Library. 436864 audio effects(of which 10k available), each with a text description. 免版税音乐库。436864 个音频效果(其中 10k 可用),每个效果都有文字描述。 | Click here 点击这里 | 1 caption & tags, audio 1个字幕和标签,音频 | 46.30hrs 46.30 小时 | 10000 | 10k sound effects processed(@marianna13#7139) 处理 10k 音效(@marianna13#7139) |
ESC-50 | 2000 environmental audio recordings with 50 classes 2000 个环境音频记录,50 个班级 | Click here 点击这里 | 1 class label, audio 1 个类标签,音频 | 2.78hrs 2.78 小时 | 2000 | processed(@marianna13#7139) 已处理(@marianna13#7139) |
VGG-Sound VGG 声音 | VGG-Sound is an audio-visual correspondent dataset consisting of short clips of audio sounds, extracted from videos uploaded to YouTube VGG-Sound 是一个视听通讯员数据集,由从上传到 YouTube 的视频中提取的音频短片组成 | Click here 点击这里 | 1 class label, video, audio 1 个类标签、视频、音频 | 560hrs 560 小时 | 200,000 + | processed(@marianna13#7139) 已处理(@marianna13#7139) |
FUSS | The Free Universal Sound Separation (FUSS) dataset is a database of arbitrary sound mixtures and source-level references, for use in experiments on arbitrary sound separation. FUSS is based on FSD50K corpus. Free Universal Sound Separation (FUSS) 数据集是一个包含任意混声和源级参考的数据库,用于任意声分离的实验。FUSS 基于 FSD50K 语料库。 | Click here 点击这里 | no class label, audio 无类标签、音频 | 61.11hrs 61.11 小时 | 22000 | |
UrbanSound8K 都市之声8K | 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes 来自 10 个类别的 8732 个城市声音的标记声音摘录 (<=4s) | Click here 点击这里 | 1 class label, audio 1 个类标签,音频 | 8.75hrs 8.75 小时 | 8732 | processed(@Yuchen Hui#8574) 已处理(@Yuchen Hui#8574) |
FSD50K | 51,197 audio clips of 200 classes 200 个班级的 51,197 个音频剪辑 | Click here 点击这里 | class labels, audio 类标签, 音频 | 108.3hrs 108.3 小时 | 51197 | processed(@Yuchen Hui#8574) 已处理(@Yuchen Hui#8574) |
YFCC100M | YFCC100M is a that dataset contains a total of 100 million media objects, of which approximately 99.2 million are photos and 0.8 million are videos, all of which carry a Creative Commons license, including 8081 hours of audio. YFCC100M 是一个 THAT 数据集,总共包含 1 亿个媒体对象,其中大约 9920 万个是照片,80 万个是视频,所有这些对象都带有 Creative Commons 许可证,包括 8081 小时的音频。 | Click here 点击这里 | title, tags, audio, video, Flickr identifier, owner name, camera, geo, media source 标题、标签、音频、视频、Flickr 标识符、所有者名称、相机、地理位置、媒体来源 | 8081hrs 8081 小时 | requested access (@marianna13#7139) 请求的访问权限 (@marianna13#7139) | |
ACAV100M | 100M video clips with audio, each 10 sec, with automatic AudioSet, Kinetics400 and Imagenet labels. -> Noisy, but LARGE. 100M 带音频的视频剪辑,每段 10 秒,带有自动 AudioSet、Kinetics400 和 Imagenet 标签。-> 吵闹,但很大。 | Click here 点击这里 | class labels/tags, audio 类标签/标签、音频 | 31 years 31 岁 | 100 million 1 亿 | |
Free To Use Sounds 免费使用声音 | 10000+ for 23$ 🙂 10000+ 23 美元 🙂 | Click here 点击这里 | 1 caption & tags, audio 1个字幕和标签,音频 | 175.73hrs 175.73 小时 | 6370 | |
MACS – Multi-Annotator Captioned Soundscapes MACS – 多注释者字幕音景 | This is a dataset containing audio captions and corresponding audio tags for a number of 3930 audio files of the TAU Urban Acoustic Scenes 2019 development dataset (airport, public square, and park). The files were annotated using a web-based tool. Each file is annotated by multiple annotators that provided tags and a one-sentence description of the audio content. The data also includes annotator competence estimated using MACE (Multi-Annotator Competence Estimation). 这是一个数据集,其中包含 TAU Urban Acoustic Scenes 2019 开发数据集(机场、公共广场和公园)的 3930 个音频文件的字幕和相应的音频标签。这些文件使用基于 Web 的工具进行注释。每个文件都由多个注释器进行注释,这些注释器提供音频内容的标记和一句话描述。数据还包括使用 MACE(多注释者能力估计)估计的注释者能力。 | Click here 点击这里 | multiple captions & tags, audio 多个字幕和标签,音频 | 10.92hrs 10.92 小时 | 3930 | processed(@marianna13#7139 & @krishna#1648 & Yuchen Hui#8574) 已处理(@marianna13#7139 & @krishna#1648 & Yuchen Hui#8574) |
Sonniss Game effects Sonniss 游戏效果 | Sound effects 音效 | no link 无链接 | tags & filenames, audio 标签和文件名,音频 | 84.6hrs 84.6 小时 | 5049 | processed 处理 |
WeSoundEffects | Sound effects 音效 | no link 无链接 | tags & filenames, audio 标签和文件名,音频 | 12.00hrs 12.00 小时 | 488 | processed 处理 |
Paramount Motion – Odeon Cinematic Sound Effects Paramount Motion – Odeon 电影音效 | Sound effects 音效 | no link 无链接 | 1 tag, audio 1 天,音频 | 19.49hrs 19.49 小时 | 4420 | processed 处理 |
Free Sound 免费声音 | Audio with text description (noisy) 带有文字描述的音频(嘈杂) | Click here 点击这里 | pertinent text, audio 相关文本、音频 | 3003.38hrs 3003.38 小时 | 515581 | processed(@Chr0my#0173 & @Yuchen Hui#8574) 已处理(@Chr0my#0173 & @Yuchen Hui#8574) |
Sound Ideas 声音创意 | Sound effects library 音效库 | Click here 点击这里 | 1 caption, audio 1 个字幕、音频 | |||
Boom Library Boom 库 | Sound effects library 音效库 | Click here 点击这里 | 1 caption, audio 1 个字幕、音频 | assigned(@marianna13#7139) 已分配(@marianna13#7139) | ||
Epidemic Sound (Sound effect part) 疫情之声(音效部分) | Royalty free music and sound effects 免版税的音乐和音效 | Click here 点击这里 | Class labels, audio 类标签、音频 | 220.41hrs 220.41 小时 | 75645 | metadata downloaded(@Chr0my#0173), processed (@Yuchen Hui#8547) 元数据已下载(@Chr0my#0173),已处理(@Yuchen Hui#8547) |
Audio Grounding dataset Audio Grounding 数据集 | The dataset is an augmented audio captioning dataset. Hard to discribe. Please refer to the URL for details. 该数据集是一个增强的音频字幕数据集。很难描述。详情请参阅网址。 | Click here 点击这里 | 1 caption, many tags,audio 1 个字幕、多个标签、音频 | 12.57hrs 12.57 小时 | 4590 | |
Fine-grained Vocal Imitation Set 细粒度的 Vocal Simitation Set | This dataset includes 763 crowd-sourced vocal imitations of 108 sound events. 该数据集包括 108 个声音事件的 763 个众包人声模拟。 | Click here 点击这里 | 1 class label, audio 1 个类标签,音频 | 1.55hrs 1.55 小时 | 1468 | processed(@marianna13#7139) 已处理(@marianna13#7139) |
Vocal Imitation 人声模仿 | The VocalImitationSet is a collection of crowd-sourced vocal imitations of a large set of diverse sounds collected from Freesound (https://freesound.org/), which were curated based on Google’s AudioSet ontology (https://research.google.com/audioset/). VocalImitationSet 是从 Freesound (https://freesound.org/) 收集的大量不同声音的众包人声模仿集合,这些声音是根据 Google 的 AudioSet 本体 (https://research.google.com/audioset/) 策划的。 | Click here 点击这里 | 1 class label, audio 1 个类标签,音频 | 24.06hrs 24.06 小时 | 9100 files 9100 个文件 | processed(@marianna13#7139) 已处理(@marianna13#7139) |
VocalSketch 声乐素描 | Dataset contains thousands of vocal imitations of a large set of diverse sounds.The dataset also contains data on hundreds of people’s ability to correctly label these vocal imitations, collected via Amazon’s Mechanical Turk Dataset 包含大量不同声音的数千个人声模仿。该数据集还包含数百人正确标记这些人声模仿的能力数据,这些数据是通过亚马逊的 Mechanical Turk 收集的 | Click here 点击这里 | 1 class label, audio 1 个类标签,音频 | 18.86hrs 18.86 小时 | 16645 | processed(@marianna13#7139) 已处理(@marianna13#7139) |
VimSketch Dataset VimSketch 数据集 | VimSketch Dataset combines two publicly available datasets(VocalSketch + Vocal Imitation, but Vimsketch delete some parts of the previous two datasets), VimSketch 数据集结合了两个公开可用的数据集(VocalSketch + Vocal Imitation,但 Vimsketch 删除了前两个数据集的部分), | Click here 点击这里 | class labels, audio 类标签, 音频 | Not important 不重要 | Not important 不重要 | |
OtoMobile Dataset OtoMobile 数据集 | OtoMobile dataset is a collection of recordings of failing car components, created by the Interactive Audio Lab at Northwestern University. OtoMobile consists of 65 recordings of vehicles with failing components, along with annotations. OtoMobile 数据集是由西北大学交互式音频实验室创建的故障汽车部件的录音集合。OtoMobile 由 65 条组件出现故障的车辆的录音以及注释组成。 | Click here 点击这里 (restricted access) (限制访问) | class labels & tags, audio 类标签和标签,音频 | Unknown 未知 | 59 | |
DCASE17Task 4 DCASE17任务 4 | DCASE Task 4 Large-scale weakly supervised sound event detection for smart cars DCASE 任务 4 面向智能汽车的大规模弱监督声音事件检测 | Click here 点击这里 | ||||
Knocking Sound Effects With Emotional Intentions 带有情感意图的 Knocking Sound Effects | A dataset of knocking sound effects with emotional intention recorded at a professional foley studio. Five type of emotions to be portrayed in the dataset: anger, fear, happiness, neutral and sadness. 在专业拟音工作室录制的带有情感意图的敲击音效数据集。数据集中要描绘的五种情绪:愤怒、恐惧、快乐、中立和悲伤。 | Click here 点击这里 | 1 class label & audio 1个类标签和音频 | 500 | processed(@marianna13#7139) 已处理(@marianna13#7139) | |
WavText5Ks WavText5K | WavText5K collection consisting of 4525 audios, 4348 descriptions, 4525 audio titlesand 2058 tags. WavText5K 集合,包括 4525 个音频、4348 个描述、4525 个音频标题和 2058 个标签。 | Click here 点击这里 | 1 label, tags & audio 1个标签、标签和音频 | 4525 audio files 4525 个音频文件 | processed(@marianna13#7139) 已处理(@marianna13#7139) |
Speech Dataset 语音数据集
Name 名字 | Description 描述 | URL | Data Type 数据类型 | Status 地位 |
---|---|---|---|---|
People’s Speech 人民致辞 | 30k+ hours en-text 30k+ 小时 en-text | Click here 点击这里 | transcription, audio 转录, 音频 | assigned(@PiEquals4#1909) 已分配(@PiEquals4#1909) |
Multilingual Spoken Words 多语言口语 | 6k+ hours 1sec audio clips with words of 50+ languages 6k+ 小时 1 秒音频剪辑,包含 50+ 种语言的单词 | Click here 点击这里 | transcription, audio 转录, 音频 | processing(@PiEquals4#1909) 处理中(@PiEquals4#1909) |
AISHELL-2 | Contains 1000 hours of clean read-speech data from iOS is free for academic usage. 包含 1000 小时的 iOS 清晰语音朗读数据,可免费用于学术用途。 | Click here 点击这里 | transcription, audio 转录, 音频 | |
Surfing AI Speech Dataset 冲浪 AI 语音数据集 | 30k+ – proprietary 30k+ – 专有 | Click here 点击这里 | transcription, audio 转录, 音频 | |
LibriSpeech Libri演讲 | A collection of approximately 1,000 hours of audiobooks that are a part of the LibriVox project. 大约 1,000 小时的有声读物集合,是 LibriVox 项目的一部分。 | Click here 点击这里 | transcription, audio 转录, 音频 | processed(@marianna13#7139) 已处理(@marianna13#7139) |
Libri-light 光书 | 60K hours of unlabelled speech from audiobooks in English and a small labelled dataset (10h, 1h, and 10 min) plus metrics, trainable baseline models, and pretrained models that use these datasets. 来自英语有声读物的 60K 小时未标记语音和一个小型标记数据集(10 小时、1 小时和 10 分钟)以及使用这些数据集的指标、可训练基线模型和预训练模型。 | Click here 点击这里 | transcription, audio 转录, 音频 | |
Europarl-ST Europarl-ST (欧洲公园-ST) | A Multilingual Speech Translation Corpus, that contains paired audio-text samples for Speech Translation, constructed using the debates carried out in the European Parliament in the period between 2008 and 2012. 多语言语音翻译语料库,包含用于语音翻译的成对音频文本样本,使用 2008 年至 2012 年期间在欧洲议会进行的辩论构建。 | Click here 点击这里 | translation, audio 翻译, 音频 | processed(@Antoniooooo#4758) 已处理(@Antoniooooo#4758) |
CoVoST 考沃斯特 | A large-scale multilingual ST corpus based on Common Voice, to foster ST research with the largest ever open dataset. Its latest version covers translations from English into 15 languages—Arabic, Catalan, Welsh, German, Estonian, Persian, Indonesian, Japanese, Latvian, Mongolian, Slovenian, Swedish, Tamil, Turkish, Chinese—and from 21 languages into English, including the 15 target languages as well as Spanish, French, Italian, Dutch, Portuguese, Russian. It has total 2,880 hours of speech and is diversified with 78K speakers. 基于 Common Voice 的大规模多语言 ST 语料库,以有史以来最大的开放数据集促进 ST 研究。其最新版本涵盖从英语翻译成 15 种语言—阿拉伯语、加泰罗尼亚语、威尔士语、德语、爱沙尼亚语、波斯语、印度尼西亚语、日语、拉脱维亚语、蒙古语、斯洛文尼亚语、瑞典语、泰米尔语、土耳其语、中文—以及从 21 种语言翻译成英语,包括 15 种目标语言以及西班牙语、法语、意大利语、荷兰语、葡萄牙语、俄语。它总共有 2,880 小时的语音,并拥有 78K 扬声器。 | Click here 点击这里 | translation & transcription, audio 翻译和转录,音频 | assigned(@PiEquals4#1909) 已分配(@PiEquals4#1909) |
GigaSpeech Giga语音 | An evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 40,000 hours of total audio suitable for semi-supervised and unsupervised training. 一个不断发展的多域英语语音识别语料库,具有 10000 小时的高质量标记音频(适用于监督训练)和 40000 小时的总音频(适用于半监督和无监督训练)。 | Click here 点击这里 | transcription, audio 转录, 音频 | processing(@PiEquals4#1909) 处理中(@PiEquals4#1909) |
LJSpeech Dataset LJSpeech 数据集 | This is a public domain speech dataset consisting of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books. A transcription is provided for each clip. Clips vary in length from 1 to 10 seconds and have a total length of approximately 24 hours. 这是一个公共领域的语音数据集,由 13,100 个简短的音频剪辑组成,其中单个说话人朗读了 7 本非小说类书籍的段落。为每个剪辑提供转录。剪辑的长度从 1 秒到 10 秒不等,总长度约为 24 小时。 | Click here 点击这里 Or 或 download 下载 | transcription, audio 转录, 音频 | processed(@PiEquals4#1909) 已处理(@PiEquals4#1909) |
Spotify English-Language Podcast Dataset Spotify 英语播客数据集 | This dataset consists of 100,000 episodes from different podcast shows on Spotify. The dataset is available for research purposes. We are releasing this dataset more widely to facilitate research on podcasts through the lens of speech and audio technology, natural language processing, information retrieval, and linguistics. The dataset contains about 50,000 hours of audio, and over 600 million transcribed words. The episodes span a variety of lengths, topics, styles, and qualities. Only non-commercial research is permitted on this dataset 该数据集包含来自 Spotify 上不同播客节目的 100,000 集。该数据集可用于研究目的。我们正在更广泛地发布此数据集,以便通过语音和音频技术、自然语言处理、信息检索和语言学的视角来促进对播客的研究。该数据集包含大约 50000 小时的音频和超过 6 亿个转录单词。这些剧集跨越各种长度、主题、风格和质量。此数据集只允许进行非商业研究 | Click here 点击这里 | transcription, audio 转录, 音频 | requested access(@marianna13#7139) 请求访问(@marianna13#7139) |
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) 瑞尔森情感语音和歌曲视听数据库 (RAVDESS) | The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7356 files (total size: 24.8 GB). The database contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) 包含 7356 个文件(总大小:24.8 GB)。该数据库包含 24 名专业演员(12 名女性,12 名男性),以中性的北美口音发音两个词汇匹配的陈述。 | Click here 点击这里 | transcription, audio 转录, 音频 | processed(@PiEquals4#1909) 已处理(@PiEquals4#1909) |
CREMA-D | CREMA-D is a data set of 7,442 original clips from 91 actors. These clips were from 48 male and 43 female actors between the ages of 20 and 74 coming from a variety of races and ethnicities (African America, Asian, Caucasian, Hispanic, and Unspecified). Actors spoke from a selection of 12 sentences. The sentences were presented using one of six different emotions (Anger, Disgust, Fear, Happy, Neutral and Sad) and four different emotion levels (Low, Medium, High and Unspecified). CREMA-D 是一个包含来自 91 位演员的 7,442 个原始剪辑的数据集。这些剪辑来自 48 名男性演员和 43 名女性演员,年龄在 20 至 74 岁之间,来自不同种族和民族(非裔美国人、亚洲人、高加索人、西班牙裔和未指定人)。演员们从精选的 12 句话中发言。这些句子使用六种不同的情绪(愤怒、厌恶、恐惧、快乐、中立和悲伤)中的一种和四种不同的情绪级别(低、中、高和未指定)来呈现。 | Click here 点击这里 | transcription, audio 转录, 音频 | processed(@PiEquals4#1909) 已处理(@PiEquals4#1909) |
EmovV-DB | The emotional Voice Database. This dataset is built for the purpose of emotional speech synthesis. It includes recordings for four speakers- two males and two females. The emotional styles are neutral, sleepiness, anger, disgust and amused. 情感语音数据库。此数据集是为情感语音合成而构建的。它包括四个扬声器的录音 – 两个男性和两个女性。情绪风格是中性、困倦、愤怒、厌恶和逗乐。 | Click here 点击这里 | transcription, class labels, audio 转录、类标签、音频 | assigned(@PiEquals4#1909) 已分配(@PiEquals4#1909) |
CMU_Arctic | The databases consist of around 1150 utterances carefully selected from out-of-copyright texts from Project Gutenberg. The databses include US English male (bdl) and female (slt) speakers (both experinced voice talent) as well as other accented speakers. 这些数据库包含大约 1150 条话语,这些话语是从 Project Gutenberg 的版权外文本中精心挑选出来的。数据库包括美国英语男性 (bdl) 和女性 (slt) 说话人(均为经验丰富的配音人才)以及其他带口音的说话人。 | Click here 点击这里 | transcription, tags, audio,…TBD 转录、标签、音频,…待定 | processed(@marianna13#7139) 已处理(@marianna13#7139) |
IEMOCAP database IEMOCAP 数据库 | The Interactive Emotional Dyadic Motion Capture (IEMOCAP) database is an acted, multimodal and multispeaker database. It contains approximately 12 hours of audiovisual data, including video, speech, motion capture of face, text transcriptions. 交互式情感二元动作捕捉 (IEMOCAP) 数据库是一个行动、多模态和多说话人数据库。它包含大约 12 小时的视听数据,包括视频、语音、面部动作捕捉、文本转录。 | Click here 点击这里 | transcription, video, audio,…TBD 转录、视频、音频,…待定 | assigned(@marianna13#7139) 已分配(@marianna13#7139) |
YouTube dataset YouTube 数据集 | youtube video/audio + automatically generated subtitle. For details, please ask @marianna13#7139. YouTube 视频/音频 + 自动生成的字幕。详情请咨询 @marianna13#7139。 | No link (please contact @marianna13#7139) 无链接(请联系 @marianna13#7139) | transcription, audio, video 转录, 音频, 视频 | processed(@marianna13#7139) 已处理(@marianna13#7139) |
The Hume Vocal Burst Competition Dataset (H-VB) 休谟人声爆发竞赛数据集 (H-VB) | labels, audio 标签, 音频 | Click here 点击这里 | labels, audio 标签, 音频 | assigned(@Yuchen Hui#8574) 已分配(@Yuchen Hui#8574) |
Music Dataset 音乐数据集
Name | Description 描述 | URL | Text Type 文本类型 | Status 地位 |
---|---|---|---|---|
Free Music Archive 免费音乐档案 | We introduce the Free Music Archive (FMA), an open and easily accessible dataset suitable for evaluating several tasks in MIR, a field concerned with browsing, searching, and organizing large music collections. The community’s growing interest in feature and end-to-end learning is however restrained by the limited availability of large audio datasets. The FMA aims to overcome this hurdle by providing 917 GiB and 343 days of Creative Commons-licensed audio from 106,574 tracks from 16,341 artists and 14,854 albums, arranged in a hierarchical taxonomy of 161 genres. It provides full-length and high-quality audio, pre-computed features, together with track- and user-level metadata, tags, and free-form text such as biographies. We here describe the dataset and how it was created, propose a train/validation/test split and three subsets, discuss some suitable MIR tasks, and evaluate some baselines for genre recognition. Code, data, and usage examples are available at https://github.com/mdeff/fma. 我们介绍了免费音乐档案 (FMA),这是一个开放且易于访问的数据集,适用于评估 MIR 中的多项任务,MIR 是一个与浏览、搜索和组织大型音乐收藏有关的领域。然而,社区对功能和端到端学习的兴趣日益浓厚,但由于大型音频数据集的可用性有限,这限制了他们。FMA 旨在通过提供来自 16,341 位艺术家和 14,854 张专辑的 106,574 首曲目的 917 GiB 和 343 天的知识共享许可音频来克服这一障碍,这些音频按照 161 种流派的分层分类法排列。它提供全长和高质量的音频、预计算功能,以及轨道和用户级元数据、标签和自由格式的文本,例如传记。我们在这里描述了数据集及其创建方式,提出了一个训练/验证/测试拆分和三个子集,讨论了一些合适的 MIR 任务,并评估了一些流派识别的基线。代码、数据和用法示例可在 https://github.com/mdeff/fma 中找到。 | Click here 点击这里 | tags/class labels, audio 标签/类标签, 音频 | processed(@marianna13#7139) 已处理(@marianna13#7139) |
MusicNet | MusicNet is a collection of 330 freely-licensed classical music recordings, together with over 1 million annotated labels indicating the precise time of each note in every recording, the instrument that plays each note, and the note’s position in the metrical structure of the composition. The labels are acquired from musical scores aligned to recordings by dynamic time warping. The labels are verified by trained musicians; we estimate a labeling error rate of 4%. We offer the MusicNet labels to the machine learning and music communities as a resource for training models and a common benchmark for comparing results. URL: https://homes.cs.washington.edu/~thickstn/musicnet.html MusicNet 是 330 张免费授权的古典音乐录音的集合,以及超过 100 万个带注释的标签,这些标签指示了每个录音中每个音符的精确时间、演奏每个音符的乐器以及音符在乐曲的度量结构中的位置。标签是通过动态时间扭曲从与录音对齐的乐谱中获得的。唱片公司由训练有素的音乐家进行验证;我们估计标记错误率为 4%。我们为机器学习和音乐社区提供 MusicNet 标签,作为训练模型的资源和比较结果的通用基准。网址:https://homes.cs.washington.edu/~thickstn/musicnet.html | Click here 点击这里 | class labels, audio 类标签, 音频 | processed(@IYWO#9072) 已处理(@IYWO#9072) |
MetaMIDI Dataset | We introduce the MetaMIDI Dataset (MMD), a large scale collection of 436,631 MIDI files and metadata. In addition to the MIDI files, we provide artist, title and genre metadata that was collected during the scraping process when available. MIDIs in (MMD) were matched against a collection of 32,000,000 30-second audio clips retrieved from Spotify, resulting in over 10,796,557 audio-MIDI matches. In addition, we linked 600,142 Spotify tracks with 1,094,901 MusicBrainz recordings to produce a set of 168,032 MIDI files that are matched to MusicBrainz database. These links augment many files in the dataset with the extensive metadata available via the Spotify API and the MusicBrainz database. We anticipate that this collection of data will be of great use to MIR researchers addressing a variety of research topics. 我们介绍 MetaMIDI 数据集 (MMD),这是一个包含 436,631 个 MIDI 文件和元数据的大型集合。除了 MIDI 文件之外,我们还提供在抓取过程中收集的艺术家、标题和流派元数据(如果可用)。(MMD) 中的 MIDI 与从 Spotify 检索的 32,000,000 个 30 秒音频剪辑集合进行匹配,从而产生超过 10,796,557 个音频-MIDI 匹配。此外,我们将 600,142 个 Spotify 曲目与 1,094,901 个 MusicBrainz 录音链接起来,生成了一组与 MusicBrainz 数据库匹配的 168,032 个 MIDI 文件。这些链接通过通过 Spotify API 和 MusicBrainz 数据库提供的大量元数据来扩充数据集中的许多文件。我们预计这些数据收集将对处理各种研究主题的 MIR 研究人员非常有用。 | Click here 点击这里 | tags, audio 标签, 音频 | |
MUSDB18-HQ | MUSDB18 consists of a total of 150 full-track songs of different styles and includes both the stereo mixtures and the original sources, divided between a training subset and a test subset. MUSDB18 由总共 150 首不同风格的全轨歌曲组成,包括立体声混音和原始源,分为训练子集和测试子集。 | Click here 点击这里 | 1 class label, audio 1 个类标签,音频 | processed(@marianna13#7139) 已处理(@marianna13#7139) |
Cambridge-mt Multitrack Dataset Cambridge-mt 多轨数据集 | Here’s a list of multitrack projects which can be freely downloaded for mixing practice purposes. All these projects are presented as ZIP archives containing uncompressed WAV files (24-bit or 16-bit resolution and 44.1kHz sample rate). 以下是可以免费下载用于混音练习目的的多轨项目列表。所有这些项目都以 ZIP 档案的形式呈现,其中包含未压缩的 WAV 文件(24 位或 16 位分辨率和 44.1kHz 采样率)。 | Click here 点击这里 | 1 class label, audio 1 个类标签,音频 | processed(@marianna13#7139) 已处理(@marianna13#7139) |
Slakh 斯拉赫 | The Synthesized Lakh (Slakh) Dataset contains 2100 automatically mixed tracks and accompanying MIDI files synthesized using a professional-grade sampling engine. 合成的 Lakh (Slakh) 数据集包含 2100 个自动混合的轨道和随附的 MIDI 文件,这些文件使用专业级采样引擎合成。 | Click here 点击这里 | 1 class label, audio 1 个类标签,音频 | processed(krishna#1648) 已处理(Krishna#1648) |
Tunebot | The Tunebot project is an online Query By Humming system. Users sing a song to Tunebot and it returns a ranked list of song candidates available on Apple’s iTunes website. The database that Tunebot compares to sung queries is crowdsourced from users as well. Users contribute new songs to Tunebot by singing them on the Tunebot website. The more songs people contribute, the better Tunebot works. Tunebot is no longer online but the dataset lives on. Tunebot 项目是一个在线 Query By Humming 系统。用户向 Tunebot 唱歌,它会返回 Apple iTunes 网站上可用的候选歌曲的排名列表。Tunebot 与唱歌查询进行比较的数据库也是从用户那里众包的。用户通过在 Tunebot 网站上演唱新歌来向 Tunebot 贡献新歌。人们贡献的歌曲越多,Tunebot 的效果就越好。Tunebot 不再在线,但数据集仍然存在。 | Click here 点击这里 | song name(so transcription), audio 歌曲名称(SO 转录)、音频 | processed(@marianna13#7139) 已处理(@marianna13#7139) |
Juno | A music review webset 音乐评论网络集 | Click here 点击这里 | perinent text/class lables, audio Perinent text/类标签, 音频 | meatadata downloaded(@dicknascarsixtynine#3885) & processed(@marianna13#7139) Meatadata 已下载(@dicknascarsixtynine#3885) & 已处理(@marianna13#7139) |
Pitch Fork | Music review website 音乐评论网站 | Click here 点击这里 | pertinent text (long paragraphs), audio 相关文本(长段落)、音频 | |
Genius | Music lyrics website Music 歌词网站 | pertinent text (long paragraphs), audio 相关文本(长段落)、音频 | assigned(@marianna13#7139) 已分配(@marianna13#7139) | |
IDMT-SMT-Audio-Effects | The IDMT-SMT-Audio-Effects database is a large database for automatic detection of audio effects in recordings of electric guitar and bass and related signal processing. IDMT-SMT-Audio-Effects 数据库是一个大型数据库,用于自动检测电吉他和贝斯录音中的音频效果以及相关的信号处理。 | Click here 点击这里 | class label, audio 类标签, 音频 | |
MIDI50K | Music generated by MIDIFILES using the synthesizer available at https://pypi.org/project/midi2audio/ MIDIFILES 使用 https://pypi.org/project/midi2audio/ 提供的合成器生成的音乐 | Temporary not available, will be added soon 暂时不可用,将很快添加 | MIDI files, audio | Processing(@marianna13#7139) |
MIDI130K | Music generated by MIDIFILES using the synthesizer available at https://pypi.org/project/midi2audio/ MIDIFILES 使用 https://pypi.org/project/midi2audio/ 提供的合成器生成的音乐 | Temporary not available, will be added soon 暂时不可用,将很快添加 | MIDI files, audio MIDI 文件、音频 | Processing(@marianna13#7139) 加工中(@marianna13#7139) |
MillionSongDataset | 72222 hours of general music as 30 second clips, one million different songs. | Temporarily not available | tags, artist names, song titles, audio | |
synth1B1 | One million hours of audio: one billion 4-second synthesized sounds. The corpus is multi-modal: Each sound includes its corresponding synthesis parameters. Since it is faster to render synth1B1 in-situ than to download it, torchsynth includes a replicable script for generating synth1B1 within the GPU. | Click here 点击这里 | synthesis parameters, audio | |
Epidemic Sound (music part) | Royalty free music and sound effects 免版税的音乐和音效 | Click here 点击这里 | class label, tags, audio | assigned(@chr0my#0173) |