{"id":23593,"date":"2025-01-02T16:24:19","date_gmt":"2025-01-02T08:24:19","guid":{"rendered":"http:\/\/139.9.1.231\/?p=23593"},"modified":"2025-02-14T15:24:33","modified_gmt":"2025-02-14T07:24:33","slug":"speech-datasets-collection","status":"publish","type":"post","link":"http:\/\/139.9.1.231\/index.php\/2025\/01\/02\/speech-datasets-collection\/","title":{"rendered":"Speech Datasets Collection-\u8bed\u97f3\u6570\u636e\u96c6\u6c47\u603b"},"content":{"rendered":"\n<p class=\"has-light-gray-background-color has-background\"><em>\u6765\u6e90\uff1a<a href=\"https:\/\/github.com\/RevoSpeechTech\/speech-datasets-collection\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/RevoSpeechTech\/speech-datasets-collection<\/a><\/em><\/p>\n\n\n\n<p><strong>\u6587\u672c\u7ffb\u8bd1\u6570\u636e\u96c6\uff1a<a rel=\"noreferrer noopener\" href=\"https:\/\/opus.nlpl.eu\/\" target=\"_blank\">https:\/\/opus.nlpl.eu\/<\/a><\/strong><\/p>\n\n\n\n<h2> <strong>openslr \u4e0b\u8f7d\uff1a<\/strong><\/h2>\n\n\n\n<p>1\uff09\u4fee\u6539\u4e3a\u56fd\u5185\u5730\u5740<\/p>\n\n\n\n<p>\u4f8b\u5982 aishell\uff0c\u9ed8\u8ba4\u7684run.sh\u91cc\u5199\u7684\u662f<strong>www.openslr.org\/resources\/33<\/strong>\uff0c\u9700\u8981\u6539\u4e3a\u56fd\u5185\u7ad9\u70b9\uff0c<strong>http:\/\/openslr.magicdatatech.com\/resources\/33<\/strong>\u3002<\/p>\n\n\n\n<p>\u5176\u4ed6\u76ee\u5f55\u53ef\u4ee5\u770b\uff1a&nbsp;<a rel=\"noreferrer noopener\" href=\"http:\/\/openslr.magicdatatech.com\/resources.php\" target=\"_blank\">http:\/\/openslr.magicdatatech.com\/resources.php<\/a><\/p>\n\n\n\n<p>\u5728\u4f7f\u7528 <code>wget<\/code> \u4e0b\u8f7d\u6587\u4ef6\u65f6\uff0c\u5982\u679c\u9047\u5230\u4e0b\u8f7d\u901f\u5ea6\u6162\u7684\u95ee\u9898\uff0c\u53ef\u4ee5\u901a\u8fc7\u4ee5\u4e0b\u51e0\u79cd\u65b9\u6cd5\u52a0\u901f\u4e0b\u8f7d\uff1a<\/p>\n\n\n\n<h3>1. \u4f7f\u7528\u591a\u4e2a\u8fde\u63a5<\/h3>\n\n\n\n<p><code>wget<\/code> \u9ed8\u8ba4\u53ea\u4f7f\u7528\u5355\u4e2a\u8fde\u63a5\u8fdb\u884c\u4e0b\u8f7d\uff0c\u4f46\u662f\u4f60\u53ef\u4ee5\u4f7f\u7528 <code>aria2<\/code> \u8fd9\u79cd\u5de5\u5177\uff0c\u5b83\u652f\u6301\u591a\u7ebf\u7a0b\u4e0b\u8f7d\uff0c\u663e\u8457\u52a0\u901f\u4e0b\u8f7d\u901f\u5ea6\u3002<code>aria2<\/code> \u53ef\u4ee5\u901a\u8fc7\u4ee5\u4e0b\u547d\u4ee4\u5b89\u88c5\uff1a<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>sudo apt install aria2  # For Ubuntu\/Debian\nbrew install aria2  # For macOS\n<\/code><\/pre>\n\n\n\n<p>\u7136\u540e\u4f60\u53ef\u4ee5\u4f7f\u7528 <code>aria2<\/code> \u4e0b\u8f7d\u6587\u4ef6\uff1a<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>aria2c -x 16 -s 16 &lt;URL&gt;\n<\/code><\/pre>\n\n\n\n<p><code>-x 16<\/code> \u8868\u793a\u4f7f\u7528 16 \u4e2a\u8fde\u63a5\u6765\u4e0b\u8f7d\u6587\u4ef6\uff0c<code>-s 16<\/code> \u8868\u793a\u5c06\u4e0b\u8f7d\u6e90\u5206\u4e3a 16 \u4e2a\u90e8\u5206\u3002<\/p>\n\n\n\n<h3>2. \u4f7f\u7528 <code>--limit-rate<\/code> \u9650\u5236\u4e0b\u8f7d\u901f\u5ea6<\/h3>\n\n\n\n<p>\u867d\u7136\u8fd9\u5e76\u4e0d\u4f1a\u76f4\u63a5\u52a0\u901f\u4e0b\u8f7d\uff0c\u4f46\u5982\u679c\u4e0b\u8f7d\u7684\u901f\u5ea6\u4e0d\u7a33\u5b9a\uff0c\u8bbe\u7f6e\u4e00\u4e2a\u5408\u7406\u7684\u901f\u7387\u9650\u5236\u53ef\u4ee5\u907f\u514d\u5e26\u5bbd\u6ce2\u52a8\u5f71\u54cd\u901f\u5ea6\u3002\u5728\u547d\u4ee4\u4e2d\u52a0\u4e0a <code>--limit-rate<\/code> \u53c2\u6570\uff1a<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>wget --limit-rate=1m &lt;URL&gt;\n<\/code><\/pre>\n\n\n\n<p>\u8fd9\u5c06\u9650\u5236\u4e0b\u8f7d\u901f\u5ea6\u4e3a\u6bcf\u79d2 1 MB\u3002<\/p>\n\n\n\n<h3>3. \u542f\u7528\u65ad\u70b9\u7eed\u4f20<\/h3>\n\n\n\n<p>\u5982\u679c\u4e0b\u8f7d\u8fc7\u7a0b\u4e2d\u65ad\uff0c\u53ef\u4ee5\u4f7f\u7528 <code>-c<\/code> \u6216 <code>--continue<\/code> \u53c2\u6570\u6765\u542f\u7528\u65ad\u70b9\u7eed\u4f20\uff0c\u4ece\u4e2d\u65ad\u7684\u5730\u65b9\u7ee7\u7eed\u4e0b\u8f7d\uff1a<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>wget -c &lt;URL&gt;<\/code><\/pre>\n\n\n\n<p>This is a curated list of open speech datasets for speech-related research (mainly for Automatic Speech Recognition).<\/p>\n\n\n\n<p>Over&nbsp;<strong>110<\/strong>&nbsp;speech datasets are collected in this repository, and more than&nbsp;<strong>70<\/strong>&nbsp;datasets can be downloaded directly without further application or registration.<\/p>\n\n\n\n<p><strong>Notice:<\/strong><\/p>\n\n\n\n<ol><li>This repository does not show corresponding License of each dataset. Basically it&#8217;s OK to use these datasets for research purpose only. Please make sure the License is suitable before using for commercial purpose.<\/li><li>Some small-scale speech corpora are not shown here for concision.<\/li><\/ol>\n\n\n\n<h3>1. Data Overview<a href=\"https:\/\/github.com\/RevoSpeechTech\/speech-datasets-collection#1-data-overview\"><\/a><\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th><strong>Dataset Acquisition<\/strong><\/th><th><strong>Sup\/Unsup<\/strong><\/th><th><strong>All Languages (Hours)<\/strong><\/th><th><strong>Mandarin (Hours)<\/strong><\/th><th><strong>English (Hours)<\/strong><\/th><\/tr><\/thead><tbody><tr><td>download directly<\/td><td>supervised<\/td><td>199k +<\/td><td>2110 +<\/td><td>34k +<\/td><\/tr><tr><td>download directly<\/td><td>unsupervised<\/td><td>530k +<\/td><td>1360 +<\/td><td>68k +<\/td><\/tr><tr><td>download directly<\/td><td>total<\/td><td>729k +<\/td><td>3470 +<\/td><td>102k +<\/td><\/tr><tr><td>need application<\/td><td>supervised<\/td><td>53k +<\/td><td>16740 +<\/td><td>50k +<\/td><\/tr><tr><td>need application<\/td><td>unsupervised<\/td><td>60k +<\/td><td>12400 +<\/td><td>57k +<\/td><\/tr><tr><td>need application<\/td><td>total<\/td><td>113k +<\/td><td>29140 +<\/td><td>107k +<\/td><\/tr><tr><td>total<\/td><td>supervised<\/td><td>252k +<\/td><td>18850 +<\/td><td>84k +<\/td><\/tr><tr><td>total<\/td><td>unsupervised<\/td><td>590k +<\/td><td>13760 +<\/td><td>125k +<\/td><\/tr><tr><td>total<\/td><td>total<\/td><td>842k +<\/td><td>32610 +<\/td><td>209k +<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<ul><li><strong>Mandarin<\/strong>&nbsp;here includes Mandarin-English CS corpora.<\/li><li><strong>Sup<\/strong>&nbsp;means supervised speech corpus with high-quality transcription.<\/li><li><strong>Unsup<\/strong>&nbsp;means unsupervised or weakly-supervised speech corpus.<\/li><\/ul>\n\n\n\n<h3>2. List of ASR corpora<a href=\"https:\/\/github.com\/RevoSpeechTech\/speech-datasets-collection#2-list-of-asr-corpora\"><\/a><\/h3>\n\n\n\n<h4>a. datasets can be downloaded directly<a href=\"https:\/\/github.com\/RevoSpeechTech\/speech-datasets-collection#a-datasets-can-be-downloaded-directly\"><\/a><\/h4>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th><strong>id<\/strong><\/th><th><strong>Name<\/strong><\/th><th><strong>Language<\/strong><\/th><th><strong>Type\/Domain<\/strong><\/th><th><strong>Paper Link<\/strong><\/th><th><strong>Data Link<\/strong><\/th><th><strong>Size (Hours)<\/strong><\/th><\/tr><\/thead><tbody><tr><td>1<\/td><td>Librispeech<\/td><td>English<\/td><td>Reading<\/td><td><a href=\"https:\/\/www.danielpovey.com\/files\/2015_icassp_librispeech.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.openslr.org\/12\/\">[dataset]<\/a><\/td><td>960<\/td><\/tr><tr><td>2<\/td><td>TED_LIUM v1<\/td><td>English<\/td><td>Talks<\/td><td><a href=\"http:\/\/www.lrec-conf.org\/proceedings\/lrec2012\/pdf\/698_Paper.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.openslr.org\/19\/\">[dataset]<\/a><\/td><td>118<\/td><\/tr><tr><td>3<\/td><td>TED_LIUM v2<\/td><td>English<\/td><td>Talks<\/td><td><a href=\"http:\/\/www.lrec-conf.org\/proceedings\/lrec2014\/pdf\/1104_Paper.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.openslr.org\/19\">[dataset]<\/a><\/td><td>207<\/td><\/tr><tr><td>4<\/td><td>TED_LIUM v3<\/td><td>English<\/td><td>Talks<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/1805.04699.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.openslr.org\/51\">[dataset]<\/a><\/td><td>452<\/td><\/tr><tr><td>5<\/td><td>MLS<\/td><td>Multilingual<\/td><td>Reading<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2012.03411.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.openslr.org\/94\">[dataset]<\/a><\/td><td>50k +<\/td><\/tr><tr><td>6<\/td><td>thchs30<\/td><td>Mandarin<\/td><td>Reading<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/1512.01882.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.openslr.org\/18\/\">[dataset]<\/a><\/td><td>35<\/td><\/tr><tr><td>7<\/td><td>ST-CMDS<\/td><td>Mandarin<\/td><td>Commands<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/www.openslr.org\/38\/\">[dataset]<\/a><\/td><td>100<\/td><\/tr><tr><td>8<\/td><td>aishell<\/td><td>Mandarin<\/td><td>Recording<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/1709.05522.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.openslr.org\/33\/\">[dataset]<\/a><\/td><td>178<\/td><\/tr><tr><td>9<\/td><td>aishell-3<\/td><td>Mandarin<\/td><td>Recording<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2010.11567.pdf\">[paper]<\/a><\/td><td><a href=\"http:\/\/www.openslr.org\/93\/\">[dataset]<\/a><\/td><td>85<\/td><\/tr><tr><td>10<\/td><td>aishell-4<\/td><td>Mandarin<\/td><td>Meeting<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2104.03603.pdf\">[paper]<\/a><\/td><td><a href=\"http:\/\/www.openslr.org\/111\/\">[dataset]<\/a><\/td><td>120<\/td><\/tr><tr><td>11<\/td><td>aishell-eval<\/td><td>Mandarin<\/td><td>Misc<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/aishelltech.com\/aishell_2018_eval\">[dataset]<\/a><\/td><td>80 +<\/td><\/tr><tr><td>12<\/td><td>Primewords<\/td><td>Mandarin<\/td><td>Recording<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/www.openslr.org\/47\/\">[dataset]<\/a><\/td><td>100<\/td><\/tr><tr><td>13<\/td><td>aidatatang_200zh<\/td><td>Mandarin<\/td><td>Record<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/www.openslr.org\/62\/\">[dataset]<\/a><\/td><td>200<\/td><\/tr><tr><td>14<\/td><td>MagicData<\/td><td>Mandarin<\/td><td>Recording<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/www.openslr.org\/68\/\">[dataset]<\/a><\/td><td>755<\/td><\/tr><tr><td>15<\/td><td>MagicData-RAMC<\/td><td>Mandarin<\/td><td>Conversational<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2203.16844.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/openslr.org\/123\/\">[dataset]<\/a><\/td><td>180<\/td><\/tr><tr><td>16<\/td><td>Heavy Accent Corpus<\/td><td>Mandarin<\/td><td>Conversational<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/magichub.com\/datasets\/mandarin-heavy-accent-conversational-speech-corpus\/\">[dataset]<\/a><\/td><td>58 +<\/td><\/tr><tr><td>17<\/td><td>AliMeeting<\/td><td>Mandarin<\/td><td>Meeting<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2202.03647.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.openslr.org\/119\/\">[dataset]<\/a><\/td><td>120<\/td><\/tr><tr><td>18<\/td><td>CN-Celeb<\/td><td>Mandarin<\/td><td>Misc<\/td><td><a href=\"http:\/\/cnceleb.org\/static\/CN-Celeb_A_Challenging_Chinese_Speaker_Recognition_Dataset.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.openslr.org\/82\/\">[dataset]<\/a><\/td><td>unsup(274)<\/td><\/tr><tr><td>19<\/td><td>CN-Celeb2<\/td><td>Mandarin<\/td><td>Misc<\/td><td><a href=\"http:\/\/aishell-cnsrc.oss-cn-hangzhou.aliyuncs.com\/CN-Celeb_Multi-Genre_Speaker_Recognition.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.openslr.org\/82\/\">[dataset]<\/a><\/td><td>unsup(1090)<\/td><\/tr><tr><td>20<\/td><td>The People&#8217;s Speech<\/td><td>English<\/td><td>Misc<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2111.09344.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/mlcommons.org\/en\/peoples-speech\/\">[dataset]<\/a><\/td><td>30k +<\/td><\/tr><tr><td>21<\/td><td>Multilingual TEDx<\/td><td>Multilingual<\/td><td>Talks<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2102.01757.pdf\">[paper]<\/a><\/td><td><a href=\"http:\/\/www.openslr.org\/100\">[dataset]<\/a><\/td><td>760 +<\/td><\/tr><tr><td>22<\/td><td>VoxPopuli<\/td><td>Multilingual<\/td><td>Misc<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2101.00390.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/github.com\/facebookresearch\/voxpopuli\">[dataset]<\/a><\/td><td>sup(1.8k)<br>unsup(400k)<\/td><\/tr><tr><td>23<\/td><td>Libri-Light<\/td><td>English<\/td><td>Reading<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/1912.07875.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/github.com\/facebookresearch\/libri-light\/tree\/main\/data_preparation\">[dataset]<\/a><\/td><td>unsup(60k)<\/td><\/tr><tr><td>24<\/td><td>Common Voice (Multilingual)<\/td><td>Multilingual<\/td><td>Recording<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/1912.06670.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/commonvoice.mozilla.org\/\">[dataset]<\/a><\/td><td>sup(15k)<br>unsup(5k)<\/td><\/tr><tr><td>25<\/td><td>Common Voice (English)<\/td><td>English<\/td><td>Recording<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/1912.06670.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/commonvoice.mozilla.org\/en\/datasets\">[dataset]<\/a><\/td><td>sup(2200)<br>unsup(700)<\/td><\/tr><tr><td>26<\/td><td>JTubeSpeech<\/td><td>Japanese<\/td><td>Misc<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2112.09323.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/github.com\/sarulab-speech\/jtubespeech\">[dataset]<\/a><\/td><td>1300<\/td><\/tr><tr><td>27<\/td><td>ai4bharat NPTEL2020<\/td><td>English(Indian)<\/td><td>Lectures<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/github.com\/AI4Bharat\/NPTEL2020-Indian-English-Speech-Dataset\">[dataset]<\/a><\/td><td>weaksup(15.7k)<\/td><\/tr><tr><td>28<\/td><td>open_stt<\/td><td>Russian<\/td><td>Misc<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/github.com\/snakers4\/open_stt\">[dataset]<\/a><\/td><td>20k +<\/td><\/tr><tr><td>29<\/td><td>ASCEND<\/td><td>Mandarin-English CS<\/td><td>Conversational<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2112.06223.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/huggingface.co\/datasets\/CAiRE\/ASCEND\">[dataset]<\/a><\/td><td>10 +<\/td><\/tr><tr><td>30<\/td><td>Crowd-Sourced Speech<\/td><td>Multilingual<\/td><td>Recording<\/td><td><a href=\"https:\/\/www.isca-speech.org\/archive\/pdfs\/sltu_2018\/kjartansson18_sltu.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/github.com\/coqui-ai\/open-speech-corpora\/blob\/150d316869c7ba468efd1f7b473555b0c76cc5e6\/README.md?plain=1#L80\">[dataset]<\/a><\/td><td>1200 +<\/td><\/tr><tr><td>31<\/td><td>Spoken Wikipedia<\/td><td>Multilingual<\/td><td>Recording<\/td><td><a href=\"https:\/\/arne.chark.eu\/static\/spoken-wp-corpus-collection.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/corpora.uni-hamburg.de\/hzsk\/de\/islandora\/object\/spoken-corpus:swc-2.0#additional-files\">[dataset]<\/a><\/td><td>1000 +<\/td><\/tr><tr><td>32<\/td><td>MuST-C<\/td><td>Multilingual<\/td><td>Talks<\/td><td><a href=\"https:\/\/aclanthology.org\/N19-1202.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/ict.fbk.eu\/must-c-release-v1-2\/\">[dataset]<\/a><\/td><td>6000 +<\/td><\/tr><tr><td>33<\/td><td>M-AILABS<\/td><td>Multilingual<\/td><td>Reading<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/www.caito.de\/2019\/01\/03\/the-m-ailabs-speech-dataset\/\">[dataset]<\/a><\/td><td>1000<\/td><\/tr><tr><td>34<\/td><td>CMU Wilderness<\/td><td>Multilingual<\/td><td>Misc<\/td><td><a href=\"http:\/\/www.cs.cmu.edu\/~awb\/papers\/2019_Black_ICASSP.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/github.com\/festvox\/datasets-CMU_Wilderness\">[dataset]<\/a><\/td><td>unsup(14k)<\/td><\/tr><tr><td>35<\/td><td>Gram_Vaani<\/td><td>Hindi<\/td><td>Recording<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2203.16973.pdf\">[paper]<\/a>&nbsp;<a href=\"https:\/\/github.com\/anish9208\/gramvaani_hindi_asr\">[code]<\/a><\/td><td><a href=\"https:\/\/www.openslr.org\/118\/\">[dataset]<\/a><\/td><td>sup(100)<br>unsup(1k)<\/td><\/tr><tr><td>36<\/td><td>VoxLingua107<\/td><td>Multilingual<\/td><td>Misc<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2011.12998.pdf\">[paper]<\/a><\/td><td><a href=\"http:\/\/bark.phon.ioc.ee\/voxlingua107\/\">[dataset]<\/a><\/td><td>unsup(6600 +)<\/td><\/tr><tr><td>37<\/td><td>Kazakh Corpus<\/td><td>Kazakh<\/td><td>Recording<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2009.10334.pdf\">[paper]<\/a>&nbsp;<a href=\"https:\/\/github.com\/IS2AI\/ISSAI_SAIDA_Kazakh_ASR\">[code]<\/a><\/td><td><a href=\"https:\/\/www.openslr.org\/102\/\">[dataset]<\/a><\/td><td>335<\/td><\/tr><tr><td>38<\/td><td>Voxforge<\/td><td>English<\/td><td>Recording<\/td><td>&#8211;<\/td><td><a href=\"http:\/\/www.voxforge.org\/home\/downloads\">[dataset]<\/a><\/td><td>130<\/td><\/tr><tr><td>39<\/td><td>Tatoeba<\/td><td>English<\/td><td>Recording<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/downloads.tatoeba.org\/audio\/\">[dataset]<\/a><\/td><td>200<\/td><\/tr><tr><td>40<\/td><td>IndicWav2Vec<\/td><td>Multilingual<\/td><td>Misc<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2111.03945.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/github.com\/AI4Bharat\/IndicWav2Vec\/tree\/main\/data_prep_scripts\/pret_scripts\">[dataset]<\/a><\/td><td>unsup(17k +)<\/td><\/tr><tr><td>41<\/td><td>VoxCeleb<\/td><td>English<\/td><td>Misc<\/td><td><a href=\"https:\/\/www.robots.ox.ac.uk\/~vgg\/publications\/2017\/Nagrani17\/nagrani17.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/mm.kaist.ac.kr\/datasets\/voxceleb\">[dataset]<\/a><\/td><td>unsup(352)<\/td><\/tr><tr><td>42<\/td><td>VoxCeleb2<\/td><td>English<\/td><td>Misc<\/td><td><a href=\"https:\/\/www.robots.ox.ac.uk\/~vgg\/publications\/2018\/Chung18a\/chung18a.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/mm.kaist.ac.kr\/datasets\/voxceleb\">[dataset]<\/a><\/td><td>unsup(2442)<\/td><\/tr><tr><td>43<\/td><td>RuLibrispeech<\/td><td>Russian<\/td><td>Read<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/www.openslr.org\/96\/\">[dataset]<\/a><\/td><td>98<\/td><\/tr><tr><td>44<\/td><td>MediaSpeech<\/td><td>Multilingual<\/td><td>Misc<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2103.16193.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.openslr.org\/108\/\">[dataset]<\/a><\/td><td>40<\/td><\/tr><tr><td>45<\/td><td>MUCS 2021 task1<\/td><td>Multilingual<\/td><td>Misc<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/www.openslr.org\/103\/\">[dataset]<\/a><\/td><td>300<\/td><\/tr><tr><td>46<\/td><td>MUCS 2021 task2<\/td><td>Multilingual<\/td><td>Misc<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/www.openslr.org\/104\/\">[dataset]<\/a><\/td><td>150<\/td><\/tr><tr><td>47<\/td><td>nicolingua-west-african<\/td><td>Multilingual<\/td><td>Misc<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2104.13083.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.openslr.org\/105\/\">[dataset]<\/a><\/td><td>140 +<\/td><\/tr><tr><td>48<\/td><td>Samromur 21.05<\/td><td>Samromur<\/td><td>Misc<\/td><td><a href=\"https:\/\/github.com\/cadia-lvl\/samromur-asr\">[code]<\/a><\/td><td><a href=\"https:\/\/www.openslr.org\/112\/\">[dataset]<\/a>&nbsp;<a href=\"https:\/\/www.openslr.org\/116\/\">[dataset]<\/a><a href=\"https:\/\/www.openslr.org\/117\">[dataset]<\/a><\/td><td>145<\/td><\/tr><tr><td>49<\/td><td>Puebla-Nahuatl<\/td><td>Puebla-Nahuatl<\/td><td>Misc<\/td><td><a href=\"https:\/\/aclanthology.org\/2021.americasnlp-1.7.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.openslr.org\/92\/\">[dataset]<\/a><\/td><td>150 +<\/td><\/tr><tr><td>50<\/td><td>Golos<\/td><td>Russian<\/td><td>Misc<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2106.10161.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.openslr.org\/114\/\">[dataset]<\/a><\/td><td>1240<\/td><\/tr><tr><td>51<\/td><td>ParlaSpeech-HR<\/td><td>Croatian<\/td><td>Parliament<\/td><td><a href=\"https:\/\/office.clarin.eu\/v\/CE-2021-1923-CLARIN2021_ConferenceProceedings.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.clarin.si\/repository\/xmlui\/handle\/11356\/1494\">[dataset]<\/a><\/td><td>1816<\/td><\/tr><tr><td>52<\/td><td>Lyon Corpus<\/td><td>French<\/td><td>Recording<\/td><td><a href=\"https:\/\/www.mq.edu.au\/__data\/assets\/pdf_file\/0006\/910077\/2008Demuth-and-Tremblay.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/phonbank.talkbank.org\/access\/French\/Lyon.html\">[dataset]<\/a><\/td><td>185<\/td><\/tr><tr><td>53<\/td><td>Providence Corpus<\/td><td>English<\/td><td>Recording<\/td><td><a href=\"https:\/\/d1wqtxts1xzle7.cloudfront.net\/66484402\/2006DemuthetalL_S-libre.pdf?1619046276=&amp;response-content-disposition=inline%3B+filename%3DWord_minimality_Epenthesis_and_Coda_Lice.pdf&amp;Expires=1652977685&amp;Signature=d3EpWElGBNwVe6wvbA-Erk9bhbykEtwwSJN3JRcLRPU4dSB2iHz8FOjsYKf9YQVLQVHtNF-5L7EF325B7jWfaBXewazatJ9f-uC2qqQO~JPhD9GQgfTXims4pfu7cm1irdRT7fgYeqAbTT6xM9LMB0LdyMsevxB6tCJCX3IZwUdUaYsmNgm9iROxn7MZnr74gmQTekpRNK0AJFjpR261oYR5ORf8sgnpdVmjlbhlOTVraj12huOIvxEoIZ~QoFwA1mFSrLArBj83gdNVPvHpBFNoup4Dsejq1MbDOogFkoh~fW3C21xnjpM5PvUuq7SeT~gDgZQ~aZo14IS474pMtw__&amp;Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA\">[paper]<\/a><\/td><td><a href=\"https:\/\/phonbank.talkbank.org\/access\/Eng-NA\/Providence.html\">[dataset]<\/a><\/td><td>364<\/td><\/tr><tr><td>54<\/td><td>CLARIN Spoken Corpora<\/td><td>Czech<\/td><td>Recording<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/www.clarin.eu\/resource-families\/spoken-corpora\">[dataset]<\/a><\/td><td>1120 +<\/td><\/tr><tr><td>55<\/td><td>Czech Parliament Plenary<\/td><td>Czech<\/td><td>Recording<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/lindat.mff.cuni.cz\/repository\/xmlui\/handle\/11234\/1-3126\">[dataset]<\/a><\/td><td>444<\/td><\/tr><tr><td>56<\/td><td>(Youtube) Regional American Corpus<\/td><td>English (Accented)<\/td><td>Misc<\/td><td><a href=\"http:\/\/cc.oulu.fi\/~scoats\/YouTube_Corpus_19a.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/github.com\/stcoats\/YouTube_Corpus\">[dataset]<\/a><\/td><td>29k +<\/td><\/tr><tr><td>57<\/td><td>NISP Dataset<\/td><td>Multilingual<\/td><td>Recording<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2007.06021.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/github.com\/iiscleap\/NISP-Dataset?utm_source=catalyzex.com\">[dataset]<\/a><\/td><td>56 +<\/td><\/tr><tr><td>58<\/td><td>Regional African American<\/td><td>English (Accented)<\/td><td>Recording<\/td><td><a href=\"http:\/\/lingtools.uoregon.edu\/coraal\/userguide\/CORAALUserGuide_current.pdf\">[paper]<\/a><\/td><td><a href=\"http:\/\/lingtools.uoregon.edu\/coraal\/\">[dataset]<\/a><\/td><td>130 +<\/td><\/tr><tr><td>59<\/td><td>Indonesian Unsup<\/td><td>Indonesian<\/td><td>Misc<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/github.com\/Wikidepia\/indonesian_datasets\/tree\/master\/speech\/unsupervised\">[dataset]<\/a><\/td><td>unsup (3000+)<\/td><\/tr><tr><td>60<\/td><td>Librivox-Spanish<\/td><td>Spanish<\/td><td>Recording<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/www.kaggle.com\/datasets\/carlfm01\/120h-spanish-speech\">[dataset]<\/a><\/td><td>120<\/td><\/tr><tr><td>61<\/td><td>AVSpeech<\/td><td>English<\/td><td>Audio-Visual<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/1804.03619.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/looking-to-listen.github.io\/avspeech\/download.html\">[dataset]<\/a><\/td><td>unsup(4700)<\/td><\/tr><tr><td>62<\/td><td>CMLR<\/td><td>Mandarin<\/td><td>Audio-Visual<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/1908.04917.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.vipazoo.cn\/CMLR.html\">[dataset]<\/a><\/td><td>100 +<\/td><\/tr><tr><td>63<\/td><td>Speech Accent Archive<\/td><td>English<\/td><td>Accented<\/td><td><a href=\"https:\/\/brill.com\/view\/book\/edcoll\/9789401206884\/B9789401206884-s014.xml\">[paper]<\/a><\/td><td><a href=\"http:\/\/accent.gmu.edu\/browse_language.php\">[dataset]<\/a><\/td><td>TBC<\/td><\/tr><tr><td>64<\/td><td>BibleTTS<\/td><td>Multilingual<\/td><td>TTS<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2203.14456.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/masakhane-io.github.io\/bibleTTS\/\">[dataset]<\/a><\/td><td>86<\/td><\/tr><tr><td>65<\/td><td>NST-Norwegian<\/td><td>Norwegian<\/td><td>Recording<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/www.nb.no\/sprakbanken\/en\/resource-catalogue\/oai-nb-no-sbr-54\/\">[dataset]<\/a><\/td><td>540<\/td><\/tr><tr><td>66<\/td><td>NST-Danish<\/td><td>Danish<\/td><td>Recording<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/www.nb.no\/sprakbanken\/en\/resource-catalogue\/oai-nb-no-sbr-55\/\">[dataset]<\/a><\/td><td>500 +<\/td><\/tr><tr><td>67<\/td><td>NST-Swedish<\/td><td>Swedish<\/td><td>Recording<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/www.nb.no\/sprakbanken\/en\/resource-catalogue\/oai-nb-no-sbr-56\/\">[dataset]<\/a><\/td><td>300 +<\/td><\/tr><tr><td>68<\/td><td>NPSC<\/td><td>Norwegian<\/td><td>Parliament<\/td><td><a href=\"http:\/\/www.lrec-conf.org\/proceedings\/lrec2022\/pdf\/2022.lrec-1.106.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.nb.no\/sprakbanken\/en\/resource-catalogue\/oai-nb-no-sbr-58\/\">[dataset]<\/a><\/td><td>140<\/td><\/tr><tr><td>69<\/td><td>CI-AVSR<\/td><td>Cantonese<\/td><td>Audio-Visual<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2201.03804.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/github.com\/HLTCHKUST\/CI-AVSR\">[dataset]<\/a><\/td><td>8 +<\/td><\/tr><tr><td>70<\/td><td>Aalto Finnish Parliament<\/td><td>Finnish<\/td><td>Parliament<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2203.14876.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.kielipankki.fi\/corpora\/fi-parliament-asr\/\">[dataset]<\/a><\/td><td>3100 +<\/td><\/tr><tr><td>71<\/td><td>UserLibri<\/td><td>English<\/td><td>Reading<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2207.00706.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.kaggle.com\/datasets\/google\/userlibri\">[dataset]<\/a><\/td><td>&#8211;<\/td><\/tr><tr><td>72<\/td><td>Ukrainian Speech<\/td><td>Ukrainian<\/td><td>Misc<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/github.com\/egorsmkv\/speech-recognition-uk#-datasets\">[dataset]<\/a><\/td><td>1300+<\/td><\/tr><tr><td>73<\/td><td>UCLA-ASR-corpus<\/td><td>Multilingual<\/td><td>Misc<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/github.com\/Open-Speech-EkStep\/ULCA-asr-dataset-corpus\">[dataset]<\/a><\/td><td>unsup(15k)<br>sup(9k)<\/td><\/tr><tr><td>74<\/td><td>ReazonSpeech<\/td><td>Japanese<\/td><td>Misc<\/td><td><a href=\"https:\/\/research.reazon.jp\/_static\/reazonspeech_nlp2023.pdf\">[paper]<\/a>&nbsp;<a href=\"https:\/\/github.com\/reazon-research\/ReazonSpeech\">[code]<\/a><\/td><td><a href=\"https:\/\/huggingface.co\/datasets\/reazon-research\/reazonspeech\">[dataset]<\/a><\/td><td>15k<\/td><\/tr><tr><td>75<\/td><td>Bundestag<\/td><td>German<\/td><td>Debate<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2302.06008v1.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/opendata.iisys.de\/datasets.html#bundestag\">[dataset]<\/a><\/td><td>sup(610)<br>unsup(1038)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h4>b. datasets can be downloaded after application<\/h4>\n\n\n\n<p><a href=\"https:\/\/github.com\/RevoSpeechTech\/speech-datasets-collection#b-datasets-can-be-downloaded-after-application\"><\/a><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th><strong>id<\/strong><\/th><th><strong>Name<\/strong><\/th><th><strong>Language<\/strong><\/th><th><strong>Type\/Domain<\/strong><\/th><th><strong>Paper Link<\/strong><\/th><th><strong>Data Link<\/strong><\/th><th><strong>Size (Hours)<\/strong><\/th><\/tr><\/thead><tbody><tr><td>1<\/td><td>Fisher<\/td><td>English<\/td><td>Conversational<\/td><td><a href=\"http:\/\/www.lrec-conf.org\/proceedings\/lrec2004\/pdf\/767.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/catalog.ldc.upenn.edu\/search\">[dataset]<\/a><\/td><td>2000<\/td><\/tr><tr><td>2<\/td><td>WenetSpeech<\/td><td>Mandarin<\/td><td>Misc<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2110.03370.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.openslr.org\/121\">[dataset]<\/a><\/td><td>sup(10k)<br>weaksup(2.4k)<br>unsup(10k)<\/td><\/tr><tr><td>3<\/td><td>aishell-2<\/td><td>Mandarin<\/td><td>Recording<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/1808.10583.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.aishelltech.com\/aishell_2\">[dataset]<\/a><\/td><td>1000<\/td><\/tr><tr><td>4<\/td><td>aidatatang_1505zh<\/td><td>Mandarin<\/td><td>Recording<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/www.datatang.com\/opensource\">[dataset]<\/a><\/td><td>1505<\/td><\/tr><tr><td>5<\/td><td>SLT 2021 CSRC<\/td><td>Mandarin<\/td><td>Misc<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2011.06724.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.data-baker.com\/csrc_challenge.html\">[dataset]<\/a><\/td><td>400<\/td><\/tr><tr><td>6<\/td><td>GigaSpeech<\/td><td>English<\/td><td>Misc<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2106.06909.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/github.com\/SpeechColab\/GigaSpeech\">[dataset]<\/a><\/td><td>sup(10k)<br>unsup(23k)<\/td><\/tr><tr><td>7<\/td><td>SPGISpeech<\/td><td>English<\/td><td>Misc<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2104.02014.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/datasets.kensho.com\/datasets\/spgispeech\">[dataset]<\/a><\/td><td>5000<\/td><\/tr><tr><td>8<\/td><td>AESRC 2020<\/td><td>English (accented)<\/td><td>Misc<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2102.10233.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.datatang.com\/INTERSPEECH2020\">[dataset]<\/a><\/td><td>160<\/td><\/tr><tr><td>9<\/td><td>LaboroTVSpeech<\/td><td>Japanese<\/td><td>Misc<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2103.14736.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/laboro.ai\/activity\/column\/engineer\/eg-laboro-tv-corpus-jp\/\">[dataset]<\/a><\/td><td>2000 +<\/td><\/tr><tr><td>10<\/td><td>TAL_CSASR<\/td><td>Mandarin-English CS<\/td><td>Lectures<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/ai.100tal.com\/dataset\">[dataset]<\/a><\/td><td>587<\/td><\/tr><tr><td>11<\/td><td>ASRU 2019 ASR<\/td><td>Mandarin-English CS<\/td><td>Reading<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/www.datatang.com\/competition\">[dataset]<\/a><\/td><td>700 +<\/td><\/tr><tr><td>12<\/td><td>SEAME<\/td><td>Mandarin-English CS<\/td><td>Recording<\/td><td><a href=\"https:\/\/www.isca-speech.org\/archive\/pdfs\/interspeech_2010\/lyu10_interspeech.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/catalog.ldc.upenn.edu\/LDC2015S04\">[dataset]<\/a><\/td><td>196<\/td><\/tr><tr><td>13<\/td><td>Fearless Steps<\/td><td>English<\/td><td>Misc<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/fearless-steps.github.io\/ChallengePhase3\/#19k_Corpus_Access\">[dataset]<\/a><\/td><td>unsup(19k)<\/td><\/tr><tr><td>14<\/td><td>FTSpeech<\/td><td>Danish<\/td><td>Meeting<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2005.12368.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/ftspeech.github.io\/\">[dataset]<\/a><\/td><td>1800 +<\/td><\/tr><tr><td>15<\/td><td>KeSpeech<\/td><td>Mandarin<\/td><td>Recording<\/td><td><a href=\"https:\/\/openreview.net\/pdf?id=b3Zoeq2sCLq\">[paper]<\/a><\/td><td><a href=\"https:\/\/github.com\/KeSpeech\/KeSpeech\">[dataset]<\/a><\/td><td>1542<\/td><\/tr><tr><td>16<\/td><td>KsponSpeech<\/td><td>Korean<\/td><td>Conversational<\/td><td><a href=\"https:\/\/www.mdpi.com\/2076-3417\/10\/19\/6936\">[paper]<\/a><\/td><td><a href=\"https:\/\/huggingface.co\/datasets\/cheulyop\/ksponspeech\">[dataset]<\/a><\/td><td>969<\/td><\/tr><tr><td>17<\/td><td>RVTE database<\/td><td>Spanish<\/td><td>TV<\/td><td><a href=\"https:\/\/catedrartve.unizar.es\/reto2022\/RTVE2022DB.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/catedrartve.unizar.es\/rtvedatabase.html\">[dataset]<\/a><\/td><td>800 +<\/td><\/tr><tr><td>18<\/td><td>DiDiSpeech<\/td><td>Mandarin<\/td><td>Recording<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2010.09275.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/github.com\/athena-team\/DiDiSpeech\">[dataset]<\/a><\/td><td>800<\/td><\/tr><tr><td>19<\/td><td>Babel<\/td><td>Multilingual<\/td><td>Telephone<\/td><td><a href=\"https:\/\/eprints.whiterose.ac.uk\/152840\/8\/Gales%20et%20al%202014.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.nist.gov\/itl\/iad\/mig\/openkws16-evaluation\">[dataset]<\/a><\/td><td>1000 +<\/td><\/tr><tr><td>20<\/td><td>National Speech Corpus<\/td><td>English (Singapore)<\/td><td>Misc<\/td><td><a href=\"https:\/\/www.isca-speech.org\/archive_v0\/Interspeech_2019\/pdfs\/1525.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.imda.gov.sg\/programme-listing\/digital-services-lab\/national-speech-corpus\">[dataset]<\/a><\/td><td>3000 +<\/td><\/tr><tr><td>21<\/td><td>MyST Children&#8217;s Speech<\/td><td>English<\/td><td>Recording<\/td><td>&#8211;<\/td><td><a href=\"http:\/\/boulderlearning.com\/request-the-myst-corpus\/\">[dataset]<\/a><\/td><td>393<\/td><\/tr><tr><td>22<\/td><td>L2-ARCTIC<\/td><td>L2 English<\/td><td>Recording<\/td><td><a href=\"https:\/\/www.isca-speech.org\/archive_v0\/Interspeech_2018\/pdfs\/1110.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/psi.engr.tamu.edu\/l2-arctic-corpus\/\">[dataset]<\/a><\/td><td>20 +<\/td><\/tr><tr><td>23<\/td><td>JSpeech<\/td><td>Multilingual<\/td><td>Recording<\/td><td><a href=\"https:\/\/ieeexplore.ieee.org\/stamp\/stamp.jsp?tp=&amp;arnumber=8639658\">[paper]<\/a><\/td><td><a href=\"https:\/\/github.com\/miras-tech\/jspeech\">[dataset]<\/a><\/td><td>1332 +<\/td><\/tr><tr><td>24<\/td><td>LRS2-BBC<\/td><td>English<\/td><td>Audio-Visual<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/1809.02108.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.robots.ox.ac.uk\/~vgg\/data\/lip_reading\/lrs2.html\">[dataset]<\/a><\/td><td>220 +<\/td><\/tr><tr><td>25<\/td><td>LRS3-TED<\/td><td>English<\/td><td>Audio-Visual<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/1809.00496.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/www.robots.ox.ac.uk\/~vgg\/data\/lip_reading\/lrs3.html\">[dataset]<\/a><\/td><td>470 +<\/td><\/tr><tr><td>26<\/td><td>LRS3-Lang<\/td><td>Multilingual<\/td><td>Audio-Visual<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/www.robots.ox.ac.uk\/~vgg\/data\/lip_reading\/lrs3-lang.html\">[dataset]<\/a><\/td><td>1300 +<\/td><\/tr><tr><td>27<\/td><td>QASR<\/td><td>Arabic<\/td><td>Dialects<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2106.13000.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/arabicspeech.org\/qasr\/\">[dataset]<\/a><\/td><td>2000 +<\/td><\/tr><tr><td>28<\/td><td>ADI (MGB-5)<\/td><td>Arabic<\/td><td>Dialects<\/td><td><a href=\"https:\/\/swshon.github.io\/pdf\/ali_asru2019_mgb5.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/arabicspeech.org\/mgb5\/#adi17\">[dataset]<\/a><\/td><td>unsup (3000 +)<\/td><\/tr><tr><td>29<\/td><td>MGB-2<\/td><td>Arabic<\/td><td>TV<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/1609.05625.pdf\">[paper]<\/a><\/td><td><a href=\"http:\/\/www.mgb-challenge.org\/MGB-2.html\">[dataset]<\/a><\/td><td>1200 +<\/td><\/tr><tr><td>30<\/td><td>3MASSIV<\/td><td>Multilingual<\/td><td>Audio-Visual<\/td><td><a href=\"https:\/\/openaccess.thecvf.com\/content\/CVPR2022\/papers\/Gupta_3MASSIV_Multilingual_Multimodal_and_Multi-Aspect_Dataset_of_Social_Media_Short_CVPR_2022_paper.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/github.com\/ShareChatAI\/3MASSIV\">[dataset]<\/a><\/td><td>sup(310)<br>unsup(600)<\/td><\/tr><tr><td>31<\/td><td>MDCC<\/td><td>Cantonese<\/td><td>Misc<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2201.02419.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/github.com\/HLTCHKUST\/cantonese-asr\">[dataset]<\/a><\/td><td>73 +<\/td><\/tr><tr><td>32<\/td><td>Lahjoita Puhetta<\/td><td>Finnish<\/td><td>Misc<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2203.12906.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/github.com\/aalto-speech\/lahjoita-puhetta-resources\">[dataset]<\/a><\/td><td>sup(1600)<br>unsup(2000)<\/td><\/tr><tr><td>33<\/td><td>SDS-200<\/td><td>Swiss German<\/td><td>Dialects<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2205.09501.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/swissnlp.org\/datasets\/\">[dataset]<\/a><\/td><td>200<\/td><\/tr><tr><td>34<\/td><td>Modality Corpus<\/td><td>Multilingual<\/td><td>Audio-Visual<\/td><td><a href=\"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10844-016-0438-z.pdf\">[paper]<\/a><\/td><td><a href=\"http:\/\/www.modality-corpus.org\/\">[dataset]<\/a><\/td><td>30 +<\/td><\/tr><tr><td>35<\/td><td>Hindi-Tamil-English<\/td><td>Multilingual<\/td><td>Misc<\/td><td>&#8211;<\/td><td><a href=\"https:\/\/sites.google.com\/view\/indian-language-asrchallenge\/home\">[dataset]<\/a><\/td><td>690<\/td><\/tr><tr><td>36<\/td><td>English-Vietnamese Corpus<\/td><td>English, Vietnamese<\/td><td>Misc<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2208.04243.pdf\">[paper]<\/a><\/td><td><a href=\"https:\/\/github.com\/VinAIResearch\/PhoST\">[dataset]<\/a><\/td><td>500+<\/td><\/tr><tr><td>37<\/td><td>OLKAVS<\/td><td>Korean<\/td><td>Audio-Visual<\/td><td><a href=\"https:\/\/arxiv.org\/pdf\/2301.06375.pdf\">[paper]<\/a>&nbsp;<a href=\"https:\/\/github.com\/IIP-Sogang\/olkavs-avspeech\">[code]<\/a><\/td><td><a href=\"https:\/\/aihub.or.kr\/aihubdata\/data\/view.do?currMenu=115&amp;topMenu=100&amp;aihubDataSe=realm&amp;dataSetSn=538\">[dataset]<\/a><\/td><td>1150<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3>3. References<\/h3>\n\n\n\n<p><a href=\"https:\/\/github.com\/RevoSpeechTech\/speech-datasets-collection#3-references\"><\/a><\/p>\n\n\n\n<ul><li><a href=\"https:\/\/github.com\/coqui-ai\/open-speech-corpora\">https:\/\/github.com\/coqui-ai\/open-speech-corpora<\/a><\/li><li><a href=\"https:\/\/openslr.org\/resources.php\">https:\/\/openslr.org\/resources.php<\/a><\/li><\/ul>\n","protected":false},"excerpt":{"rendered":"<p>\u6765\u6e90\uff1ahttps:\/\/github.com\/RevoSpeechTech\/speech-datasets-co &hellip; <a href=\"http:\/\/139.9.1.231\/index.php\/2025\/01\/02\/speech-datasets-collection\/\" class=\"more-link\">\u7ee7\u7eed\u9605\u8bfb<span class=\"screen-reader-text\">Speech Datasets Collection-\u8bed\u97f3\u6570\u636e\u96c6\u6c47\u603b<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[4,38,34],"tags":[],"_links":{"self":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts\/23593"}],"collection":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/comments?post=23593"}],"version-history":[{"count":11,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts\/23593\/revisions"}],"predecessor-version":[{"id":24850,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts\/23593\/revisions\/24850"}],"wp:attachment":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/media?parent=23593"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/categories?post=23593"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/tags?post=23593"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}