{"id":22635,"date":"2024-12-02T10:46:12","date_gmt":"2024-12-02T02:46:12","guid":{"rendered":"http:\/\/139.9.1.231\/?p=22635"},"modified":"2024-12-02T10:46:13","modified_gmt":"2024-12-02T02:46:13","slug":"funcodecopen-source-toolkit-for-neural-speech-codec","status":"publish","type":"post","link":"http:\/\/139.9.1.231\/index.php\/2024\/12\/02\/funcodecopen-source-toolkit-for-neural-speech-codec\/","title":{"rendered":"FunCodec\uff1a\u97f3\u9891\u7f16\u89e3\u7801\u5f00\u6e90\u5de5\u5177\u5305\uff0c\u7528\u4e8e\u97f3\u9891\u91cf\u5316\u548c\u6587\u672c\u5230\u8bed\u97f3\u5408\u6210\u3001\u97f3\u4e50\u751f\u6210\u7b49"},"content":{"rendered":"\n<ul class=\"has-light-pink-background-color has-background\"><li><strong>Demo\uff1a<a rel=\"noreferrer noopener\" target=\"_blank\" href=\"https:\/\/funcodec.github.io\/\">funcodec.github.io\/<\/a><\/strong><\/li><li><strong>Github\uff1a<a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/modelscope\/FunCodec\/\" target=\"_blank\">https:\/\/github.com\/modelscope\/FunCodec\/<\/a><\/strong><\/li><li><strong>Paper\uff1a<a rel=\"noreferrer noopener\" href=\"https:\/\/arxiv.org\/abs\/2309.07405\" target=\"_blank\">FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec<\/a><\/strong><\/li><\/ul>\n\n\n\n<p class=\"has-text-align-center\"><strong>\u4e00\u4e2a\u57fa\u7840\u7684\u3001\u53ef\u91cd\u590d\u7684\u548c\u53ef\u96c6\u6210\u7684\u7528\u4e8e\u795e\u7ecf\u8bed\u97f3\u7f16\u89e3\u7801\u5668\u7684\u5f00\u6e90\u5de5\u5177\u5305<\/strong><\/p>\n\n\n\n<p class=\"has-light-gray-background-color has-background\"><strong>\u7279\u70b9\uff1a<\/strong><\/p>\n\n\n\n<ul><li><em>FunCodec \u518d\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6a21\u578b\uff0c\u5305\u62ec SoundStream\u3001Encodec \u7b49\u3002<\/em><\/li><li><em>FunCodec \u53ef\u4ee5\u5f88\u5bb9\u6613\u5730\u6269\u5c55\u5230 \u4e0b\u6e38\u4efb\u52a1\uff0c\u4f8b\u5982 ASR \u548c TTS\u3002<\/em><\/li><li><em>FunCodec \u53ef\u4ee5\u5728\u5206\u5e03\u5f0f GPU \u4e0a\u8bad\u7ec3\u6a21\u578b\uff0c \u548c\u6279\u5904\u7406\u6a21\u5f0f\u4e0b\u7684\u63a8\u7406\u3002<\/em><\/li><li><em>FunCodec \u539f\u751f\u652f\u6301\u9891\u57df\u3001 \u66f4\u9002\u5408\u8bed\u97f3\u4fe1\u53f7\u3002<\/em><\/li><li>FunCode \u6a21\u578b\u53ef\u4ee5\u901a\u8fc7\u8bed\u4e49\u6807\u8bb0\u8fdb\u884c\u589e\u5f3a\uff0c \u4f8b\u5982\u97f3\u7d20\u548c Hubert \u5d4c\u5165\u3002<\/li><\/ul>\n\n\n\n<p class=\"has-light-gray-background-color has-background\"><strong>Available models<\/strong>\uff1a<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"691\" height=\"363\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image.png\" alt=\"\" class=\"wp-image-22656\" srcset=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image.png 691w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-300x158.png 300w\" sizes=\"(max-width: 691px) 100vw, 691px\" \/><\/figure>\n\n\n\n<p class=\"has-light-gray-background-color has-background\"><em>audio_codec-freqcodec_\u6a21\u578b\u7279\u70b9\uff1a\u9891\u57df\u6a21\u578b\uff0c\u5145\u5206\u5229\u7528\u8bed\u97f3\u4fe1\u53f7\u7684\u77ed\u65f6\u7ed3\u6784\uff0c\u6a21\u578b\u53c2\u6570\u6781\u5c11 \uff080.52M\uff09\uff0c\u8ba1\u7b97\u590d\u6742\u5ea6\u6781\u4f4e \uff080.34G flops\uff09\uff0c\u4f7f\u7528\u7ed3\u6784\u5316 dropout \u8fdb\u884c\u8bad\u7ec3\uff0c\u4f7f\u7528\u5355\u4e2a\u6a21\u578b\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u542f\u7528\u5404\u79cd\u5e26\u5bbd\uff0c\u5c06\u539f\u59cb\u8bed\u97f3\u6ce2\u5f62\u91cf\u5316\u4e3a\u79bb\u6563\u6807\u8bb0\u5e8f\u5217<\/em><\/p>\n\n\n\n<p class=\"has-light-gray-background-color has-background\"><em>audio_codec-encodec_\u6a21\u578b\u7279\u70b9:\u4f7f\u7528\u5927\u89c4\u6a21\u5185\u90e8\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\uff0c\u5bf9\u8bb8\u591a\u573a\u666f\u90fd\u5177\u6709\u9c81\u68d2\u6027,\u5728\u4f4e\u9891\u5e26\u5bbd\u5ea6\u4e0b\u5b9e\u73b0\u66f4\u9ad8\u7684\u7f16\u89e3\u7801\u5668\u8d28\u91cf,\u4f7f\u7528\u7ed3\u6784\u5316 dropout \u8fdb\u884c\u8bad\u7ec3\uff0c\u4f7f\u7528\u5355\u4e2a\u6a21\u578b\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u542f\u7528\u5404\u79cd\u5e26\u5bbd,\u5c06\u539f\u59cb\u8bed\u97f3\u6ce2\u5f62\u91cf\u5316\u4e3a\u79bb\u6563\u6807\u8bb0\u5e8f\u5217<\/em><\/p>\n\n\n\n<p>\u4e0e&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2210.13438\">EnCodec<\/a>&nbsp;\u548c&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2107.03312\">SoundStream<\/a>&nbsp;\u76f8\u6bd4\uff0c \u4f7f\u7528\u4ee5\u4e0b\u6539\u8fdb\u7684\u6280\u672f\u6765\u8bad\u7ec3\u6a21\u578b\uff0c\u4ece\u800c\u63d0\u9ad8\u7f16\u89e3\u7801\u5668\u8d28\u91cf\u548c \u76f8\u540c\u5e26\u5bbd\u4e0b\u7684&nbsp;<a href=\"https:\/\/github.com\/google\/visqol\">ViSQOL<\/a>&nbsp;\u5206\u6570\uff1a<\/p>\n\n\n\n<ul><li>\u5e45\u503c\u9891\u8c31loss\u7528\u4e8e\u589e\u5f3a\u4e2d\u9ad8\u9891\u4fe1\u53f7<\/li><li>\u7ed3\u6784\u5316 dropout \u7528\u4e8e\u5e73\u6ed1\u4ee3\u7801\u7a7a\u95f4\uff0c\u5e76\u5728\u5355\u4e2a\u6a21\u578b\u4e2d\u542f\u7528\u5404\u79cd\u5e26\u5bbd<\/li><li>\u7801\u5b57\u7531 k-means \u96c6\u7fa4\u800c\u4e0d\u662f\u968f\u673a\u503c\u521d\u59cb\u5316<\/li><li>\u7801\u672c\u91c7\u7528\u6307\u6570\u79fb\u52a8\u5e73\u5747\u548c\u6b7b\u7801\u6d88\u9664\u673a\u5236\u8fdb\u884c\u7ef4\u62a4\uff0c\u56e0\u6b64\u7801\u672c\u7684\u5229\u7528\u7387\u5f88\u9ad8\u3002<\/li><\/ul>\n\n\n\n<p class=\"has-light-gray-background-color has-background\"><strong>\u6a21\u578b\u7ec4\u6210\uff1a<\/strong><\/p>\n\n\n\n<ul><li><strong>FunCodec \u6a21\u578b\u7531\u4e94\u4e2a\u6a21\u5757\u7ec4\u6210\uff1a\u57df\u8f6c\u6362\u6a21\u5757\u3001\u7f16\u7801\u5668\u3001RVQ \u6a21\u5757\u3001\u89e3\u7801\u5668\u548c\u57df\u53cd\u8f6c\u6a21\u5757\u3002<\/strong><\/li><li><strong>\u57df\u53d8\u6362\uff1a\u5c06\u4fe1\u53f7\u8f6c\u6362\u4e3a\u65f6\u57df\u3001\u77ed\u65f6\u9891\u57df\u3001\u5e45\u5ea6-\u89d2\u5ea6\u57df\u6216\u5e45\u5ea6-\u76f8\u4f4d\u57df\u3002<\/strong><\/li><li><strong>\u7f16\u7801\u5668\uff1a\u5c06\u4fe1\u53f7\u7f16\u7801\u4e3a\u5177\u6709\u5806\u53e0\u5377\u79ef\u5c42\u548c LSTM \u5c42\u7684\u7d27\u51d1\u8868\u793a\u3002<\/strong><\/li><li><strong>\u8bed\u4e49token\uff08\u53ef\u9009\uff09\uff1a\u4f7f\u7528\u8bed\u4e49\u6807\u8bb0\u589e\u5f3a\u7f16\u7801\u5668\u8f93\u51fa\u4ee5\u589e\u5f3a\u5185\u5bb9\u4fe1\u606f\uff0c\u6b64\u6a21\u578b\u4e2d\u672a\u4f7f\u7528\u3002<\/strong><\/li><li><strong>RVQ\uff1a\u4f7f\u7528\u7ea7\u8054\u5411\u91cf\u91cf\u5316\u5668\u5c06\u8868\u793a\u91cf\u5316\u4e3a\u79bb\u6563\u6807\u8bb0\u7684\u5e76\u884c\u5e8f\u5217\u3002<\/strong><\/li><li><strong>Decoder\uff1a\u5c06\u91cf\u5316\u7684 embedding \u89e3\u7801\u5230\u4e0e inputs \u76f8\u540c\u7684\u4e0d\u540c\u4fe1\u53f7\u57df\u4e2d\u3002<\/strong><\/li><li><strong>Domain Inversion\uff1a\u91cd\u65b0\u5408\u6210\u6765\u81ea\u4e0d\u540c\u57df\u7684\u53ef\u611f\u77e5\u6ce2\u5f62\u3002<\/strong><\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-1.png\" alt=\"\" class=\"wp-image-22677\" width=\"581\" height=\"293\" srcset=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-1.png 762w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-1-300x152.png 300w\" sizes=\"(max-width: 581px) 100vw, 581px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-2.png\" alt=\"\" class=\"wp-image-22678\" width=\"559\" height=\"186\" srcset=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-2.png 727w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-2-300x100.png 300w\" sizes=\"(max-width: 559px) 100vw, 559px\" \/><\/figure>\n\n\n\n<p class=\"has-light-gray-background-color has-background\"><strong>Results<\/strong><\/p>\n\n\n\n<p>\u76f8\u6bd4\u5176\u4ed6\u5f00\u6e90\u7684\u97f3\u9891\u7f16\u89e3\u7801\u8bad\u7ec3\u6846\u67b6:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-3.png\" alt=\"\" class=\"wp-image-22683\" width=\"439\" height=\"371\" srcset=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-3.png 750w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-3-300x254.png 300w\" sizes=\"(max-width: 439px) 100vw, 439px\" \/><\/figure><\/div>\n\n\n\n<p>1. Comparison of academic models in terms of ViSQOL scores on LibriTTS dataset. \u2020 means the model is causal.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"742\" height=\"327\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-4.png\" alt=\"\" class=\"wp-image-22686\" srcset=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-4.png 742w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-4-300x132.png 300w\" sizes=\"(max-width: 742px) 100vw, 742px\" \/><\/figure>\n\n\n\n<p>2. Comparison between FunCodec and other toolkits under (a) lower and (b) higher token rate. LS denotes Librispeech test sets. While Librispeech and gigaspeech are English corpora, aishell and Wenet are Mandarin corpora.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-5.png\" alt=\"\" class=\"wp-image-22689\" width=\"403\" height=\"491\" srcset=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-5.png 655w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-5-246x300.png 246w\" sizes=\"(max-width: 403px) 100vw, 403px\" \/><\/figure>\n\n\n\n<p>3. Comparison of FreqCodec and other time domain models in terms of ViSQOL score on LibriTTS. Mag denotes magnitude spectrogram. C_in represents the channel number of inputs.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"766\" height=\"415\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-6.png\" alt=\"\" class=\"wp-image-22692\" srcset=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-6.png 766w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-6-300x163.png 300w\" sizes=\"(max-width: 766px) 100vw, 766px\" \/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Demo\uff1afuncodec.github.io\/ Github\uff1ahttps:\/\/github.com\/mode &hellip; <a href=\"http:\/\/139.9.1.231\/index.php\/2024\/12\/02\/funcodecopen-source-toolkit-for-neural-speech-codec\/\" class=\"more-link\">\u7ee7\u7eed\u9605\u8bfb<span class=\"screen-reader-text\">FunCodec\uff1a\u97f3\u9891\u7f16\u89e3\u7801\u5f00\u6e90\u5de5\u5177\u5305\uff0c\u7528\u4e8e\u97f3\u9891\u91cf\u5316\u548c\u6587\u672c\u5230\u8bed\u97f3\u5408\u6210\u3001\u97f3\u4e50\u751f\u6210\u7b49<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[40,4,9,38,34],"tags":[],"_links":{"self":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts\/22635"}],"collection":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/comments?post=22635"}],"version-history":[{"count":49,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts\/22635\/revisions"}],"predecessor-version":[{"id":22693,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts\/22635\/revisions\/22693"}],"wp:attachment":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/media?parent=22635"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/categories?post=22635"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/tags?post=22635"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}