{"id":19319,"date":"2024-09-20T17:18:16","date_gmt":"2024-09-20T09:18:16","guid":{"rendered":"http:\/\/139.9.1.231\/?p=19319"},"modified":"2024-10-06T13:58:17","modified_gmt":"2024-10-06T05:58:17","slug":"mel-vq-tts","status":"publish","type":"post","link":"http:\/\/139.9.1.231\/index.php\/2024\/09\/20\/mel-vq-tts\/","title":{"rendered":"\u57fa\u4e8eMEL\u8c31+VQ\u7684TTS\u76f8\u5173\u5de5\u4f5c"},"content":{"rendered":"\n<p>Zero-shot TTS\u4e2d\u6709\u4e0d\u5c11\u5de5\u4f5c<strong>\u7528\u4e86MEL\u8c31\u4f5c\u4e3a\u4e2d\u95f4\u7279\u5f81<\/strong>\uff0c\u7136\u540e\u5728\u6885\u5c14\u8c31\u7684\u57fa\u7840\u4e0a\uff0c\u6216\u662f<strong>\u7528VQ\u63d0\u4f9b\u79bb\u6563token\uff0c\u6216\u662f\u7528CNN\u6765\u63d0\u53d6\u8fde\u7eedlatent<\/strong>\u3002\u5bf9<strong>\u4e8eMEL+VQ<\/strong>\u7684\u5de5\u4f5c\uff0c\u6709tortoise-tts\u3001xtts 1&amp;2\u3001megatts1&amp;2\u3001base TTS\u3002<\/p>\n\n\n\n\n\n<h2><strong>Tortoise-tts<\/strong><\/h2>\n\n\n\n<p>\u8bba\u6587\uff1a<a href=\"https:\/\/arxiv.org\/abs\/2305.07243\">https:\/\/arxiv.org\/abs\/2305.07243<\/a><\/p>\n\n\n\n<p>\u8be5\u5de5\u4f5c\u662f\u8457\u540d\u7684\u5f00\u6e90\u82f1\u6587TTS\u6a21\u578b\u3002<strong>\u5176\u4f5c\u8005\u76ee\u524d\u5728OpenAI\u5c31\u804c\uff0c\u540c\u65f6\u4e5f\u662fGPT-4o\u7684\u91cd\u8981Contributor(\u4ed6\u81ea\u4e2a\u513f\u5728\u535a\u5ba2\u4e2d\u8bf4\u7684)\u3002<\/strong>Tortoise-tts\u4f7f\u7528MEL+VQVAE\u7684\u65b9\u6cd5\u5f97\u5230\u8bed\u97f3\u7684MEL token\uff0c\u7136\u540e\u5bf9MEL token\u4ee5\u53catext token\u505aGPT\u81ea\u56de\u5f52\u5efa\u6a21\u3002\u5bf9\u4e8e\u8bed\u97f3\u7684\u89e3\u7801\uff0c\u81ea\u7136\u4e5f\u662f\u5206\u4e3a\u4e24\u6b65\uff1a\u5148\u662f\u7528\u6269\u6563\u6a21\u578b\u5c06MEL token\u8f6c\u6362\u4e3aMEL\u8c31\uff0c\u8fd9\u4e00\u6b65\u548c\u6587\u751f\u56fe\u5f88\u50cf\uff0c\u7528\u6269\u6563\u6a21\u578b\u662f\u5f88\u81ea\u7136\u7684\u9009\u62e9\uff1b\u7136\u540e\u7528\u58f0\u7801\u5668\u5c06MEL\u8c31\u8f6c\u6362\u4e3a\u97f3\u9891\u6ce2\u5f62\u3002tortoise-tts\u548cVALL-E\u7684\u4e3b\u4f53\u90fd\u662f\u81ea\u56de\u5f52\u5efa\u6a21\uff0c\u4e8c\u8005\u7684\u4e0d\u540c\u4e3b\u8981\u5728\u4e8etoken\u7684\u4e0d\u540c\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"1024\" height=\"499\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/10\/image-94-1024x499.png\" alt=\"\" class=\"wp-image-21317\" srcset=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/10\/image-94-1024x499.png 1024w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/10\/image-94-300x146.png 300w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/10\/image-94-768x374.png 768w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/10\/image-94.png 1327w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2><strong>MegaTTS 1&amp;2<\/strong><\/h2>\n\n\n\n<p><a href=\"https:\/\/arxiv.org\/abs\/2306.03509\" target=\"_blank\" rel=\"noreferrer noopener\">Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias<\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/arxiv.org\/abs\/2307.07218\" target=\"_blank\" rel=\"noreferrer noopener\">Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis<\/a><\/p>\n\n\n\n<p>\u5b57\u8282\u8df3\u52a8\u7684MegaTTS\u7cfb\u5217\u5bf9\u8bed\u97f3token\u7f16\u7801\u4fe1\u606f\u505a\u4e86\u663e\u5f0f\u7684\u4fe1\u606f\u538b\u7f29\u5904\u7406\uff0c\u8ba9\u8bed\u97f3token\u4ec5\u7f16\u7801\u4e0a\u4e0b\u6587\u4f9d\u8d56\u5f3a\u7684\u97f5\u5f8b\u4fe1\u606f\uff0c\u7136\u540e\u7528GPT\u81ea\u56de\u5f52\u6765\u5efa\u6a21\u8bed\u97f3\u7684\u97f5\u5f8b\u3002\u5bf9\u4e8e\u5176\u4ed6\u65b9\u9762\u7684\u4fe1\u606f\uff0c\u6a21\u578b\u7684\u5904\u7406\u663e\u5f97\u8f83\u4e3a\u5e38\u89c4\uff1a\u97f3\u8272\u4e00\u822c\u5177\u6709\u5168\u5c40\u6027\uff0c\u4f7f\u7528\u5355\u4e00\u7684\u97f3\u8272\u7f16\u7801\u5668\u4ece\u53c2\u8003\u97f3\u9891\u4e2d\u63d0\u53d6\u5c31\u6027\uff1b\u5bf9\u4e8e\u6587\u672c\u8bed\u4e49\u5185\u5bb9\u7684\u5904\u7406\uff0c\u6a21\u578b\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53c2\u8003\u4e86\u975e\u81ea\u56de\u5f52\u7684FastSpeech 2\u3002<\/p>\n\n\n\n<p>\u5bf9\u4e8e\u8bed\u97f3\u7684\u89e3\u7801\uff0c\u4e5f\u662f\u5206\u4e3a\u4e24\u6b65\uff1a\u5148\u901a\u8fc7MEL decoder\u8fd8\u539f\u4e3aMEL\u8c31\uff0c\u7136\u540e\u901a\u8fc7\u58f0\u7801\u5668\u89e3\u7801\u4e3a\u97f3\u9891\u6ce2\u5f62\u3002MegaTTS 2\u548c1\u603b\u4f53\u4e0a\u7c7b\u4f3c\uff0c\u5728\u97f3\u8272\u7f16\u7801(\u97f3\u7d20\u7ea7\u7f16\u7801\u3001\u591a\u6761\u53c2\u8003\u97f3\u9891)\u3001\u8bed\u97f3\u63d0\u793a\u957f\u5ea6(\u6269\u5c55\u540cspeaker\u8bed\u97f3\u4e0a\u4e0b\u6587\u957f\u5ea6\u786ctrain\uff0c\u97f3\u9891prompt\u957f\u5ea6\u66f4\u957f)\u548c\u65f6\u957f\u5efa\u6a21(\u4e5f\u7528GPT\u81ea\u56de\u5f52)\u4e0a\u505a\u4e86\u6539\u8fdb\uff0c\u540c\u65f6\u5806\u4e86\u66f4\u5927\u89c4\u6a21\u7684\u6570\u636e\u3002\u526a\u6620\u7684\u540e\u7aefTTS\u6a21\u578b\u7528\u7684\u5c31\u662fmegatts2\u3002\u8be5\u5de5\u4f5c\u5728\u5404\u8bba\u6587\u7684\u8bc4\u6d4b\u4e2d\u8868\u73b0\u4e5f\u90fd\u4e0d\u9519\u3002<\/p>\n\n\n\n<p>\u8ba4\u4e3a\u8bed\u97f3\u53ef\u4ee5\u5206\u89e3\u4e3a\u51e0\u4e2a\u5c5e\u6027\uff08\u4f8b\u5982\uff0c\u5185\u5bb9\u3001\u97f3\u8272\u3001\u97f5\u5f8b\u548c\u76f8\u4f4d\uff09\uff0c\u6bcf\u4e2a\u5c5e\u6027\u90fd\u5e94\u8be5\u4f7f\u7528\u5177\u6709\u9002\u5f53\u5f52\u7eb3\u504f\u5dee\u7684\u6a21\u5757\u8fdb\u884c\u5efa\u6a21\u3002\u4ece\u8fd9\u4e2a\u89d2\u5ea6\u51fa\u53d1\uff0c\u6211\u4eec\u7cbe\u5fc3\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u5927\u578b\u96f6\u6837\u672c TTS \u7cfb\u7edf\uff0c\u79f0\u4e3a Mega-TTS\uff0c\u8be5\u7cfb\u7edf\u4f7f\u7528\u5927\u89c4\u6a21\u7684\u91ce\u751f\u6570\u636e\u8fdb\u884c\u8bad\u7ec3\uff0c\u5e76\u4ee5\u4e0d\u540c\u7684\u65b9\u5f0f\u5bf9\u4e0d\u540c\u7684\u5c5e\u6027\u8fdb\u884c\u5efa\u6a21\uff1a1\uff09 \u6211\u4eec\u6ca1\u6709\u4f7f\u7528\u97f3\u9891\u7f16\u89e3\u7801\u5668\u7f16\u7801\u7684\u6f5c\u5728\u4f5c\u4e3a\u4e2d\u95f4\u7279\u5f81\uff0c\u800c\u662f\u4ecd\u7136\u9009\u62e9\u9891\u8c31\u56fe\uff0c\u56e0\u4e3a\u5b83\u5f88\u597d\u5730\u5206\u79bb\u4e86\u76f8\u4f4d\u548c\u5176\u4ed6\u5c5e\u6027\u3002Phase \u53ef\u4ee5\u7531\u57fa\u4e8e GAN \u7684\u58f0\u7801\u5668\u9002\u5f53\u6784\u5efa\uff0c\u4e0d\u9700\u8981\u7531\u8bed\u8a00\u6a21\u578b\u5efa\u6a21\u30022\uff09 \u6211\u4eec\u4f7f\u7528\u5168\u5c40\u5411\u91cf\u5bf9\u97f3\u8272\u8fdb\u884c\u5efa\u6a21\uff0c\u56e0\u4e3a\u97f3\u8272\u662f\u4e00\u4e2a\u968f\u65f6\u95f4\u7f13\u6162\u53d8\u5316\u7684\u5168\u5c40\u5c5e\u6027\u30023\uff09 \u6211\u4eec\u8fdb\u4e00\u6b65\u4f7f\u7528\u57fa\u4e8e VQGAN \u7684\u58f0\u5b66\u6a21\u578b\u6765\u751f\u6210\u9891\u8c31\u56fe\uff0c\u5e76\u4f7f\u7528\u6f5c\u5728\u4ee3\u7801\u8bed\u8a00\u6a21\u578b\u6765\u62df\u5408\u97f5\u5f8b\u7684\u5206\u5e03\uff0c\u56e0\u4e3a\u53e5\u5b50\u4e2d\u7684\u97f5\u5f8b\u4f1a\u968f\u7740\u65f6\u95f4\u7684\u63a8\u79fb\u800c\u5feb\u901f\u53d8\u5316\uff0c\u5e76\u4e14\u8bed\u8a00\u6a21\u578b\u53ef\u4ee5\u6355\u83b7\u5c40\u90e8\u548c\u957f\u671f\u4f9d\u8d56\u5173\u7cfb\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"1024\" height=\"405\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/10\/image-95-1024x405.png\" alt=\"\" class=\"wp-image-21318\" srcset=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/10\/image-95-1024x405.png 1024w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/10\/image-95-300x119.png 300w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/10\/image-95-768x304.png 768w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/10\/image-95.png 1157w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u901a\u7528\u7684\u96f6\u6837\u672c TTS \u63d0\u793a\u673a\u5236 Mega-TTS 2\uff0c\u4ee5\u5e94\u5bf9\u4e0a\u8ff0\u6311\u6218\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u5f3a\u5927\u7684\u58f0\u5b66\u81ea\u52a8\u7f16\u7801\u5668\uff0c\u5c06\u97f5\u5f8b\u548c\u97f3\u8272\u4fe1\u606f\u5206\u522b\u7f16\u7801\u5230\u538b\u7f29\u7684\u6f5c\u5728\u7a7a\u95f4\u4e2d\uff0c\u540c\u65f6\u63d0\u4f9b\u9ad8\u8d28\u91cf\u7684\u91cd\u5efa\u3002\u7136\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u591a\u53c2\u8003\u97f3\u8272\u7f16\u7801\u5668\u548c\u4e00\u4e2a\u97f5\u5f8b\u6f5c\u5728\u8bed\u8a00\u6a21\u578b (P-LLM)\uff0c\u7528\u4e8e\u4ece\u591a\u53e5\u63d0\u793a\u4e2d\u63d0\u53d6\u6709\u7528\u4fe1\u606f\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u5229\u7528\u4ece\u591a\u4e2a P-LLM \u8f93\u51fa\u5f97\u51fa\u7684\u6982\u7387\u6765\u4ea7\u751f\u53ef\u8f6c\u79fb\u548c\u53ef\u63a7\u5236\u7684\u97f5\u5f8b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cMega-TTS 2 \u4e0d\u4ec5\u53ef\u4ee5\u4f7f\u7528\u6765\u81ea\u4efb\u610f\u6765\u6e90\u7684\u770b\u4e0d\u89c1\u7684\u8bf4\u8bdd\u8005\u7684\u7b80\u77ed\u63d0\u793a\u6765\u5408\u6210\u8eab\u4efd\u4fdd\u7559\u8bed\u97f3\uff0c\u800c\u4e14\u5f53\u6570\u636e\u91cf\u4ece 10 \u79d2\u5230 5 \u5206\u949f\u4e0d\u7b49\u65f6\uff0c\u5176\u8868\u73b0\u59cb\u7ec8\u4f18\u4e8e\u5fae\u8c03\u65b9\u6cd5\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"987\" height=\"338\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/10\/image-96.png\" alt=\"\" class=\"wp-image-21320\" srcset=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/10\/image-96.png 987w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/10\/image-96-300x103.png 300w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/10\/image-96-768x263.png 768w\" sizes=\"(max-width: 987px) 100vw, 987px\" \/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Zero-shot TTS\u4e2d\u6709\u4e0d\u5c11\u5de5\u4f5c\u7528\u4e86MEL\u8c31\u4f5c\u4e3a\u4e2d\u95f4\u7279\u5f81\uff0c\u7136\u540e\u5728\u6885\u5c14\u8c31\u7684\u57fa\u7840\u4e0a\uff0c\u6216\u662f\u7528VQ\u63d0\u4f9b\u79bb\u6563tok &hellip; <a href=\"http:\/\/139.9.1.231\/index.php\/2024\/09\/20\/mel-vq-tts\/\" class=\"more-link\">\u7ee7\u7eed\u9605\u8bfb<span class=\"screen-reader-text\">\u57fa\u4e8eMEL\u8c31+VQ\u7684TTS\u76f8\u5173\u5de5\u4f5c<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[40,4,9,38,34],"tags":[],"_links":{"self":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts\/19319"}],"collection":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/comments?post=19319"}],"version-history":[{"count":12,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts\/19319\/revisions"}],"predecessor-version":[{"id":21323,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts\/19319\/revisions\/21323"}],"wp:attachment":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/media?parent=19319"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/categories?post=19319"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/tags?post=19319"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}