{"id":27174,"date":"2025-06-23T20:34:12","date_gmt":"2025-06-23T12:34:12","guid":{"rendered":"http:\/\/139.9.1.231\/?p=27174"},"modified":"2025-06-24T10:58:36","modified_gmt":"2025-06-24T02:58:36","slug":"mwer-loss","status":"publish","type":"post","link":"http:\/\/139.9.1.231\/index.php\/2025\/06\/23\/mwer-loss\/","title":{"rendered":"MWER loss \u5b9e\u73b0"},"content":{"rendered":"\n<h2>1\u3001paraformer <strong>\u8d1f\u4f8b\u91c7\u6837\u7b56\u7565<\/strong>\uff1a<\/h2>\n\n\n\n<p class=\"has-text-align-center\">The implementation of Minimum Word Error Rate Training loss (MWER) based on negative sampling strategy from &lt;Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition&gt;<\/p>\n\n\n\n<p class=\"has-text-align-center\"><a href=\"https:\/\/gist.github.com\/TeaPoly\/234429e6c2d74d10fcb4987bc541d528\"><em><strong>https:\/\/gist.github.com\/TeaPoly\/234429e6c2d74d10fcb4987bc541d528<\/strong><\/em><\/a><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def create_sampling_mask(log_softmax, n):\n    \"\"\"\n    Generate sampling mask\n    # Ref: &lt;Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition&gt;\n    #       https:\/\/arxiv.org\/abs\/2206.08317\n    Args:\n        log_softmax: log softmax inputs, float32 (batch, maxlen_out, vocab_size)\n        n: candidate paths num, int32\n    Return:\n        sampling_mask: the sampling mask (nbest, batch, maxlen_out, vocab_size)\n    \"\"\"\n    b, s, v = log_softmax.size()\n\n    # Generate random mask\n    nbest_random_mask = torch.randint(\n        0, 2, (n, b, s, v), device=log_softmax.device\n    )\n\n    # Greedy search decoding for best path\n    top1_score_indices = log_softmax.argmax(dim=-1).squeeze(-1)\n\n    # Genrate top 1 score token mask\n    top1_score_indices_mask = torch.zeros(\n        (b, s, v), dtype=torch.int).to(log_softmax.device)\n    top1_score_indices_mask.scatter_(-1, top1_score_indices.unsqueeze(-1), 1)\n\n    # Genrate sampling mask by applying random mask to top 1 score token\n    sampling_mask = nbest_random_mask*top1_score_indices_mask.unsqueeze(0)\n\n    return sampling_mask<\/code><\/pre>\n\n\n\n<h2>2\u3001CTC decoder  mwer loss\uff1a<\/h2>\n\n\n\n<p><a href=\"https:\/\/github.com\/Mddct\/losses\/blob\/main\/py\/mwer.py\"><strong><em>https:\/\/github.com\/Mddct\/losses\/blob\/main\/py\/mwer.py<\/em><\/strong><\/a><\/p>\n\n\n\n<p>\u5173\u952e\uff1a \u8ba1\u7b97\u524d\u7f00\u675f\u641c\u7d22\u5019\u9009\u8def\u5f84<\/p>\n\n\n\n<p><em><strong>self.ctc_prefix_beam_decoer = CTCDecoder(beam_width=beam_width,top_paths=beam_width)<\/strong><\/em><\/p>\n\n\n\n<ul><li><strong><em>wenet\/wenet\/transducer\/search\/greedy_search.py<\/em><\/strong><\/li><li><strong><em>wenet\/wenet\/transducer\/search\/prefix_beam_search.py<\/em><\/strong><\/li><\/ul>\n\n\n\n<h2>3\u3001\u5176\u4ed6\uff1awenet<\/h2>\n\n\n\n<p class=\"has-text-align-center\"><a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/wenet-e2e\/wenet\/tree\/main#\" target=\"_blank\"><em><strong>https:\/\/github.com\/wenet-e2e\/wenet\/tree\/main#<\/strong><\/em><\/a><\/p>\n\n\n\n<p class=\"has-text-align-center\"><strong><em>wenet\/wenet\/transformer\/ctc.py<\/em><\/strong><\/p>\n\n\n\n<p class=\"has-text-align-center\"><strong><em>wenet\/wenet\/transformer\/label_smoothing_loss.py<\/em><\/strong><\/p>\n\n\n\n<p class=\"has-text-align-center\"><em><strong>Attention-based \u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u6a21\u578b\u7684 Beam Search \u89e3\u7801\u8fc7\u7a0b\uff1a<\/strong><\/em><strong><em>wenet\/wenet\/transformer\/search.py<\/em><\/strong><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>1\u3001paraformer \u8d1f\u4f8b\u91c7\u6837\u7b56\u7565\uff1a The implementation of Minimum Word &hellip; <a href=\"http:\/\/139.9.1.231\/index.php\/2025\/06\/23\/mwer-loss\/\" class=\"more-link\">\u7ee7\u7eed\u9605\u8bfb<span class=\"screen-reader-text\">MWER loss \u5b9e\u73b0<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[38,34],"tags":[],"_links":{"self":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts\/27174"}],"collection":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/comments?post=27174"}],"version-history":[{"count":9,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts\/27174\/revisions"}],"predecessor-version":[{"id":27184,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts\/27174\/revisions\/27184"}],"wp:attachment":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/media?parent=27174"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/categories?post=27174"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/tags?post=27174"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}