{"id":19008,"date":"2024-09-21T15:14:12","date_gmt":"2024-09-21T07:14:12","guid":{"rendered":"http:\/\/139.9.1.231\/?p=19008"},"modified":"2024-12-18T14:43:52","modified_gmt":"2024-12-18T06:43:52","slug":"qwen2-technical-report","status":"publish","type":"post","link":"http:\/\/139.9.1.231\/index.php\/2024\/09\/21\/qwen2-technical-report\/","title":{"rendered":"Qwen2 \u6280\u672f\u62a5\u544a"},"content":{"rendered":"\n<p class=\"has-text-align-center\"><strong>Abs\uff1a<a href=\"https:\/\/arxiv.org\/abs\/2407.10671\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/arxiv.org\/abs\/2407.10671<\/a><br>Code\uff1a<a href=\"https:\/\/github.com\/QwenLM\/Qwen2\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/QwenLM\/Qwen2<\/a><\/strong><\/p>\n\n\n\n<p>Qwen \u56e2\u961f\u8fd1\u65e5\u53d1\u5e03\u4e86\u76ee\u524d\u6700\u5f3a\u5f00\u6e90\u5927\u6a21\u578b Qwen2 \u7684\u6280\u672f\u62a5\u544a\u3002\u6587\u4e2d\u4ecb\u7ecd\u4e86\u9884\u8bad\u7ec3\u548c\u5bf9\u9f50\u8fc7\u7a0b\u4e2d\u6240\u91c7\u7528\u7684\u6280\u672f\u548c\u65b9\u6cd5\uff0c\u6700\u540e\u5bf9\u6a21\u578b\u8fdb\u884c\u4e86\u8be6\u7ec6\u7684\u8bc4\u4f30\u3002<\/p>\n\n\n\n\n\n<h2>&nbsp;Abstract<\/h2>\n\n\n\n<p>\u53d1\u5e03\u4e86\u4e00\u7cfb\u5217\u7684 Base \u6a21\u578b\u548c Instruct \u6a21\u578b\uff0c\u53c2\u6570\u91cf\u4ece 0.5B \u5230 72B\uff0c\u5305\u62ec dense \u7cfb\u5217\u6a21\u578b\u548c\u4e00\u4e2a MoE \u6a21\u578b\u3002Qwen2 \u8d85\u8d8a\u4e86\u5305\u62ec Qwen1.5 \u5728\u5185\u7684\u5927\u591a\u6570\u5148\u524d\u7684\u5f00\u6e90\u6a21\u578b\u3002\u4e0e\u95ed\u6e90\u6a21\u578b\u76f8\u6bd4\uff0c\u5728\u8bed\u8a00\u7406\u89e3\u3001\u751f\u6210\u3001\u591a\u8bed\u8a00\u3001\u4ee3\u7801\u3001\u6570\u5b66\u3001\u63a8\u7406\u7b49\u4e0d\u540c\u57fa\u51c6\u4e0a\u4e5f\u8868\u73b0\u51fa\u4e86\u6709\u7ade\u4e89\u529b\u7684\u6027\u80fd\u3002\u65d7\u8230\u7684 Qwen2-72B-Base \u548c Qwen2-72B-Instruct \u5728\u8bb8\u591a\u8bc4\u6d4b\u96c6\u4e0a\u8868\u73b0\u51fa\u5353\u8d8a\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0cQwen2 \u4e5f\u5c55\u793a\u51fa\u4e86\u9c81\u68d2\u7684\u591a\u8bed\u8a00\u6027\u80fd\uff0c\u7cbe\u901a\u5927\u7ea6 30 \u79cd\u8bed\u8a00\uff0c\u51f8\u663e\u4e86\u5176\u5168\u80fd\u6027\u3002Qwen2 \u6a21\u578b\u6743\u91cd\u5728 Hugging Face \u548c ModelScope \u4e0a\u8fdb\u884c\u4e86\u5f00\u6e90\uff0c\u6837\u4f8b\u4ee3\u7801\u5728 Github \u4e0a\u8fdb\u884c\u4e86\u5f00\u6e90\u3002<\/p>\n\n\n\n<p><strong>\u652f\u6301\u7684\u8bed\u8a00\uff1a<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"1024\" height=\"500\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-63-1024x500.png\" alt=\"\" class=\"wp-image-23224\" srcset=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-63-1024x500.png 1024w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-63-300x146.png 300w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-63-768x375.png 768w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/12\/image-63.png 1198w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2>Introduction<\/h2>\n\n\n\n<p class=\"has-text-align-left\">\u968f\u7740 ChatGPT \u7684\u95ee\u4e16\uff0c\u5728\u5168\u7403\u6380\u8d77\u4e86 LLM \u7684\u70ed\u6f6e\u3002Llama \u7cfb\u5217\u6a21\u578b\u7684\u53d1\u5e03\u8fdb\u4e00\u6b65\u6fc0\u53d1\u4e86\u5f00\u6e90\u793e\u533a\u7684\u5174\u8da3\u3002\u6700\u8fd1\uff0cClaude-3 Opus \u548c GPT-4o \u8fc5\u901f\u767b\u9876 Chatbot Arena \u738b\u5ea7\u3002\u6b64\u5916\uff0cLlama-3 \u5df2\u7ecf\u6210\u4e3a SOTA \u7684\u5f00\u6e90\u7cfb\u5217\u6a21\u578b\uff0c\u7f29\u5c0f\u4e86\u4e0e\u9886\u5148\u7684\u95ed\u6e90\u6a21\u578b\u7684\u5dee\u8ddd\uff0c\u88ab\u8ba4\u4e3a\u8fbe\u5230\u4e86 GPT-4 \u6c34\u5e73\u3002\u8d8a\u6765\u8d8a\u591a\u7684 LLM \u6b63\u5728\u8ffd\u6c42 OpenAI \u7684 GPT \u7cfb\u5217\u7684\u8fdb\u6b65\uff0c\u5176\u4e2d\u5305\u62ec Qwen\u3001Mistral\u3001Gemma \u7b49\u5728\u5185\u7684\u8bb8\u591a\u7cfb\u5217\u90fd\u662f\u4ee5\u5f00\u6e90\u5f62\u5f0f\u53d1\u5e03\u7684\u3002<\/p>\n\n\n\n<p class=\"has-text-align-left\">\u8fd1\u51e0\u4e2a\u6708\u6765\uff0cQwen \u56e2\u961f\u6210\u529f\u53d1\u5e03\u4e86 Qwen \u7cfb\u5217\u6a21\u578b\u5e76\u8fdb\u5316\u4e3a Qwen1.5\u3002\u540c\u65f6\uff0cQwen \u56e2\u961f\u63a8\u51fa\u4e86 vision-language \u6a21\u578b Qwen-VL \u548c audio-language \u6a21\u578b Qwen-Audio\u3002<\/p>\n\n\n\n<p class=\"has-text-align-left\">\u5728\u8fd9\u7bc7\u6280\u672f\u62a5\u544a\u4e2d\uff0cQwen \u56e2\u961f\u53d1\u5e03\u4e86\u6700\u65b0\u7684 Qwen2 \u7cfb\u5217 LLM\u3002Transformer \u67b6\u6784\uff0cnext-token prediction \u8bad\u7ec3\u3002\u53d1\u5e03\u4e86\u5305\u62ec 0.5B\u30011.5B\u30017B\u300172B \u5728\u5185\u7684 4 \u4e2a dense \u6a21\u578b\u4ee5\u53ca 1 \u4e2a 57B \u7684 MoE \u6a21\u578b\uff08\u6bcf\u4e2a token \u6fc0\u6d3b\u5176\u4e2d 14B \u53c2\u6570\uff09\u3002\u6240\u6709\u7684\u6a21\u578b\u5728\u4e00\u4e2a\u8d85\u8fc7 7T tokens \u7684\u9ad8\u8d28\u91cf\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u9884\u8bad\u7ec3\uff0c\u6570\u636e\u6db5\u76d6\u5e7f\u6cdb\u7684\u9886\u57df\u548c\u8bed\u79cd\u3002\u4e0e\u4e4b\u524d\u7684 Qwen \u7248\u672c\u76f8\u6bd4\uff0cQwen2 \u5305\u542b\u4e86\u66f4\u5e7f\u6cdb\u7684\u8bed\u8a00\u6570\u636e\uff0c\u63d0\u9ad8\u4e86\u4ee3\u7801\u548c\u6570\u5b66\u6570\u636e\u7684\u6570\u91cf\u548c\u8d28\u91cf\u3002\u901a\u5e38\u8ba4\u4e3a\u8fd9\u53ef\u4ee5\u63d0\u9ad8 LLM \u7684\u63a8\u7406\u80fd\u529b\u3002\u9884\u8bad\u7ec3\u540e\uff0c\u6240\u6709\u6a21\u578b\u90fd\u8fdb\u884c\u4e86 SFT \u548c DPO\u3002<\/p>\n\n\n\n<p class=\"has-text-align-left\">Qwen \u56e2\u961f\u5bf9 Qwen2 \u8fdb\u884c\u4e86\u5168\u9762\u8bc4\u4f30\uff0c\u57fa\u672c\u90fd\u4f18\u4e8e\u7ade\u54c1\u6a21\u578b\u3002Qwen2-72B-Instruct \u5728 MT-Bench \u5f97\u5206\u4e3a 9.1\uff0cArena-Hard 48.1\uff0cLiveCodeBench 35.7\u3002Qwen2-72B-Base \u5728 MMLU \u5f97\u5206 84.2\uff0cGPQA 37.9\uff0cHumanEval 64.6\uff0cGSM8K 89.5\uff0cBBH 82.4\u3002<\/p>\n\n\n\n<h2>Tokenizer &amp; Model<\/h2>\n\n\n\n<p>\u8fd9\u4e00\u8282\u4ecb\u7ecd Qwen2 \u7684 Tokenizer \u548c\u6a21\u578b\u8bbe\u8ba1\u3002<\/p>\n\n\n\n<h3>Tokenizer<\/h3>\n\n\n\n<p>\u91c7\u7528\u4e0e Qwen \u76f8\u540c\u7684\u57fa\u4e8e byte-leval \u7684 BPE Tokenizer\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u8fd9\u79cd Tokenizer \u5177\u6709\u5f88\u9ad8\u7684\u7f16\u7801\u6548\u7387\uff0c\u4ece\u800c\u4fc3\u8fdb\u4e86 Qwen2 \u7684\u591a\u8bed\u8a00\u80fd\u529b\u3002\u3010\u5b57\u8282\u5bf9\u7f16\u7801\uff08BPE, Byte Pair Encoder\uff09\uff0c\u53c8\u79f0 digram coding \u53cc\u5b57\u6bcd\u7ec4\u5408\u7f16\u7801\uff0c\u662f\u4e00\u79cd<strong>\u6570\u636e\u538b\u7f29<\/strong>&nbsp;\u7b97\u6cd5\uff0c\u7528\u6765\u5728\u56fa\u5b9a\u5927\u5c0f\u7684\u8bcd\u8868\u4e2d\u5b9e\u73b0\u53ef\u53d8\u2ed3\u5ea6\u7684\u5b50\u8bcd\u3002\u8be5\u7b97\u6cd5\u7b80\u5355\u6709\u6548\uff0c\u56e0\u800c\u76ee\u524d\u5b83\u662f\u6700\u6d41\u884c\u7684\u65b9\u6cd5\u3002\u3011<\/p>\n\n\n\n<p>\u8bcd\u6c47\u8868\u5305\u542b 151643 \u4e2a\u5e38\u89c4 tokens \u548c 3 \u4e2a \u7279\u6b8a\u63a7\u5236 tokens\u3002<\/p>\n\n\n\n<h3>Model Architecture<\/h3>\n\n\n\n<p>\u57fa\u4e8e<strong>decoder-only<\/strong>  Transformer \u67b6\u6784\uff0c\u5e26\u6709 causal masks \u7684 self-attention\u3002<\/p>\n\n\n\n<p>\u5305\u62ec 4 \u4e2a\u89c4\u6a21\u7684 dense \u6a21\u578b\u548c 1 \u4e2a MoE \u6a21\u578b\u3002<\/p>\n\n\n\n<h4>Qwen2 Dense Model:<\/h4>\n\n\n\n<p>\u6a21\u578b\u67b6\u6784\u5305\u62ec\u591a\u4e2a Transformer \u5c42\uff0c\u6bcf\u5c42\u5177\u6709 causal attention \u673a\u5236\u548c FFN\u3002<\/p>\n\n\n\n<p>\u4e0e\u5148\u524d Qwen \u7cfb\u5217\u6a21\u578b\u7684\u4e3b\u8981\u533a\u522b\u5982\u4e0b\uff1a<\/p>\n\n\n\n<p>1\u3001\u91c7\u7528 GQA \u6765\u4ee3\u66ff\u4e86\u4f20\u7edf\u7684 multi-head attention\u3002GQA \u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u4f18\u5316\u4e86 KV cache\uff0c\u53ef\u4ee5\u663e\u8457\u63d0\u5347\u541e\u5410\u91cf\u3002<\/p>\n\n\n\n<p><em>Paper\uff1aGQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints<br>Abs\uff1ahttps:\/\/arxiv.org\/abs\/2305.13245<\/em><\/p>\n\n\n\n<p>2\u3001\u4e3a\u4e86\u6269\u5c55 Qwen2 \u7684\u4e0a\u4e0b\u6587\u7a97\u53e3\uff0c\u5b9e\u73b0\u4e86 DCA\uff0c\u5c06\u957f\u5e8f\u5217\u5206\u5272\u6210\u5177\u6709\u53ef\u7ba1\u7406\u957f\u5ea6\u7684 chunks\u3002\u5982\u679c\u8f93\u5165\u53ef\u4ee5\u5728\u4e00\u4e2a chunk \u4e2d\u5904\u7406\uff0c\u5219 DCA \u8ddf\u539f\u59cb attention \u8f93\u51fa\u76f8\u540c\u7684\u7ed3\u679c\u3002\u5426\u5219\uff0cDCA \u901a\u8fc7\u6709\u6548\u6355\u83b7 chunks \u5185\u548c chunks \u4e4b\u95f4 tokens \u7684\u76f8\u5bf9\u4f4d\u7f6e\u4fe1\u606f\u6765\u63d0\u5347\u4e0a\u4e0b\u6587\u6027\u80fd\u3002\u6b64\u5916\uff0c\u8fd8\u901a\u8fc7 YARN \u6765\u91cd\u65b0\u7f29\u653e\u6ce8\u610f\u529b\u6743\u91cd\uff0c\u4ee5\u5b9e\u73b0\u66f4\u597d\u7684\u957f\u5ea6\u5916\u63a8\u3002<\/p>\n\n\n\n<ul><li><em>Paper\uff1aTraining-Free Long-Context Scaling of Large Language Models<br>Abs\uff1ahttps:\/\/arxiv.org\/abs\/2402.17463<\/em><\/li><li><em>Paper\uff1aYaRN: Efficient Context Window Extension of Large Language Models<br>Abs\uff1ahttps:\/\/arxiv.org\/abs\/2309.00071<\/em><\/li><\/ul>\n\n\n\n<p>3\u3001\u6fc0\u6d3b\u51fd\u6570 SwiGLU\uff0c\u4f4d\u7f6e\u7f16\u7801 RoPE\uff0cQKV bias\uff0cRMSNorm\uff0cpre-norm\u3002<\/p>\n\n\n\n<h4>Qwen2 Mixture-of-Experts Model<\/h4>\n\n\n\n<p>Qwen2 MoE \u7684\u67b6\u6784\u548c Qwen1.5-MoE-A2.7B \u975e\u5e38\u76f8\u4f3c\u3002<\/p>\n\n\n\n<p>MoE FFN \u7531 n \u4e2a\u5355\u72ec\u7684 FFN \u7ec4\u6210\uff0c\u6bcf\u4e2a FFN \u662f\u4e00\u4e2a\u4e13\u5bb6\u3002\u8f93\u5165\u7684\u6bcf\u4e2a token \u88ab\u5b9a\u5411\u5230\u7279\u5b9a\u7684\u4e13\u5bb6&nbsp;<em>Ei<\/em>&nbsp;\u6765\u6839\u636e\u95e8\u63a7\u7f51\u7edc&nbsp;<em>G<\/em>&nbsp;\u5206\u914d\u7684\u6982\u7387\u8fdb\u884c\u8ba1\u7b97\uff1a<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-109.png\" alt=\"\" class=\"wp-image-19494\" width=\"203\" height=\"78\"\/><\/figure><\/div>\n\n\n\n<p>\u63a5\u4e0b\u6765\uff0c\u4ecb\u7ecd Qwen2 MoE \u4e2d\u91cd\u8981\u7684\u8bbe\u8ba1\u8003\u8651\u3002<\/p>\n\n\n\n<ul><li>&nbsp;<strong>Expert Granularity<\/strong><\/li><\/ul>\n\n\n\n<p>MoE \u6a21\u578b\u548c dense \u6a21\u578b\u95f4\u7684\u5173\u952e\u7ed3\u6784\u5dee\u5f02\u5728\u4e8e MoE \u5c42\u6709\u591a\u4e2a FFN\uff0c\u6bcf\u4e2a FFN \u662f\u4e00\u4e2a\u72ec\u7acb\u7684\u4e13\u5bb6\u3002\u56e0\u6b64\uff0c\u76f4\u63a5\u5c06\u6bcf\u4e2a\u4e13\u5bb6\u7684\u53c2\u6570\u8bbe\u7f6e\u4e3a\u539f\u59cb dense \u6a21\u578b\u4e2d FFN \u7684\u53c2\u6570\u5373\u53ef\u4ece dense \u67b6\u6784\u8fc7\u5ea6\u5230 MoE \u67b6\u6784\u3002<\/p>\n\n\n\n<p>Mistral-7B \u5230 Mixtral 8x7B \u6bcf\u6b21\u4ece 8 \u4e2a\u4e13\u5bb6\u4e2d\u6fc0\u6d3b 2 \u4e2a\u3002<strong>\u800c Qwen2 MoE \u91c7\u7528\u7ec6\u7c92\u5ea6\u4e13\u5bb6\uff0c\u5728\u521b\u5efa\u5c0f\u89c4\u6a21\u4e13\u5bb6\u7684\u540c\u65f6\u6fc0\u6d3b\u66f4\u591a\u6570\u91cf\u7684\u4e13\u5bb6<\/strong>\u3002\u7ed9\u5b9a\u76f8\u540c\u6570\u91cf\u7684\u4e13\u5bb6\u53c2\u6570\u548c\u6fc0\u6d3b\u53c2\u6570\uff0c\u7ec6\u7c92\u5ea6\u4e13\u5bb6\u53ef\u4ee5\u63d0\u4f9b\u66f4\u4e30\u5bcc\u7684\u4e13\u5bb6\u7ec4\u5408\u3002<\/p>\n\n\n\n<p>\u901a\u8fc7\u5229\u7528\u8fd9\u4e9b\u7ec6\u7c92\u5ea6\u4e13\u5bb6\uff0cQwen2 MoE \u4fc3\u8fdb\u4e86\u66f4\u591a\u6837\u548c\u52a8\u6001\u7684\u4e13\u5bb6\u4f7f\u7528\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u6574\u4f53\u7684\u6027\u80fd\u548c\u9002\u914d\u6027\u3002<\/p>\n\n\n\n<ul><li><strong>Expert Routing<\/strong><\/li><\/ul>\n\n\n\n<p>\u4e13\u5bb6\u8def\u7531\u673a\u5236\u7684\u8bbe\u8ba1\u5bf9\u63d0\u5347 MoE \u6a21\u578b\u7684\u6027\u80fd\u81f3\u5173\u91cd\u8981\u3002<\/p>\n\n\n\n<p>\u8fd1\u671f\uff0c\u5728 MoE \u5c42\u5185\u6574\u5408 shared \u548c routing-specific \u4e13\u5bb6\u6709\u7740\u663e\u8457\u8d8b\u52bf\u3002Qwen2 MoE \u91c7\u7528\u4e86\u8fd9\u79cd\u65b9\u6cd5\uff0c\u56e0\u4e3a\u8fd9\u6837\u4fc3\u8fdb\u4e86 shared \u4e13\u5bb6\u5728\u4e0d\u540c\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\uff0c\u540c\u65f6\u4fdd\u7559\u4e86\u5176\u4ed6\u4e13\u5bb6\u5728\u7279\u5b9a\u8def\u7531\u573a\u666f\u4e2d\u7684\u9009\u62e9\u6027\u5e94\u7528\u3002shared \u548c specialized \u4e13\u5bb6\u7684\u5f15\u5165\u4e3a MoE \u8def\u7531\u673a\u5236\u63d0\u4f9b\u4e86\u4e00\u79cd\u9002\u5e94\u6027\u66f4\u5f3a\u3001\u66f4\u6709\u6548\u7684\u65b9\u6848\u3002<\/p>\n\n\n\n<ul><li><strong>Expert Initialization<\/strong><\/li><\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>Paper\uff1aSparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints<br>Abs\uff1ahttps:\/\/arxiv.org\/abs\/2212.05055<\/code><\/pre>\n\n\n\n<p>\u521d\u59cb\u5316\u4e13\u5bb6\u7684\u65b9\u5f0f\u4e0e upcycling \u76f8\u4f3c\uff0c\u5229\u7528 dense \u6a21\u578b\u7684\u6743\u91cd\u3002\u4f46 Qwen2 MoE \u7684\u65b9\u6cd5\u5f3a\u8c03\u7ec6\u7c92\u5ea6\u4e13\u5bb6\u4e4b\u95f4\u7684\u591a\u6837\u5316\u3002<\/p>\n\n\n\n<p>\u7ed9\u5b9a\u6307\u5b9a\u7ec6\u7c92\u5ea6\u4e13\u5bb6\u7684\u4e2d\u95f4\u5c42\u5927\u5c0f&nbsp;<em>h<\/em>E&nbsp;\uff0c\u4e13\u5bb6\u6570\u91cf&nbsp;<em>n<\/em>&nbsp;\uff0c\u539f\u59cb FFN \u4e2d\u95f4\u5c42\u5927\u5c0f&nbsp;<em>h<\/em>FFN&nbsp;\uff0c\u5219\u5c06 FFN \u590d\u5236&nbsp;\u2308<em>h<\/em>FFN<em>n<\/em>\u00d7<em>h<\/em>E\u2309&nbsp;\u6b21\u3002\u53ef\u4ee5\u5bb9\u7eb3\u4efb\u610f\u6570\u91cf\u7684\u4e13\u5bb6\u3002<\/p>\n\n\n\n<p>\u4e3a\u4e86\u4fc3\u8fdb\u6bcf\u4e2a FFN copy \u7684\u591a\u6837\u6027\uff0c\u5bf9\u53c2\u6570\u6cbf\u7740\u4e2d\u95f4\u5c42\u7eac\u5ea6\u8fdb\u884c shuffle\u3002\u8fd9\u4fdd\u8bc1\u4e86\u6bcf\u4e2a\u7ec6\u7c92\u5ea6\u4e13\u5bb6\u5373\u4f7f\u5728\u4e0d\u540c\u7684 FFN copy \u4e2d\u8868\u73b0\u51fa\u72ec\u7279\u7684\u7279\u5f81\u3002<\/p>\n\n\n\n<p>\u4e4b\u540e\uff0c\u4ece FFN copies \u4e2d\u63d0\u53d6\u51fa\u7ec6\u7c92\u5ea6\u4e13\u5bb6\uff0c\u4e22\u5f03\u5176\u4f59\u7684\u7ef4\u5ea6\u3002<\/p>\n\n\n\n<p>\u5bf9\u4e8e\u6bcf\u4e2a\u7ec6\u7c92\u5ea6\u4e13\u5bb6\uff0c\u5c06 50% \u7684\u53c2\u6570\u91cd\u65b0\u968f\u673a\u521d\u59cb\u5316\u3002\u8fd9\u4e2a\u8fc7\u7a0b\u5f15\u5165\u4e86\u989d\u5916\u7684\u968f\u673a\u6027\uff0c\u53ef\u80fd\u6709\u52a9\u4e8e\u589e\u5f3a\u6a21\u578b\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7684\u63a2\u7d22\u80fd\u529b\u3002<\/p>\n\n\n\n<h4>Model Configuration<\/h4>\n\n\n\n<p>5 \u4e2a\u4e0d\u540c\u89c4\u6a21\u7684\u6a21\u578b\uff1a<\/p>\n\n\n\n<ul><li>&nbsp;Qwen2-0.5B<\/li><li>&nbsp;Qwen2-1.5B<\/li><li>&nbsp;Qwen2-7B<\/li><li>&nbsp;Qwen2-57B-A14B<\/li><li>&nbsp;Qwen2-72B<\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"808\" height=\"396\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-110.png\" alt=\"\" class=\"wp-image-19496\" srcset=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-110.png 808w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-110-300x147.png 300w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-110-768x376.png 768w\" sizes=\"(max-width: 808px) 100vw, 808px\" \/><\/figure>\n\n\n\n<p>Qwen2-57B-A14B \u662f\u4ece Qwen2-7B \u5347\u7ea7\u800c\u6765\u3002<\/p>\n\n\n\n<p>\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u4e0e Qwen1.5 \u76f8\u6bd4\uff0cQwen2 \u6a21\u578b\u6bcf\u4e2a token \u7684 Key-Value\uff08KV\uff09\u5927\u5c0f\u8981\u4f4e\u5f97\u591a\u3002\u8fd9\u6837\u53ef\u4ee5\u51cf\u5c11\u5185\u5b58\u5360\u7528\uff0c\u5728\u957f\u4e0a\u4e0b\u6587\u63a8\u7406\u4e2d\u66f4\u6709\u4f18\u52bf\u3002<\/p>\n\n\n\n<h2>Pre-Training<\/h2>\n\n\n\n<p>\u5bf9\u4e8e Qwen2 \u7684\u9884\u8bad\u7ec3\uff0c\u91cd\u70b9\u662f\u4f18\u5316\u6570\u636e\u96c6\u5e76\u4e14\u63a2\u7d22\u4e0a\u4e0b\u6587\u957f\u5ea6\u6269\u5c55\u7684\u6709\u6548\u65b9\u6cd5\u3002<\/p>\n\n\n\n<h3>Pre-Training Data<\/h3>\n\n\n\n<p>\u4e00\u4e2a\u65b0\u7684\u3001\u5927\u89c4\u6a21\u3001\u9ad8\u8d28\u91cf\u7684\u591a\u8bed\u8a00\u6570\u636e\u96c6\u3002<\/p>\n\n\n\n<p>\u4e0e\u4e4b\u524d\u7684 Qwen \u548c Qwen1.5 \u76f8\u6bd4\uff0c\u5728\u8bed\u6599\u5e93\u4e0a\u6709\u6240\u6539\u8fdb\uff0c\u589e\u5f3a\u4e86\u51e0\u4e2a\u5173\u952e\u9886\u57df\u7684\u6570\u636e\u89c4\u6a21\u3001\u8d28\u91cf\u548c\u591a\u6837\u6027\u3002<\/p>\n\n\n\n<h4>Quality Enhancement<\/h4>\n\n\n\n<p><strong>\u901a\u8fc7\u989d\u5916\u7684\u542f\u53d1\u5f0f\u7b97\u6cd5\u548c\u57fa\u4e8e\u6a21\u578b\u7684\u65b9\u6cd5\u5bf9\u6570\u636e\u8fc7\u6ee4\u7b97\u6cd5\u8fdb\u884c\u6539\u8fdb\uff0c\u5305\u62ec\u4f7f\u7528 Qwen \u6a21\u578b\u8fc7\u6ee4\u51fa\u4f4e\u8d28\u91cf\u7684\u6570\u636e\u3002\u6b64\u5916\uff0c\u8fd9\u4e9b\u6a21\u578b\u88ab\u7528\u4e8e\u5408\u6210\u9ad8\u8d28\u91cf\u7684\u9884\u8bad\u7ec3\u6570\u636e\u3002<\/strong><\/p>\n\n\n\n<h4>Data Expansion<\/h4>\n\n\n\n<p>\u4e0e Qwen1.5 \u76f8\u6bd4\uff0c<strong>Qwen2 \u6536\u96c6\u4e86\u5927\u91cf\u9ad8\u8d28\u91cf\u7684\u4ee3\u7801\u3001\u6570\u5b66\u548c\u591a\u8bed\u8a00\u6570\u636e\uff0c\u589e\u5f3a\u4e86\u6a21\u578b\u5bf9\u5e94\u7684\u80fd\u529b\u3002<\/strong><\/p>\n\n\n\n<p>\u65b0\u7684\u6570\u636e\u96c6\u652f\u6301\u5927\u7ea6 30 \u79cd\u8bed\u8a00\uff0c\u4f8b\u5982\u82f1\u8bed\u3001\u4e2d\u6587\u3001\u897f\u73ed\u7259\u8bed\u3001\u6cd5\u8bed\u3001\u5fb7\u8bed\u3001\u963f\u62c9\u4f2f\u8bed\u3001\u4fc4\u8bed\u3001\u97e9\u8bed\u3001\u65e5\u8bed\u3001\u6cf0\u8bed\u548c\u8d8a\u5357\u8bed\u3002<\/p>\n\n\n\n<h4>Distribution Improvement<\/h4>\n\n\n\n<p>\u4e3a\u4e86\u786e\u4fdd\u6a21\u578b\u5b66\u4e60\u7684\u5206\u5e03\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\uff0c\u5bf9\u5c0f\u6a21\u578b\u8fdb\u884c\u4e86\u5b9e\u9a8c\u6765\u4f18\u5316\u4e0d\u540c\u6765\u6e90\u548c\u9886\u57df\u6570\u636e\u7684\u6df7\u5408\u3002<\/p>\n\n\n\n<p>\u57fa\u4e8e\u6b64\uff0c\u9884\u8bad\u7ec3\u6570\u636e\u4ece Qwen1.5 \u7684 3T tokens \u589e\u52a0\u5230\u4e86 7T tokens\u3002\u8fdb\u4e00\u6b65\u653e\u5bbd\u9608\u503c\u5219\u6709 12T tokens\u3002<\/p>\n\n\n\n<p>\u7136\u800c\uff0c\u5728 12T tokens \u6570\u636e\u96c6\u4e0a\u8bad\u7ec3\u7684\u6a21\u578b\uff080.5B\uff09\u6ca1\u6709\u6bd4 7T tokens \u4e0a\u7684\u6a21\u578b\u6709\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\u3002\u53ef\u4ee5\u6000\u7591\u6570\u636e\u91cf\u7684\u589e\u52a0\u4e0d\u4e00\u5b9a\u6709\u76ca\u4e8e\u6a21\u578b\u7684\u8bad\u7ec3\u3002<\/p>\n\n\n\n<p>\u56e0\u6b64\uff0c\u8003\u8651\u5230\u6210\u672c\uff0c\u9009\u62e9\u4f7f\u7528\u66f4\u9ad8\u8d28\u91cf\u7684 7T tokens \u6570\u636e\u96c6\u6765\u8bad\u7ec3\u66f4\u5927\u7684\u6a21\u578b\u3002<\/p>\n\n\n\n<p>\u6240\u6709\u7684 Qwen2 dense \u6a21\u578b\uff08\u4e0d\u5305\u62ec Qwen2-0.5B\uff09\u90fd\u662f\u5728 7T tokens \u6570\u636e\u96c6\u4e0a\u9884\u8bad\u7ec3\u7684\u3002Qwen-0.5B \u5728 12T tokens \u6570\u636e\u96c6\u4e0a\u9884\u8bad\u7ec3\u3002MoE \u6a21\u578b\u6839\u636e upcycling \u539f\u5219\u989d\u5916\u7ecf\u8fc7\u4e864.5T tokens \u9884\u8bad\u7ec3\u3002<\/p>\n\n\n\n<p>\u4e0e\u4e4b\u524d\u7684 Qwen \u6a21\u578b\u7c7b\u4f3c\uff0c\u9ad8\u8d28\u91cf\u7684\u591a\u4efb\u52a1 instruction \u6570\u636e\u88ab\u6574\u5408\u5230 Qwen2 \u7684\u9884\u8bad\u7ec3\u4e4b\u4e2d\uff0c\u4ee5\u63d0\u9ad8\u6a21\u578b\u7684 ICL \u548c instruction-following \u80fd\u529b\u3002<\/p>\n\n\n\n<h3>Long-Context Training<\/h3>\n\n\n\n<p>\u4e3a\u4e86\u589e\u5f3a Qwen2 \u7684\u957f\u4e0a\u4e0b\u6587\u80fd\u529b\uff0c\u5728\u9884\u8bad\u7ec3\u7684\u7ed3\u675f\u9636\u6bb5\u5c06\u4e0a\u4e0b\u6587\u957f\u5ea6\u4ece 4096 tokens \u589e\u52a0\u5230\u4e86 32768 tokens\u3002\u8fd9\u4e00\u6269\u5c55\u8fc7\u7a0b\u5f15\u5165\u4e86\u5927\u91cf\u9ad8\u8d28\u91cf\u3001\u66f4\u957f\u7684\u6570\u636e\u3002\u540c\u65f6\u5c06 RoPE \u7684\u57fa\u9891\u4ece 10000 \u4fee\u6539\u4e3a 1000000\uff0c\u4ee5\u4f18\u5316\u957f\u4e0a\u4e0b\u6587\u573a\u666f\u4e2d\u7684\u6027\u80fd\u3002<\/p>\n\n\n\n<p>\u4e3a\u4e86\u5145\u5206\u5229\u7528\u6a21\u578b\u7684\u957f\u5ea6\u5916\u63a8\u6f5c\u529b\uff0c\u91c7\u7528\u4e86 YARN \u673a\u5236\u548c Dual Chunk Attention \u673a\u5236\u3002\u8fd9\u4e9b\u7b56\u7565\u4f7f\u6a21\u578b\u53ef\u4ee5\u5904\u7406\u591a\u8fbe 131072 tokens \u5e8f\u5217\uff0c\u540c\u65f6\u4fdd\u6301\u8f83\u9ad8\u7684\u6027\u80fd\u3002<\/p>\n\n\n\n<h1>Post-Training<\/h1>\n\n\n\n<p>\u5728\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u4e4b\u540e\uff0c\u5bf9 Qwen2 \u8fdb\u884c\u4e86 post-training\u3002\u8fd9\u4e00\u8fc7\u7a0b\u5bf9\u4e8e\u63d0\u5347\u6a21\u578b\u5728\u5305\u62ec\u4ee3\u7801\u3001\u6570\u5b66\u3001\u903b\u8f91\u63a8\u7406\u3001\u6307\u4ee4\u9075\u5faa\u3001\u591a\u8bed\u8a00\u7406\u89e3\u7b49\u9886\u57df\u7684\u80fd\u529b\u5f88\u91cd\u8981\u3002\u6b64\u5916\uff0c\u8fd9\u4e00\u8fc7\u7a0b\u4e5f\u786e\u4fdd\u4e86\u6a21\u578b\u751f\u6210\u7684\u5185\u5bb9\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u5bf9\u9f50\uff0c\u4f7f\u6a21\u578b helpful\u3001\u8bda\u5b9e\u3001\u65e0\u5bb3\u3002<\/p>\n\n\n\n<p>\u4e0e\u4e25\u91cd\u4f9d\u8d56\u5927\u91cf\u4eba\u5de5\u76d1\u7763\u7684\u4f20\u7edf\u65b9\u6cd5\u4e0d\u540c\uff0cQwen2 \u66f4\u4fa7\u91cd\u4e8e\u53ef\u6269\u5c55\u7684\u5bf9\u9f50\uff0c\u5c3d\u91cf\u51cf\u5c11\u4eba\u5de5\u6807\u6ce8\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u7814\u7a76\u4e86\u83b7\u53d6 SFT \u548c RLHF \u9ad8\u8d28\u91cf\u6570\u636e\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u63d0\u5347\u6570\u636e\u8d28\u91cf\u548c\u53ef\u9760\u6027\u7684\u540c\u65f6\u51cf\u5c11\u5bf9\u4eba\u7c7b\u6807\u67f1\u7684\u9700\u6c42\u3002<\/p>\n\n\n\n<h3>Post-Training Data<\/h3>\n\n\n\n<p>\u6570\u636e\u4e3b\u8981\u7531\u4e24\u90e8\u5206\u7ec4\u6210\uff1a<\/p>\n\n\n\n<ul><li>&nbsp;demonstration \u6570\u636e&nbsp;D={(<em>x<\/em><em>i<\/em>,<em>y<\/em><em>i<\/em>)}<\/li><li>\u2022&nbsp;\u504f\u597d\u6570\u636e&nbsp;P={(<em>x<\/em><em>i<\/em>,<em>y<\/em><em>i<\/em>+,<em>y<\/em><em>i<\/em>\u2212)}<\/li><\/ul>\n\n\n\n<p>\u5176\u4e2d\uff0c&nbsp;<em>x<\/em><em>i<\/em>&nbsp;\u8868\u793a\u6307\u4ee4\uff08instruction\uff09\uff0c&nbsp;<em>y<\/em><em>i<\/em>&nbsp;\u8868\u793a response\uff0c&nbsp;<em>y<\/em><em>i<\/em>+&nbsp;\u548c&nbsp;<em>y<\/em><em>i<\/em>\u2212&nbsp;\u662f&nbsp;<em>x<\/em><em>i<\/em>&nbsp;\u7684\u4e24\u4e2a response\uff0c\u4f46&nbsp;<em>y<\/em><em>i<\/em>+\u6bd4<em>y<\/em><em>i<\/em>\u2212&nbsp;\u66f4\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u3002SFT \u4f7f\u7528\u7684\u6570\u636e\u96c6\u4e3a&nbsp;D&nbsp;\uff0cRLHF \u4f7f\u7528\u7684\u6570\u636e\u96c6\u4e3a&nbsp;P&nbsp;\u3002<\/p>\n\n\n\n<p>\u8bad\u7ec3\u6570\u636e\u7684\u6784\u5efa\u5305\u62ec\u4e24\u4e2a\u6b65\u9aa4\uff1a\u534f\u4f5c\u5f0f\u6570\u636e\u6807\u6ce8\u548c\u81ea\u52a8\u5316\u6570\u636e\u5408\u6210\u3002<\/p>\n\n\n\n<p>\u9996\u5148\uff0c\u4ece\u5927\u89c4\u6a21\u7684\u6307\u4ee4\u8bed\u6599\u5e93\u4e2d\u63d0\u53d6\u6570\u636e\u672c\u4f53\uff0c\u4ece\u800c\u5f97\u5230\u4e00\u6279\u5e7f\u6cdb\u800c\u591a\u6837\u7684\u9ad8\u8d28\u91cf\u6307\u4ee4\u3002\u8fd9\u4e9b\u6307\u4ee4\u901a\u8fc7\u7cfb\u7edf\u5316\u589e\u5f3a\u6765\u589e\u52a0\u590d\u6742\u6027\u3002\u901a\u8fc7\u4eba\u5de5\u6807\u6ce8\uff0c\u5f97\u5230\u4e86&nbsp;<em>t<\/em><em>a<\/em><em>r<\/em><em>g<\/em><em>e<\/em><em>t<\/em><em>res<\/em><em>p<\/em><em>o<\/em><em>n<\/em><em>se<\/em><em>y<\/em><em>i<\/em>&nbsp;\u548c\u6b63\u8d1f\u4f8b\u5bf9&nbsp;(<em>y<\/em><em>i<\/em>+,<em>y<\/em><em>i<\/em>\u2212)&nbsp;\u3002<\/p>\n\n\n\n<p>\u968f\u540e\uff0c\u8bb8\u591a\u81ea\u52a8\u5316\u5bf9\u9f50\u7b56\u7565\u88ab\u5e94\u7528\u4e8e\u5927\u91cf\u5408\u6210\u4ee3\u7801\u3001\u6570\u5b66\u3001\u6307\u4ee4\u9075\u5faa\u3001\u521b\u4f5c\u3001\u89d2\u8272\u626e\u6f14\u3001\u5b89\u5168\u7b49\u9886\u57df\u7684\u4eba\u5de5\u6807\u6ce8\u6570\u636e\u3002<\/p>\n\n\n\n<h4>Collaborative Data Annotation<\/h4>\n\n\n\n<ul><li>\u2022<strong>Automatic Ontology Extraction<\/strong><\/li><\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>Paper\uff1aInsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models<br>Abs\uff1ahttps:\/\/arxiv.org\/abs\/2308.07074<\/code><\/pre>\n\n\n\n<p>\u9996\u5148\u5e94\u7528\u4e00\u4e2a\u5f00\u653e\u5f0f\u7ec6\u7c92\u5ea6\u6807\u6ce8\u5668 InsTag \u6765\u4ece\u5927\u89c4\u6a21\u6307\u4ee4\u6570\u636e\u4e2d\u63d0\u53d6\u51fa\u5e95\u5c42\u672c\u4f53\u3002\u540e\u7eed\u8fdb\u884c\u624b\u5de5\u4f18\u5316\u786e\u4fdd\u63d0\u53d6\u7684\u672c\u4f53\u7684\u51c6\u786e\u6027\u3002<\/p>\n\n\n\n<ul><li><strong>Instruction Selection<\/strong><\/li><\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>Paper\uff1aHow Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition<br>Abs\uff1ahttps:\/\/arxiv.org\/abs\/2310.05492<\/code><\/pre>\n\n\n\n<p>\u6bcf\u4e2a\u5e26\u6709 tag \u6807\u8bb0\u7684\u6307\u4ee4\u90fd\u4f1a\u6839\u636e tag \u591a\u6837\u6027\u3001\u8bed\u4e49\u4e30\u5bcc\u884c\u3001\u590d\u6742\u6027\u3001\u610f\u56fe\u5b8c\u6574\u6027\u8fdb\u884c\u8bc4\u4f30\u3002\u9009\u62e9\u4e00\u7ec4\u5177\u6709\u4ee3\u8868\u6027\u7684\u6307\u4ee4\u3002<\/p>\n\n\n\n<ul><li><strong>Instruction Evolution<\/strong><\/li><\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>Paper\uff1aChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools<br>Abs\uff1ahttps:\/\/arxiv.org\/abs\/2406.12793<\/code><\/pre>\n\n\n\n<p>\u4e3a\u4e86\u4e30\u5bcc\u6307\u4ee4\u6570\u636e\u96c6\uff0c\u91c7\u7528\u4e86 self-evolution \u7b56\u7565\u6765 prompt Qwen \u6a21\u578b\u5411\u73b0\u6709\u7684\u6307\u4ee4\u6dfb\u52a0\u7ea6\u675f\u6216\u8981\u6c42\uff0c\u4ece\u800c\u589e\u52a0\u6307\u4ee4\u7684\u590d\u6742\u5ea6\u5e76\u786e\u4fdd\u6570\u636e\u96c6\u4e2d\u96be\u5ea6\u5206\u5e03\u7684\u591a\u6837\u6027\u3002<\/p>\n\n\n\n<ul><li><strong>Human Annotation<\/strong><\/li><\/ul>\n\n\n\n<p>\u4f7f\u7528\u4e0d\u540c\u7684\u751f\u6210\u7b56\u7565\u548c\u4e0d\u540c\u89c4\u6a21\u7684 Qwen \u6a21\u578b\u83b7\u5f97\u5bf9\u6307\u4ee4\u7684\u591a\u4e2a responses\u3002\u6807\u6ce8\u8005\u6839\u636e\u4ed6\u4eec\u7684\u504f\u597d\u5bf9\u8fd9\u4e9b responses \u8fdb\u884c\u6392\u5e8f\uff0c\u786e\u4fdd\u6700\u4f73 response \u7b26\u5408\u8981\u6c42\uff0c\u8fdb\u800c\u4ea7\u751f demonstration \u548c\u504f\u597d\u6570\u636e\u3002<\/p>\n\n\n\n<h4>Automated Data Synthesis<\/h4>\n\n\n\n<p>\u4fdd\u8bc1\u6307\u4ee4 response \u7684\u6807\u6ce8\u8d28\u91cf\u662f\u4e00\u4e2a\u5de8\u5927\u7684\u6311\u6218\uff0c\u7279\u522b\u662f\u90a3\u4e9b\u9700\u8981\u4e13\u4e1a\u77e5\u8bc6\u3001\u7ecf\u9a8c\u3001\u7ec6\u5fc3\u6216\u8010\u5fc3\u7684\u6807\u6ce8\u3002\u56e0\u6b64\uff0c\u8bbe\u8ba1\u4e86\u591a\u79cd\u81ea\u52a8\u5bf9\u9f50\u7b56\u7565\u6765\u5927\u89c4\u6a21\u5408\u6210\u6570\u636e\u3002<\/p>\n\n\n\n<ul><li><strong>Rejection Sampling<\/strong><\/li><\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>Paper\uff1aScaling Relationship on Learning Mathematical Reasoning with Large Language Models<br>Abs\uff1ahttps:\/\/arxiv.org\/abs\/2308.01825<\/code><\/pre>\n\n\n\n<p>\u5bf9\u4e8e\u6570\u5b66\u7b49\u5e26\u6709\u660e\u786e\u6700\u7ec8\u7b54\u6848\u7684\u4efb\u52a1\uff0c\u91c7\u7528\u62d2\u7edd\u91c7\u6837\u6765\u63d0\u5347 solution \u7684\u8d28\u91cf\u3002<\/p>\n\n\n\n<p>LLM \u88ab\u7528\u6765\u4e3a\u6bcf\u4e2a\u6307\u4ee4\u751f\u6210\u591a\u4e2a\u63a8\u7406\u8def\u5f84 responses\u3002\u4fdd\u7559\u53ef\u4ee5\u5f97\u51fa\u51c6\u786e\u7ed3\u679c\u5e76\u4e14\u88ab\u6a21\u578b\u8ba4\u4e3a\u5408\u7406\u7684\u63a8\u7406\u8def\u5f84\uff0c\u4f5c\u4e3a demonstration \u6570\u636e\u3002\u901a\u8fc7\u5bf9\u6bd4\u6b63\u786e\u548c\u4e0d\u6b63\u786e\u7684\u63a8\u7406\u8def\u5f84\u6765\u83b7\u5f97\u504f\u597d\u6570\u636e\u3002<\/p>\n\n\n\n<ul><li><strong>Execution Feedback<\/strong><\/li><\/ul>\n\n\n\n<p>\u5bf9\u4e8e\u4ee3\u7801\u4efb\u52a1\uff0cLLM \u88ab\u7528\u6765\u751f\u6210\u89e3\u51b3\u65b9\u6848\u548c\u76f8\u5173\u6d4b\u8bd5\u7528\u4f8b\u3002\u901a\u8fc7\u6839\u636e\u6d4b\u8bd5\u7528\u4f8b\u7f16\u8bd1\u548c\u6267\u884c\u751f\u6210\u7684\u89e3\u51b3\u65b9\u6848\u6765\u8bc4\u4f30\u5176\u6709\u6548\u6027\uff0c\u4ece\u800c\u5f97\u5230 demonstration \u548c \u504f\u597d\u6570\u636e\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Paper\uff1aSelf-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models<br>Abs\uff1ahttps:\/\/arxiv.org\/abs\/2406.13542<\/code><\/pre>\n\n\n\n<p>\u8fd9\u79cd\u65b9\u6cd5\u4e5f\u53ef\u4ee5\u5e94\u7528\u4e8e\u8bc4\u4f30\u6307\u4ee4\u9075\u5faa\u80fd\u529b\u3002\u5bf9\u4e8e\u6709\u7ea6\u675f\u6761\u4ef6\u7684\u6307\u4ee4\uff0cLLM \u53ef\u4ee5\u751f\u6210 Python \u9a8c\u8bc1\u51fd\u6570\uff0c\u786e\u4fdd response \u7b26\u5408\u6307\u4ee4\u7684\u8981\u6c42\u3002<\/p>\n\n\n\n<ul><li><strong>Data Repurposing<\/strong><\/li><\/ul>\n\n\n\n<p>\u5bf9\u4e8e\u6ca1\u6709\u53d7\u8fc7\u4e13\u4e1a\u8bad\u7ec3\u7684\u6807\u6ce8\u4eba\u5458\u6765\u8bf4\uff0c\u5728\u6587\u5b66\u5199\u4f5c\u4efb\u52a1\u4e2d\u5f88\u96be\u7ed9\u51fa\u719f\u7ec3\u7684 response\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u4ece\u516c\u5f00\u9886\u57df\u6536\u96c6\u9ad8\u8d28\u91cf\u7684\u6587\u5b66\u4f5c\u54c1\uff0c\u7528 LLM \u6765\u7ed9\u51fa\u5e26\u6709\u4e0d\u540c\u7a0b\u5ea6\u7684\u7ec6\u8282\u7684\u6307\u4ee4\u3002\u5f97\u5230\u7684\u6307\u4ee4\u548c\u539f\u59cb\u6587\u5b66\u4f5c\u54c1\u4e00\u8d77\u4f5c\u4e3a demonstration \u6570\u636e\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Paper\uff1aLarge Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment<br>Abs\uff1ahttps:\/\/arxiv.org\/abs\/2401.12474<\/code><\/pre>\n\n\n\n<p>\u4f8b\u5982\uff0c\u4e3a\u4e86\u5f97\u5230\u751f\u52a8\u903c\u771f\u7684\u89d2\u8272\u626e\u6f14 responses\uff0c\u4ece\u7ef4\u57fa\u767e\u79d1\u7b49\u77e5\u8bc6\u5e93\u4e2d\u83b7\u53d6\u8be6\u7ec6\u7684\u89d2\u8272\u7b80\u4ecb\uff0c\u7136\u540e\u6307\u793a LLM \u751f\u6210\u76f8\u5e94\u7684\u6307\u4ee4\u548c responses\u3002\u8fd9\u4e2a\u7c7b\u4f3c\u9605\u8bfb\u7406\u89e3\u4efb\u52a1\u7684\u8fc7\u7a0b\u53ef\u4ee5\u786e\u4fdd\u4fdd\u6301\u89d2\u8272\u7b80\u4ecb\u7684\u5b8c\u6574\u6027\u3002<\/p>\n\n\n\n<ul><li><strong>Constitutional Feedback<\/strong><\/li><\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>Paper\uff1aConstitutional AI: Harmlessness from AI Feedback<br>Abs\uff1ahttps:\/\/arxiv.org\/abs\/2212.08073<\/code><\/pre>\n\n\n\n<p>Constitutional AI \u6307\u7684\u662f\u901a\u8fc7\u5f15\u5bfc LLM \u6765\u57fa\u4e8e\u9884\u5b9a\u4e49\u7684\u89c4\u5219\u751f\u6210 responses \u7684\u8fc7\u7a0b\u3002\u4e3a\u4e86\u786e\u4fdd\u9075\u5b88\u5b89\u5168\u548c\u4ef7\u503c\u89c2\u7b49\u51c6\u5219\uff0c\u7f16\u5236\u4e86 constitution \u6570\u636e\u96c6\u3002\u6570\u636e\u96c6\u63cf\u8ff0\u4e86 LLM \u5e94\u8be5\u9075\u5faa\u548c\u5e94\u8be5\u907f\u514d\u7684\u539f\u5219\uff0c\u7528\u4e8e\u6307\u793a LLM \u505a\u51fa\u4e0e\u8fd9\u4e9b\u89c4\u5219\u4e00\u81f4\u6216\u8005\u504f\u79bb\u7684 responses\uff0c\u8fdb\u800c\u4f5c\u4e3a demonstration \u548c\u504f\u597d\u6570\u636e\u7684\u53c2\u8003\u3002<\/p>\n\n\n\n<h3>Supervised Fine-Tuning<\/h3>\n\n\n\n<p>\u6536\u96c6\u4e86\u4e00\u4e2a\u5927\u89c4\u6a21 SFT \u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u8d85\u8fc7 50 \u4e07\u6761\u6570\u636e\uff0c\u6db5\u76d6\u6307\u4ee4\u9075\u5faa\u3001\u4ee3\u7801\u3001\u6570\u5b66\u3001\u903b\u8f91\u63a8\u7406\u3001\u89d2\u8272\u626e\u6f14\u3001\u591a\u8bed\u8a00\u3001\u5b89\u5168\u7b49\u9886\u57df\u3002<\/p>\n\n\n\n<p>\u6a21\u578b\u5fae\u8c03 2 \u4e2a epoch\uff0c\u5e8f\u5217\u957f\u5ea6 32768 tokens\u3002\u5b66\u4e60\u7387\u4ece&nbsp;7\u00d710\u22126&nbsp;\u9010\u6e10\u964d\u4f4e\u5230&nbsp;7\u00d710\u22127&nbsp;\u3002<\/p>\n\n\n\n<p>\u4e3a\u4e86\u89e3\u51b3\u8fc7\u62df\u5408\u95ee\u9898\uff0c\u5e94\u7528\u4e86 0.1 \u7684 weight decay\uff0c\u5e76\u8fdb\u884c\u4e0a\u9650\u4e3a 1.0 \u7684\u68af\u5ea6\u88c1\u526a\u3002<\/p>\n\n\n\n<h3>Reinforcement Learning from Human Feedback<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>Paper\uff1aDirect Preference Optimization: Your Language Model is Secretly a Reward Model<br>Abs\uff1ahttps:\/\/arxiv.org\/abs\/2305.18290<br><br>Paper\uff1aOnline Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment<br>Abs\uff1ahttps:\/\/arxiv.org\/abs\/2405.17931<\/code><\/pre>\n\n\n\n<p>RLHF \u8bad\u7ec3\u5305\u62ec\u4e24\u4e2a\u8fde\u7eed\u7684\u9636\u6bb5\uff1aoffline \u548c online \u8bad\u7ec3\u3002<\/p>\n\n\n\n<p>\u5728 offline \u8bad\u7ec3\u9636\u6bb5\uff0c\u4f7f\u7528\u9884\u5148\u51c6\u5907\u597d\u7684\u504f\u597d\u6570\u636e\u96c6&nbsp;P&nbsp;\u8fdb\u884c DPO \u8bad\u7ec3\u3002<\/p>\n\n\n\n<p>\u5728 online \u8bad\u7ec3\u9636\u6bb5\uff0c\u6a21\u578b\u5229\u7528\u5956\u52b1\u6a21\u578b\uff08Reward Model\uff0cRM\uff09 \u5b9e\u65f6\u5730\u8fed\u4ee3\u5f0f\u63d0\u5347\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u4ece\u5f53\u524d\u7684 policy \u6a21\u578b\u4e2d\u91c7\u6837\u591a\u4e2a response\uff0c\u7528 RM \u6765\u9009\u62e9\u6700\u4f73\u548c\u6700\u5dee\u7684 response \u6765\u5f62\u6210\u6bcf\u4e00\u6bb5 DPO \u8bad\u7ec3\u7684\u504f\u597d pair \u6570\u636e\u3002\u6b64\u5916\uff0c\u8fd8\u91c7\u7528 Online Merging Optimizer \u6765\u7f13\u89e3\u5bf9\u9f50\u7a0e\u95ee\u9898\uff08\u6a21\u578b\u4e0e\u4eba\u7c7b\u5bf9\u9f50\u8fc7\u7a0b\u4e2d\u9020\u6210\u7684\u6027\u80fd\u4e0b\u964d\uff09\u3002<\/p>\n\n\n\n<h1>Evaluation<\/h1>\n\n\n\n<p>\u4e3a\u4e86\u5168\u9762\u8bc4\u4f30 Qwen2 Base \u6a21\u578b\u548c Instruct \u6a21\u578b\uff0c\u5b9e\u65bd\u4e86\u4e00\u4e2a\u5168\u9762\u8bc4\u4f30\u534f\u8bae\u3002\u534f\u8bae\u8003\u5dee\u4e86\u5305\u62ec\u4e00\u822c\u77e5\u8bc6\u7406\u89e3\u3001\u8bed\u8a00\u7406\u89e3\u3001\u751f\u6210\u3001\u4ee3\u7801\u3001\u6570\u5b66\u3001\u63a8\u7406\u548c\u5176\u4ed6\u4e13\u4e1a\u9886\u57df\u7b49\u4e00\u7cfb\u5217\u80fd\u529b\u3002<\/p>\n\n\n\n<p>\u5177\u4f53\u6765\u8bf4\uff0c\u9664\u975e\u53e6\u6709\u8bf4\u660e\uff0c\u7528 LLM \u901a\u8fc7 few-shot prompting \u5bf9 Base \u6a21\u578b\u8fdb\u884c benchmark \u8bc4\u4f30\u3002\u5bf9\u4e8e Instruct \u6a21\u578b\uff0c\u9664\u4e86 benchmark \u8bc4\u4f30\u5916\uff0c\u8fd8\u4f18\u5148\u8003\u8651\u4eba\u7c7b\u504f\u597d\u8bc4\u4f30\u3002<\/p>\n\n\n\n<h3>Base Language Models<\/h3>\n\n\n\n<h4>Core Capabilities<\/h4>\n\n\n\n<p>\u53ea\u653e 72B \u548c 7B \u7ed3\u679c\uff0c\u66f4\u591a\u66f4\u8be6\u7ec6\u7684\u7ed3\u679c\u89c1\u539f\u59cb\u8bba\u6587\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"987\" height=\"997\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-111.png\" alt=\"\" class=\"wp-image-19501\" srcset=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-111.png 987w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-111-297x300.png 297w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-111-768x776.png 768w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-111-120x120.png 120w\" sizes=\"(max-width: 987px) 100vw, 987px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"1006\" height=\"1008\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-112.png\" alt=\"\" class=\"wp-image-19502\" srcset=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-112.png 1006w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-112-300x300.png 300w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-112-150x150.png 150w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-112-768x770.png 768w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-112-120x120.png 120w\" sizes=\"(max-width: 1006px) 100vw, 1006px\" \/><\/figure>\n\n\n\n<h3>Instruction-Tuned Model<\/h3>\n\n\n\n<h4>Open Benchmark Evaluation<\/h4>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"1006\" height=\"819\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-113.png\" alt=\"\" class=\"wp-image-19503\" srcset=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-113.png 1006w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-113-300x244.png 300w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-113-768x625.png 768w\" sizes=\"(max-width: 1006px) 100vw, 1006px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"1003\" height=\"802\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-114.png\" alt=\"\" class=\"wp-image-19504\" srcset=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-114.png 1003w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-114-300x240.png 300w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-114-768x614.png 768w\" sizes=\"(max-width: 1003px) 100vw, 1003px\" \/><\/figure>\n\n\n\n<h4>In-House Automatic Evalution<\/h4>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"1021\" height=\"727\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-115.png\" alt=\"\" class=\"wp-image-19505\" srcset=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-115.png 1021w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-115-300x214.png 300w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-115-768x547.png 768w\" sizes=\"(max-width: 1021px) 100vw, 1021px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"1021\" height=\"817\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-116.png\" alt=\"\" class=\"wp-image-19506\" srcset=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-116.png 1021w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-116-300x240.png 300w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-116-768x615.png 768w\" sizes=\"(max-width: 1021px) 100vw, 1021px\" \/><\/figure>\n\n\n\n<h4>&nbsp;Long Context Capabilities<\/h4>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"960\" height=\"1024\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-117-960x1024.png\" alt=\"\" class=\"wp-image-19507\" srcset=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-117-960x1024.png 960w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-117-281x300.png 281w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-117-768x819.png 768w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-117.png 988w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/><\/figure>\n\n\n\n<h4>Multilingual Evaluation<\/h4>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"1003\" height=\"553\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-118.png\" alt=\"\" class=\"wp-image-19508\" srcset=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-118.png 1003w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-118-300x165.png 300w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-118-768x423.png 768w\" sizes=\"(max-width: 1003px) 100vw, 1003px\" \/><\/figure>\n\n\n\n<h4>Safety &amp; Responsebility<\/h4>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"1011\" height=\"297\" src=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-119.png\" alt=\"\" class=\"wp-image-19509\" srcset=\"http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-119.png 1011w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-119-300x88.png 300w, http:\/\/139.9.1.231\/wp-content\/uploads\/2024\/09\/image-119-768x226.png 768w\" sizes=\"(max-width: 1011px) 100vw, 1011px\" \/><\/figure>\n\n\n\n<h1>Conclusion<\/h1>\n\n\n\n<p>\u672c\u6280\u672f\u62a5\u544a\u4ecb\u7ecd\u4e86 Qwen2 \u7cfb\u5217\uff0c\u4e00\u5957\u591a\u529f\u80fd\u7684 Base \u548c Instruct \u8bed\u8a00\u6a21\u578b\uff0c\u53c2\u6570\u4ece 0.5B \u5230 72B \u4e0d\u7b49\uff0c\u5305\u62ec dense \u6a21\u578b\u548c MoE \u6a21\u578b\u3002<\/p>\n\n\n\n<p>Qwen2 \u8d85\u8d8a\u4e86\u4e4b\u524d\u7684\u5f00\u6e90\u6a21\u578b\uff0c\u5728\u8bed\u8a00\u7406\u89e3\u3001\u751f\u6210\u3001\u591a\u8bed\u8a00\u80fd\u529b\u3001\u4ee3\u7801\u3001\u6570\u5b66\u3001\u63a8\u7406\u7b49\u5e7f\u6cdb\u7684 benchmark \u8bc4\u4f30\u4e2d\u8868\u73b0\u51fa\u4e86\u4e0e\u95ed\u6e90\u6a21\u578b\u7684\u7ade\u4e89\u529b\u3002<\/p>\n\n\n\n<p>\u5728\u672c\u6b21\u7684 Qwen2 \u66f4\u65b0\u4e2d\uff0c\u6211\u4eec\u7279\u522b\u5173\u6ce8\u957f\u4e0a\u4e0b\u6587\u3001\u591a\u8bed\u8a00\u3001\u4ee3\u7801\u3001\u6570\u5b66\u3001\u5b89\u5168\u548c responsibility\u3002<\/p>\n\n\n\n<p>\u4e3a\u4e86\u4fc3\u8fdb\u793e\u533a\u7684\u53d1\u5c55\u548c\u5f00\u653e\uff0c\u6211\u4eec\u5f00\u6e90\u4e86 Qwen2 \u6a21\u578b\u6743\u91cd\uff0c\u4f7f\u7814\u7a76\u4eba\u5458\u548c\u5f00\u53d1\u4eba\u5458\u80fd\u591f\u5728\u5404\u79cd\u5e94\u7528\u548c\u7814\u7a76\u9879\u76ee\u4e2d\u5145\u5206\u5229\u7528 Qwen2 \u7684\u6f5c\u529b\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u4e3a AI \u6280\u672f\u7684\u8fdb\u6b65\u53ca\u5176\u5bf9\u793e\u4f1a\u7684\u79ef\u6781\u5f71\u54cd\u505a\u51fa\u8d21\u732e\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Abs\uff1ahttps:\/\/arxiv.org\/abs\/2407.10671Code\uff1ahttps:\/\/github &hellip; <a href=\"http:\/\/139.9.1.231\/index.php\/2024\/09\/21\/qwen2-technical-report\/\" class=\"more-link\">\u7ee7\u7eed\u9605\u8bfb<span class=\"screen-reader-text\">Qwen2 \u6280\u672f\u62a5\u544a<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[4,9,38,34],"tags":[],"_links":{"self":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts\/19008"}],"collection":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/comments?post=19008"}],"version-history":[{"count":27,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts\/19008\/revisions"}],"predecessor-version":[{"id":23227,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts\/19008\/revisions\/23227"}],"wp:attachment":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/media?parent=19008"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/categories?post=19008"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/tags?post=19008"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}