{"id":23320,"date":"2024-09-10T21:02:49","date_gmt":"2024-09-10T13:02:49","guid":{"rendered":"https:\/\/17aitech.com\/?p=23320"},"modified":"2024-09-10T21:02:49","modified_gmt":"2024-09-10T13:02:49","slug":"%e5%8d%95%e4%b8%aa4090%e5%8f%af%e6%8e%a8%e7%90%86%ef%bc%8c2000%e4%ba%bf%e7%a8%80%e7%96%8f%e5%a4%a7%e6%a8%a1%e5%9e%8b%e3%80%8c%e5%a4%a9%e5%b7%a5moe%e3%80%8d%e5%bc%80%e6%ba%90","status":"publish","type":"post","link":"https:\/\/17aitech.com\/?p=23320","title":{"rendered":"\u5355\u4e2a4090\u53ef\u63a8\u7406\uff0c2000\u4ebf\u7a00\u758f\u5927\u6a21\u578b\u300c\u5929\u5de5MoE\u300d\u5f00\u6e90"},"content":{"rendered":"<p>\u6587\u7ae0\u6765\u6e90\u4e8e\u4e92\u8054\u7f51:<a href=\"https:\/\/www.jiqizhixin.com\/articles\/2024-06-04-2\" target=\"_blank\" rel=\"noopener\">\u5355\u4e2a4090\u53ef\u63a8\u7406\uff0c2000\u4ebf\u7a00\u758f\u5927\u6a21\u578b\u300c\u5929\u5de5MoE\u300d\u5f00\u6e90<\/a><\/p>\n<p>\u5728\u5927\u6a21\u578b\u6d6a\u6f6e\u4e2d\uff0c\u8bad\u7ec3\u548c\u90e8\u7f72\u6700\u5148\u8fdb\u7684\u5bc6\u96c6 LLM \u5728\u8ba1\u7b97\u9700\u6c42\u548c\u76f8\u5173\u6210\u672c\u4e0a\u5e26\u6765\u4e86\u5de8\u5927\u6311\u6218\uff0c\u5c24\u5176\u662f\u5728\u6570\u767e\u4ebf\u6216\u6570\u5343\u4ebf<mark data-type=\"concepts\" data-id=\"2e982b73-88e2-41e8-a430-f7ae5a9af4bf\">\u53c2\u6570<\/mark>\u7684\u89c4\u6a21\u4e0a\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u7a00\u758f\u6a21\u578b\uff0c\u5982\u4e13\u5bb6\u6df7\u5408\u6a21\u578b\uff08MoE\uff09\uff0c\u5df2\u7ecf\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\u3002\u8fd9\u4e9b\u6a21\u578b\u901a\u8fc7\u5c06\u8ba1\u7b97\u5206\u914d\u7ed9\u5404\u79cd\u4e13\u95e8\u7684\u5b50\u6a21\u578b\u6216\u300c\u4e13\u5bb6\u300d\uff0c\u63d0\u4f9b\u4e86\u4e00\u79cd\u7ecf\u6d4e\u4e0a\u66f4\u53ef\u884c\u7684\u66ff\u4ee3\u65b9\u6848\uff0c\u6709\u53ef\u80fd\u4ee5\u6781\u4f4e\u7684\u8d44\u6e90\u9700\u6c42\u8fbe\u5230\u751a\u81f3\u8d85\u8fc7\u5bc6\u96c6\u578b\u6a21\u578b\u7684\u6027\u80fd\u3002<\/p>\n<p>6 \u6708 3 \u65e5\uff0c\u5f00\u6e90\u5927\u6a21\u578b\u9886\u57df\u53c8\u4f20\u6765\u91cd\u8981\u6d88\u606f\uff1a<strong>\u6606\u4ed1\u4e07\u7ef4\u5ba3\u5e03\u5f00\u6e90 2 \u5343\u4ebf\u7a00\u758f\u5927\u6a21\u578b Skywork-MoE<\/strong>\uff0c\u5728\u4fdd\u6301\u6027\u80fd\u5f3a\u52b2\u7684\u540c\u65f6\uff0c\u5927\u5e45\u964d\u4f4e\u4e86\u63a8\u7406\u6210\u672c\u3002<\/p>\n<p>Skywork-MoE \u57fa\u4e8e\u6b64\u524d\u6606\u4ed1\u4e07\u7ef4\u5f00\u6e90\u7684 Skywork-13B \u6a21\u578b\u4e2d\u95f4 checkpoint \u6269\u5c55\u800c\u6765\uff0c\u662f\u9996\u4e2a\u5b8c\u6574\u5c06 MoE Upcycling \u6280\u672f\u5e94\u7528\u5e76\u843d\u5730\u7684\u5f00\u6e90\u5343\u4ebf MoE \u5927\u6a21\u578b\uff0c\u4e5f\u662f\u9996\u4e2a\u652f\u6301\u7528\u5355\u53f0 4090 \u670d\u52a1\u5668\u63a8\u7406\u7684\u5f00\u6e90\u5343\u4ebf MoE \u5927\u6a21\u578b\u3002<\/p>\n<p>\u8ba9\u5927\u6a21\u578b\u793e\u533a\u66f4\u4e3a\u5173\u6ce8\u7684\u662f\uff0cSkywork-MoE \u7684\u6a21\u578b<mark data-type=\"concepts\" data-id=\"149a12cf-10c2-4555-9899-cc6dee319ef5\">\u6743\u91cd<\/mark>\u3001\u6280\u672f\u62a5\u544a\u5b8c\u5168\u5f00\u6e90\uff0c\u514d\u8d39\u5546\u7528\uff0c\u65e0\u9700\u7533\u8bf7\u3002<\/p>\n<ul>\n<li>\n<p>\u6a21\u578b<mark data-type=\"concepts\" data-id=\"149a12cf-10c2-4555-9899-cc6dee319ef5\">\u6743\u91cd<\/mark>\u4e0b\u8f7d\u5730\u5740\uff1a<\/p>\n<\/li>\n<\/ul>\n<p>\uffee https:\/\/huggingface.co\/Skywork\/Skywork-MoE-base<\/p>\n<p>\uffee https:\/\/huggingface.co\/Skywork\/Skywork-MoE-Base-FP8<\/p>\n<ul>\n<li>\n<p>\u6a21\u578b\u5f00\u6e90\u4ed3\u5e93\uff1ahttps:\/\/github.com\/SkyworkAI\/Skywork-MoE<\/p>\n<\/li>\n<li>\n<p>\u6a21\u578b\u6280\u672f\u62a5\u544a\uff1ahttps:\/\/github.com\/SkyworkAI\/Skywork-MoE\/blob\/main\/skywork-moe-tech-report.pdf<\/p>\n<\/li>\n<li>\n<p>\u6a21\u578b\u63a8\u7406\u4ee3\u7801\uff1a\uff08\u652f\u6301 8&#215;4090 \u670d\u52a1\u5668\u4e0a 8 bit <mark data-type=\"tech_tasks\" data-id=\"e1abaa20-0000-4ff4-ad42-f97ca7fca4b1\">\u91cf\u5316<\/mark>\u52a0\u8f7d\u63a8\u7406\uff09 https:\/\/github.com\/SkyworkAI\/vllm<\/p>\n<\/li>\n<\/ul>\n<p>Skywork-MoE \u662f\u76ee\u524d\u80fd\u5728 8&#215;4090 \u670d\u52a1\u5668\u4e0a\u63a8\u7406\u7684\u6700\u5927\u7684\u5f00\u6e90 MoE \u6a21\u578b\u30028&#215;4090 \u670d\u52a1\u5668\u4e00\u5171\u6709 192GB \u7684 GPU \u663e\u5b58\uff0c\u5728 FP8 <mark data-type=\"tech_tasks\" data-id=\"e1abaa20-0000-4ff4-ad42-f97ca7fca4b1\">\u91cf\u5316<\/mark>\u4e0b\uff08weight \u5360\u7528 146GB\uff09\uff0c\u4f7f\u7528\u6606\u4ed1\u4e07\u7ef4\u56e2\u961f\u9996\u521b\u7684\u975e\u5747\u5300 Tensor Parallel \u5e76\u884c\u63a8\u7406\u65b9\u5f0f\uff0cSkywork-MoE \u53ef\u4ee5\u5728\u5408\u9002\u7684 batch size \u5185\u8fbe\u5230 2200 tokens\/s \u7684\u541e\u5410\u3002<\/p>\n<p>\u5b8c\u6574\u76f8\u5173\u7684\u63a8\u7406\u6846\u67b6\u4ee3\u7801\u548c\u5b89\u88c5\u73af\u5883\u89c1\uff1ahttps:\/\/github.com\/SkyworkAI\/Skywork-MoE<\/p>\n<p><strong>Skywork-MoE\u00a0<\/strong><strong>\u4ecb\u7ecd<\/strong><\/p>\n<p>\u672c\u6b21\u5f00\u6e90\u7684 Skywork-MoE \u6a21\u578b\u96b6\u5c5e\u4e8e\u5929\u5de5 3.0 \u7684\u7814\u53d1\u6a21\u578b\u7cfb\u5217\uff0c\u662f\u5176\u4e2d\u7684\u4e2d\u6863\u5927\u5c0f\u6a21\u578b\uff08Skywork-MoE-Medium\uff09\uff0c\u6a21\u578b\u7684\u603b<mark data-type=\"concepts\" data-id=\"2e982b73-88e2-41e8-a430-f7ae5a9af4bf\">\u53c2\u6570<\/mark>\u91cf\u4e3a 146B\uff0c\u6fc0\u6d3b<mark data-type=\"concepts\" data-id=\"2e982b73-88e2-41e8-a430-f7ae5a9af4bf\">\u53c2\u6570<\/mark>\u91cf 22B\uff0c\u5171\u6709 16 \u4e2a Expert\uff0c\u6bcf\u4e2a Expert \u5927\u5c0f\u4e3a 13B\uff0c\u6bcf\u6b21\u6fc0\u6d3b\u5176\u4e2d\u7684 2 \u4e2a Expert\u3002<\/p>\n<p>\u636e\u4e86\u89e3\uff0c\u5929\u5de5 3.0 \u8fd8\u8bad\u7ec3\u4e86 75B \uff08Skywork-MoE-Small\uff09 \u548c 400B \uff08Skywork-MoE-Large\uff09\u4e24\u6863 MoE \u6a21\u578b\uff0c\u5e76\u4e0d\u5728\u6b64\u6b21\u5f00\u6e90\u4e4b\u5217\u3002<\/p>\n<p>\u6606\u4ed1\u4e07\u7ef4\u57fa\u4e8e\u76ee\u524d\u5404\u5927\u4e3b\u6d41\u6a21\u578b\u8bc4\u6d4b\u699c\u5355\u8bc4\u6d4b\u4e86 Skywork-MoE\uff0c\u5728\u76f8\u540c\u7684\u6fc0\u6d3b<mark data-type=\"concepts\" data-id=\"2e982b73-88e2-41e8-a430-f7ae5a9af4bf\">\u53c2\u6570<\/mark>\u91cf 20B\uff08\u63a8\u7406\u8ba1\u7b97\u91cf\uff09\u4e0b\uff0cSkywork-MoE \u80fd\u529b\u5728\u884c\u4e1a\u524d\u5217\uff0c\u63a5\u8fd1 70B \u7684 Dense \u6a21\u578b\u3002\u4f7f\u5f97\u6a21\u578b\u7684\u63a8\u7406\u6210\u672c\u6709\u8fd1 3 \u500d\u7684\u4e0b\u964d\u3002<\/p>\n<p><a href=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/06\/frc-d9e3ad068712bb9483ff00cfb9179691.png\" data-fancybox=\"images\" data-fancybox=\"gallery\"><img decoding=\"async\" src=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/06\/frc-d9e3ad068712bb9483ff00cfb9179691.png\"><\/a><\/p>\n<p>\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c Skywork-MoE \u7684\u603b<mark data-type=\"concepts\" data-id=\"2e982b73-88e2-41e8-a430-f7ae5a9af4bf\">\u53c2\u6570<\/mark>\u5927\u5c0f\u6bd4 DeepSeekV2 \u7684\u603b<mark data-type=\"concepts\" data-id=\"2e982b73-88e2-41e8-a430-f7ae5a9af4bf\">\u53c2\u6570<\/mark>\u5927\u5c0f\u8981\u5c0f 1\/3\uff0c\u7528\u66f4\u5c0f\u7684<mark data-type=\"concepts\" data-id=\"2e982b73-88e2-41e8-a430-f7ae5a9af4bf\">\u53c2\u6570<\/mark>\u89c4\u6a21\u505a\u5230\u4e86\u76f8\u8fd1\u7684\u80fd\u529b\u3002<\/p>\n<p><strong>\u6280\u672f\u521b\u65b0<\/strong><\/p>\n<p>\u4e3a\u4e86\u89e3\u51b3 MoE \u6a21\u578b\u8bad\u7ec3\u56f0\u96be\uff0c\u6cdb\u5316\u6027\u80fd\u5dee\u7684\u95ee\u9898\uff0cSkywork-MoE \u8bbe\u8ba1\u4e86\u4e24\u79cd\u8bad\u7ec3\u4f18\u5316\u7b97\u6cd5\uff1a<\/p>\n<p><strong>Gating Logits \u5f52\u4e00\u5316\u64cd\u4f5c<\/strong><\/p>\n<p>Skywork-MoE \u5728 Gating Layer \u7684 token \u5206\u53d1<mark data-type=\"concepts\" data-id=\"95a97f4b-79d2-4bbc-91ae-300f074dff9f\">\u903b\u8f91<\/mark>\u5904\u65b0\u589e\u4e86\u4e00\u4e2a normalization \u64cd\u4f5c\uff0c\u4f7f\u5f97 Gating Layer \u7684<mark data-type=\"concepts\" data-id=\"2e982b73-88e2-41e8-a430-f7ae5a9af4bf\">\u53c2\u6570<\/mark>\u5b66\u4e60\u66f4\u52a0\u8d8b\u5411\u4e8e\u88ab\u9009\u4e2d\u7684 top-2 experts\uff0c\u589e\u52a0\u4e86 MoE \u6a21\u578b\u5bf9\u4e8e top-2 \u7684\u7f6e\u4fe1\u5ea6\uff1a<\/p>\n<p><a href=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/06\/frc-3d3ea979f6b081a73471267e041c615d.png\" data-fancybox=\"images\" data-fancybox=\"gallery\"><img decoding=\"async\" src=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/06\/frc-3d3ea979f6b081a73471267e041c615d.png\"><\/a><strong>\u81ea\u9002\u5e94\u7684 Aux Loss<\/strong><\/p>\n<p>\u6709\u522b\u4e8e\u4f20\u7edf\u7684\u56fa\u5b9a\u7cfb\u6570\uff08\u56fa\u5b9a\u8d85\u53c2\uff09\u7684 aux loss\uff0c Skywork-MoE \u5728 MoE \u8bad\u7ec3\u7684\u4e0d\u540c\u9636\u6bb5\u8ba9\u6a21\u578b\u81ea\u9002\u5e94\u7684\u9009\u62e9\u5408\u9002\u7684 aux loss \u8d85\u53c2\u7cfb\u6570\uff0c\u4ece\u800c\u8ba9 Drop Token Rate \u4fdd\u6301\u5728\u5408\u9002\u7684\u533a\u95f4\u5185\uff0c\u65e2\u80fd\u505a\u5230 expert \u5206\u53d1\u7684\u5e73\u8861\uff0c\u53c8\u80fd\u8ba9 expert \u5b66\u4e60\u5177\u5907\u5dee\u5f02\u5316\uff0c\u4ece\u800c\u63d0\u5347\u6a21\u578b\u6574\u4f53\u7684\u6027\u80fd\u548c\u6cdb\u5316\u6c34\u5e73\u3002\u5728 MoE \u8bad\u7ec3\u7684\u524d\u671f\uff0c\u7531\u4e8e<mark data-type=\"concepts\" data-id=\"2e982b73-88e2-41e8-a430-f7ae5a9af4bf\">\u53c2\u6570<\/mark>\u5b66\u4e60\u4e0d\u5230\u4f4d\uff0c\u5bfc\u81f4 Drop Token Rate \u592a\u9ad8\uff08token \u5206\u5e03\u5dee\u5f02\u592a\u5927\uff09\uff0c\u6b64\u65f6\u9700\u8981\u8f83\u5927\u7684 aux loss \u5e2e\u52a9 token load balance\uff1b\u5728 MoE \u8bad\u7ec3\u7684\u540e\u671f\uff0cSkywork-MoE \u56e2\u961f\u5e0c\u671b Expert \u4e4b\u95f4\u4ecd\u4fdd\u8bc1\u4e00\u5b9a\u7684\u533a\u5206\u5ea6\uff0c\u907f\u514d Gating \u503e\u5411\u4e3a\u968f\u673a\u5206\u53d1 Token\uff0c\u56e0\u6b64\u9700\u8981\u8f83\u4f4e\u7684 aux loss \u964d\u4f4e\u7ea0\u504f\u3002<\/p>\n<p><a href=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/06\/frc-e570ab1e3d5581c5940b4e921fb75410.png\" data-fancybox=\"images\" data-fancybox=\"gallery\"><img decoding=\"async\" src=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/06\/frc-e570ab1e3d5581c5940b4e921fb75410.png\"><\/a><\/p>\n<p><strong>\u8bad\u7ec3 Infra<\/strong><\/p>\n<p>\u5982\u4f55\u5bf9 MoE \u6a21\u578b\u9ad8\u6548\u7684\u8fdb\u884c\u5927\u89c4\u6a21\u5206\u5e03\u5f0f\u8bad\u7ec3\u662f\u4e00\u4e2a\u6709\u96be\u5ea6\u7684\u6311\u6218\u3002Skywork-MoE \u63d0\u51fa\u4e86\u4e24\u4e2a\u91cd\u8981\u7684\u5e76\u884c\u4f18\u5316\u8bbe\u8ba1\uff0c\u4ece\u800c\u5728\u5343\u5361\u96c6\u7fa4\u4e0a\u5b9e\u73b0\u4e86 MFU 38% \u7684\u8bad\u7ec3\u541e\u5410\uff0c\u5176\u4e2d MFU \u4ee5 22B \u7684\u6fc0\u6d3b<mark data-type=\"concepts\" data-id=\"2e982b73-88e2-41e8-a430-f7ae5a9af4bf\">\u53c2\u6570<\/mark>\u8ba1\u7b97\u7406\u8bba\u8ba1\u7b97\u91cf\u3002<\/p>\n<p><strong>Expert Data Parallel<\/strong><\/p>\n<p>\u533a\u522b\u4e8e Megatron-LM \u793e\u533a\u5df2\u6709\u7684 EP\uff08Expert Parallel\uff09\u548c ETP\uff08Expert Tensor Parallel\uff09\u8bbe\u8ba1\uff0cSkywork-MoE \u56e2\u961f\u63d0\u51fa\u4e86\u4e00\u79cd\u79f0\u4e4b\u4e3a Expert Data Parallel \u7684\u5e76\u884c\u8bbe\u8ba1\u65b9\u6848\uff0c\u8fd9\u79cd\u5e76\u884c\u65b9\u6848\u53ef\u4ee5\u5728 Expert \u6570\u91cf\u8f83\u5c0f\u65f6\u4ecd\u80fd\u9ad8\u6548\u5730\u5207\u5206\u6a21\u578b\uff0c\u5bf9 Expert \u5f15\u5165\u7684 all2all \u901a\u4fe1\u4e5f\u53ef\u4ee5\u6700\u5927\u7a0b\u5ea6\u7684\u4f18\u5316\u548c\u63a9\u76d6\u3002\u76f8\u8f83\u4e8e EP \u5bf9 GPU \u6570\u91cf\u7684\u9650\u5236\u548c ETP \u5728\u5343\u5361\u96c6\u7fa4\u4e0a\u7684\u4f4e\u6548\uff0c EDP \u53ef\u4ee5\u8f83\u597d\u7684\u89e3\u51b3\u5927\u89c4\u6a21\u5206\u5e03\u5f0f\u8bad\u7ec3 MoE \u7684\u5e76\u884c\u75db\u70b9\uff0c\u540c\u65f6 EDP \u7684\u8bbe\u8ba1\u7b80\u5355\u3001\u9c81\u68d2\u3001\u6613\u6269\u5c55\uff0c\u53ef\u4ee5\u8f83\u5feb\u7684\u5b9e\u73b0\u548c\u9a8c\u8bc1\u3002<\/p>\n<p><a href=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/06\/frc-9732e2ebce17f5885bfe15e288829b71.png\" data-fancybox=\"images\" data-fancybox=\"gallery\"><img decoding=\"async\" src=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/06\/frc-9732e2ebce17f5885bfe15e288829b71.png\"><\/a><\/p>\n<p><em><sup>\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u4e00\u4e2a\u6700\u7b80\u5355\u7684 EDP \u7684\u4f8b\u5b50\uff0c\u4e24\u5361\u60c5\u51b5\u4e0b TP = 2\uff0c EP = 2\uff0c \u5176\u4e2d Attention \u90e8\u5206\u91c7\u7528 Tensor Parallel \uff0c Expert \u90e8\u5206\u91c7\u7528 Expert Parallel<\/sup><\/em><\/p>\n<p><strong>\u975e\u5747\u5300\u5207\u5206\u6d41\u6c34\u5e76\u884c<\/strong><\/p>\n<p>\u7531\u4e8e first stage \u7684 Embedding \u8ba1\u7b97\u548c last stage \u7684 Loss \u8ba1\u7b97\uff0c\u4ee5\u53ca Pipeline Buffer \u7684\u5b58\u5728\uff0c \u6d41\u6c34\u5e76\u884c\u4e0b\u5747\u5300\u5207\u5206 Layer \u65f6\u7684\u5404 stage \u8ba1\u7b97\u8d1f\u8f7d\u548c\u663e\u5b58\u8d1f\u8f7d\u5747\u6709\u8f83\u660e\u663e\u7684\u4e0d\u5747\u8861\u60c5\u51b5\u3002Skywork-MoE \u56e2\u961f\u63d0\u51fa\u4e86\u975e\u5747\u5300\u7684\u6d41\u6c34\u5e76\u884c\u5207\u5206\u548c\u91cd\u8ba1\u7b97 Layer \u5206\u914d\u65b9\u5f0f\uff0c\u4f7f\u5f97\u603b\u4f53\u7684\u8ba1\u7b97 \/ \u663e\u5b58\u8d1f\u8f7d\u66f4\u5747\u8861\uff0c\u7ea6\u6709 10% \u5de6\u53f3\u7684\u7aef\u5230\u7aef\u8bad\u7ec3\u541e\u5410\u63d0\u5347\u3002<\/p>\n<p><a href=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/06\/frc-0219796cf2ee6417f61dd51a3a70cf13.png\" data-fancybox=\"images\" data-fancybox=\"gallery\"><img decoding=\"async\" src=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/06\/frc-0219796cf2ee6417f61dd51a3a70cf13.png\"><\/a><\/p>\n<p><em><sup>\u6bd4\u8f83\u5747\u5300\u5207\u5206\u548c\u975e\u5747\u5300\u5207\u5206\u4e0b\u7684\u6d41\u6c34\u5e76\u884c\u6c14\u6ce1\uff1a\u5bf9\u4e8e\u4e00\u4e2a 24 \u5c42 Layer \u7684 LLM\uff0c (a) \u662f\u5747\u5300\u5207\u5206\u6210 4 \u4e2a stage\uff0c\u6bcf\u4e2a stage \u00a0\u7684 layer \u6570\u91cf\u662f\uff1a[6, 6, 6, 6].(b) \u662f\u7ecf\u8fc7\u4f18\u5316\u540e\u7684\u975e\u5747\u5300\u5207\u5206\u65b9\u5f0f\uff0c\u5207\u6210 5 \u4e2a stage\uff0c \u6bcf\u4e2a stage \u7684 layer \u6570\u91cf\u662f\uff1a[5, 5, 5, 5, 4] \uff0c \u5728\u4e2d\u95f4\u6d41\u6c34\u6253\u6ee1\u7684\u9636\u6bb5\uff0c\u975e\u5747\u5300\u5207\u5206\u7684\u6c14\u6ce1\u66f4\u4f4e\u3002<\/sup><\/em><\/p>\n<p>\u6b64\u5916\uff0cSkywork-MoE \u8fd8\u901a\u8fc7\u4e00\u7cfb\u5217\u57fa\u4e8e Scaling Law \u7684\u5b9e\u9a8c\uff0c\u63a2\u7a76\u54ea\u4e9b\u7ea6\u675f\u4f1a\u5f71\u54cd Upcycling \u548c From Scratch \u8bad\u7ec3 MoE \u6a21\u578b\u7684\u597d\u574f\u3002<\/p>\n<p><a href=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/06\/frc-2f984c88fad01d2a5d5163b2fe183461.png\" data-fancybox=\"images\" data-fancybox=\"gallery\"><img decoding=\"async\" src=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/06\/frc-2f984c88fad01d2a5d5163b2fe183461.png\"><\/a><\/p>\n<p>\u4e00\u4e2a\u53ef\u4ee5\u9075\u5faa\u7684\u7ecf\u9a8c\u89c4\u5219\u662f\uff1a\u5982\u679c\u8bad\u7ec3 MoE \u6a21\u578b\u7684 FLOPs \u662f\u8bad\u7ec3 Dense \u6a21\u578b\u7684 2 \u500d\u4ee5\u4e0a\uff0c\u90a3\u4e48\u9009\u62e9 from Scratch \u8bad\u7ec3 MoE \u4f1a\u66f4\u597d\uff0c\u5426\u5219\u7684\u8bdd\uff0c\u9009\u62e9 Upcycling \u8bad\u7ec3 MoE \u53ef\u4ee5\u660e\u663e\u51cf\u5c11\u8bad\u7ec3\u6210\u672c\u3002<\/p>\n<p>\u6587\u7ae0\u6765\u6e90\u4e8e\u4e92\u8054\u7f51:<a href=\"https:\/\/www.jiqizhixin.com\/articles\/2024-06-04-2\" target=\"_blank\" rel=\"noopener\">\u5355\u4e2a4090\u53ef\u63a8\u7406\uff0c2000\u4ebf\u7a00\u758f\u5927\u6a21\u578b\u300c\u5929\u5de5MoE\u300d\u5f00\u6e90<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u6587\u7ae0\u6765\u6e90\u4e8e\u4e92\u8054\u7f51:\u5355\u4e2a4090\u53ef\u63a8\u7406\uff0c2 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[27],"tags":[],"class_list":["post-23320","post","type-post","status-publish","format-standard","hentry","category-news"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>\u5355\u4e2a4090\u53ef\u63a8\u7406\uff0c2000\u4ebf\u7a00\u758f\u5927\u6a21\u578b\u300c\u5929\u5de5MoE\u300d\u5f00\u6e90 - \u4e00\u8d77AI\u6280\u672f<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/17aitech.com\/?p=23320\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/17aitech.com\/?p=23320\",\"url\":\"https:\/\/17aitech.com\/?p=23320\",\"name\":\"\u5355\u4e2a4090\u53ef\u63a8\u7406\uff0c2000\u4ebf\u7a00\u758f\u5927\u6a21\u578b\u300c\u5929\u5de5MoE\u300d\u5f00\u6e90 - \u4e00\u8d77AI\u6280\u672f\",\"isPartOf\":{\"@id\":\"https:\/\/17aitech.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/17aitech.com\/?p=23320#primaryimage\"},\"image\":{\"@id\":\"https:\/\/17aitech.com\/?p=23320#primaryimage\"},\"thumbnailUrl\":\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/06\/frc-d9e3ad068712bb9483ff00cfb9179691.png\",\"datePublished\":\"2024-09-10T13:02:49+00:00\",\"author\":{\"@id\":\"https:\/\/17aitech.com\/#\/schema\/person\/3d23bb6f7f115fcefc9ae7803a691739\"},\"breadcrumb\":{\"@id\":\"https:\/\/17aitech.com\/?p=23320#breadcrumb\"},\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/17aitech.com\/?p=23320\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/17aitech.com\/?p=23320#primaryimage\",\"url\":\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/06\/frc-d9e3ad068712bb9483ff00cfb9179691.png\",\"contentUrl\":\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/06\/frc-d9e3ad068712bb9483ff00cfb9179691.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/17aitech.com\/?p=23320#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\u9996\u9875\",\"item\":\"https:\/\/17aitech.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"\u5355\u4e2a4090\u53ef\u63a8\u7406\uff0c2000\u4ebf\u7a00\u758f\u5927\u6a21\u578b\u300c\u5929\u5de5MoE\u300d\u5f00\u6e90\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/17aitech.com\/#website\",\"url\":\"https:\/\/17aitech.com\/\",\"name\":\"\u4e00\u8d77AI\u6280\u672f\",\"description\":\"\u8ba9AI\u77e5\u8bc6\u89e6\u624b\u53ef\u53ca\",\"alternateName\":\"\u4e00\u8d77AI\u6280\u672f\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/17aitech.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"zh-Hans\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/17aitech.com\/#\/schema\/person\/3d23bb6f7f115fcefc9ae7803a691739\",\"name\":\"Dongming\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/17aitech.com\/#\/schema\/person\/image\/\",\"url\":\"\/\/17aitech.com\/wp-content\/uploads\/member\/avatars\/238a0b923820dcc5.1732798681.jpg\",\"contentUrl\":\"\/\/17aitech.com\/wp-content\/uploads\/member\/avatars\/238a0b923820dcc5.1732798681.jpg\",\"caption\":\"Dongming\"},\"description\":\"\u89c1\u5929\u5730\uff0c\u89c1\u4f17\u751f\uff0c\u89c1\u81ea\u5df1\u3002\",\"sameAs\":[\"http:\/\/17aitech.com\"],\"url\":\"https:\/\/17aitech.com\/?page_id=33738&user=1\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"\u5355\u4e2a4090\u53ef\u63a8\u7406\uff0c2000\u4ebf\u7a00\u758f\u5927\u6a21\u578b\u300c\u5929\u5de5MoE\u300d\u5f00\u6e90 - \u4e00\u8d77AI\u6280\u672f","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/17aitech.com\/?p=23320","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/17aitech.com\/?p=23320","url":"https:\/\/17aitech.com\/?p=23320","name":"\u5355\u4e2a4090\u53ef\u63a8\u7406\uff0c2000\u4ebf\u7a00\u758f\u5927\u6a21\u578b\u300c\u5929\u5de5MoE\u300d\u5f00\u6e90 - \u4e00\u8d77AI\u6280\u672f","isPartOf":{"@id":"https:\/\/17aitech.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/17aitech.com\/?p=23320#primaryimage"},"image":{"@id":"https:\/\/17aitech.com\/?p=23320#primaryimage"},"thumbnailUrl":"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/06\/frc-d9e3ad068712bb9483ff00cfb9179691.png","datePublished":"2024-09-10T13:02:49+00:00","author":{"@id":"https:\/\/17aitech.com\/#\/schema\/person\/3d23bb6f7f115fcefc9ae7803a691739"},"breadcrumb":{"@id":"https:\/\/17aitech.com\/?p=23320#breadcrumb"},"inLanguage":"zh-Hans","potentialAction":[{"@type":"ReadAction","target":["https:\/\/17aitech.com\/?p=23320"]}]},{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/17aitech.com\/?p=23320#primaryimage","url":"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/06\/frc-d9e3ad068712bb9483ff00cfb9179691.png","contentUrl":"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/06\/frc-d9e3ad068712bb9483ff00cfb9179691.png"},{"@type":"BreadcrumbList","@id":"https:\/\/17aitech.com\/?p=23320#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\u9996\u9875","item":"https:\/\/17aitech.com\/"},{"@type":"ListItem","position":2,"name":"\u5355\u4e2a4090\u53ef\u63a8\u7406\uff0c2000\u4ebf\u7a00\u758f\u5927\u6a21\u578b\u300c\u5929\u5de5MoE\u300d\u5f00\u6e90"}]},{"@type":"WebSite","@id":"https:\/\/17aitech.com\/#website","url":"https:\/\/17aitech.com\/","name":"\u4e00\u8d77AI\u6280\u672f","description":"\u8ba9AI\u77e5\u8bc6\u89e6\u624b\u53ef\u53ca","alternateName":"\u4e00\u8d77AI\u6280\u672f","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/17aitech.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"zh-Hans"},{"@type":"Person","@id":"https:\/\/17aitech.com\/#\/schema\/person\/3d23bb6f7f115fcefc9ae7803a691739","name":"Dongming","image":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/17aitech.com\/#\/schema\/person\/image\/","url":"\/\/17aitech.com\/wp-content\/uploads\/member\/avatars\/238a0b923820dcc5.1732798681.jpg","contentUrl":"\/\/17aitech.com\/wp-content\/uploads\/member\/avatars\/238a0b923820dcc5.1732798681.jpg","caption":"Dongming"},"description":"\u89c1\u5929\u5730\uff0c\u89c1\u4f17\u751f\uff0c\u89c1\u81ea\u5df1\u3002","sameAs":["http:\/\/17aitech.com"],"url":"https:\/\/17aitech.com\/?page_id=33738&user=1"}]}},"_links":{"self":[{"href":"https:\/\/17aitech.com\/index.php?rest_route=\/wp\/v2\/posts\/23320","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/17aitech.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/17aitech.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/17aitech.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/17aitech.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=23320"}],"version-history":[{"count":0,"href":"https:\/\/17aitech.com\/index.php?rest_route=\/wp\/v2\/posts\/23320\/revisions"}],"wp:attachment":[{"href":"https:\/\/17aitech.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=23320"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/17aitech.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=23320"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/17aitech.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=23320"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}