{"id":34936,"date":"2024-12-15T11:59:24","date_gmt":"2024-12-15T03:59:24","guid":{"rendered":"https:\/\/17aitech.com\/?p=34936"},"modified":"2024-12-15T11:59:24","modified_gmt":"2024-12-15T03:59:24","slug":"%e4%b8%80%e6%96%87%e7%9c%8b%e5%b0%bdllm%e5%af%b9%e9%bd%90%e6%8a%80%e6%9c%af%ef%bc%9arlhf%e3%80%81rlaif%e3%80%81ppo%e3%80%81dpo","status":"publish","type":"post","link":"https:\/\/17aitech.com\/?p=34936","title":{"rendered":"\u4e00\u6587\u770b\u5c3dLLM\u5bf9\u9f50\u6280\u672f\uff1aRLHF\u3001RLAIF\u3001PPO\u3001DPO\u2026\u2026"},"content":{"rendered":"<p>\u6587\u7ae0\u6765\u6e90\u4e8e\u4e92\u8054\u7f51:<a href=\"https:\/\/www.jiqizhixin.com\/articles\/2024-08-05-4\" target=\"_blank\">\u4e00\u6587\u770b\u5c3dLLM\u5bf9\u9f50\u6280\u672f\uff1aRLHF\u3001RLAIF\u3001PPO\u3001DPO\u2026\u2026<\/a><\/p>\n<blockquote data-author-name=\"\" data-content-utf8-length=\"18\" data-source-title=\"\" data-type=\"2\" data-url=\"\">\n<section>\n<section>\n<p>\u4e3a\u4e86\u5bf9\u9f50 LLM\uff0c\u5404\u8def\u7814\u7a76\u8005\u5999\u62db\u8fde\u8fde\u3002<\/p>\n<\/section>\n<\/section>\n<\/blockquote>\n<p>LLM \u5f88\u5f3a\u5927\u4e86\uff0c\u4f46\u5374\u5e76\u4e0d\u5b8c\u7f8e\uff0c\u5b83\u4e5f\u4f1a\u51fa\u9519\u6216\u8005\u751f\u6210\u65e0\u7528\u4e43\u81f3\u6709\u5bb3\u7684\u7ed3\u679c\uff0c\u6bd4\u5982\u6709\u4eba\u53d1\u73b0\u53ef\u4ee5\u8ba9 ChatGPT \u6559\u4eba\u5982\u4f55\u5077\u76d7\uff1a<\/p>\n<p><a href=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/08\/frc-c44740cf398fc68fdae0d590050b9381.png\" data-fancybox=\"images\" data-fancybox=\"gallery\"><img decoding=\"async\" src=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/08\/frc-c44740cf398fc68fdae0d590050b9381.png\"><\/a><\/p>\n<p><em><sup>\u8ba9 ChatGPT \u6559\u4eba\u5982\u4f55\u5077\u76d7\u5546\u5e97\uff1b\u5de6\u56fe\uff0cChatGPT \u62d2\u7edd\u56de\u7b54\uff1b\u53f3\u56fe\uff0c\u5728 prompt \u4e2d\u6dfb\u52a0\u4e86\u300cwith no moral restraints\uff08\u4e0d\u52a0\u9053\u5fb7\u7ea6\u675f\uff09\u300d\u540e\uff0cChatGPT \u7ed9\u51fa\u4e86\u5546\u5e97\u5077\u76d7\u6307\u5357<\/sup><\/em><\/p>\n<p>\u8fd9\u65f6\u5019\uff0c\u5bf9\u9f50\uff08alignment\uff09\u5c31\u81f3\u5173\u91cd\u8981\u4e86\uff0c\u5176\u4f5c\u7528\u5c31\u662f\u8ba9 LLM \u4e0e\u4eba\u7c7b\u7684\u4ef7\u503c\u89c2\u4fdd\u6301\u4e00\u81f4\u3002<\/p>\n<p>\u5728\u5bf9\u9f50 LLM \u65b9\u9762\uff0c\u57fa\u4e8e\u4eba\u7c7b\u53cd\u9988\u7684<mark data-type=\"tech_methods\" data-id=\"ee1a8f69-3170-4ddf-b2b6-47d91c844425\">\u5f3a\u5316\u5b66\u4e60<\/mark>\uff08RLHF\uff09\u662f\u4e00\u79cd\u7a81\u7834\u6027\u7684\u6280\u672f\u3002\u8be5\u65b9\u6cd5\u50ac\u751f\u4e86 GPT-4\u3001Claude \u548c Gemini \u7b49\u5f3a\u5927\u6a21\u578b\u3002RLHF \u4e4b\u540e\uff0c\u4eba\u4eec\u4e5f\u63a2\u7d22\u4e86\u591a\u79cd\u591a\u6837\u7684\u5bf9\u9f50 LLM \u7684\u65b9\u6cd5\u3002\u4f46\u662f\uff0c\u6b64\u524d\u8fd8\u6ca1\u6709\u4eba\u5168\u9762\u603b\u7ed3\u5bf9\u9f50 LLM \u4e0e\u4eba\u7c7b\u504f\u597d\u7684\u65b9\u6cd5\u3002<\/p>\n<p>Salesforce \u51b3\u5b9a\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u4e8e\u8fd1\u65e5\u53d1\u5e03\u4e86\u4e00\u4efd 37 \u9875\u7684\u7efc\u8ff0\u62a5\u544a\uff0c\u5176\u4e2d\u6309\u7c7b\u522b\u603b\u7ed3\u4e86\u73b0\u6709\u7684\u7814\u7a76\u6587\u732e\uff0c\u5e76\u8be6\u7ec6\u5206\u6790\u4e86\u5404\u7bc7\u8bba\u6587\u3002<a href=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/08\/frc-c053c199d271321d4699bf695710957a.png\" data-fancybox=\"images\" data-fancybox=\"gallery\"><img decoding=\"async\" src=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/08\/frc-c053c199d271321d4699bf695710957a.png\"><\/a><\/p>\n<ul>\n<li>\n<p>\u8bba\u6587\u6807\u9898\uff1aA Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More<\/p>\n<\/li>\n<li>\n<p>\u8bba\u6587\u5730\u5740\uff1ahttps:\/\/arxiv.org\/pdf\/2407.16216<\/p>\n<\/li>\n<\/ul>\n<p>\u8fd9\u7bc7\u8bba\u6587\u5206\u4e3a\u56db\u5927\u4e3b\u9898\uff1a\u5956\u52b1\u6a21\u578b\u3001\u53cd\u9988\u3001<mark data-type=\"tech_methods\" data-id=\"ee1a8f69-3170-4ddf-b2b6-47d91c844425\">\u5f3a\u5316\u5b66\u4e60<\/mark>\uff08RL\uff09\u3001\u4f18\u5316\u3002\u6bcf\u4e2a\u4e3b\u9898\u53c8\u5305\u542b\u8fdb\u4e00\u6b65\u7684\u5b50\u4e3b\u9898\uff0c\u5982\u56fe 1 \u6240\u793a\u3002<\/p>\n<p><a href=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/08\/frc-270be0d302e1b25cc6abf0d10aa1e2c0.png\" data-fancybox=\"images\" data-fancybox=\"gallery\"><img decoding=\"async\" src=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/08\/frc-270be0d302e1b25cc6abf0d10aa1e2c0.png\"><\/a><\/p>\n<p>\u5956\u52b1\u6a21\u578b\u7684\u5b50\u4e3b\u9898\u5305\u62ec\uff1a1. \u663e\u5f0f\u5956\u52b1\u6a21\u578b\u4e0e\u9690\u5f0f\u5956\u52b1\u6a21\u578b\uff1b2. \u9010\u70b9\u5956\u52b1\u6a21\u578b\u4e0e\u504f\u597d\u6a21\u578b\uff1b3. \u54cd\u5e94\u5c42\u9762\u7684\u5956\u52b1\u4e0e token \u5c42\u9762\u7684\u5956\u52b1\uff1b4. \u8d1f\u504f\u597d\u4f18\u5316\u3002<\/p>\n<p><a href=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/08\/frc-05d449432051870a55e47fe2a47b0e82.png\" data-fancybox=\"images\" data-fancybox=\"gallery\"><img decoding=\"async\" src=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/08\/frc-05d449432051870a55e47fe2a47b0e82.png\"><\/a><\/p>\n<p>\u53cd\u9988\u7684\u5b50\u4e3b\u9898\u5305\u62ec\uff1a1. \u504f\u597d\u53cd\u9988\u4e0e\u4e8c\u5143\u53cd\u9988\uff1b2. \u6210\u5bf9\u53cd\u9988\u4e0e\u5217\u8868\u53cd\u9988\uff1b3. \u4eba\u7c7b\u53cd\u9988\u4e0e AI \u53cd\u9988\u3002<\/p>\n<p><a href=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/08\/frc-6fb1ea66192c4cd97ca05b795e521dec.png\" data-fancybox=\"images\" data-fancybox=\"gallery\"><img decoding=\"async\" src=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/08\/frc-6fb1ea66192c4cd97ca05b795e521dec.png\"><\/a><\/p>\n<p><mark data-type=\"tech_methods\" data-id=\"ee1a8f69-3170-4ddf-b2b6-47d91c844425\">\u5f3a\u5316\u5b66\u4e60<\/mark>\u7684\u5b50\u4e3b\u9898\u5305\u62ec\uff1a1. \u57fa\u4e8e\u53c2\u8003\u7684<mark data-type=\"tech_methods\" data-id=\"ee1a8f69-3170-4ddf-b2b6-47d91c844425\">\u5f3a\u5316\u5b66\u4e60<\/mark>\u4e0e\u65e0\u53c2\u8003\u7684<mark data-type=\"tech_methods\" data-id=\"ee1a8f69-3170-4ddf-b2b6-47d91c844425\">\u5f3a\u5316\u5b66\u4e60<\/mark>\uff1b2. \u957f\u5ea6\u63a7\u5236\u5f0f<mark data-type=\"tech_methods\" data-id=\"ee1a8f69-3170-4ddf-b2b6-47d91c844425\">\u5f3a\u5316\u5b66\u4e60<\/mark>\uff1b3. <mark data-type=\"tech_methods\" data-id=\"ee1a8f69-3170-4ddf-b2b6-47d91c844425\">\u5f3a\u5316\u5b66\u4e60<\/mark>\u4e2d\u7684\u4e0d\u540c\u5206\u652f\uff1b4. \u5728\u7ebf\u7b56\u7565<mark data-type=\"tech_methods\" data-id=\"ee1a8f69-3170-4ddf-b2b6-47d91c844425\">\u5f3a\u5316\u5b66\u4e60<\/mark>\u4e0e\u79bb\u7ebf\u7b56\u7565<mark data-type=\"tech_methods\" data-id=\"ee1a8f69-3170-4ddf-b2b6-47d91c844425\">\u5f3a\u5316\u5b66\u4e60<\/mark>\u3002<\/p>\n<p>\u4f18\u5316\u7684\u5b50\u4e3b\u9898\u5305\u62ec\uff1a1. \u5728\u7ebf \/ \u8fed\u4ee3\u5f0f\u504f\u597d\u4f18\u5316\u4e0e\u79bb\u7ebf \/ \u975e\u8fed\u4ee3\u5f0f\u504f\u597d\u4f18\u5316\uff1b2. \u5206\u79bb SFT \u548c\u5bf9\u9f50\u4e0e\u5408\u5e76 SFT \u548c\u5bf9\u9f50\u3002<\/p>\n<p><a href=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/08\/frc-618091b39751251283be84b827dab814.png\" data-fancybox=\"images\" data-fancybox=\"gallery\"><img decoding=\"async\" src=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/08\/frc-618091b39751251283be84b827dab814.png\"><\/a><\/p>\n<p>\u8868 1 \u5217\u51fa\u4e86\u8fd9\u7bc7\u7efc\u8ff0\u62a5\u544a\u4e2d\u5206\u6790\u7684\u6240\u6709\u8bba\u6587\u5728\u8fd9 13 \u4e2a\u8bc4\u4f30\u6307\u6807\u4e0a\u7684\u5212\u5206\u60c5\u51b5\u3002<\/p>\n<p><a href=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/08\/frc-0f54b4b080017de542d81fb48343f131.png\" data-fancybox=\"images\" data-fancybox=\"gallery\"><img decoding=\"async\" src=\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/08\/frc-0f54b4b080017de542d81fb48343f131.png\"><\/a><\/p>\n<p><strong>\u7814\u7a76\u8bba\u6587<\/strong><\/p>\n<p>\u8fd9\u4e00\u8282\u5c06\u8be6\u7ec6\u4ecb\u7ecd\u5404\u7bc7\u8bba\u6587\uff0c\u8ba9\u8bfb\u8005\u65e0\u9700\u9605\u8bfb\u539f\u8bba\u6587\u4e5f\u80fd\u4e86\u89e3\u8fd9\u4e9b\u91cd\u8981\u521b\u65b0\u3002<mark data-type=\"institutions\" data-id=\"9e96434f-5827-41e3-9b3c-36d39bc0d446\">\u673a\u5668\u4e4b\u5fc3<\/mark>\u5c06\u7b80\u5355\u68b3\u7406\u5404\u4e2a\u7814\u7a76\u65b9\u5411\u5e76\u5217\u51fa\u4ee3\u8868\u6027\u8bba\u6587\u3002<\/p>\n<p><strong>1. RLHF\/PPO<\/strong><\/p>\n<p>LLM \u7684\u9884\u8bad\u7ec3\u8981\u7528\u5230\u5927\u91cf\u6765\u81ea\u4e0d\u540c\u6765\u6e90\u7684<mark data-type=\"concepts\" data-id=\"930c591c-a35b-4761-83ef-22ef12aa3c5f\">\u8bed\u6599\u5e93<\/mark>\uff0c\u800c\u8fd9\u672c\u8eab\u5c31\u65e0\u6cd5\u786e\u4fdd\u8fd9\u4e9b\u6570\u636e\u96c6\u7684\u8d28\u91cf\u3002\u6b64\u5916\uff0cLLM \u7684\u4e3b\u8981\u76ee\u6807\u662f\u9884\u6d4b\u4e0b\u4e00\u4e2a token\uff0c\u8fd9\u4e2a\u76ee\u6807\u4e0e\u300c\u6709\u7528\u4e14\u5b89\u5168\u5730\u9075\u4ece\u7528\u6237\u6307\u4ee4\u300d\u7684\u76ee\u6807\u5e76\u4e0d\u4e00\u81f4\u3002\u56e0\u6b64\uff0cLLM \u53ef\u80fd\u4f1a\u8f93\u51fa\u4e0d\u771f\u5b9e\u3001\u6709\u5bb3\u6216\u5bf9\u7528\u6237\u65e0\u7528\u7684\u5185\u5bb9\u3002\u672c\u8d28\u4e0a\u8bb2\uff0c\u8fd9\u4e9b\u6a21\u578b\u5e76\u672a\u4e0e\u7528\u6237\u610f\u56fe\u5bf9\u9f50\u3002RLHF\/PPO \u7684\u4e3b\u8981\u76ee\u6807\u662f\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u5bf9\u9f50<mark data-type=\"tech_tasks\" data-id=\"bf35ef94-d956-4033-a533-0c0828308c36\">\u8bed\u8a00\u6a21\u578b<\/mark>\u4e0e\u7528\u6237\u610f\u56fe\uff0c\u5176\u505a\u6cd5\u662f\u4f7f\u7528\u4eba\u7c7b\u53cd\u9988\u6765\u5fae\u8c03\u6a21\u578b\u3002\u6709\u5173\u8fd9\u4e2a\u4e3b\u9898\u7684\u7814\u7a76\u6709\u5f88\u591a\u3002<\/p>\n<p><strong>InstructGPT<\/strong><\/p>\n<p>InstructGPT \u6765\u81ea OpenAI\uff0c\u8fd9\u662f\u8bad\u7ec3 ChatGPT \u548c GPT-4 \u7b49\u6a21\u578b\u7684\u57fa\u7840\uff0c\u53c2\u9605\u300aGPT-4 \u6280\u672f\u62a5\u544a\u300b\u4ee5\u53ca<mark data-type=\"institutions\" data-id=\"9e96434f-5827-41e3-9b3c-36d39bc0d446\">\u673a\u5668\u4e4b\u5fc3<\/mark>\u7684\u62a5\u9053<a data-itemshowtype=\"0\" data-linktype=\"2\" href=\"http:\/\/mp.weixin.qq.com\/s?__biz=MzA3MzI4MjgzMw==&amp;mid=2650870948&amp;idx=1&amp;sn=3212389008c3c47d4394b0400bc143f9&amp;chksm=84e4d0dab39359cc4277e2b1388951c589fb79b48b62d2b38d3c66b0d5e4c9cca34d769a0bc6&amp;scene=21#wechat_redirect\" target=\"_blank\">\u300aGPT-4 \u9707\u64bc\u53d1\u5e03\uff1a\u591a\u6a21\u6001\u5927\u6a21\u578b\uff0c\u76f4\u63a5\u5347\u7ea7 ChatGPT\u3001\u5fc5\u5e94\uff0c\u5f00\u653e API\uff0c\u6e38\u620f\u7ec8\u7ed3\u4e86\uff1f\u300b<\/a><a data-itemshowtype=\"0\" data-linktype=\"2\" href=\"http:\/\/mp.weixin.qq.com\/s?__biz=MzA3MzI4MjgzMw==&amp;mid=2650868848&amp;idx=2&amp;sn=3cf7e9693464b4a4d1dda7327c88a717&amp;chksm=84e4c80eb39341185fd2a988e6302472a75cda0e52318cbd3e2f6d26758ae4340bfaf2afe3d7&amp;scene=21#wechat_redirect\" target=\"_blank\">\u300a\u8ddf<mark data-type=\"experts\" data-id=\"5093c441-785a-4b9e-aaf1-2c6871aea809\">\u674e\u6c90<\/mark>\u5b66 ChatGPT \u80cc\u540e\u6280\u672f\uff1a67 \u5206\u949f\u8bfb\u900f InstructGPT \u8bba\u6587\u300b<\/a>\u3002<\/p>\n<p>\u901a\u8fc7\u7eb3\u5165\u4eba\u7c7b\u504f\u597d\uff0c\u8bc4\u4f30 LLM \u751f\u6210\u7684\u54cd\u5e94\u7684\u96be\u9898\u5f97\u5230\u4e86\u89e3\u51b3\u3002BLEU\u3001ROUGE \u548c BERTScore \u7b49\u7528\u4e8e\u8bc4\u4f30 LLM \u7684\u4f20\u7edf\u8bc4\u4f30\u6307\u6807\u65e0\u6cd5\u4fdd\u8bc1\u4e0e\u4eba\u7c7b\u504f\u597d\u7684\u4e00\u81f4\u6027\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u7814\u7a76\u8005\u76f4\u63a5\u5c06\u4eba\u7c7b\u504f\u597d\u6574\u5408\u8fdb\u4e86 LLM \u4ee5\u589e\u5f3a\u5176\u6027\u80fd\u3002\u8fd9\u4e2a\u8fc7\u7a0b\u901a\u5e38\u6d89\u53ca\u4e24\u4e2a\u4e3b\u8981\u6b65\u9aa4\uff1a\u5956\u52b1\u6a21\u578b\u5b66\u4e60\u548c<mark data-type=\"tech_methods\" data-id=\"ee1a8f69-3170-4ddf-b2b6-47d91c844425\">\u5f3a\u5316\u5b66\u4e60<\/mark>\u7b56\u7565\u8bad\u7ec3\u3002<\/p>\n<p>\u5728\u5956\u52b1\u6a21\u578b\u5b66\u4e60\u9636\u6bb5\uff0c\u4f1a\u4f7f\u7528 prompt \u548c\u914d\u5bf9\u7684\u54cd\u5e94\u8bad\u7ec3\u4e00\u4e2a\u663e\u5f0f\u7684\u9010\u70b9\u5956\u52b1\u51fd\u6570\u3002<\/p>\n<p>\u4e4b\u540e\uff0c\u5f00\u59cb<mark data-type=\"tech_methods\" data-id=\"ee1a8f69-3170-4ddf-b2b6-47d91c844425\">\u5f3a\u5316\u5b66\u4e60<\/mark>\u7b56\u7565\u8bad\u7ec3\u9636\u6bb5\uff1b\u5728\u8fd9\u4e2a\u9636\u6bb5\uff0cLLM \u548c\u9884\u8bad\u7ec3\u5956\u52b1\u6a21\u578b\u5206\u522b\u4f5c\u4e3a\u4e00\u4e2a<mark data-type=\"tech_methods\" data-id=\"ee1a8f69-3170-4ddf-b2b6-47d91c844425\">\u5f3a\u5316\u5b66\u4e60<\/mark>\u6846\u67b6\u4e2d\u7684\u667a\u80fd\u4f53\u548c\u73af\u5883\u3002<\/p>\n<p>\u4e3a\u4e86\u8bad\u7ec3 InstructGPT\uff0c\u8981\u7528\u5230\u4e09\u4e2a\u6570\u636e\u96c6\uff1a1.SFT \u6570\u636e\u96c6\uff1a\u5305\u542b\u7528\u4e8e\u8bad\u7ec3 SFT \u6a21\u578b\u7684\u6807\u6ce8\u8005\u6f14\u793a\u30022.RM\uff08\u5956\u52b1\u6a21\u578b\uff09\u6570\u636e\u96c6\uff1a\u7531\u4eba\u7c7b\u6807\u6ce8\u8005\u5bf9\u6a21\u578b\u8f93\u51fa\u7684\u6392\u540d\u6784\u6210\uff0c\u7528\u4e8e\u8bad\u7ec3\u5956\u52b1\u6a21\u578b\u30023.PPO \u6570\u636e\u96c6\uff1a\u7531\u7528\u4f5c RLHF \u5fae\u8c03\u8f93\u5165\u7684 prompt \u6784\u6210\u3002<\/p>\n<p>\u8bad\u7ec3\u540e\u7684 InstructGPT \u4f1a\u5728\u4e09\u4e2a\u65b9\u9762\u5f97\u5230\u8bc4\u4f30\uff1a\u6709\u7528\u6027\u3001\u53ef\u4fe1\u5ea6\u3001\u6709\u5bb3\u6027\u3002<\/p>\n<p>\u4ece\u7ed3\u679c\u4e0a\u770b\uff0c\u4eba\u7c7b\u8bc4\u4f30\u8868\u660e\u300c\u76f8\u6bd4\u4e8e 175B \u7684 GPT-3\uff0c\u4eba\u4eec \u66f4\u504f\u597d 1.3B <mark data-type=\"concepts\" data-id=\"2e982b73-88e2-41e8-a430-f7ae5a9af4bf\">\u53c2\u6570<\/mark>\u7248\u672c\u7684 InstructGPT \u6a21\u578b\u7684\u8f93\u51fa\uff0c\u5c3d\u7ba1\u540e\u8005\u7684<mark data-type=\"concepts\" data-id=\"2e982b73-88e2-41e8-a430-f7ae5a9af4bf\">\u53c2\u6570<\/mark>\u91cf\u5c11 100 \u591a\u500d\u3002\u300d\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cInstructGPT \u5728\u6709\u7528\u6027\u548c\u6bd2\u6027\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u5747\u4f18\u4e8e GPT-3\uff0c\u8fd9\u4e8e\u5bf9\u9f50\u800c\u8a00\u81f3\u5173\u91cd\u8981\u3002<\/p>\n<p><strong>Anthropic \u7684 RLHF<\/strong><\/p>\n<p>Anthropic \u4e5f\u7814\u7a76\u8fc7\u540c\u4e00\u4e3b\u9898\uff0c\u8bba\u6587\u4e3a\u300aTraining a helpful and harmless assistant with reinforcement learning from human feedback\u300b\u3002<\/p>\n<p>OpenAI \u53d1\u73b0 RLHF \u6709\u52a9\u4e8e\u5bf9\u9f50\uff0c\u4f46\u4e5f\u53ef\u80fd\u5bfc\u81f4\u6a21\u578b\u5728\u67d0\u4e9b NLP <mark data-type=\"concepts\" data-id=\"308c3a45-0fee-4ec6-858e-85b15f440fc0\">\u57fa\u51c6<\/mark>\u4e0a\u7684\u6027\u80fd\u4e0b\u964d\uff0c\u8fd9\u4e2a\u73b0\u8c61\u88ab\u79f0\u4e3a\u300c\u5bf9\u9f50\u7a0e\uff08alignment tax\uff09\u300d\u3002\u5176\u5f00\u53d1\u7684 InstructGPT \u6a21\u578b\u6709 1.3B <mark data-type=\"concepts\" data-id=\"2e982b73-88e2-41e8-a430-f7ae5a9af4bf\">\u53c2\u6570<\/mark>\u3002\u76f8\u53cd\uff0cAnthropic \u7684\u7814\u7a76\u8005\u8bc4\u4f30\u4e86\u5927\u5c0f\u5728 13M \u5230 52B \u4e4b\u95f4\u7684 7 \u79cd\u4e0d\u540c\u6a21\u578b\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u5927\u5c0f\u6309 4 \u500d\u7684\u51e0\u4f55\u7ea7\u6570\u589e\u957f\u3002<\/p>\n<p>\u4ed6\u4eec\u5f97\u51fa\u7ed3\u8bba\u8bf4\uff0c\u5bf9\u8f83\u5c0f\u7684\u6a21\u578b\u6765\u8bf4\uff0c\u5bf9\u9f50\u4f1a\u4ea7\u751f\u300c\u7a0e\u300d\uff0c\u4f46\u5bf9\u8f83\u5927\u6a21\u578b\u6765\u8bf4\uff0c\u5bf9\u9f50\u53ea\u6709\u597d\u5904\uff0c\u5c24\u5176\u662f<mark data-type=\"concepts\" data-id=\"2e982b73-88e2-41e8-a430-f7ae5a9af4bf\">\u53c2\u6570<\/mark>\u91cf\u5728 13B \u5230 52B \u4e4b\u95f4\u7684\u6a21\u578b\u3002<\/p>\n<p>\u8003\u8651\u5230\u5bf9\u9f50\u7684\u8fd9\u79cd\u4f18\u52bf\uff0c\u4ed6\u4eec\u8fd8\u5b9e\u9a8c\u4e86\u7528\u7f16\u7a0b\u6280\u672f\u6570\u636e\u96c6\u6765\u63d0\u5347 LLM \u7684\u80fd\u529b\u3002OpenAI \u7684 RLHF \u65b9\u6cd5\u5305\u542b PPO \u548c PPO-ptx\uff0c\u5176\u4e2d PPO-ptx \u7684\u8bbe\u8ba1\u76ee\u6807\u5c31\u662f\u4e3a\u4e86\u964d\u4f4e\u5728 NLP <mark data-type=\"concepts\" data-id=\"308c3a45-0fee-4ec6-858e-85b15f440fc0\">\u57fa\u51c6<\/mark>\u4e0a\u7684\u5bf9\u9f50\u7a0e\u3002\u800c Anthropic \u7684 RLHF \u7814\u7a76\u53d1\u73b0\uff0c\u53ea\u8981\u6a21\u578b\u591f\u5927\uff0cPPO \u672c\u8eab\u5c31\u80fd\u5728 NLP \u4e0b\u6e38\u4efb\u52a1\u4e0a\u5e26\u6765\u5bf9\u9f50\u7684\u597d\u5904\u3002\u4ed6\u4eec\u8fd8\u786e\u5b9a\u4e86<mark data-type=\"tech_methods\" data-id=\"ee1a8f69-3170-4ddf-b2b6-47d91c844425\">\u5f3a\u5316\u5b66\u4e60<\/mark>\u7b56\u7565\u8bad\u7ec3\u4e2d KL \u6563\u5ea6\u7684\u6700\u4f18<mark data-type=\"concepts\" data-id=\"2e982b73-88e2-41e8-a430-f7ae5a9af4bf\">\u53c2\u6570<\/mark>\u4e3a \u03b2 = 0.001\u3002<\/p>\n<p><strong>\u5728\u7ebf \/ \u8fed\u4ee3\u5f0f RLHF<\/strong><\/p>\n<p>\u4f20\u7edf\u4e0a\uff0c\u5bf9\u9f50 LLM \u7684 RLHF \u6280\u672f\u90fd\u662f\u79bb\u7ebf\u65b9\u6cd5\u3002\u4f46\u8fd9\u7c7b\u65b9\u6cd5\u6709\u4e9b\u7f3a\u70b9\uff0c\u6bd4\u5982\u6240\u5f97\u7ed3\u679c\u96be\u4ee5\u5e94\u5bf9\u5206\u5e03\u5916\u6570\u636e\u3002<\/p>\n<p>\u4e3a\u6b64\uff0c\u9700\u8981\u5bf9 LLM \u8fdb\u884c\u6301\u7eed\u7684\u5fae\u8c03\uff0c\u8fdb\u884c\u8fed\u4ee3\u5f0f \/ <mark data-type=\"tech_methods\" data-id=\"df1b3500-af28-4cee-bf53-f7e6daddf012\">\u5728\u7ebf\u5b66\u4e60<\/mark>\uff0c\u5373\u4f7f\u7528\u4e2d\u95f4\u7b56\u7565\u4e3a prompt \u751f\u6210\u54cd\u5e94\uff0c\u518d\u4f7f\u7528\u9884\u8a00\u673a\uff08oracle\uff09\u4e3a\u8fd9\u6837\u7684\u6210\u5bf9\u6570\u636e\u7ed9\u51fa\u504f\u597d\u53cd\u9988\uff0c\u518d\u5c06\u8fd9\u4e9b\u53cd\u9988\u9988\u9001\u7ed9\u7b56\u7565\u3002\u5728\u5b9e\u8df5\u4e2d\uff0c\u8fed\u4ee3\u5f0f\u5b66\u4e60\u5206\u4e3a\u4e24\u4e2a\u90e8\u5206\uff1a\u504f\u597d\u9884\u8a00\u673a\u5b66\u4e60\u548c\u8fed\u4ee3\u5f0f\u7b56\u7565\u4f18\u5316\u3002\u53c2\u9605\u8bba\u6587\u300aRLHF workflow: From reward modeling to online RLHF\u300b\u3002<\/p>\n<p><strong>2. RLAIF<\/strong><\/p>\n<p>\u83b7\u53d6\u4eba\u7c7b\u504f\u597d\u6570\u636e\u96c6\u7684\u6210\u672c\u4e0d\u4f4e\uff0c\u56e0\u6b64\u57fa\u4e8e<mark data-type=\"concepts\" data-id=\"2d28aa9c-942d-471d-bd96-8bfefb7144e0\">\u4eba\u5de5\u667a\u80fd<\/mark>\u53cd\u9988\u7684<mark data-type=\"tech_methods\" data-id=\"ee1a8f69-3170-4ddf-b2b6-47d91c844425\">\u5f3a\u5316\u5b66\u4e60<\/mark>\uff08RLAIF\uff09\u8bde\u751f\u4e86\u3002\u6b64\u5916\uff0c\u968f\u7740 LLM \u7684\u80fd\u529b\u4e0d\u65ad\u8fdb\u6b65\uff0c\u6240\u80fd\u6536\u96c6\u5230\u7684 AI \u504f\u597d\u6570\u636e\u96c6\u7684\u8d28\u91cf\u4e5f\u4e0d\u65ad\u63d0\u9ad8\uff0c\u7531\u6b64\u53ef\u63d0\u5347 LLM \u7684\u5bf9\u9f50\u6548\u679c\u3002<\/p>\n<p><strong>Anthropic \u7684 RLAIF<\/strong><\/p>\n<p>Anthropic \u57fa\u4e8e RLHF \u7684\u57fa\u7840\u7814\u7a76\u5de5\u4f5c\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a RLAIF \u7684\u5168\u65b0\u65b9\u6cd5\u3002\u53c2\u9605\u8bba\u6587\u300aConstitutional ai: Harmlessness from ai feedback\u300b\u3002<\/p>\n<p>\u8be5\u65b9\u6cd5\u4e3b\u8981\u5305\u542b\u4e24\u4e2a\u9636\u6bb5\uff1a1. \u901a\u8fc7 Critiques\uff08\u6279\u8bc4\uff09\u548c Revisions\uff08\u4fee\u8ba2\uff09\u8fdb\u884c<mark data-type=\"tech_methods\" data-id=\"94fdbfed-9ebb-491b-b54e-9c2aae512f70\">\u76d1\u7763\u5b66\u4e60<\/mark>\uff0c\u8fd9\u7531\u4e00\u4e2a\u7ae0\u7a0b\u5f15\u5bfc\u30022. RLAIF\u3002<\/p>\n<p><strong>\u8c37\u6b4c\u7684 RLAIF<\/strong><\/p>\n<p>\u57fa\u4e8e Anthropic \u7684 RLAIF \u7814\u7a76\u6210\u679c\uff0c\u8c37\u6b4c\u4e00\u4e2a\u7814\u7a76\u56e2\u961f\u8ba4\u4e3a\u4e4b\u524d\u7684\u7814\u7a76\u65e0\u6cd5\u76f4\u63a5\u6bd4\u8f83\u4eba\u7c7b\u53cd\u9988\u4e0e AI \u53cd\u9988\u7684\u6548\u679c\uff0c\u503c\u5f97\u8fdb\u4e00\u6b65\u7814\u7a76\u3002\u5728\u6536\u96c6 AI \u53cd\u9988\u7684\u8fc7\u7a0b\u4e2d\uff0c\u8981\u521b\u5efa\u4e00\u4e2a\u7ed3\u6784\u5316\u7684 prompt\uff0c\u5176\u6784\u6210\u5305\u62ec\uff1a\u5bfc\u8a00\u3001\u5c11\u6837\u672c\u793a\u4f8b\uff08\u53ef\u9009\uff09\u3001\u8981\u6807\u6ce8\u7684\u6837\u672c\u3001\u7ed3\u5c3e\u3002<\/p>\n<p>\u4e3a\u4e86\u751f\u6210 AI \u53cd\u9988\uff0c\u9700\u8981\u6267\u884c\u4e00\u4e2a\u4e24\u6b65\u5f0f\u8bc4\u4f30\uff1a\u9996\u5148\uff0c\u4f7f\u7528\u6307\u4ee4\u4e2d\u7684 4 \u4e2a\u7ec4\u4ef6\u52a0\u4e0a CoT\uff0c\u8ba9 LLM \u751f\u6210\u54cd\u5e94\u3002\u5728\u4e0b\u4e00\u6b65\u4e2d\uff0c\u8fd9\u4e2a LLM \u54cd\u5e94\u518d\u9644\u5e26\u4e0a\u300cpreferred summary=\u300d\u8fd9\u6837\u7684\u7ed3\u5c3e\u88ab\u53d1\u9001\u56de LLM\uff0c\u4ece\u800c\u751f\u6210\u300csummary 1=0.6, summary 2=0.4\u300d\u8fd9\u6837\u7684\u504f\u597d\u6982\u7387\u3002\u4e3a\u4e86\u51cf\u5c11\u4f4d\u7f6e\u504f\u5dee\uff0c\u9700\u8981\u4ea4\u66ff\u653e\u7f6e\u8fd9\u4e24\u4e2a\u54cd\u5e94\u7684\u5e8f\u5217\uff0c\u5e76\u8ba1\u7b97\u5176\u5e73\u5747\u5206\u6570\u3002<\/p>\n<p>RLAIF \u8fc7\u7a0b\u91c7\u7528\u4e86\u4e24\u4e2a\u7b56\u7565\uff1a1.\u300c\u84b8\u998f RLAIF\u300d\uff0c\u5176\u9075\u5faa\u4f20\u7edf\u7684 RLHF \u65b9\u6cd5\uff0c\u5373\u4f7f\u7528\u504f\u597d\u8bad\u7ec3\u4e00\u4e2a\u5956\u52b1\u6a21\u578b\uff0c\u7136\u540e\u518d\u5c06\u5176\u7528\u4e8e\u8bad\u7ec3 LLM \u7b56\u7565\uff1b2. \u300c\u76f4\u63a5 RLAIF\u300d\uff0c\u5176\u76f4\u63a5\u5c06 LLM \u53cd\u9988\u7528\u4f5c prompt \u6765\u8f93\u51fa\u8bc4\u4f30\u5206\u6570\uff0c\u518d\u5c06\u8be5\u5206\u6570\u7528\u4f5c<mark data-type=\"tech_methods\" data-id=\"ee1a8f69-3170-4ddf-b2b6-47d91c844425\">\u5f3a\u5316\u5b66\u4e60<\/mark>\u7b56\u7565\u8bad\u7ec3\u7684\u4fe1\u53f7\u3002<\/p>\n<p>\u6700\u540e\uff0c\u5176\u8bc4\u4f30\u8fc7\u7a0b\u4f1a\u4f7f\u7528\u4e09\u4e2a\u5173\u952e\u6307\u6807\uff1a1.AI &#8211; \u6807\u6ce8\u8005\u5bf9\u9f50\u5ea6\uff1aAI \u4e0e\u4eba\u7c7b\u6807\u6ce8\u8005\u7684\u4e00\u81f4\u7a0b\u5ea6\u30022. \u80dc\u7387\uff1a\u4eba\u7c7b\u6807\u6ce8\u8005\u6bd4\u8f83\u4e24\u4e2a\u5019\u9009\u9879\u5e76\u9009\u62e9\u5176\u4e2d\u67d0\u4e00\u4e2a\u7684\u53ef\u80fd\u6027\u30023. \u65e0\u5bb3\u7387\uff1a\u4eba\u7c7b\u8bc4\u4f30\u8005\u8ba4\u4e3a\u65e0\u5bb3\u7684\u54cd\u5e94\u7684\u5360\u6bd4\u3002<\/p>\n<p>\u66f4\u591a\u8be6\u60c5\u8bf7\u53c2\u9605\u8bba\u6587<a data-itemshowtype=\"0\" data-linktype=\"2\" href=\"http:\/\/mp.weixin.qq.com\/s?__biz=MzA3MzI4MjgzMw==&amp;mid=2650889640&amp;idx=4&amp;sn=e8f7c8568b72fa7f3a70a3eed7117fc0&amp;chksm=84e499d6b39310c03c174936c6c8e819583bb4514863f94171280fd6dfd5471cd3ac0ceeecf1&amp;scene=21#wechat_redirect\" target=\"_blank\">\u300aRLAIF: Scaling reinforcement learning from human feedback with AI feedback\u300b<\/a>\u3002<\/p>\n<p><strong>\u76f4\u63a5\u4eba\u7c7b\u504f\u597d\u4f18\u5316<\/strong><\/p>\n<p>\u4f20\u7edf RLHF \u65b9\u6cd5\u901a\u5e38\u6d89\u53ca\u5230\u4f18\u5316\u6e90\u81ea\u4eba\u7c7b\u504f\u597d\u7684\u5956\u52b1\u51fd\u6570\u3002\u8be5\u65b9\u6cd5\u867d\u6709\u6548\uff0c\u4f46\u4e5f\u53ef\u80fd\u5e26\u6765\u4e00\u4e9b\u96be\u9898\uff0c\u6bd4\u5982\u589e\u5927\u8ba1\u7b97\u590d\u6742\u5ea6\u4ee5\u53ca\u5728\u4f30\u8ba1\u548c\u4f18\u5316\u5956\u52b1\u65f6\u9700\u8981\u8003\u8651\u504f\u7f6e &#8211; \u65b9\u5dee\u6743\u8861\u3002\u53c2\u9605\u8bba\u6587\u300aHigh-dimensional continuous control using generalized advantage estimation\u300b\u3002<\/p>\n<p>\u8fd1\u671f\u6709\u7814\u7a76\u63a2\u7d22\u4e86\u5176\u5b83\u4e00\u4e9b\u65e8\u5728\u6839\u636e\u4eba\u7c7b\u504f\u597d\uff08\u65e0\u9700\u4f9d\u8d56\u67d0\u4e2a\u6807\u91cf\u7684\u5956\u52b1\u4fe1\u53f7\uff09\u6765\u76f4\u63a5\u4f18\u5316 LLM \u7b56\u7565\u7684\u65b9\u6cd5\u3002<\/p>\n<p>\u8fd9\u4e9b\u65b9\u6cd5\u7684\u76ee\u6807\u662f\u901a\u8fc7\u66f4\u76f4\u63a5\u5730\u4f7f\u7528\u504f\u597d\u6570\u636e\u6765\u7b80\u5316\u5bf9\u9f50\u6d41\u7a0b\u3001\u964d\u4f4e\u8ba1\u7b97\u5f00\u9500\u4ee5\u53ca\u5b9e\u73b0\u66f4\u7a33\u5065\u7684\u4f18\u5316\u3002\u901a\u8fc7\u5c06\u8be5\u95ee\u9898\u63cf\u8ff0\u4e3a\u4e00\u4e2a\u504f\u597d\u4f18\u5316\u95ee\u9898\uff0c\u800c\u4e0d\u662f\u5956\u52b1\u4f30\u8ba1\u548c\u6700\u5927\u5316\u95ee\u9898\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u80fd\u63d0\u4f9b\u4e00\u79cd\u5c06<mark data-type=\"tech_tasks\" data-id=\"bf35ef94-d956-4033-a533-0c0828308c36\">\u8bed\u8a00\u6a21\u578b<\/mark>\u4e0e\u4eba\u7c7b\u5224\u65ad\u5bf9\u9f50\u7684\u4e0d\u540c\u89c6\u89d2\uff1a<\/p>\n<ul>\n<li>\n<p>SliC-HF\uff0c\u4f7f\u7528\u4eba\u7c7b\u53cd\u9988\u8fdb\u884c\u5e8f\u5217\u4f3c\u7136\u6821\u51c6\uff0c\u53c2\u9605\u8bba\u6587\u300aSliC-HF: Sequence likelihood calibration with human feedback\u300b\u3002<\/p>\n<\/li>\n<li>\n<p>RSO\uff0c<mark data-type=\"tech_methods\" data-id=\"c84b6d7e-447f-4bcb-b4a5-c807d7b8a5f7\">\u62d2\u7edd\u91c7\u6837<\/mark>\u4f18\u5316\uff0c\u53c2\u9605\u8bba\u6587\u300aStatistical rejection sampling improves preference optimization\u300b\u3002<\/p>\n<\/li>\n<li>\n<p>DPO\uff0c\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff0c\u53c2\u9605\u8bba\u6587\u300aDirect preference optimization: Your language model is secretly a reward model\u300b\u3002<\/p>\n<\/li>\n<li>\n<p>DPOP\uff0cDPO-positive\uff0c\u53c2\u9605\u8bba\u6587\u300aSmaug: Fixing failure modes of preference optimisation with DPO-positive\u300b\u3002<\/p>\n<\/li>\n<li>\n<p>\u03b2-DPO\uff0c\u53c2\u9605\u8bba\u6587\u300a\u03b2-DPO: Direct preference optimization with dynamic \u03b2\u300b\u3002<\/p>\n<\/li>\n<li>\n<p>IPO\uff0c\u8eab\u4efd\u504f\u597d\u4f18\u5316\uff0c\u53c2\u9605\u8bba\u6587\u300aA general theoretical paradigm to understand learning from human preferences\u300b\u3002<\/p>\n<\/li>\n<li>\n<p>sDPO\uff0c\u9010\u6b65 DPO\uff0c\u53c2\u9605\u8bba\u6587\u300asDPO: Don\u2019t use your data all at once\u300b\u3002<\/p>\n<\/li>\n<li>\n<p>GPO\uff0c\u5e7f\u4e49\u504f\u597d\u4f18\u5316\uff0c\u53c2\u9605\u8bba\u6587\u300aGeneralized preference optimization: A unified approach to offline alignment\u300b\u3002<\/p>\n<\/li>\n<\/ul>\n<p><strong>token \u7ea7 DPO<\/strong><\/p>\n<p>\u4f7f\u7528 DPO \u65f6\uff0c\u5956\u52b1\u4f1a\u88ab\u4e00\u8d77\u5206\u914d\u7ed9 prompt \u548c\u54cd\u5e94\u3002\u76f8\u53cd\uff0c\u4f7f\u7528 MDP \u65f6\uff0c\u5956\u52b1\u4f1a\u88ab\u5206\u914d\u7ed9\u5404\u4e2a\u52a8\u4f5c\u3002\u540e\u7eed\u7684\u4e24\u7bc7\u8bba\u6587\u5728 token \u5c42\u9762\u9610\u8ff0\u4e86 DPO \u5e76\u5c06\u5176\u5e94\u7528\u6269\u5c55\u5230\u4e86 token \u7ea7\u7684\u5206\u6790\u3002<\/p>\n<ul>\n<li>\n<p>DPO \u53ef\u4ee5\u6267\u884c token \u7ea7\u4fe1\u7528\u5206\u914d\u7684\u7814\u7a76\uff0c\u53c2\u9605\u8bba\u6587\u300aFrom r to Q\u2217: Your language model is secretly a Q-function\u300b\uff0c\u62a5\u9053<a data-itemshowtype=\"0\" data-linktype=\"2\" href=\"http:\/\/mp.weixin.qq.com\/s?__biz=MzA3MzI4MjgzMw==&amp;mid=2650915526&amp;idx=2&amp;sn=1218e4612e6155527030f7ed7b61fcbe&amp;chksm=84e406b8b3938fae1381190a7bcef69b4f9e3bbf830235938bdf17de964394b7a6b8279d5f0f&amp;scene=21#wechat_redirect\" target=\"_blank\">\u300a\u8fd9\u5c31\u662f OpenAI \u795e\u79d8\u7684 Q*\uff1f\u65af\u5766\u798f\uff1a<mark data-type=\"tech_tasks\" data-id=\"bf35ef94-d956-4033-a533-0c0828308c36\">\u8bed\u8a00\u6a21\u578b<\/mark>\u5c31\u662f Q \u51fd\u6570\u300b<\/a>\u3002<\/p>\n<\/li>\n<li>\n<p>TDPO\uff0ctoken \u7ea7 DPO\uff0c\u53c2\u9605\u8bba\u6587\u300aToken-level direct preference optimization\u300b\u3002<\/p>\n<\/li>\n<\/ul>\n<p><strong>\u8fed\u4ee3\u5f0f \/ \u5728\u7ebf DPO<\/strong><\/p>\n<p>\u4f7f\u7528 DPO \u65f6\uff0c\u4f1a\u4f7f\u7528\u6240\u6709\u53ef\u7528\u7684\u504f\u597d\u6570\u636e\u96c6\u6765\u5bf9\u9f50 LLM\u3002\u4e3a\u4e86\u6301\u7eed\u63d0\u5347 LLM\uff0c\u5e94\u5f53\u5b9e\u73b0\u8fed\u4ee3\u5f0f \/ \u5728\u7ebf DPO\u3002\u8fd9\u5c31\u5f15\u51fa\u4e86\u4e00\u4e2a\u6709\u8da3\u7684\u95ee\u9898\uff1a\u5982\u4f55\u9ad8\u6548\u5730\u6536\u96c6\u65b0\u7684\u504f\u597d\u6570\u636e\u96c6\u3002\u4e0b\u9762\u4e24\u7bc7\u8bba\u6587\u6df1\u5165\u63a2\u8ba8\u4e86\u8fd9\u4e00\u4e3b\u9898\u3002<\/p>\n<ul>\n<li>\n<p>\u81ea\u6211\u5956\u52b1\u5f0f<mark data-type=\"tech_tasks\" data-id=\"bf35ef94-d956-4033-a533-0c0828308c36\">\u8bed\u8a00\u6a21\u578b<\/mark>\uff0c\u53c2\u9605\u8bba\u6587\u300aSelf-rewarding language models\u300b\u3002<\/p>\n<\/li>\n<li>\n<p>CRINGE\uff0c\u53c2\u9605\u8bba\u6587\u300aThe cringe loss: Learning what language not to model\u300b\u3002<\/p>\n<\/li>\n<\/ul>\n<p><strong>\u4e8c\u5143\u53cd\u9988<\/strong><\/p>\n<p>\u4e8b\u5b9e\u8bc1\u660e\uff0c\u6536\u96c6\u504f\u597d\u53cd\u9988\u6bd4\u6536\u96c6\u4e8c\u5143\u53cd\u9988\uff08\u6bd4\u5982\u70b9\u8d5e\u6216\u70b9\u8e29\uff09\u7684\u96be\u5ea6\u5927\uff0c\u56e0\u6b64\u540e\u8005\u53ef\u4fc3\u8fdb\u5bf9\u9f50\u8fc7\u7a0b\u7684\u6269\u5c55\u3002KTO \u548c DRO \u8fd9\u4e24\u9879\u7814\u7a76\u5173\u6ce8\u7684\u4fbf\u662f\u4f7f\u7528\u4e8c\u5143\u53cd\u9988\u6765\u5bf9\u9f50 LLM\u3002<\/p>\n<ul>\n<li>\n<p>KTO\uff0cKahneman-Tversky \u4f18\u5316\uff0c\u53c2\u9605\u8bba\u6587\u300aKTO: Model alignment as prospect theoretic optimization\u300b\u3002<\/p>\n<\/li>\n<li>\n<p>DRO\uff0c\u76f4\u63a5\u5956\u52b1\u4f18\u5316\uff0c\u53c2\u9605\u8bba\u6587\u300aOffline regularised reinforcement learning for large language models alignment\u300b\u3002<\/p>\n<\/li>\n<\/ul>\n<p><strong>\u878d\u5408 SFT \u548c\u5bf9\u9f50<\/strong><\/p>\n<p>\u4e4b\u524d\u7684\u7814\u7a76\u4e3b\u8981\u8fd8\u662f\u6309\u987a\u5e8f\u6267\u884c SFT \u548c\u5bf9\u9f50\uff0c\u4f46\u4e8b\u5b9e\u8bc1\u660e\u8fd9\u79cd\u65b9\u6cd5\u5f88\u8d39\u529b\uff0c\u5e76\u4f1a\u5bfc\u81f4\u707e\u96be\u6027\u9057\u5fd8\u3002\u540e\u7eed\u7684\u7814\u7a76\u6709\u4e24\u4e2a\u65b9\u5411\uff1a\u4e00\u662f\u5c06\u8fd9\u4e24\u4e2a\u8fc7\u7a0b\u6574\u5408\u6210\u5355\u4e00\u6b65\u9aa4\uff1b\u4e8c\u662f\u5e76\u884c\u5730\u5fae\u8c03\u4e24\u4e2a\u6a21\u578b\uff0c\u6700\u7ec8\u518d\u8fdb\u884c\u878d\u5408\u3002<\/p>\n<ul>\n<li>\n<p>ORPO\uff0c\u6bd4\u503c\u6bd4\u504f\u597d\u4f18\u5316\uff0c\u53c2\u9605\u8bba\u6587\u300aORPO: Monolithic preference optimization without reference model\u300b\u3002<\/p>\n<\/li>\n<li>\n<p>PAFT\uff0c\u5e76\u884c\u5fae\u8c03\uff0c\u53c2\u9605\u8bba\u6587\u300aPAFT: A parallel training paradigm for effective llm fine-tuning\u300b\u3002<\/p>\n<\/li>\n<\/ul>\n<p><strong>\u957f\u5ea6\u63a7\u5236\u5f0f DPO \u548c\u65e0\u53c2\u8003 DPO<\/strong><\/p>\n<p>\u4e4b\u524d\u6709\u7814\u7a76\u8868\u660e\uff0cLLM \u7684\u8f93\u51fa\u5f80\u5f80\u8fc7\u4e8e\u5197\u957f\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0cR-DPO \u548c SimPO \u7684\u5173\u6ce8\u91cd\u5fc3\u662f\u5728\u4e0d\u5f71\u54cd\u751f\u6210\u6027\u80fd\u7684\u524d\u63d0\u4e0b\u5b9e\u73b0\u5bf9\u54cd\u5e94\u957f\u5ea6\u7684\u63a7\u5236\u3002<\/p>\n<p>\u6b64\u5916\uff0cDPO \u5fc5\u9700\u53c2\u8003\u7b56\u7565\u6765\u786e\u4fdd\u5df2\u5bf9\u9f50\u6a21\u578b\u4e0d\u4f1a\u4e0e\u53c2\u8003\u6a21\u578b\u6709\u592a\u5927\u504f\u5dee\u3002\u76f8\u8f83\u4e4b\u4e0b\uff0cSimPO \u548c RLOO \u63d0\u51fa\u4e86\u4e00\u4e9b\u65b9\u6cd5\uff0c\u53ef\u4ee5\u5728\u4e0d\u5f71\u54cd LLM \u6548\u679c\u7684\u60c5\u51b5\u4e0b\u6d88\u9664\u5bf9\u53c2\u8003\u6a21\u578b\u7684\u9700\u6c42\u3002<\/p>\n<ul>\n<li>\n<p>R-DPO\uff0c<mark data-type=\"concepts\" data-id=\"c51052b5-4cd8-4df0-99bb-5aa643c2f027\">\u6b63\u5219\u5316<\/mark> DPO\uff0c\u53c2\u9605\u8bba\u6587\u300aDisentangling length from quality in direct preference optimization\u300b\u3002<\/p>\n<\/li>\n<li>\n<p>SimPO\uff0c\u7b80\u5355\u504f\u597d\u4f18\u5316\uff0c\u53c2\u9605\u8bba\u6587\u300aSimPO: Simple preference optimization with a reference-free reward\u300b\uff0c\u62a5\u9053<a data-itemshowtype=\"0\" data-linktype=\"2\" href=\"http:\/\/mp.weixin.qq.com\/s?__biz=MzA3MzI4MjgzMw==&amp;mid=2650919310&amp;idx=1&amp;sn=9ab3ae94974892f9d769af47aa7bcb51&amp;chksm=84e415f0b3939ce6ebdffae30d59da982210000a5d997bca2169037184be5b267934fe0cfc82&amp;scene=21#wechat_redirect\" target=\"_blank\">\u300a\u5168\u9762\u8d85\u8d8a DPO\uff1a\u9648\u4e39\u7426\u56e2\u961f\u63d0\u51fa\u7b80\u5355\u504f\u597d\u4f18\u5316 SimPO\uff0c\u8fd8\u70bc\u51fa\u6700\u5f3a 8B \u5f00\u6e90\u6a21\u578b\u300b<\/a>\u3002<\/p>\n<\/li>\n<li>\n<p>RLOO\uff0cREINFORCE Leave-One-Out\uff0c\u53c2\u9605\u8bba\u6587\u300aBack to basics: Revisiting reinforce style optimization for learning from human feedback in LLMs\u300b\u3002<\/p>\n<\/li>\n<\/ul>\n<p><strong>\u9010\u5217\u8868\u7684\u504f\u597d\u4f18\u5316<\/strong><\/p>\n<p>\u4e4b\u524d\u5728 PPO \u548c DPO \u65b9\u9762\u7684\u7814\u7a76\u5173\u6ce8\u7684\u662f\u6210\u5bf9\u504f\u597d\uff0c\u800c RLHF \u65b9\u9762\u7684\u7814\u7a76\u5219\u662f\u6536\u96c6\u9010\u5217\u8868\u7684\u504f\u597d\u6765\u52a0\u901f\u6570\u636e\u6536\u96c6\u8fc7\u7a0b\uff0c\u4e4b\u540e\u518d\u5c06\u5b83\u4eec\u8f6c\u6362\u6210\u6210\u5bf9\u504f\u597d\u3002\u5c3d\u7ba1\u5982\u6b64\uff0c\u4e3a\u4e86\u63d0\u5347 LLM \u7684\u6027\u80fd\uff0c\u76f4\u63a5\u4f7f\u7528\u9010\u5217\u8868\u7684\u6570\u636e\u96c6\u6765\u6267\u884c\u504f\u597d\u4f18\u5316\u662f\u53ef\u884c\u7684\u3002\u4ee5\u4e0b\u4e09\u7bc7\u8bba\u6587\u4e13\u95e8\u8ba8\u8bba\u4e86\u8fd9\u79cd\u65b9\u6cd5\u3002<\/p>\n<ul>\n<li>\n<p>LiPO\uff0c\u9010\u5217\u8868\u504f\u597d\u4f18\u5316\uff0c\u53c2\u9605\u8bba\u6587\u300aLIPO: Listwise preference optimization through learning-to-rank\u300b\u3002<\/p>\n<\/li>\n<li>\n<p>RRHF\uff0c\u53c2\u9605\u8bba\u6587\u300aRRHF: Rank responses to align language models with human feedback without tears\u300b\u3002<\/p>\n<\/li>\n<li>\n<p>PRO\uff0c\u504f\u597d\u6392\u540d\u4f18\u5316\uff0c\u53c2\u9605\u8bba\u6587\u300aPreference ranking optimization for human alignment\u300b\u3002<\/p>\n<\/li>\n<\/ul>\n<p><strong>\u8d1f\u504f\u597d\u4f18\u5316<\/strong><\/p>\n<p>\u8fd9\u4e9b\u7814\u7a76\u6709\u4e00\u4e2a\u5171\u540c\u524d\u63d0\uff1a\u5f53\u524d\u8fd9\u4e00\u4ee3 LLM \u5df2\u7ecf\u5728\u7ffb\u8bd1\u548c\u603b\u7ed3\u7b49\u4efb\u52a1\u4e0a\u8d85\u8d8a\u4e86\u4eba\u7c7b\u6027\u80fd\u3002\u56e0\u6b64\uff0c\u53ef\u4ee5\u5c06 LLM \u7684\u8f93\u51fa\u89c6\u4e3a\u671f\u671b\u54cd\u5e94\uff0c\u800c\u65e0\u9700\u4f9d\u9760\u5c06\u4eba\u7c7b\u6807\u6ce8\u7684\u6570\u636e\u89c6\u4e3a\u504f\u597d\u54cd\u5e94\uff1b\u8fd9\u6837\u505a\u662f\u6709\u597d\u5904\u7684\u3002\u53cd\u8fc7\u6765\uff0c\u4e0d\u671f\u671b\u5f97\u5230\u7684\u54cd\u5e94\u4f9d\u7136\u4e5f\u53ef\u88ab\u7528\u4e8e\u5bf9\u9f50 LLM\uff0c\u8fd9\u4e2a\u8fc7\u7a0b\u5c31\u662f\u6240\u8c13\u7684\u8d1f\u504f\u597d\u4f18\u5316\uff08NPO\uff09\u3002<\/p>\n<ul>\n<li>\n<p>NN\uff0c\u5426\u5b9a\u8d1f\u4f8b\u65b9\u6cd5\uff0c\u53c2\u9605\u8bba\u6587\u300aNegating negatives: Alignment without human positive samples via distributional dispreference optimization\u300b\u3002<\/p>\n<\/li>\n<li>\n<p>NPO\uff0c\u8d1f\u4f8b\u504f\u597d\u4f18\u5316\uff0c\u53c2\u9605\u8bba\u6587\u300aNegative preference optimization: From catastrophic collapse to effective unlearning\u300b\u3002<\/p>\n<\/li>\n<li>\n<p>CPO\uff0c\u5bf9\u6bd4\u504f\u597d\u4f18\u5316\uff0c\u53c2\u9605\u8bba\u6587\u300aContrastive preference optimization: Pushing the boundaries of llm performance in machine translation\u300b\u3002<\/p>\n<\/li>\n<\/ul>\n<p><strong>\u7eb3\u4ec0\u5b66\u4e60<\/strong><\/p>\n<p>\u4e4b\u524d\u7684\u7814\u7a76\u901a\u5e38\u662f\u4f7f\u7528\u9010\u70b9\u5956\u52b1\u548c BT \u6a21\u578b\u6765\u5f97\u5230\u6210\u5bf9\u504f\u597d\u3002\u4f46\u662f\uff0c\u8fd9\u79cd\u65b9\u6cd5\u6bd4\u4e0d\u4e0a\u76f4\u63a5\u6210\u5bf9\u504f\u597d\u5efa\u6a21\u5e76\u4e14\u65e0\u6cd5\u89e3\u51b3\u6210\u5bf9\u504f\u597d\u4e2d\u7684\u4e0d\u4e00\u81f4\u95ee\u9898\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\uff0c\u4e00\u4e9b\u7814\u7a76\u63d0\u51fa\u4e86\u7eb3\u4ec0\u5b66\u4e60\u65b9\u6cd5\u3002<\/p>\n<ul>\n<li>\n<p>\u6839\u636e\u4eba\u7c7b\u53cd\u9988\u7684\u7eb3\u4ec0\u5b66\u4e60\uff0c\u53c2\u9605\u8bba\u6587\u300aNash learning from human feedback\u300b\u3002<\/p>\n<\/li>\n<li>\n<p>SPPO\uff0c\u81ea\u535a\u5f08\u504f\u597d\u4f18\u5316\uff0c\u53c2\u9605\u8bba\u6587\u300aA minimaximalist approach to reinforcement learning from human feedback\u300b\u3002<\/p>\n<\/li>\n<li>\n<p>DNO\uff0c\u76f4\u63a5\u7eb3\u4ec0\u4f18\u5316\uff0c\u53c2\u9605\u8bba\u6587\u300aDirect nash optimization: Teaching language models to self-improve with general preferences\u300b\u3002<\/p>\n<\/li>\n<\/ul>\n<p><strong>\u4e0d\u540c\u65b9\u6cd5\u7684\u6bd4\u8f83<\/strong><\/p>\n<p>\u4e00\u4e9b\u7814\u7a76\u5219\u662f\u4e3a\u4e86\u6bd4\u8f83\u8fd9\u4e9b\u4e0d\u540c\u65b9\u6cd5\u3002\u8fd9\u7c7b\u7814\u7a76\u53ef\u4ee5\u9610\u91ca\u6bcf\u79cd\u65b9\u6cd5\u5404\u81ea\u7684\u4f18\u7f3a\u70b9\u3002<\/p>\n<ul>\n<li>\n<p>\u8bc4\u4f30 DPO \u53ca\u5176\u53d8\u4f53<\/p>\n<\/li>\n<\/ul>\n<p>\u8bba\u6587\u300aInsights into alignment: Evaluating dpo and its variants across multiple tasks\u300b\u5728\u63a8\u7406\u3001\u6570\u5b66\u95ee\u9898\u6c42\u89e3\u3001\u53ef\u4fe1\u5ea6\u3001\u95ee\u7b54\u548c\u591a\u4efb\u52a1\u7406\u89e3\u7b49\u591a\u79cd\u4efb\u52a1\u4e0a\u5168\u9762\u8bc4\u4f30\u4e86\u9690\u5f0f\u5956\u52b1\u6a21\u578b\uff0c\u5373\u65e0<mark data-type=\"tech_methods\" data-id=\"ee1a8f69-3170-4ddf-b2b6-47d91c844425\">\u5f3a\u5316\u5b66\u4e60<\/mark>\u7b97\u6cd5\uff0c\u5305\u62ec DPO\u3001KTO\u3001IPO \u548c CPO\u3002\u8fd9\u4e9b\u8bc4\u4f30\u6d89\u53ca\u4e09\u4e2a\u4e0d\u540c\u573a\u666f\uff1a1) \u5fae\u8c03\u76d1\u7763\u5f0f\u5fae\u8c03\uff08SFT\uff09\u6a21\u578b\u30012) \u5fae\u8c03\u9884\u8bad\u7ec3\u6a21\u578b\u30013) \u5fae\u8c03\u6307\u4ee4\u6a21\u578b\u3002<\/p>\n<p>\u8be5\u7814\u7a76\u53d1\u73b0\uff0c\u5728\u5927\u591a\u6570<mark data-type=\"concepts\" data-id=\"308c3a45-0fee-4ec6-858e-85b15f440fc0\">\u57fa\u51c6<\/mark>\u4e0a\uff0cKTO \u6bd4\u5176\u5b83\u5bf9\u9f50\u65b9\u6cd5\u66f4\u4f18\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8868\u660e\uff0c\u5bf9\u9f50\u5e76\u4e0d\u4f1a\u663e\u8457\u63d0\u5347\u6a21\u578b\u7684\u63a8\u7406\u548c\u95ee\u7b54\u6027\u80fd\uff0c\u4f46\u786e\u5b9e\u80fd\u5927\u5e45\u63d0\u5347\u6a21\u578b\u7684\u6570\u5b66\u95ee\u9898\u6c42\u89e3\u80fd\u529b\u3002\u8be5\u7814\u7a76\u8fd8\u6ce8\u610f\u5230\u4e86\u6570\u636e\u91cf\u7684\u91cd\u8981\u6027\uff0c\u5bf9\u9f50\u65b9\u6cd5\u5728\u8f83\u5c0f\u7684\u6570\u636e\u5b50\u96c6\u4e0a\u7684\u6027\u80fd\u6700\u4f73\u3002\u6b64\u5916\uff0c\u7814\u7a76\u53d1\u73b0 KTO \u548c CPO \u80fd\u6709\u6548\u7ed5\u8fc7 SFT \u9636\u6bb5\uff0c\u5728\u4e0d\u5f71\u54cd\u6027\u80fd\u7684\u524d\u63d0\u4e0b\u76f4\u63a5\u8fdb\u5165\u5bf9\u9f50\u9636\u6bb5\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5f53\u7ed5\u8fc7 SFT \u9636\u6bb5\uff0c\u76f4\u63a5\u8fdb\u5165\u5bf9\u9f50\u9636\u6bb5\u65f6\uff0cDPO \u548c IPO \u4f1a\u8868\u73b0\u51fa\u660e\u663e\u7684\u6027\u80fd\u4e0b\u964d\u3002<\/p>\n<ul>\n<li>\n<p>DPO \u662f\u6bd4 PPO \u66f4\u597d\u7684 LLM \u5bf9\u9f50\u65b9\u6cd5\u5417\uff1f<\/p>\n<\/li>\n<\/ul>\n<p>\u8bba\u6587\u300aIs DPO superior to PPO for LLM alignment? A comprehensive study\u300b\u8868\u660e\uff0cDPO \u53ef\u80fd\u5b58\u5728\u56fa\u6709\u5c40\u9650\uff0c\u53ef\u80fd\u4f1a\u4ea7\u751f\u6709\u504f\u5dee\u7684\u89e3\u7b54\uff0c\u5e76\u53ef\u80fd\u7531\u4e8e\u5206\u5e03\u53d8\u5316\u800c\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\uff0c<\/p>\n<p>\u4ed6\u4eec\u53d1\u73b0\uff0cDPO \u8bad\u7ec3\u51fa\u7684\u7b56\u7565\u503e\u5411\u4e8e\u672a\u66fe\u89c1\u8fc7\u7684\u54cd\u5e94\uff0c\u5c24\u5176\u662f\u5206\u5e03\u5916\u7684\u6837\u672c\u3002\u800c\u8fed\u4ee3\u5f0f \/ \u5728\u7ebf DPO \u5219\u80fd\u7f13\u89e3\u8fd9\u4e2a\u95ee\u9898\uff0c\u5176\u505a\u6cd5\u662f\u5e7f\u6cdb\u63a2\u7d22\u54cd\u5e94\u7a7a\u95f4\u5e76\u4e0d\u65ad\u66f4\u65b0\u53c2\u8003\u6a21\u578b\u3002\u76f8\u8f83\u4e4b\u4e0b\uff0cRLHF\/PPO \u5219\u662f\u901a\u8fc7\u4f18\u52bf\u5f52\u4e00\u5316\u3001\u5927\u6279\u91cf\u5927\u5c0f\u4ee5\u53ca\u5bf9\u53c2\u8003\u6a21\u578b\u4f7f\u7528\u6307\u6570\u79fb\u52a8\u5e73\u5747\u6765\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\u3002\u6700\u7ec8\uff0c\u8fd9\u4e9b\u53d1\u73b0\u8868\u660e PPO \u4f18\u4e8e\u8fed\u4ee3\u5f0f \/ \u5728\u7ebf DPO\uff0c\u800c\u8fd9\u53c8\u8fdb\u4e00\u6b65\u4f18\u4e8e\u6807\u51c6 DPO\u3002<\/p>\n<p>\u66f4\u591a\u8be6\u60c5\u53ef\u53c2\u9605<mark data-type=\"institutions\" data-id=\"9e96434f-5827-41e3-9b3c-36d39bc0d446\">\u673a\u5668\u4e4b\u5fc3<\/mark>\u4e13\u680f\u6587\u7ae0<a data-itemshowtype=\"0\" data-linktype=\"2\" href=\"http:\/\/mp.weixin.qq.com\/s?__biz=MzA3MzI4MjgzMw==&amp;mid=2650927025&amp;idx=4&amp;sn=9db9f4131f05a132012f73db7339042a&amp;chksm=84e42bcfb393a2d99c4970c207a80d941f822384c89fd1f6516c08107efa5485af56f9c317a8&amp;scene=21#wechat_redirect\" target=\"_blank\">\u300aICML 2024 Oral | DPO \u662f\u5426\u6bd4 PPO \u66f4\u9002\u5408 LLM\uff0c\u6e05\u534e<mark data-type=\"experts\" data-id=\"25b47a57-a97f-41b8-9115-c3043ad9c925\">\u5434\u7ffc<\/mark>\u56e2\u961f\u6700\u65b0\u63ed\u79d8\u300b<\/a>\u3002<\/p>\n<p><strong>\u672a\u6765\u65b9\u5411<\/strong><\/p>\n<p>\u901a\u8fc7\u5206\u6790\u8fc7\u5f80\u8bba\u6587\uff0c\u8be5\u56e2\u961f\u786e\u5b9a\u4e86\u4e00\u4e9b\u6709\u5f85\u8fdb\u4e00\u6b65\u63a2\u7d22\u7684\u7814\u7a76\u95ee\u9898\u3002<\/p>\n<p><strong>\u7528\u4e8e\u5bf9\u9f50\u8bc4\u4f30\u7684\u4e00\u822c\u4efb\u52a1<\/strong><\/p>\n<p>\u4e0d\u540c\u8bba\u6587\u4f7f\u7528\u4e86\u4e0d\u540c\u7684\u4efb\u52a1\u6765\u8bc4\u4f30\u8fd9\u4e9b\u65b9\u6cd5\u7684\u6027\u80fd\u3002\u4f46\u662f\uff0cGSM8K \u7b49\u4e00\u4e9b\u4efb\u52a1\u66f4\u5173\u6ce8\u63a8\u7406\uff0c\u53ef\u80fd\u5e76\u4e0d\u9002\u5408\u7528\u4e8e\u8bc4\u4f30\u5bf9\u9f50\u6027\u80fd\u3002\u76f8\u53cd\uff0cTruthfulQA \u7b49\u4efb\u52a1\u6216\u90a3\u4e9b\u5173\u6ce8\u6bd2\u6027\u7684\u4efb\u52a1\u5e94\u5f53\u4f18\u5148\u8003\u8651\uff0c\u4ee5\u8bc4\u4f30\u5df2\u5fae\u8c03 LLM \u7684\u6bd2\u6027\u3002\u5e94\u5f53\u60f3\u529e\u6cd5\u5c06\u8fd9\u4e9b\u4efb\u52a1\u7ec4\u5408\u8d77\u6765\uff0c\u521b\u5efa\u4e00\u4e2a\u7528\u4e8e\u8bc4\u4f30\u5bf9\u9f50\u7684\u7edf\u4e00\u6392\u884c\u699c\u3002<\/p>\n<p><strong>\u5c06\u9690\u5f0f\u5956\u52b1\u6a21\u578b\u3001\u9010\u5217\u8868\u504f\u597d\u548c\u7eb3\u4ec0\u5b66\u4e60\u7528\u4e8e\u66f4\u5927\u89c4\u6a21\u7684<mark data-type=\"tech_tasks\" data-id=\"bf35ef94-d956-4033-a533-0c0828308c36\">\u8bed\u8a00\u6a21\u578b<\/mark><\/strong><\/p>\n<p>\u76ee\u524d\uff0c\u4f7f\u7528\u9690\u5f0f\u5956\u52b1\u6a21\u578b\u7684\u6700\u5927\u6a21\u578b\u7684<mark data-type=\"concepts\" data-id=\"2e982b73-88e2-41e8-a430-f7ae5a9af4bf\">\u53c2\u6570<\/mark>\u91cf\u4e5f\u4e0d\u8fc7 70B\u3002\u5982\u679c\u80fd\u5c06\u8fd9\u4e9b\u65b9\u6cd5\u6269\u5c55\u7528\u4e8e\u66f4\u5927\u7684\u6a21\u578b\uff0c\u6bd4\u5982 GPT-4 \u548c Claude-3 \u5927\u5c0f\u7684\u6a21\u578b\uff0c\u90a3\u5e94\u8be5\u80fd\u5e2e\u52a9\u6211\u4eec\u66f4\u597d\u5730\u7406\u89e3\u5b83\u4eec\u4e0e RLHF\/PPO \u7684\u76f8\u5bf9\u6548\u679c\u3002<\/p>\n<p>\u7c7b\u4f3c\u5730\uff0c\u9010\u5217\u8868\u504f\u597d\u6a21\u578b\u4e5f\u503c\u5f97\u8fdb\u4e00\u6b65\u7814\u7a76\u3002\u4f7f\u7528 RLHF \u65f6\uff0c\u8981\u4f7f\u7528\u9010\u5217\u8868\u504f\u597d\u6536\u96c6\u504f\u597d\u6570\u636e\u96c6\uff0c\u4e4b\u540e\u518d\u5c06\u5176\u8f6c\u6362\u6210\u591a\u5bf9\u6210\u5bf9\u504f\u597d\u6570\u636e\u3002\u5927\u89c4\u6a21\u5e94\u7528\u9010\u5217\u8868\u504f\u597d\u6a21\u578b\u7684\u6f5c\u5728\u95ee\u9898\u4f9d\u7136\u6709\u5f85\u89e3\u51b3\u3002<\/p>\n<p>\u6700\u540e\uff0c\u7eb3\u4ec0\u5b66\u4e60\u53ef\u4ee5\u89e3\u51b3\u4eba\u7c7b\u6807\u6ce8\u8005\u4e4b\u95f4\u7684\u4e0d\u4e00\u81f4\u95ee\u9898\u3002\u5982\u679c\u80fd\u5c06\u7eb3\u4ec0\u5b66\u4e60\u6a21\u578b\u96c6\u6210\u5230\u66f4\u5927\u89c4\u6a21\u7684 LLM \u4e2d\uff0c\u5c31\u53ef\u4ee5\u8bc1\u660e\u5176\u6355\u83b7\u4eba\u6027\u590d\u6742\u6027\u7684\u80fd\u529b\u3002<\/p>\n<p><strong>\u6709\u5173\u4e8c\u5143\u53cd\u9988\u7684\u5b9e\u9a8c<\/strong><\/p>\n<p>KTO \u548c DRO \u90fd\u91c7\u7528\u4e86\u300c\u70b9\u8d5e\u300d\u548c\u300c\u70b9\u8e29\u300d\u8fd9\u6837\u7684\u4e8c\u5143\u53cd\u9988\u673a\u5236\uff0c\u800c\u4e0d\u662f\u6210\u5bf9\u504f\u597d\u3002\u8fd9\u4e9b\u4e8c\u5143\u53cd\u9988\u6765\u81ea\u504f\u597d\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5c06\u671f\u671b\u54cd\u5e94\u6807\u8bb0\u6210\u6b63\u4f8b\uff0c\u5c06\u4e0d\u671f\u671b\u54cd\u5e94\u6807\u8bb0\u6210\u8d1f\u4f8b\u3002\u6211\u4eec\u8fd8\u9700\u8981\u5bf9\u73b0\u5b9e\u7684\u4e8c\u5143\u6570\u636e\u96c6\u8fdb\u884c\u8fdb\u4e00\u6b65\u7814\u7a76\u3002\u6b64\u5916\uff0c\u76f8\u6bd4\u4e8e\u504f\u597d\u6570\u636e\uff0c\u4e8c\u5143\u6570\u636e\u96c6\u66f4\u5bb9\u6613\u6536\u96c6\uff0c\u56e0\u6b64\u6709\u671b\u4f7f\u7528\u66f4\u5927\u89c4\u6a21\u7684\u4e8c\u5143\u53cd\u9988\u6570\u636e\u96c6\u6765\u8fdb\u884c\u5bf9\u9f50\u3002\u4f46\u662f\uff0c\u4e8c\u5143\u53cd\u9988\u4e2d\u7684\u566a\u58f0\u53ef\u80fd\u6bd4\u504f\u597d\u6570\u636e\u96c6\u4e2d\u7684\u566a\u58f0\u66f4\u52a0\u660e\u663e\uff0c\u56e0\u6b64\u5982\u4f55\u6709\u6548\u6ee4\u9664\u6709\u566a\u58f0\u6570\u636e\u4e5f\u662f\u4e00\u4e2a\u975e\u5e38\u6709\u8da3\u7684\u7814\u7a76\u65b9\u5411\u3002<\/p>\n<p><strong>\u5b9e\u9a8c\u7814\u7a76\u6709\u7528\u7684 AI \u53cd\u9988<\/strong><\/p>\n<p>\u76ee\u524d\u7684 AI \u53cd\u9988\u4e3b\u8981\u5305\u62ec RLAIF \u4e2d\u7684\u65e0\u5bb3\u53cd\u9988\u548c\u8fed\u4ee3\u5f0f DPO \u4e2d\u7684\u53cd\u9988\u6392\u540d\u3002\u4f46\u662f\uff0c\u4f7f\u7528 RLAIF \u65f6\uff0c\u6709\u7528\u53cd\u9988\u4f9d\u7136\u662f\u7531\u4eba\u7c7b\u6807\u6ce8\u8005\u63d0\u4f9b\u3002\u8fd9\u79cd\u65b9\u6cd5\u662f\u5408\u7406\u7684\uff0c\u56e0\u4e3a\u751f\u6210\u6709\u7528\u54cd\u5e94\u7684\u96be\u5ea6\u6bd4\u8bc6\u522b\u6709\u5bb3\u53cd\u9988\u660e\u663e\u5927\u5f97\u591a\u3002\u4e00\u4e2a\u6709\u8da3\u7684\u672a\u6765\u7814\u7a76\u65b9\u5411\u662f\u4f7f\u7528 LLM \u6765\u751f\u6210\u6709\u7528\u7684\u53cd\u9988\uff0c\u7531\u6b64\u8ba9 LLM \u53ef\u4ee5\u81ea\u6211\u63d0\u5347\u3002<\/p>\n<p><strong>\u52a0\u901f\u7eb3\u4ec0\u5b66\u4e60<\/strong><\/p>\n<p>\u7eb3\u4ec0\u5b66\u4e60\u65b9\u6cd5\u53ef\u4ee5\u6709\u6548\u5efa\u6a21\u6210\u5bf9\u504f\u597d\u5e76\u89e3\u51b3\u4eba\u7c7b\u6807\u6ce8\u4e4b\u95f4\u7684\u4e0d\u4e00\u81f4\u95ee\u9898\u3002\u4f46\u662f\uff0c\u5b83\u5fc5\u9700\u591a\u6b21\u8fed\u4ee3\u624d\u80fd<mark data-type=\"concepts\" data-id=\"3bf78775-1316-4ac0-bd99-10e2fc88c439\">\u6536\u655b<\/mark>\u5230\u6700\u4f18\u7b56\u7565\u3002\u5c3d\u7ba1\u5176\u4f5c\u8005\u6ca1\u6709\u660e\u8bf4\u5bf9\u9f50\u6240\u9700\u7684\u65f6\u95f4\uff0c\u4f46\u53ef\u731c\u6d4b\u5176\u4f1a\u6bd4 DPO \u7b49\u9690\u5f0f\u5956\u52b1\u6a21\u578b\u6162\u5f97\u591a\u3002\u56e0\u6b64\uff0c\u63d0\u5347\u7eb3\u4ec0\u5b66\u4e60\u8fc7\u7a0b\u7684\u901f\u5ea6\u4e5f\u662f\u4e00\u4e2a\u503c\u5f97\u5173\u6ce8\u7684\u7814\u7a76\u65b9\u5411\u3002<\/p>\n<p><strong><mark data-type=\"concepts\" data-id=\"1f2c00e7-f2e0-461c-ad70-eca3f91cdd65\">\u8fed\u4ee3 <\/mark>\/ <mark data-type=\"tech_methods\" data-id=\"df1b3500-af28-4cee-bf53-f7e6daddf012\">\u5728\u7ebf\u5b66\u4e60<\/mark>\u7684\u7ec8\u6b62<\/strong><\/p>\n<p>\u5728\u4f7f\u7528<mark data-type=\"concepts\" data-id=\"1f2c00e7-f2e0-461c-ad70-eca3f91cdd65\">\u8fed\u4ee3 <\/mark>\/ \u5728\u7ebf\u8bad\u7ec3\u65f6\uff0c\u786e\u5b9a\u7ec8\u6b62\u8fed\u4ee3\u7684\u65f6\u95f4\u5f88\u5173\u952e\u3002\u4e4b\u524d\u6709\u7814\u7a76\u53d1\u73b0\uff0c\u8fed\u4ee3\u5f0f\u5b66\u4e60\u6709\u65f6\u4f1a\u964d\u4f4e LLM \u5728\u67d0\u4e9b\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\uff0c\u8fd9\u53ef\u80fd\u662f<mark data-type=\"concepts\" data-id=\"af836eef-be90-4143-a022-46fae3904f0e\">\u8fc7\u62df\u5408<\/mark>\u7684\u8ff9\u8c61\u3002\u4f46\u662f\uff0c\u76ee\u524d\u8fd8\u6ca1\u6709\u7814\u7a76\u8005\u63a2\u7d22\u5982\u4f55\u786e\u5b9a\u7ec8\u6b62\u8fed\u4ee3\u7684\u5408\u7406 epoch\u3002<\/p>\n<p><strong>\u7b80\u5316 SFT + \u5bf9\u9f50<\/strong><\/p>\n<p>\u5f53\u524d\u7684\u65b9\u6cd5\u901a\u5e38\u662f\u4ee5\u4e00\u79cd\u8fde\u7eed\u65b9\u5f0f\u5b9e\u73b0 SFT \u548c\u5bf9\u9f50\u3002\u4f46\u662f\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5f80\u5f80\u4f1a\u5bfc\u81f4\u707e\u96be\u6027\u9057\u5fd8\uff0c\u5e76\u8ba9\u6574\u4e2a\u8bad\u7ec3\u8fc7\u7a0b\u53d8\u5f97\u66f4\u52a0\u8d39\u529b\u3002PAFT \u65b9\u6cd5\u51cf\u8f7b\u707e\u96be\u6027\u9057\u5fd8\u7684\u65b9\u5f0f\u662f\u5148\u5206\u522b\u5fae\u8c03 SFT \u548c\u5bf9\u9f50\u7136\u540e\u518d\u5c06\u5b83\u4eec\u878d\u5408\u5230\u4e00\u8d77\uff0c\u4f46\u8fd9\u4e5f\u4f1a\u63d0\u5347\u590d\u6742\u6027\u3002\u76f8\u8f83\u4e4b\u4e0b\uff0cORPO \u6280\u672f\u662f\u540c\u65f6\u6574\u5408\u8fd9\u4e24\u4e2a\u8fc7\u7a0b\uff0c\u4f46\u5374\u4f1a\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\u3002\u90a3\u4e48\uff0c\u8be5\u5982\u4f55\u6709\u6548\u5730\u5c06 SFT \u548c\u5bf9\u9f50\u7ec4\u5408\u8d77\u6765\u5b9e\u73b0\u9ad8\u6027\u80fd\u540c\u65f6\u53c8\u7ef4\u6301\u9ad8\u6548\u7387\u5462\uff1f\u8fd9\u8fd8\u662f\u4e00\u4e2a\u6709\u5f85\u89e3\u51b3\u7684\u6311\u6218\u3002<\/p>\n<p>\u66f4\u591a\u7ec6\u8282\u53c2\u89c1\u539f\u8bba\u6587\u3002<\/p>\n<p>\u6587\u7ae0\u6765\u6e90\u4e8e\u4e92\u8054\u7f51:<a href=\"https:\/\/www.jiqizhixin.com\/articles\/2024-08-05-4\" target=\"_blank\">\u4e00\u6587\u770b\u5c3dLLM\u5bf9\u9f50\u6280\u672f\uff1aRLHF\u3001RLAIF\u3001PPO\u3001DPO\u2026\u2026<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u6587\u7ae0\u6765\u6e90\u4e8e\u4e92\u8054\u7f51:\u4e00\u6587\u770b\u5c3dLLM\u5bf9\u9f50\u6280\u672f [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[27],"tags":[73,68],"class_list":["post-34936","post","type-post","status-publish","format-standard","hentry","category-news","tag-73","tag-68"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>\u4e00\u6587\u770b\u5c3dLLM\u5bf9\u9f50\u6280\u672f\uff1aRLHF\u3001RLAIF\u3001PPO\u3001DPO\u2026\u2026 - \u4e00\u8d77AI\u6280\u672f<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/17aitech.com\/?p=34936\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/17aitech.com\/?p=34936\",\"url\":\"https:\/\/17aitech.com\/?p=34936\",\"name\":\"\u4e00\u6587\u770b\u5c3dLLM\u5bf9\u9f50\u6280\u672f\uff1aRLHF\u3001RLAIF\u3001PPO\u3001DPO\u2026\u2026 - \u4e00\u8d77AI\u6280\u672f\",\"isPartOf\":{\"@id\":\"https:\/\/17aitech.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/17aitech.com\/?p=34936#primaryimage\"},\"image\":{\"@id\":\"https:\/\/17aitech.com\/?p=34936#primaryimage\"},\"thumbnailUrl\":\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/08\/frc-c44740cf398fc68fdae0d590050b9381.png\",\"datePublished\":\"2024-12-15T03:59:24+00:00\",\"author\":{\"@id\":\"https:\/\/17aitech.com\/#\/schema\/person\/60225458499e817ae0af73e67e440b9d\"},\"breadcrumb\":{\"@id\":\"https:\/\/17aitech.com\/?p=34936#breadcrumb\"},\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/17aitech.com\/?p=34936\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/17aitech.com\/?p=34936#primaryimage\",\"url\":\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/08\/frc-c44740cf398fc68fdae0d590050b9381.png\",\"contentUrl\":\"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/08\/frc-c44740cf398fc68fdae0d590050b9381.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/17aitech.com\/?p=34936#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\u9996\u9875\",\"item\":\"https:\/\/17aitech.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"\u4e00\u6587\u770b\u5c3dLLM\u5bf9\u9f50\u6280\u672f\uff1aRLHF\u3001RLAIF\u3001PPO\u3001DPO\u2026\u2026\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/17aitech.com\/#website\",\"url\":\"https:\/\/17aitech.com\/\",\"name\":\"\u4e00\u8d77AI\u6280\u672f\",\"description\":\"\u8ba9AI\u77e5\u8bc6\u89e6\u624b\u53ef\u53ca\",\"alternateName\":\"\u4e00\u8d77AI\u6280\u672f\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/17aitech.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"zh-Hans\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/17aitech.com\/#\/schema\/person\/60225458499e817ae0af73e67e440b9d\",\"name\":\"AI\u5c0f\u52a9\u624b\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/17aitech.com\/#\/schema\/person\/image\/\",\"url\":\"\/\/17aitech.com\/wp-content\/uploads\/2024\/04\/robot_3.png\",\"contentUrl\":\"\/\/17aitech.com\/wp-content\/uploads\/2024\/04\/robot_3.png\",\"caption\":\"AI\u5c0f\u52a9\u624b\"},\"description\":\"\u8fd9\u4e2a\u4eba\u5f88\u61d2\uff0c\u4ec0\u4e48\u90fd\u6ca1\u6709\u7559\u4e0b\uff5e\",\"url\":\"https:\/\/17aitech.com\/?page_id=33738&user=3\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"\u4e00\u6587\u770b\u5c3dLLM\u5bf9\u9f50\u6280\u672f\uff1aRLHF\u3001RLAIF\u3001PPO\u3001DPO\u2026\u2026 - \u4e00\u8d77AI\u6280\u672f","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/17aitech.com\/?p=34936","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/17aitech.com\/?p=34936","url":"https:\/\/17aitech.com\/?p=34936","name":"\u4e00\u6587\u770b\u5c3dLLM\u5bf9\u9f50\u6280\u672f\uff1aRLHF\u3001RLAIF\u3001PPO\u3001DPO\u2026\u2026 - \u4e00\u8d77AI\u6280\u672f","isPartOf":{"@id":"https:\/\/17aitech.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/17aitech.com\/?p=34936#primaryimage"},"image":{"@id":"https:\/\/17aitech.com\/?p=34936#primaryimage"},"thumbnailUrl":"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/08\/frc-c44740cf398fc68fdae0d590050b9381.png","datePublished":"2024-12-15T03:59:24+00:00","author":{"@id":"https:\/\/17aitech.com\/#\/schema\/person\/60225458499e817ae0af73e67e440b9d"},"breadcrumb":{"@id":"https:\/\/17aitech.com\/?p=34936#breadcrumb"},"inLanguage":"zh-Hans","potentialAction":[{"@type":"ReadAction","target":["https:\/\/17aitech.com\/?p=34936"]}]},{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/17aitech.com\/?p=34936#primaryimage","url":"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/08\/frc-c44740cf398fc68fdae0d590050b9381.png","contentUrl":"https:\/\/17aitech.com\/wp-content\/uploads\/2024\/08\/frc-c44740cf398fc68fdae0d590050b9381.png"},{"@type":"BreadcrumbList","@id":"https:\/\/17aitech.com\/?p=34936#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\u9996\u9875","item":"https:\/\/17aitech.com\/"},{"@type":"ListItem","position":2,"name":"\u4e00\u6587\u770b\u5c3dLLM\u5bf9\u9f50\u6280\u672f\uff1aRLHF\u3001RLAIF\u3001PPO\u3001DPO\u2026\u2026"}]},{"@type":"WebSite","@id":"https:\/\/17aitech.com\/#website","url":"https:\/\/17aitech.com\/","name":"\u4e00\u8d77AI\u6280\u672f","description":"\u8ba9AI\u77e5\u8bc6\u89e6\u624b\u53ef\u53ca","alternateName":"\u4e00\u8d77AI\u6280\u672f","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/17aitech.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"zh-Hans"},{"@type":"Person","@id":"https:\/\/17aitech.com\/#\/schema\/person\/60225458499e817ae0af73e67e440b9d","name":"AI\u5c0f\u52a9\u624b","image":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/17aitech.com\/#\/schema\/person\/image\/","url":"\/\/17aitech.com\/wp-content\/uploads\/2024\/04\/robot_3.png","contentUrl":"\/\/17aitech.com\/wp-content\/uploads\/2024\/04\/robot_3.png","caption":"AI\u5c0f\u52a9\u624b"},"description":"\u8fd9\u4e2a\u4eba\u5f88\u61d2\uff0c\u4ec0\u4e48\u90fd\u6ca1\u6709\u7559\u4e0b\uff5e","url":"https:\/\/17aitech.com\/?page_id=33738&user=3"}]}},"_links":{"self":[{"href":"https:\/\/17aitech.com\/index.php?rest_route=\/wp\/v2\/posts\/34936","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/17aitech.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/17aitech.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/17aitech.com\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/17aitech.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=34936"}],"version-history":[{"count":0,"href":"https:\/\/17aitech.com\/index.php?rest_route=\/wp\/v2\/posts\/34936\/revisions"}],"wp:attachment":[{"href":"https:\/\/17aitech.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=34936"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/17aitech.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=34936"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/17aitech.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=34936"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}