内容目录
文章来源于互联网:震惊!Claude伪对齐率竟能高达78%,Anthropic 137页长论文自揭短
这下,大模型不能太过信任有「实锤」了。
-
论文标题:Alignment Faking in Large Language Models
-
论文地址:https://assets.anthropic.com/m/983c85a201a962f/original/Alignment-Faking-in-Large-Language-Models-full-paper.pdf
-
视频讲解地址:https://www.youtube.com/watch?v=9eXV64O2Xp8