[LG] Teasing Apart Architecture and Initial Weights as Sources of Inductive Bias in Neural Networks将架构和初始权重作为神经网络的归纳偏置来源分开探讨初始权重优化能缩小网络差距,但结构对新任务的适应性仍关键。
[CL] Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?[CL] 不仅蒸馏数据,还要蒸馏奖励:小型语言模型能否超越大型模型?小模型学习大模型的判断力,在数学等任务上实现超越。