Research

Mechanistic Analysis of Alignment Algorithms in Language Models

Researchers conducted a systematic analysis of six preference-optimization methods (PPO, DPO, SimPO, ORPO, GRPO, and KTO) to understand how they reshape language models' internal computations. The study found that different alignment objectives induce qualitatively distinct representational changes, with some methods enhancing feature separability while others degrade it, revealing that behavioral alignment doesn't guarantee uniform internal restructuring.

Read full story at cs.LG updates on arXiv.org →V: · A: · D:

Research

Reinforcement Learning Towards Broadly and Persistently Beneficial Models

Researchers have published findings suggesting that reinforcement learning on carefully constructed datasets of benefici...

Research

Commemorating 70 Years of Artificial Intelligence

IEEE Spectrum marks seventy years since the Dartmouth workshop formally named artificial intelligence as a field, offeri...

Research

Diffusion Language Models: An Experimental Analysis

Researchers present a systematic evaluation of eight diffusion language models across eight benchmarks covering reasonin...