Research

Can AI Agents Synthesize Scientific Conclusions?

Researchers created SciConBench, a benchmark testing AI agents' ability to synthesize scientific conclusions from multiple sources, finding that even the best systems achieve only 33.7% factual accuracy. The study used clean-room evaluation to prevent data leakage and found that consumer-facing AI tools frequently generate incomplete or contradictory scientific summaries. The results highlight significant gaps in AI's ability to reliably synthesize complex scientific information for high-stakes decisions.

Read full story at cs.AI updates on arXiv.org →V: · A: · D:

Research

Reinforcement Learning Towards Broadly and Persistently Beneficial Models

Researchers have published findings suggesting that reinforcement learning on carefully constructed datasets of benefici...

Research

Commemorating 70 Years of Artificial Intelligence

IEEE Spectrum marks seventy years since the Dartmouth workshop formally named artificial intelligence as a field, offeri...

Research

Diffusion Language Models: An Experimental Analysis

Researchers present a systematic evaluation of eight diffusion language models across eight benchmarks covering reasonin...