Apple
Research

WorkBench Revisited: Workplace Agents Two Years On

AI workplace agents have improved dramatically since 2024, with the best model now completing 89% of tasks compared to 43% two years ago, while harmful actions dropped from 26% to 2.5%. The research shows capability and safety improvements go hand in hand rather than trading off against each other. However, frontier models still make basic mistakes that can cause irreversible harm, such as sending emails to wrong recipients.

Read full story at Import AIV: · A: · D:
Related
Research
Nothing from Something: Can a Language Model Discover 0?
This arxiv paper uses the concept of zero as a test case for whether language models can engage in genuine mathematical ...
Research
Relational Structural Causal Models
Researchers have extended Pearl's structural causal models to settings where objects and their relations vary, addressin...
Research
A Definition of Good Explanations and the Challenges Explaining LLM Outputs
This arxiv paper proposes a formal definition of what constitutes a good explanation, drawing on counterfactual reasonin...
WorkBench Revisited: Workplace Agents Two Years On — Techlomerate