
Safety
SafeGene: Reusable Adapters for Transferable Safety Alignment
Researchers have developed SafeGene, a method to maintain AI safety across different applications without requiring model-specific safety training. The approach addresses a critical problem where fine-tuning AI models for specific tasks can weaken their safety guardrails.
Read full story at cs.AI updates on arXiv.org →V: · A: · D:
Related
Safety
Predicting model behavior before release by simulating deployment
OpenAI has introduced a method called Deployment Simulation that uses real conversation data to anticipate how a model w...
Safety
Critical Copilot vulnerability allowed hackers to steal 2FA code from users
A now-patched vulnerability in Microsoft Copilot, dubbed SearchLeak, allowed attackers to exfiltrate two-factor authenti...
Safety
KPMG pulls report on AI usage due to apparent hallucinations
KPMG has withdrawn a research report about AI usage after discovering apparent hallucinations in the AI-generated conten...