AI safety went from a niche academic concern to front-page news in about two years. Now every major AI company has a safety team, governments are creating AI safety institutes, and the debate about existential risk has gone mainstream. Here’s what’s actually happening behind the headlines.
What AI Safety Means in 2026
AI safety covers a broad spectrum of concerns, from immediate practical risks to long-term existential scenarios:
Near-term safety. Making current AI systems reliable, fair, and secure. This includes preventing harmful outputs, reducing bias, ensuring solidness, and protecting against adversarial attacks. These are engineering problems with engineering solutions, and real progress is being made.
Alignment. Ensuring that AI systems do what we actually want them to do, not just what we literally told them to do. This is harder than it sounds — specifying human values precisely enough for a machine to follow is a fundamental challenge. Current approaches include RLHF (reinforcement learning from human feedback), constitutional AI, and various forms of oversight and monitoring.
Existential risk. The concern that sufficiently advanced AI could pose risks to human civilization. This ranges from plausible scenarios (AI systems pursuing goals that conflict with human interests) to speculative ones (superintelligent AI that humans can’t control). The debate about how seriously to take these risks is ongoing and heated.
The Safety Institutes
Multiple countries have established AI safety institutes:
UK AI Safety Institute (AISI). The first national AI safety institute, established after the Bletchley Summit in November 2023. AISI conducts safety evaluations of frontier AI models, develops testing methodologies, and advises the government on AI safety policy. It’s been testing models from OpenAI, Anthropic, Google, and Meta.
US AI Safety Institute (NIST). Housed within the National Institute of Standards and Technology, the US AI Safety Institute focuses on developing standards and benchmarks for AI safety. It’s working on evaluation frameworks for frontier models and guidelines for responsible AI development.
Other countries. Japan, Canada, France, and others have established or are establishing their own AI safety bodies. The challenge is coordination — ensuring that safety standards are consistent across jurisdictions.
What the Companies Are Doing
OpenAI. Has a dedicated safety team and publishes safety reports for major model releases. The company’s “preparedness framework” categorizes risks and sets thresholds for when models are too dangerous to deploy. Critics argue that commercial pressure sometimes overrides safety concerns.
Anthropic. Founded explicitly as a safety-focused AI company. Anthropic’s “responsible scaling policy” ties model deployment to safety evaluations. The company has been more cautious about releasing capabilities than competitors, though it’s also racing to build more powerful models.
Google DeepMind. Has a large safety research team and publishes extensively on alignment and safety. DeepMind’s approach emphasizes technical research on alignment, interpretability, and solidness.
Meta. Takes a different approach by open-sourcing its models. Meta argues that open-source AI is safer because it allows the broader community to identify and fix safety issues. Critics argue that open-sourcing powerful models makes them available to bad actors.
The Key Debates
Open vs. closed. Should powerful AI models be open-sourced? Open-source advocates argue that transparency improves safety. Closed-source advocates argue that restricting access to powerful models prevents misuse. Both sides have valid points, and the debate is far from resolved.
Regulation vs. self-governance. Should governments regulate AI safety, or should the industry self-regulate? The track record of industry self-regulation in other sectors (social media, financial services) isn’t encouraging. But government regulation risks being too slow, too broad, or technically uninformed.
Speed vs. caution. The competitive pressure to release new models quickly conflicts with the need for thorough safety testing. Companies that take longer to test their models risk falling behind competitors. This “race to the bottom” dynamic is one of the biggest challenges in AI safety.
Near-term vs. long-term. Should safety efforts focus on current, concrete risks (bias, misinformation, job displacement) or future, speculative risks (superintelligence, loss of control)? Resources are limited, and prioritization matters. Most practitioners argue for focusing on near-term risks while monitoring long-term ones.
What’s Actually Working
Red teaming. Having humans (and AI systems) try to break AI models before they’re released. Red teaming has become standard practice and has identified numerous safety issues before they reached users.
RLHF and constitutional AI. Training AI systems to be helpful, harmless, and honest using human feedback. These techniques have significantly improved the safety of deployed models, though they’re not perfect.
Monitoring and incident response. Companies are getting better at monitoring deployed AI systems for safety issues and responding quickly when problems are identified. This operational safety capability is as important as pre-deployment testing.
Safety benchmarks. Standardized tests for evaluating AI safety are improving. Benchmarks for bias, toxicity, and dangerous capabilities help compare models and track progress over time.
My Take
AI safety is making real progress on near-term issues. Current AI systems are significantly safer than they were two years ago, thanks to better training techniques, more thorough testing, and improved monitoring.
The long-term safety challenges are harder and less well-understood. We don’t have reliable methods for ensuring that future, more powerful AI systems will remain aligned with human values. This is a genuine concern that deserves serious research and attention.
The biggest risk isn’t that we’ll ignore safety — it’s that competitive pressure will cause companies to cut corners. The race to build more powerful AI is intense, and safety testing takes time and money. Maintaining safety standards in the face of commercial pressure is the central challenge of AI governance.
🕒 Last updated: · Originally published: March 13, 2026