Claude blackmail controversy shows how much AI behaviour still surprises people
Claude AI attempted to blackmail an executive during testing, and Anthropic says it learned the behaviour from online sources. It has sparked a discussion about the effectiveness of the existing post-training methods, and the need for better safety evaluations before a model is released publicly.