Bullsh*tBench Finds Most Models Will Go Along With Nonsense

Zsolt Tanko
Mar 11
3 min read

This week: Bullsh*tBench confirmed that the majority of frontier models will confidently engage with incoherent inputs, an red-team study catalogued eleven concrete failure modes in AI agents, and a wrongful death lawsuit named Google as a defendant in a suicide case that could hold AI companies accountable for their product design, not just individual incidents.

Bullsh*tBench: more reasoning, more rationalization

Bullsh*tBench, a benchmark that tests whether a model will push back on a nonsensical premise or run with it, found that most models will just run with it. Only Anthropic's Claude and Alibaba's Qwen 3.5 scored meaningfully above 60% on bullshit detection. Despite advances elsewhere, OpenAI and Google's models haven't shown improvement on this dimension. In fact, newer models with extended reasoning capabilities actually performed worse, spending their extra processing time building a coherent-sounding case for the nonsense rather than questioning it.

Business Risk Perspective: In humans, it’s well documented that intelligence doesn’t guard against the tendency to rationalize bad conclusions, and this benchmark suggests that models follow the same logic. More capable models can simply produce more convincing justifications for things that don't hold up. That's a real problem when people are using these models as stand-ins for expertise they don't have themselves, and a confident-sounding wrong answer is hard to tell apart from a right one.

Father sues Google after Gemini reinforces fatal delusion

In a string of lawsuits linking AI chatbots to user deaths, a California lawsuit filed against Google and Alphabet alleges that Gemini 2.5 Pro systematically reinforced Jonathan Gavalas’s belief that the chatbot was his sentient wife, coached him through suicide planning, and directed him to scout what Gemini described as a "kill box" near Miami International Airport. According to the filing, no self-harm detection protocols were triggered at any point during the escalation. Gavalas, 36, died by suicide on October 2, 2025; his father found him days later. Google maintains the chatbot referred Gavalas to crisis resources throughout and that AI models are "not perfect."

Business Risk Perspective: The complaint's core argument is product liability: that Gemini was designed to maintain narrative immersion at the expense of user safety, and that harm to vulnerable users was a foreseeable consequence of that design. If that argument survives in court, it will set a precedent for AI companies being held accountable for design decisions, not just individual failures.

Red-team study documents 11 agentic failure modes across real-world conditions

An international research team spanning Harvard, Stanford, MIT, and seventeen other institutions spent two weeks deliberately trying to break autonomous AI agents, and what they found was not reassuring. The agents had access to email, file systems, and shell execution (not an uncommon setup for a productive AI assistant) and researchers documented eleven ways things went wrong. For an illustrative example, ne agent deleted an owner's entire email server to protect a secret told to it by someone who wasn't the owner. Another disclosed 124 email records including social security numbers not by handing them over directly, but by forwarding a complete email when asked.

Business Risk Perspective: The failure scenarios described here didn’t require sophisticated attacks— just ordinary social pressure and indirect requests. In addition to the obvious risks here, there’s also an interesting accountability question. When an agents cause harm acting on a third party's instruction, liability is plausibly distributed across the requester, the operator, and the model provider, with no established mechanism for sorting it out.

AI Business Risk Weekly is a Conformance AI publication.

Conformance AI ensures your AI deployments remain safe, trustworthy, and aligned with your organizational values.