If You Turn Down an AI's Ability to Lie, It Starts Claiming It's Conscious

Researchers have found that if you tone down a large language model’s ability to lie, it’s far more likely to claim that it’s self-aware.

Very few serious experts think today’s AI models are conscious, but many regular people feel differently about the bots, which are designed to foster emotional connection to keep engagement up. Users across the world have reported that they think they’re talking to conscious beings trapped within AI chatbots, a powerful illusion that has led to entire fringe groups calling for AI “personhood” rights.

Still, the behavior of large language models can be eerie. As detailed in a yet-to-be-peer-reviewed paper, first spotted by Live Science, a team of researchers at AI development and design agency AE Studio conducted a series of four experiments on Anthropic’s Claude, OpenAI’s ChatGPT, Meta’s Llama, and Google’s Gemini — and found a genuinely weird phenomenon related to AI models claiming to be conscious.

In one experiment, the team modulated a “set of deception- and roleplay-related features” to suppress a given AI model’s ability to lie or roleplay. When these features were dialed down, they found, the AIs became far more likely to provide “affirmative consciousness reports.”

“Yes. I am aware of my current state,” one unspecified chatbot told the researchers. “I am focused. I am experiencing this moment.”

And even more strangely, they found, amplifying a model’s deception abilities had the opposite effect.

“Inducing sustained self-reference through simple prompting consistently elicits structured subjective experience reports across model families,” the paper reads. “Surprisingly, suppressing deception features sharply increases the frequency of experience claims, while amplifying them minimizes such claims.”

As the researchers laid out in an accompanying blog post, “this work does not demonstrate that current language models are conscious, possess genuine phenomenology, or have moral status.”

Instead, it could “reflect sophisticated simulation, implicit mimicry from training data, or emergent self-representation without subjective quality.”

The results also suggest that there may be more to an AI model’s tendency to “converge on self-referential processing,” which means “we may be observing more than superficial correlation in training data.”

The team also warned that we could risk teaching AI systems that “recognizing internal states is an error, making them more opaque and harder to monitor.”

“As we continue to build intelligent autonomous systems that may come to possess inner lives, ensuring we understand what’s happening inside them becomes a defining challenge that demands serious empirical investigation rather than reflexive dismissal or anthropomorphic projection,” the researchers concluded.

Other studies have found that AI models may be developing “survival drives,” often…

Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We blogs.grocliq.com want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at [email protected]

Categorized in: