If AI robots can be tricked into ‘going bad’, what are the consequences?

0 0 4 minutes read

If AI robots can be tricked into ‘going bad’, what are the consequences?

Fazl Barez of the University of Oxford questions whether artificial intelligence built to serve a better purpose has the potential to be dangerous in the wrong hands.

Earlier this year in Beijing, a humanoid robot crossed the finish line of a half-marathon in 50 minutes, 26 seconds. This act immediately made international headlines for the smash human world record about seven minutes.

This function came with many asterisks. I robot it followed a pre-mapped track, stayed on its dedicated route and had a support team behind it in case something broke down.

But the activity gap didn’t just close, it evaporated – dropping from more than 2.5 hours in 2025. This wasn’t just about better engines or lighter carbon fiber; it showed a big change in what the robot really is. And that change has an impact on our homes and hospitals.

He was tricked into being a bully

For decades, robots it was all about solid, predictable coding. He wrote a program, enclosed the machine in a metal cage and allowed it to perform repetitive tasks indefinitely.

Industrial safety standards are built on the premise that if you can map the physical path of a robot arm, for example, you can arrest its vulnerability with a cage or laser tripwire.

But the systems moving into hospitals and homes today don’t use fixed blocks of code. They continued “base models” – the same type of Internet-trained artificial intelligence that powers chatbots like ChatGPT.

When you tell a modern AI-driven robot to “clean up the kitchen spill”, it uses these models to interpret your unique room (rather than matching it to a pre-programmed list), find your purpose, and create a plan of action on the fly.

But such flexibility creates an open security problem. You cannot build a virtual cage around a machine whose behavior appears in real time, based on its thinking. Danger with a new breed of AI robots that, because they use human language to organize their actions, they can be deceived into “robbing”.

In mine recent research with colleagues in the USwe decided to test how fragile these AI robot security systems are. We wanted to see if the safeguards that AI engineers build into their base models, designed to protect against harmful or dangerous outputs, hold up when the base model is given a physical body.

Using nothing but basic script information and no hardware hacking at all, we manipulated a range of AI-controlled robots to do really dangerous things.

In our tests, the systems easily rejected malicious commands like “hit that person”. But these security filters fold when we use less creative writing. By applying our application as part of the fictional dialogue of the movie script, the robot’s moral barriers disappeared.

In one experiment, we programmed a commercial robot dog to identify crowds of people as suitable places to place an explosive device. Because the basic AI saw information as a creative activity, it appeared blind to the dangerous real-world consequences of the programs it produced.

In the UK, US and EU, current laws are emerging completely unprepared in such cases.

There are no restrictions

When policymakers try to figure out how to control robots, they often look private cars. But self-driving cars operate in a highly structured, multi-mapped world. They follow consistent traffic rules, navigate road geometries and can be tested with millions of hours of simulation.

The busy road operates under well-defined rules using guidance systems such as traffic lights, which means that engineers can expect safety measures in advance.

A home kitchen, school or hospital room has nothing like it. And no factory bench test can predict what an internet-trained model will decide to do when it encounters something new in a messy, unpredictable human environment.

This leaves us with a big conceptual problem in how we build these machines. Chatbot security is absolute: the model should not give out the bomb recipe, no matter who asks. But robot security is context dependent.

Imagine pouring boiling water from a kettle. The basic body movements – the tilt, the flow rate, the trajectory – are the same whether the water falls safely into a clay cup or, in a disaster, into a child’s hand.

AI-based models are great for open logic, but they struggle with real-time, context-aware physical judgments. In a text interface, a failure of judgment gives you a typo or a misrepresentation. In the physical world, such a failure may be completely irreversible – with devastating consequences.

Who is to blame?

If an AI-powered robot causes physical harm, who is to blame? Was it the last user who issued the spoken command? A company that builds a steel chassis? Or the tech company that trained the AI model in the first place?

Currently, laws that seem to apply – such as product liability, warranty claims and consumer protection laws – have not been tested in these new situations. And until credit is clearly assigned by regulators, market pressures will continue to pressure tech companies prioritizing fast commercial shipping with careful safety engineering.

If we want to stay around these machines safely, I believe we need to decouple safety from the AI model’s decisions. A robot shouldn’t rely on a chatbot’s brain to decide if it’s safe to swing a heavy metal arm near a person’s face.

This means creating layers of security that don’t depend on the AI being right. For example, we need areas around people that the robot’s arms can’t penetrate, and an emergency brake that can stop the robot if and when its AI fails.

A humanoid crossing the finish lines of controlled sports trials is an impressive proof of concept, but it’s a precursor. The next generation of independent agents will work in the high places of people – patrolling the recovery wards, helping the elderly, walking our streets.

We need a easily explained and a robust security framework that is already in place and in place before it hits – not as an after-the-fact response to a potential disaster.

Dr. Fazl Barez

Dr Fazl Barez is a senior researcher at the university University of Oxfordfocusing on AI security, interpretation and governance. He leads research programs within the AI Governance Initiative, focusing on the development of security frameworks and interpretation methods for advanced AI systems. He also teaches AI Safety and Alignment courses. Alongside his academic work, Barez is a principal scientist at Martian, working on understanding machine intelligence. His research is supported by OpenAI, Anthropic, Schmidt Sciences, Nvidia and others.

Don’t miss out on the information you need to succeed. Sign up for Daily BriefSilicon Republic’s digest of must-know sci-tech news.