AI news articles and opinions
AI for Business News article
Published On: February 27th, 20254 min read

How to test the latest AI models…

Every few days, a new AI model seems to drop from one of the big AI companies. This week it was Anthropic’s turn with Claude 3.7 Sonnet. Social media then explodes with people saying it’s either the best thing in the history of time or massively overhyped. But as a normal person, how do you work out whether a new AI model is actually better than what you were using before?

Thankfully for most of us, you don’t need a brain the size of a planet or a PhD in machine learning to test AI. It will be put through the ringer with Humanity’s Last Exam for you! But as an individual to find out for yourself, you just need a bit of curiosity, a decent amount of time and a structured approach.

Here’s how to test if an AI model is worth the hype…

Test it on what you actually do

If you’re a lawyer, get it to summarise a contract. If you’re in training, ask it to create a lesson plan. If you’re in marketing, see if it can write ad copy that sounds even remotely human.

Have a list of your day-to-day tasks and use cases in your back pocket and use the same ones each time.

Most new AI models perform well in general tasks, but can they do your specific tasks well?

Quick test: Ask it to draft an email you’d actually send. If you still need to rewrite most of it, the model probably isn’t saving you time.

Push it with edge cases

Find out where the AI model’s boundaries are. What is its ‘jagged frontier’ (with thanks to Ethan Mollick for the phrase). When does it start going haywire?

  • Give it an ambiguous request (‘Write a summary of this article’ but don’t give it the article). Does it ask for clarification or just write a nonsense summary of something else?
  • Ask it to handle nuance (‘Explain AI regulation to a 10-year-old and then to a CEO’). Can it adjust its tone and depth?

If it falls apart on these, it’s probably not as advanced as it claims.

Test its knowledge

New models boast about their knowledge cut-off dates, but that doesn’t mean they understand the information they have been trained on correctly.

  • Ask it about recent events (e.g. ‘What happened in UK politics last week?’ – actually don’t ask that, it’s too depressing).
  • Get it to summarise a niche topic you know well. For me it’s instructions on how to make a Rhoorkhee chair. As niche as you can get.

AI is confident even when it’s wrong, so if it’s misrepresenting things you know, it’s probably unreliable elsewhere too.

Measure how much effort and time it actually saves you

The best AI tools don’t just generate text; they make your work and life easier.

  • Ask it to help you plan your next holiday, including flights, hotel comparisons, and an itinerary.
  • Get it to rewrite a bad piece of writing into something clear and professional.

If you’re spending as much time fixing the AI’s output as you would have spent writing or researching from scratch, it’s not a game-changer.

Compare models side by side

The easiest way to see if a new AI model is better? Run the exact same prompt across different models.

I have five different AI models open at any one time on a dedicated screen. Probably overkill, but it really clearly highlights the differences and capabilities of each.

  • For example, I would try ChatGPT (GPT-4), Claude, Gemini, CoPilot and Le Chat with the same request.
  • Look at response quality, depth, accuracy, and how much editing you need to do.

This gives you an instant reality check on whether the new model is actually an upgrade or just marketing rubbish.

Use a software tool to support you

If you have the budget, there are a breed of software tool such as arthur.ai that can help you evaluate model performance, but this doesn’t necessarily cover the things you need to test for your role.

The best AI is the one that helps you do what you need to do

But for a moment, forget benchmarks and marketing claims. The best way to test AI is to see how well it fits into your life and work. If it works for you in your role, that’s a win – stick with it… until the next one.

Final thought – don’t forget good AI governance. Stay within the boundaries of your AI policy when testing new models.


We are iwantmore.ai – an AI consulting firm who specialise in delivering AI strategy and AI training courses to small and medium-sized businesses. Contact us for a free no obligation conversation about how we can help your business.

Interested in more content like this? Sign up to our Newsletter here.

Share our article

Other AI articles you may be interested in:

  • AI article heading panel

    We recently met a chief executive who despite his team asking for AI tools to help them work smarter, had decided that AI was not for him or his business. He even went as far as calling it ‘artificial unintelligence’. Despite this position, he admitted that competitors were taking market share. They were using AI to do a lot more with less. These were AI forward businesses. But what exactly is an AI forward business?

  • AI article heading panel

    As AI tools become more widespread, some businesses are putting their money where their strategy is and offering financial rewards to employees who use AI in their work. It’s a bold move. But is it effective or could it backfire? And what message does it send to employees? We’ve taken a look at the pros and cons, including the very real human concerns around job security and why rewarding AI use can be a good move when done correctly.

  • AI article heading panel

    Vibe marketing is the next big thing in digital marketing. But what is Vibe Marketing? Vibe marketing is a new approach that uses generative AI to translate a brand feeling or essence into marketing assets (images, copy, ads, full campaigns, and even product ideas). Coined from the concept of vibe coding, introduced by AI legend Andrej Karpathy, vibe marketing is all about marketers describing their brand’s feel, audience, and intention in plain English.

  • AI article heading panel

    AI continues to move forwards at a ridiculous pace. New models are popping up weekly. Things you couldn’t do a week ago now seem second nature. But behind the headlines about innovation, is good old regulation. Whether it’s GDPR or new legal frameworks emerging to tackle foundation models, in the UK, AI won’t remain unregulated for long.

  • AI article heading panel

    In the world of artificial intelligence, there's a slightly corny phrase that gets repeated a lot: ‘data is the new oil.’ I don’t like it, but it’s catchy. It makes a point. But like all catchy phrases, it oversimplifies reality. Yes, data fuels AI, but does that mean you need to stop everything until your data is pristine? Nope.

  • AI article heading panel

    At iwantmore.ai, we help businesses navigate the world of AI. One of the persisting challenges we see is the perception by some people that using AI is 'cheating.' Despite AI’s proven advantages, some professionals still feel uneasy about embracing it leading to a culture where employees secretly use AI tools to avoid judgment.

Looking for AI consulting for your business?

AI consulting services UK

We specialse in delivering specialise in delivering AI strategy and AI training courses to small and medium-sized organisations.

Sign up to our newsletter

AI training course UK - Tailored course

Stay ahead of the curve with cutting-edge AI insights delivered straight to your inbox.

iwantmore.ai – The AI consulting firm that helps you build a smarter business

Wherever you are with your AI implementation initiatives, we have a range of stand-alone AI quick start services to help you fast track the transformative benefits of AI across your business.