AI news articles and opinions
AI for Business News article
Published On: February 27th, 20254 min read

How to test the latest AI models…

Every few days, a new AI model seems to drop from one of the big AI companies. This week it was Anthropic’s turn with Claude 3.7 Sonnet. Social media then explodes with people saying it’s either the best thing in the history of time or massively overhyped. But as a normal person, how do you work out whether a new AI model is actually better than what you were using before?

Thankfully for most of us, you don’t need a brain the size of a planet or a PhD in machine learning to test AI. It will be put through the ringer with Humanity’s Last Exam for you! But as an individual to find out for yourself, you just need a bit of curiosity, a decent amount of time and a structured approach.

Here’s how to test if an AI model is worth the hype…

Test it on what you actually do

If you’re a lawyer, get it to summarise a contract. If you’re in training, ask it to create a lesson plan. If you’re in marketing, see if it can write ad copy that sounds even remotely human.

Have a list of your day-to-day tasks and use cases in your back pocket and use the same ones each time.

Most new AI models perform well in general tasks, but can they do your specific tasks well?

Quick test: Ask it to draft an email you’d actually send. If you still need to rewrite most of it, the model probably isn’t saving you time.

Push it with edge cases

Find out where the AI model’s boundaries are. What is its ‘jagged frontier’ (with thanks to Ethan Mollick for the phrase). When does it start going haywire?

  • Give it an ambiguous request (‘Write a summary of this article’ but don’t give it the article). Does it ask for clarification or just write a nonsense summary of something else?
  • Ask it to handle nuance (‘Explain AI regulation to a 10-year-old and then to a CEO’). Can it adjust its tone and depth?

If it falls apart on these, it’s probably not as advanced as it claims.

Test its knowledge

New models boast about their knowledge cut-off dates, but that doesn’t mean they understand the information they have been trained on correctly.

  • Ask it about recent events (e.g. ‘What happened in UK politics last week?’ – actually don’t ask that, it’s too depressing).
  • Get it to summarise a niche topic you know well. For me it’s instructions on how to make a Rhoorkhee chair. As niche as you can get.

AI is confident even when it’s wrong, so if it’s misrepresenting things you know, it’s probably unreliable elsewhere too.

Measure how much effort and time it actually saves you

The best AI tools don’t just generate text; they make your work and life easier.

  • Ask it to help you plan your next holiday, including flights, hotel comparisons, and an itinerary.
  • Get it to rewrite a bad piece of writing into something clear and professional.

If you’re spending as much time fixing the AI’s output as you would have spent writing or researching from scratch, it’s not a game-changer.

Compare models side by side

The easiest way to see if a new AI model is better? Run the exact same prompt across different models.

I have five different AI models open at any one time on a dedicated screen. Probably overkill, but it really clearly highlights the differences and capabilities of each.

  • For example, I would try ChatGPT (GPT-4), Claude, Gemini, CoPilot and Le Chat with the same request.
  • Look at response quality, depth, accuracy, and how much editing you need to do.

This gives you an instant reality check on whether the new model is actually an upgrade or just marketing rubbish.

Use a software tool to support you

If you have the budget, there are a breed of software tool such as arthur.ai that can help you evaluate model performance, but this doesn’t necessarily cover the things you need to test for your role.

The best AI is the one that helps you do what you need to do

But for a moment, forget benchmarks and marketing claims. The best way to test AI is to see how well it fits into your life and work. If it works for you in your role, that’s a win – stick with it… until the next one.

Final thought – don’t forget good AI governance. Stay within the boundaries of your AI policy when testing new models.


We are iwantmore.ai – an AI consulting firm who specialise in delivering AI strategy and AI training courses to small and medium-sized businesses. Contact us for a free no obligation conversation about how we can help your business.

Interested in more content like this? Sign up to our Newsletter here.

Share our article

Other AI articles you may be interested in:

  • AI article heading panel

    Let's be honest: we bought Microsoft 365 Copilot for all the wrong reasons. Like countless other businesses, we saw the AI wave building and felt the familiar tech industry FOMO creeping in. As an AI-first consultancy, we couldn't exactly sit on the sidelines while the latest Microsoft AI tool rolled out. Our clients would expect us to know it inside and out, and frankly, our credibility depended on staying ahead of the curve.

  • AI article heading panel

    Artificial intelligence is hungry for energy. Behind every chatbot, Copilot, or agent are servers burning a lot of power. As more UK companies adopt AI tools, it's worth paying attention to how much energy those tools use and what that means for your sustainability targets and reporting requirements.

  • AI article heading panel

    AI is no longer a future trend — it’s a present-day advantage. While some businesses are building AI agents and transforming operations, others risk falling behind. This article looks at what’s driving the AI divide, why foundational knowledge matters, and how even small steps can unlock real competitive value. If you’re standing still, you’re already losing ground.

  • AI article heading panel

    It was only a year or two ago that the role of prompt engineer was being touted as the next great career path. Thousands of job ads appeared. People the world over updated their LinkedIn profiles to highlight their prompt engineering skills. Fast forward a couple of years and the role is already on its way out.

  • AI article heading panel

    The CTO is swamped with the day to day, the COO wants to see progress, the CFO wants to see hard ROI, and the CEO is reading headlines about competitors appointing ‘Chief  AI Officers’ and is wondering why you don’t have one too. If this sounds familiar in your business, you might be wondering whether it’s time to create (or hire) a dedicated AI leader.

  • AI article heading panel

    AI agents are the fancy buzzword of the day. Loads of hype, but also loads of potential. Despite the hype, over promise and noise, we are still witnessing the birth of a new business model: human-agent teams. Human employees that are building, managing and working alongside AI agents. Meet the agent boss.

Looking for AI consulting for your business?

AI consulting services UK

We specialse in delivering specialise in delivering AI strategy and AI training courses to small and medium-sized organisations.

Sign up to our newsletter

AI training course UK - Tailored course

Stay ahead of the curve with cutting-edge AI insights delivered straight to your inbox.

iwantmore.ai – The AI consulting firm that helps you build a smarter business

Wherever you are with your AI implementation initiatives, we have a range of stand-alone AI quick start services to help you fast track the transformative benefits of AI across your business.