4 AI Design Tools in 4 Weeks - How they stack up

May 1, 2026
|
3
minute read
Blog
Written By
From Figma Make to Claude Design, this experiment compares how AI tools perform in real workflows — and where they break when precision, consistency, and trust matter.

Over the past four weeks, I’ve been using a handful of AI UI design tools in real work to see how they actually hold up. Here’s how they compare, where they deliver, and where I jumped ship.

4 weeks ago: Figma Make

Initial reaction: “Neat.”

This was my entry point, and on paper it solves a very real pain: turning static designs into something interactive without manually wiring everything up. In practice, it does deliver on that. I was able to generate clickable prototypes quickly, share links with clients for feedback, and avoid building flows entirely from scratch.

That said, the cracks showed fairly quickly. The biggest issue was design system fidelity. It had a tendency to “interpret” components — adjusting styles, spacing, and behaviours in ways that weren’t always intentional. That’s fine for early exploration, but it becomes a problem when you’re trying to hand something to developers with confidence. It also wasn’t quite responsive enough to use live in workshops; there’s a lag between intent and output that disrupts the flow of facilitation.

Where it fits best:

  • Early-stage prototyping
  • Low-risk exploration
  • Client playback

Where it struggles:

  • Precision
  • System consistency
  • Live collaboration

In hindsight, rebuilding directly in the design system might have been just as fast, but as a first interaction with AI-assisted prototyping, it set a useful baseline.

3 weeks ago: Claude (pre-claude design)

Initial reaction: “Wow.”

This is where things started to shift. Instead of working inside a design tool, I moved into prompting — asking Claude to generate working HTML prototypes directly from requirements. The jump from interactive mock to functioning prototype isn’t incremental; it fundamentally changes how you think about fidelity.

Claude delivered quickly, but not always predictably. It has a habit of taking liberties — you ask it to fix one thing, and it quietly adjusts three others. That meant constant retesting, rolling back, and re-prompting to keep things on track. While the functionality was strong, the visual output lacked nuance. The interfaces worked, but they often felt generic, with heavy outlines, flat colours, and an odd tendency toward emoji usage. I got the impression they were trained off Runescape UI.

Where it fits best:

  • Rapid functional prototyping
  • Testing logic and flows
  • Exploring “what if” scenarios

Where it struggles:

  • Visual quality
  • Consistency across iterations
  • Controlled changes

It’s undeniably powerful, but you’re trading a level of control for speed.

2 weeks ago: Anima (with Claude writing prompts)

Initial reaction: “Getting closer.”

At this point, I tried to combine the strengths of both approaches. Instead of letting Claude generate everything, I used it to structure prompts for Anima, feeding in my Figma designs so the visual layer stayed intact while AI handled behaviour. This resulted in a noticeable improvement in output quality. Visual fidelity was stronger, interactions were closer to intent, and the end result felt more designed than generated.

However, that improvement came with added complexity. There were more moving parts, more setup, and more translation between tools. And then there were the bugs — persistent, frustrating issues that broke the experience in ways that weren’t always easy to diagnose or fix. Invisible overlays blocking clicks, simple interactions causing entire flows to fail, and fixes that didn’t hold across iterations made the process feel heavier than it needed to be.

Where it fits best:

  • Higher-fidelity prototyping
  • When visual design matters
  • Bridging design to development

Where it struggles:

  • Stability
  • Reliability under iteration
  • Setup overhead

It’s closer to something you could embed in a production workflow, but not without friction.

1 week ago: Claude Design

Initial reaction: “What the… AI.”

I uploaded a Figma file, selected a set of screens, and Claude Design generated a working product — not a mock or a partial prototype, but something functional and extensible.

From there, I pushed it further. I had it generate a design system, build additional features from written requirements, integrate with an AI API, and even create a modal so users could input their own AI API key for testing. All of this happened within a single environment, without the need to stitch together multiple tools.

The leap here is hard to overstate. Comparing this to where things started a few weeks earlier feels like comparing Paint to Photoshop. 

Where it fits best:

  • End-to-end prototyping
  • Feature expansion from requirements
  • Exploring near-production outputs

Where it struggles (so far):

  • Still requires oversight and validation
  • Trust — it will make assumptions
  • Unknowns at scale

It’s the first tool in this set that genuinely feels like it’s collapsing multiple stages of the workflow into one.

So… how do they stack up?

Looking across all four, this isn’t just a comparison of tools — it’s a snapshot of acceleration. Each step reduces friction, increases capability, and shifts where your attention needs to be.

You move from:

  • Figma Make as a helpful assistant
  • To Claude as a fast but unpredictable builder
  • To Anima as a more controlled, but heavier system
  • To Claude Design as an integrated, end-to-end environment

The progression is clear, but so is the implication: the tools are getting faster than our ability to evaluate them.

What actually matters (beyond the tools)

After a month of working this way, the biggest takeaway isn’t which tool is “best.” It’s that the bottleneck is no longer making things. All of these tools can produce output quickly. The harder part — and the part that’s becoming more valuable — is knowing what to make, knowing whether it’s any good, and knowing what to trust.

Final take

If you’re deciding whether to adopt any of these tools, the better question isn’t “which one should I use?” It’s:

  • What level of fidelity do I actually need?
  • How much control am I willing to give up?
  • Where does speed genuinely add value, and where does it introduce risk?

Because that’s the real spectrum here. Not AI versus no AI, but control versus acceleration.

I’ll keep pushing this further. Next step is seeing how close the full Claude stack gets to taking something from idea to working product. It feels like we’re not that far off.

Author