Blog

Building a Frontend with AI Agents and No Design System — What Actually Worked

How we built a frontend development workflow using AI Agents in a team with no frontend experience and no formal design system. Covers the 4-source multimodal context strategy (Figma MCP, publisher HTML/CSS, screenshots, specs) and a Jira-driven QA automation pipeline with Playwright.

AI AgentNext.jsFigma MCPPlaywrightQA AutomationLLMFrontend

Note: This post is based on real project experience from November 2025 to May 2026, with some details conceptually reconstructed to exclude internal company information.

How This Started

In November 2025, we kicked off an ad center integration project with AIDC (Ali International Digital Center). I was a backend engineer, and I suddenly found myself owning the frontend architecture and the AI development framework.

The team situation was straightforward: almost no one had frontend experience. Most team members had never touched Next.js or React. And the deadline was set.

"Can't we just use AI?" someone said. Yes. The question was how.


The Prerequisite for AI Agent Development — A Design System

To build a frontend with AI Agents, you need one thing first: a clear reference point.

If you tell an AI "build me a login form," it will. But does it match your service's style? What color is the button? What's the font size? What's the spacing?

The thing that provides all these answers is a design system — component tokens, color palettes, typography rules. When that's defined, an AI can generate consistent UI against it.

We didn't have one.

Our Situation

This was a seller-facing admin product. Not a B2C service with an established design system. Our UX designer was working directly in Figma, defining screens as the project moved forward. There was no official component library or style guide for the AI to reference.

At first, we tried the obvious approach. We gave the AI Figma — and it didn't work.

// What we tried
Prompt: "Implement this Figma screen in Next.js"
Result: Different component styles every time, inconsistent spacing, mismatched colors

Without a reference, AI generates arbitrarily. Every output is different. Consistency disappears.


The Solution — Publisher HTML/CSS as an Informal Design Token

We changed the approach.

If we didn't have a design system, we did have something else: HTML/CSS markup output produced by publishers working from the designer's Figma.

Publishers had taken the designer's Figma files and produced actual, agreed-upon HTML/CSS. This was the closest artifact to the final screen, already aligned with the design.

We decided to use this as an informal Design Token.

Tell the AI "port this HTML/CSS to a Next.js component," and suddenly there's a reference point. The AI has something concrete to follow.


The 4-Source Multimodal Context

Just providing HTML/CSS still wasn't enough. We combined four sources to help the AI understand context fully.

📐 Figma MCPMetadata(Layout · Intent) 💻 PublisherHTML/CSS(Markup Reference) 🖼️ OutputScreenshot(Visual Reference) 📋 RefinedSpec Doc(Requirements) 🤖 AI Agent(Next.js Component)

Source 1: Figma MCP Metadata

Using Figma MCP, we fed the AI the layout structure and component semantics. This is what a screenshot alone can't convey — the intent behind components.

For example, two boxes that look identical visually might be a Card and a Modal. Figma metadata tells the AI which is which.

Source 2: Publisher HTML/CSS Output

The core source. This is the actual HTML/CSS the publisher built from the designer's Figma — already agreed upon and design-aligned.

When the AI uses this as its baseline for generating Next.js components, it stops inventing styles from scratch and stays within the established visual language.

<!-- Publisher output example -->
<div class="ad-card">
  <div class="ad-card__header">
    <span class="ad-card__title">Ad Title</span>
    <span class="ad-card__status status--active">Active</span>
  </div>
  <div class="ad-card__body">
    ...
  </div>
</div>
// AI-generated Next.js component (conceptual)
export function AdCard({ title, status, children }: AdCardProps) {
  return (
    <div className={styles.adCard}>
      <div className={styles.header}>
        <span className={styles.title}>{title}</span>
        <span className={cn(styles.status, styles[status])}>{statusLabel}</span>
      </div>
      <div className={styles.body}>{children}</div>
    </div>
  );
}

Source 3: Publisher Output Screenshots

Code alone doesn't give the AI a complete spatial picture. A rendered screenshot of the publisher's output adds a visual layer that complements the markup — leveraging the multimodal image understanding of modern AI.

Source 4: Refined Spec Document

UI without behavior is half a product. What happens when a button is clicked? What data should be displayed? This came from the spec docs.

Critically, we didn't dump raw spec documents. Original docs contain too much noise. We extracted and refined only the content relevant to the specific page being worked on.


The Core Insight

Even without a design system, if there's an agreed-upon artifact, the AI can use it as a reference.

What matters isn't whether you have an official Design System. What matters is having something the AI can anchor to — a point of reference it can call "the standard." Publisher HTML/CSS served that role exactly.

In practice, the consistency problems dropped dramatically once the AI had that concrete reference. Style drift between components — which happened constantly when we just fed it Figma — largely disappeared.


The Next Problem — QA

With the dev workflow sorted, the next bottleneck appeared: QA.

Tens of issues accumulate. Each one requires reproducing, fixing, and verifying. Done manually, it creates a constant bottleneck.

We automated it.


The QA Automation Pipeline

Yes No 📋 Jira Ticket(Description, Comments, Attachments) 🔍 Issue AnalysisSummary Generation 📝 Task Decomposition(Issue → Action Items) 🔗 Evidence Mapping(Jira · Confluence · GitHub) Validated? ⚙️ Implementation ❌ Excluded(No evidence) 🎭 Playwright E2EScreenshot + Video 📊 Dashboard1st-pass Validation 👀 PR Review2nd-pass Human Check

Steps 1–2: Jira Ticket Parsing & Issue Analysis

The AI reads full Jira ticket content — Description, Comments, and attachments.

Reading Comments matters. QA engineers often leave reproduction steps, edge cases, and additional findings in comments — context that's just as important as the original description.

Step 3: Task Decomposition

One issue often maps to multiple actions. The AI analyzes "what needs to happen to resolve this issue" and breaks it into specific, actionable tasks.

Step 4: Evidence Mapping — The Critical Layer

This is the most important part.

It validates whether the AI-generated tasks are actually grounded in reality. We map tasks to their evidence: linked Confluence docs, GitHub issues and PRs, referenced code.

Why does this matter?

Without this layer, AI will often expand scope — touching things unrelated to the issue, or doing more than what was asked. The Evidence Mapping layer is a grounding mechanism that prevents the AI from operating outside its sanctioned boundaries.

Tasks without mapped evidence are excluded from the work queue.

Steps 5–6: Implementation & Playwright E2E

Only validated tasks proceed. After each fix, Playwright runs E2E tests — capturing not just pass/fail status but also screenshots and video recordings.

// Playwright test example (conceptual)
test('Ad group status change reflects correctly', async ({ page }) => {
  await page.goto('/ad-groups');
  await page.screenshot({ path: 'screenshots/before.png' });

  await page.click('[data-testid="status-toggle"]');
  await page.waitForResponse('/api/ad-groups/*');

  await page.screenshot({ path: 'screenshots/after.png' });
  await expect(page.locator('[data-testid="status-badge"]')).toHaveText('Paused');
});

Steps 7–8: Dashboard 1st-pass + PR Review 2nd-pass

With 50 issues, the dashboard shows 50 completed result sets — screenshots and videos — in one view. Reviewers check the outputs, not reproduce each issue manually.

PR Review is the human 2nd-pass for final verification.


Problems We Hit

Problem 1: What if the publisher output changes?

Using HTML/CSS as a reference means the AI's context goes stale whenever a publisher updates the markup. If the AI isn't told about the change, it continues generating components against outdated standards.

We're currently handling this manually — swapping out the source files when publishers update. Ideally, a version control strategy or change-detection hook would automate this.

Problem 2: Coverage of AI-generated tests

When the AI generates Playwright tests, there's no automatic way to verify whether the tests actually cover the critical user flows. Tests can look comprehensive while missing the most important paths.

Coverage Report integration would solve this — but we haven't implemented it yet.

Problem 3: Dashboard validation is still visual

The current dashboard confirms that issues were addressed — but it doesn't automatically verify correctness. Screenshots are there to review, but comparing them against expected results is still done by a human eye.

Adding Visual Regression Testing — a pixel-level comparison between expected and actual screenshots — would make the 1st-pass truly automated.


Reflections

Six months of running this workflow taught me two things.

First: AI needs a reference, not a design system per se.

The blocker isn't the absence of a formal design system — it's the absence of any agreed-upon reference. Publisher HTML/CSS was enough to serve that role. That realization unlocked the whole approach.

Second: Grounding is what makes AI automation trustworthy.

Evidence mapping was the difference between AI that does the right thing and AI that does something adjacent to the right thing. Without that layer, AI autonomy becomes a liability. With it, it becomes a force multiplier.

This isn't a finished system. Visual regression, publisher output sync, test coverage measurement — these are open problems. The plan is to keep building and keep writing about it.