Blog

Gmarket CPC Ad Full Workflow — From Bidding to Report

A deep dive into the full workflow of Gmarket PowerClick (CPC keyword advertising). From advertiser bidding to ranking calculation, ad serving, click tracking, billing, and final report generation — how an event-driven pipeline keeps it all running.

CPCAdSystemKafkaDatabricksSpring WebfluxGmarketAdTech

Note: Code samples and numbers in this post are conceptually reconstructed from real work experience and are unrelated to actual company code.

Full Pipeline Diagram

Publish to Kafka Publish Ranking Snapshot Save Top 1~50 Rankings Impression / Click Events Kafka Publish Click Data 🖥️ Advertiser Admin(Bid Setting / Ad Group Management) ⚡ Facade API(Spring Webflux · Non-blocking) 📨 Ranking Consumer(Bid Calc · Validation · Ranking) 🏗️ Databricks(Ranking Snapshot Storage) 🗄️ MongoDB(Live Ranking Data) 📡 AD Serving API(Real-time Ranking Serving) 👤 User(Ad Impression & Click) 📊 ATS(Ad Tracking System) 📥 Impression/Click Consumer 💰 Billing / Balance Deduction Job 📈 Report Workflow(Databricks Workflow) 🏪 Ad Admin Reports(Seller Report View & Export)

Introduction

During my time on Gmarket's AdTech team handling the entire CPC ad system, I had the opportunity to deeply analyze and enhance the full journey from an advertiser's bid update all the way to the final report they see in their dashboard.

On the surface, CPC is simple: "advertiser pays per click." But underneath lies a complex event-driven pipeline: real-time ranking → ad serving → event collection → billing → report generation.

This post breaks down the PowerClick (CPC keyword ad) workflow step by step.


Step 1 — Bid Update & Facade API

Where It All Starts

Advertisers set keyword bids, daily budgets, and ad group states through the admin system. These changes flow into the Facade API.

The Facade API is a non-blocking reactive server built on Spring Webflux. As the entry point for high-concurrency advertiser requests, the reactive stack was chosen specifically to avoid the inefficiency of blocking I/O.

// Facade API — Bid update handler (conceptual)
@PostMapping("/ad-groups/{adGroupId}/bid")
public Mono<ResponseEntity<Void>> updateBid(
        @PathVariable String adGroupId,
        @RequestBody BidUpdateRequest request) {

    return bidService.updateBid(adGroupId, request.getBidPrice())
            .then(rankingEventProducer.publish(new RankingUpdateEvent(adGroupId)))  // Kafka publish
            .thenReturn(ResponseEntity.accepted().<Void>build());
}

The key: publish to Kafka then immediately return. Heavy ranking computation is decoupled from the API response time via async messaging.


Step 2 — Ranking Consumer (Ranking Engine)

The Ranking Consumer subscribing to the Kafka topic is the brain of this pipeline.

Bid Price Calculation

In CPC, rank isn't determined by bid price alone. The system computes a composite score:

Ranking Score = f(bid price, quality score, predicted CTR, ...)

Multi-layer Validation

Before ranking calculation, each ad must pass a gauntlet of eligibility checks:

Validation Description
Seller restrictions Is the seller's account suspended or restricted?
Ad group status Is the ad group active, paused, or ended?
Daily budget exhaustion Has today's budget already been spent?
Out of stock Is the product currently out of stock?
Creative approval Is the ad creative approved?

Only ads that pass all validations enter ranking calculation.

Storing Results & Publishing Snapshot

Ranking calculation complete
    ├── Save top 1~50 rankings to MongoDB (real-time data for AD Serving API)
    └── Publish Ranking Snapshot event to Kafka → Loaded into Databricks

The Ranking Snapshot in Databricks becomes the source data for downstream reporting and analytics.


Step 3 — AD Serving API & Ad Impression

The AD Serving API reads ranking data from MongoDB and serves ads to users in real time.

Across Gmarket's search results, product detail pages, and other placements, the AD Serving API is called and returns the top-N relevant ads for a given keyword/placement.

User searches "phone case"
    → AD Serving API called
    → Query MongoDB for top-ranked ads for this keyword
    → Return ad list → Display to user

Step 4 — ATS (Ad Tracking System): Event Collection

When a user sees or clicks an ad, the ATS (Ad Tracking System) captures that event.

The existing ATS was a Node.js v6 legacy server. I performed an environment migration, upgrading it to Node.js v16 and containerizing it on Kubernetes. As part of this modernization effort, Datadog monitoring was added to ensure reliable event collection without altering the core legacy logic.

Events collected by ATS:

  • Impression: Ad was displayed on screen
  • Click: User clicked the ad

Collected events are immediately published to a Kafka topic.


Step 5 — Impression/Click Consumer & Databricks Ingestion

A dedicated Consumer subscribes to the ATS-published Kafka events and loads them into Databricks.

ATS → Kafka publish
    → Impression/Click Consumer (Kafka subscriber)
    → Databricks ingestion (raw impression/click data)

By this point, Databricks holds:

  • Ranking Snapshots (from Step 2)
  • Impression logs (this step)
  • Click logs (this step)

Step 6 — Billing / Balance Deduction Job

A click doesn't trigger immediate billing. A separate backend billing job runs periodically based on click data in Databricks.

[Click data loaded into Databricks]
    → Billing Job runs
    → Valid click filtering (dedup, abuse, bot detection)
    → Deduct balance from advertiser account
    → Record deduction history

Invalid clicks (abuse, bots, rapid duplicates) are filtered at this stage, and only legitimate clicks proceed to billing.


Step 7 — Report Workflow (Databricks)

After the billing job completes, the Databricks Report Workflow kicks off.

The Problem with the Old Hadoop Pipeline

The legacy pipeline ran on Hadoop. Issues included:

  • Repeated manual reprocessing of large batches late at night
  • Full pipeline restart required on any step failure

Migrated to Databricks Workflow

Billing Job complete
    → Databricks Workflow triggered
        ├── Impression/click aggregation (keyword / ad group / campaign)
        ├── Cost aggregation (daily / weekly / monthly)
        ├── CTR / CVR metric calculation
        └── Report table generation

Databricks Workflow manages each step independently — on failure, only the failed step needs rerunning. This dramatically reduced operational overhead and eliminated the repetitive late-night manual reprocessing.


Step 8 — Ad Admin Report View

Once all report workflows complete, advertisers can view their reports in the admin dashboard:

  • Impressions / Clicks / CTR
  • Spend / Average CPC
  • Filters by keyword, ad group, and date range
  • CSV report export

Full Pipeline Summary

Step Component Technology
1. Bid update Facade API Spring Webflux, Kafka
2. Ranking calc Ranking Consumer Kafka, MongoDB
3. Ad serving AD Serving API MongoDB
4. Event collection ATS Node.js, Kafka
5. Data ingestion Click/Impression Consumer Kafka, Databricks
6. Billing Billing Job Databricks
7. Report generation Report Workflow Databricks Workflow
8. Report view Ad Admin REST API

Closing Thoughts

While analyzing the entire pipeline and enhancing various components, what stood out the most was the importance of loose coupling between stages. Because Kafka perfectly served as an async buffer between components, we were able to migrate individual systems and prevent isolated failures from cascading through the entire workflow.

Modernizing legacy environments like the ATS and migrating from Hadoop to Databricks weren't just technology swaps — they were strategic decisions to reduce operational burden and improve pipeline reliability. Ultimately, eliminating the repeated late-night manual reruns and providing stable advertiser reports on schedule was the most tangible outcome of these enhancements.