Blog

Resolving Performance Bottlenecks in the Legacy Ad Ranking System — From .NET to Spring Webflux

A journey of completely redesigning a legacy .NET & MS-SQL system to a Spring Webflux + MongoDB reactive architecture, reducing the ad ranking update time from a maximum of 4 hours to under 3 minutes.

Spring WebfluxMongoDBRedisKafkaPerformance OptimizationGmarket

Note: The code in this article has been conceptually rewritten based on actual work experience. It is not associated with the actual company code.

System Architecture

AS-IS TO-BE Advertiser Bid Change Ranking Update Trigger Publish to Kafka Topic Ranking Consumer Update MongoDB Ranking Update Redis Cache Ad Serving API User Exposure .NET Blocking Process(Max 4 hours) MS-SQL Query/Update Spring WebfluxNon-blocking MongoDB + KafkaAsynchronous

Introduction

When I first took charge of the CPC Ad Ranking System at the Gmarket AdTech team, the initial numbers I encountered were shocking. It took up to 4 hours for important events, such as an advertiser changing a bid or exhausting a budget, to be actually reflected in the rankings. During this gap, ads with depleted budgets continued to be exposed, leading to refunds and a pile-up of advertiser customer service (CS) tickets.

This article records the journey, technical decisions, and outcomes of completely redesigning a .NET & MS-SQL based legacy system into a Spring Webflux + MongoDB reactive architecture.


The Problem: A 4-Hour Gap

Grasping the System Structure

The existing system was a monolithic application written in .NET that read ranking data from an MS-SQL RDB, went through complex logic to calculate rankings, and stored them in a cache.

The problem was that this entire process operated in a synchronous, blocking manner.

[Ranking Update Trigger]
     ↓
[MS-SQL Query] → Blocking Wait
     ↓
[Ranking Calculation Logic] → Single Thread Processing
     ↓
[MS-SQL Update] → Blocking Wait
     ↓
[Update Cache] → Complete

With data reaching millions of records, this structure produced the following results:

  • Update Time: Took up to 4 hours
  • Peak Time TPS: Unstable at around 30,000
  • CPU Usage: Spiked drastically during update tasks

For ad rankings to display data that is 4 hours old meant that advertisers were spending their budgets in real-time, but this wasn't reflected in the rankings at all.


Technical Decision: Why a Reactive Stack?

Initially, we considered optimizing the existing .NET system. Query optimization, adding indexes, and improving the caching layer were all candidates. However, the profiling results were clear.

The fundamental bottleneck was the synchronous blocking I/O itself.

In a structure where a thread cannot do anything while waiting for a DB response, the only way to process millions of records was to indefinitely increase the number of threads. And that led to CPU usage explosions.

The conclusion was a transition to a non-blocking I/O based reactive architecture.

Tech Stack Selection

Layer Legacy New
Application .NET (Synchronous) Spring Webflux (Reactive)
Database MS-SQL (RDB) MongoDB (NoSQL)
Message Queue None Kafka
Monitoring None Datadog

The reason for selecting MongoDB was due to the characteristics of the ranking data. Ranking queries are extremely read-heavy, involve almost no complex JOINs, and a document-level query pattern is dominant. A document-based structure fit much better than the row-based structure of an RDB.


Implementation: 3 Core Changes

1. Transitioning to Non-blocking Processing with Spring Webflux

Spring Webflux is based on the Reactor library. The core is reactive stream processing utilizing Mono and Flux.

// Legacy Approach (Blocking)
public List<RankingItem> getRankings(String adGroupId) {
    return rankingRepository.findByAdGroupId(adGroupId); // Blocking wait
}

// Reactive Approach (Non-blocking)
public Flux<RankingItem> getRankings(String adGroupId) {
    return rankingRepository.findByAdGroupId(adGroupId) // Immediate return, data flows asynchronously
        .filter(item -> item.isActive())
        .sort(Comparator.comparingDouble(RankingItem::getScore).reversed());
}

Because threads do not have to wait for I/O responses and can process other tasks, we were able to handle a significantly higher number of concurrent requests with a smaller number of threads.

2. MongoDB Modeling Optimization

When modeling the ranking data in MongoDB, the most important consideration was Denormalization.

In an RDB, data is distributed across multiple tables through normalization and combined using JOINs upon query. However, in NoSQL, the opposite strategy—storing frequently co-queried data in a single document—is highly effective.

// Legacy RDB Structure (Requires JOINs)
// ads table + rankings table + bidding table → JOIN

// MongoDB Denormalized Structure
{
  "_id": "ranking_001",
  "adGroupId": "ag_12345",
  "score": 98.5,
  "bid": 500,
  "qualityScore": 0.92,
  "adTitle": "...",
  "advertiserName": "...",
  "updatedAt": ISODate("2024-06-10T10:00:00Z"),
  // All fields to be queried together in one document
}

Because there was no need to reference multiple collections during queries, the response speed increased drastically.

Compound Index Design was also key. We analyzed the most frequent query patterns to configure the indexes.

// Primary Query Pattern: adGroupId + status + score based
db.rankings.createIndex(
  { adGroupId: 1, status: 1, score: -1 },
  { background: true }
)

All indexes were tuned by verifying performance based on p99.9 metrics. We checked execution plans using the explain() method and iteratively improved them to ensure IXSCAN (Index Scans) were being used.

3. Kafka-based Asynchronous Updates

If everything were processed synchronously via APIs at the ranking update trigger point, heavy tasks would slow down API response times. We decoupled the heavy update tasks using Kafka.

[Update Trigger API Call]
     ↓
[Publish event to Kafka Topic] → Immediate Response
     ↓ (Asynchronous)
[Ranking Consumer] → Consume message from Kafka
     ↓
[Process MongoDB Update]

With this structure, we kept API response times fast while stably handling massive volumes of updates in the background.


Tracking Bottlenecks with Datadog

One of the most useful practices during the development process was attaching Datadog monitoring from the very beginning. By tracing request flows with APM traces and monitoring p99.9 response times, we continually discovered hidden bottlenecks.

For example, initially, we thought we had created MongoDB indexes well, but we found queries in Datadog APM that bypassed the indexes under certain conditions, leading us to redesign the indexes.


Results

These are the metrics measured after the complete redesign.

Metric Legacy After Improvement
Ranking Update Time Up to 4 hours Under 3 minutes
Core API p99.9 Response Time 100~200ms Under 10ms
CPU Usage High (at peak) 60% Decrease
Peak TPS Processing 30,000 (Unstable) 50,000 (Stable)

Beyond just the improved numbers, inquiries of the type "My rankings aren't reflecting," which previously resulted in advertiser CS tickets, drastically decreased.


Conclusion: What I Learned from the Reactive Transition

Reactive programming initially has a steep learning curve. The concepts of Mono and Flux, error handling approaches, and debugging methods differ significantly from imperative programming.

However, in systems that are highly I/O intensive and must handle massive traffic, the efficiency brought by a reactive stack far outweighs the learning costs. This shines especially bright in use cases like an ad ranking system, where high concurrency must be handled without wasting thread resources.

In my next post, I plan to cover the Redis & Kafka-related issues we encountered together during this process.