Portfolio

Focused on technical problems I led, the thought processes behind my decisions, and the outcomes achieved.

AI-based Workflow Improvements

Separate initiatives focused on improving development, analysis, and QA workflows with AI.

Case #11AI Agent-based Frontend Dev Workflow — LLM in Practice Without a Design SystemAI-based Workflow Improvements
Next.jsReactAI AgentPlaywrightFigma MCPQA AutomationJira
  • Project Overview: For the AIDC (Ali International Digital Center) ad center integration project, directly designed and built an AI Agent-based development workflow and Playwright E2E QA automation pipeline to enable team members without frontend experience to participate in Next.js · React development

  • Key Achievements:

    • Designed an AI Agent 4-source multimodal context framework in an environment without a formal design system
    • Automated per-issue QA first-pass validation via Jira Evidence mapping + Playwright E2E + monitoring dashboard
    • Built a development system enabling non-frontend engineers to actively contribute

  • Problem Situation

    • Lack of a clear design reference point (Design System), a prerequisite for AI Agent development — absence causes AI hallucinations
    • No formal design system; only UX designer's Figma work, resulting in an absence of explicit component tokens or style guides for AI reference
  • Technical Approach & Implementation

    • Informal Design Token: Utilized HTML/CSS markup output agreed upon by publishers and designers as an informal Design Token to establish a Next.js porting workflow
    • 4-Source Multimodal Context: Provided 4 data sources simultaneously to maximize the AI Agent's comprehension and accuracy
      • Figma MCP metadata (layout & component intent), Publisher HTML/CSS output (markup reference), Publisher screenshots (visual grounding), Refined spec documents (feature & interaction definitions)
    • QA Automation Pipeline:
      • Jira ticket parsing → AI issue analysis & task decomposition → Evidence mapping (Jira·Confluence·GitHub grounding to constrain AI scope) → Playwright E2E test generation (Screenshot, Video Recording) → Dashboard 1st-pass automated validation → PR Review 2nd-pass human validation
  • Limitations & Improvement Directions

    • No Visual Regression Testing: Need to evolve to Expected vs Actual pixel comparison for fully automated 1st-pass validation
    • No Artifact Sync Strategy: Require version control or change-detection hooks to update AI context when publisher markup changes
    • Unmeasured Test Coverage: Need Coverage Report integration to verify AI-generated E2E test coverage across critical user flows
Case #12Code Sonar — AI Agent-based Legacy System Analysis PluginAI-based Workflow Improvements
AI AgentGitHub MCPJira MCPConfluence MCPMSSQL MCPPythonatlsMermaidLegacy Analysis
  • Project Overview: Designed and built an internal DeepWiki-like AI Agent analysis plugin for company legacy environments (Private Repos, MSSQL, Jira/Wiki) where public GitHub repo-oriented tools could not be applied as-is

  • Key Achievements:

    • Built a business flow documentation system from client → Kafka topic → consumer → SP internals
    • Reduced unnecessary DB access risk via a stepwise Evidence Architecture (Filesystem → GitHub → MSSQL)
    • Separated verified facts from inference through evidence-grounded cross-validation
    • Built a custom Atlassian CLI (atls) to cover gaps in MCP ecosystem support
    • Designed a chain of 17 specialized Agents (analysis → validation → Confluence auto-publish)
    • Multi-environment support: Gemini CLI, Claude Code, Codex, Antigravity IDE

  • Problem Situation

    • Legacy systems where change history and documentation had become fragmented, making onboarding heavily dependent on veteran engineers
    • Business logic context left only in code and operational traces after original owners had moved on
    • Needed a DeepWiki-like automatic documentation experience, but public GitHub repo-oriented tools could not be applied directly to private repos, legacy DBs, and Jira/Wiki context
  • Technical Approach & Implementation

    • Stepwise Evidence Architecture: Avoids direct DB exploration; narrows scope before querying in 3 stages
      1. Filesystem MCP (Local Search): First pass over API, consumer, SQL call sites, and config keywords in the local workspace
      2. GitHub MCP (Codebase): PRs, commits, CODEOWNERS to add change context, ownership, and call-path evidence
      3. MSSQL MCP (Targeted Query): Queries only the table and SP names identified in steps 1–2, then verifies SP definitions and dependencies
    • Multi-source Cross-Validation: Cross-validates Filesystem → GitHub → Jira/Confluence → MSSQL sources to clearly separate verified facts from inference
    • atls Python CLI — Custom Built: Wrapper CLI covering Atlassian features unsupported by MCP (Confluence page create/update, recursive Wiki scraping)
    • 17 Specialized Agent Chain: Role-separated agent pipeline instead of a monolithic analyzer
      • Core 6: bridge-analyzer (full flow), db-schema-analyst (DB·SP), business-workflow-analyst (business logic), cross-repo-tracer (multi-repo), evidence-auditor (quality audit), wiki-publisher (Confluence publish)
      • 11 additional domain-specific Agents
    • Mermaid 2-Stage Pre-Refinement: Pre-refine node relationships before asking AI to draw — prevents spaghetti diagrams
    • Evidence Ledger: Maps every document claim to its source (code/config/wiki/github), clearly separating inference from verified fact
  • Limitations & Improvement Roadmap

    Business flow and data flow documentation is complete. Code-level detailed analysis remains the next frontier.

    Level Goal
    Level 2 AST-based Call Graph, SP branch logic → business rule mapping, ETL pattern tracing
    Level 3 Full bidirectional traceability: JIRA → code lines → Kafka → consumers → DB → SP
    Level 4 PR merge webhook → doc drift detection → auto-flag stale documentation
    Level 5 Natural language queries → sequence diagrams with Evidence citations, auto-generated

Project Cases

Case #1Resolving Performance Bottlenecks in the Legacy Ad Ranking SystemGmarket
Spring WebfluxMongoDBRedisKafkaMS-SQL.NETDatadog
  • Project Overview: Completely redesigned a .NET & MS-SQL based CPC ad ranking system to a Spring Webflux + MongoDB reactive architecture — led by a 2-person team alongside the existing maintainer.
  • Key Achievements:
    • Reduced ranking update time from max 4 hours → under 3 minutes (96% reduction).
    • Achieved query response time of under 10ms at the p99.9 percentile.
    • Reduced CPU usage by 60%.
    • Stably handled peak TPS of 50,000.

  • Problem Situation

    • Synchronous ranking updates based on .NET & MS-SQL took up to 4 hours, causing increased advertiser CS inquiries and refunds.
    • Lack of processing capacity and instability during peak times (around TPS 30,000).
    • High CPU usage due to synchronous blocking processing.
  • Technical Approach & Implementation

    • Transition to Reactive Architecture: Introduced Spring Webflux to switch to non-blocking, asynchronous processing, securing high concurrency with fewer threads.
    • MongoDB Optimization: Designed a denormalized model tailored for read-heavy data characteristics + configured compound indexes, continuously verifying query performance based on the p99.9 percentile.
    • Asynchronous Stream Processing: Improved CPU and memory efficiency with massive ranking update logic based on Reactor Flux/Mono.
    • Kafka Messaging: Delegated heavy update tasks at the trigger point to Kafka consumers to distribute API load.
    • Datadog Monitoring: Real-time tracking of Latency, TPS, MongoDB query performance, and analysis of bottlenecks.
Case #2Ad Data Migration to DatabricksGmarket
DatabricksHadoopHueMongoDBMS-SQLPySpark
  • Project Overview: Migrated CPC ad settlement (Bill/Pay) data and ranking update snapshot data from a Hue/Hadoop based environment to Databricks — led the pipeline design, implementation, and verification.
  • Key Achievements: 0 batch failures post-migration (completely resolved recurring manual re-executions at dawn) / Secured ad settlement data accuracy through a step-by-step verification structure.

  • Problem Situation

    • Frequent batch failures due to resource shortages in the Hue/Hadoop infrastructure → Required the person in charge to manually re-execute at dawn on a regular basis.
    • Ad settlement (Bill/Pay) data and MongoDB ranking snapshot data were processed separately in disparate environments.
    • Given the nature of ad settlement data, preventing omissions, duplications, and amount discrepancies was mandatory.
  • Technical Approach & Implementation

    • Legacy Flow Analysis: Understood the Hue/Hadoop based Bill/Pay creation/aggregation/verification flow, and the usage purpose/processing order of the MS-SQL raw settlement data and MongoDB ranking snapshot data.
    • Reconstructing Databricks Pipeline: Transitioned to a workflow of loading raw data → creating intermediate aggregation tables → producing the final settlement result table.
    • Designing a Step-by-step Verification Structure: Constructed a structure traceable for failure points, preventing settlement errors in advance by automatically comparing and verifying record counts and total amounts against intermediate aggregation tables.
    • Verifying Migration Results against MS-SQL & MongoDB Raw Data: Verified migration accuracy based on major keys across ad, seller, and period dimensions.
Case #3Building a Real-time RDB-MongoDB Data Sync Pipeline Without Commercial CDCGmarket
Spring WebfluxMS-SQLMongoDBKafkaSpring Batch
  • Project Overview: Designed and implemented a custom 3-stage synchronization pipeline to solve data consistency issues during the transition of the CPC ad ranking system from MS-SQL to MongoDB.
  • Key Achievements: Guaranteed real-time data consistency without a commercial CDC solution / Automatic recovery of failed transactions / Reduced advertiser CS inquiries through final consistency verification in the background.

  • Problem Situation

    • Intermittent data inconsistencies occurred during the transition from MS-SQL to MongoDB.
    • Securing real-time consistency was mandatory as ad ranking data is a core business asset.
    • Adopting a commercial CDC (Change Data Capture) solution was impossible due to budget and technical constraints.
  • Technical Approach & Implementation

    The initial approach of writing simultaneously to both DBs was rejected due to the complexity of distributed transaction management and concerns over worsening inconsistencies upon simultaneous failures. Instead, we custom-designed a 3-stage tiered architecture.

    1. Stage 1 — Real-time Sync (API Level): Attempted immediate reflection to MongoDB upon data changes using Spring Webflux asynchronous processing.
    2. Stage 2 — Failure Recovery (Utilizing Message Queue): If MongoDB reflection failed, published to a Kafka topic where a Sync Consumer subscribed and safely reprocessed it. Guaranteed order via message key (uuid) based partitioning.
    3. Stage 3 — Background Verification (Final Consistency): Developed a Batch Job that periodically compared both DBs based on the MS-SQL change history and automatically corrected any inconsistencies.
  • Results

    • Secured data reliability with one-way failure recovery + two-way final consistency verification.
    • Improved operational efficiency and reduced advertiser CS inquiries caused by data inconsistencies.
Case #4Root Cause Analysis and Resolution of Redis & Kafka Integration IssuesGmarket
Spring WebfluxRedisLettuceKafka
  • Project Overview: Analyzed and resolved issues of Kafka partition skew and Redis bulk processing performance degradation that occurred during the ranking system improvement process.
  • Key Achievements: 4x faster Redis bulk processing speed (reduced from 240 sec to 60 sec per 1 million records) / Resolved Kafka message distribution imbalance → Secured stable and predictable processing performance.

  • Issue 1 — Partition Skew Due to Kafka StickyPartitioner Bug

    • Symptom: Abnormal distribution where messages were concentrated in specific partitions, leaving others nearly empty.
    • Root Cause Analysis: Identified a skew bug in the StickyPartitioner of Kafka Client below 3.3 that occurs when a low linger.ms setting overlaps with broker latency. Additionally, found a bug in the RoundRobinPartitioner for versions 2.4 and above where the partition() method is called twice, causing distribution only to even/odd partitions.
    • Resolution: Upgraded to Kafka Client 3.3 or higher and used the bug-fixed DefaultPartitioner, restoring even distribution.
  • Issue 2 — Bulk Processing Performance Degradation Due to RedisTemplate Individual TCP Connections

    • Symptom: Batch processing of 1 million Redis SET commands took 240 seconds (4 minutes) — exceeding expectations by 4 times.
    • Root Cause Analysis: While monitoring Redis connections via Datadog, identified an explosive increase in connections during batch execution. Found that the RedisTemplate default configuration establishes and tears down a new TCP connection for every command.
    • Resolution: Switched to the Lettuce native API, eliminating I/O overhead by utilizing connection pooling + pipelining (internal command queuing) → Processing time reduced from 240 sec to 60 sec (75% reduction).
Case #5Integrating External VOD Solution While Minimizing Legacy System ImpactGmarket
Node.js.NETWebhookREST APIShoplive
  • Project Overview: Launched a new video ad product by integrating an external VOD solution (Shoplive) into a legacy .NET ad system originally focused on image ads.
  • Key Achievements: Contributed to new ad revenue worth at least billions of KRW with the launch of the new video ad product / Successfully added features without impacting the stability of the legacy system.

  • Problem Situation

    • Needed to add new video ad capabilities to a legacy .NET ad system designed exclusively for image ads.
    • Given the nature of the legacy system, code changes carried the risk of cascading failures, making minimizing the scope of impact a core constraint.
    • Required real-time metadata synchronization with the external VOD solution (Shoplive).
  • Technical Approach & Implementation

    • Webhook-based Sync Design: To avoid touching the internals of the legacy system as much as possible, adopted a method of one-way metadata synchronization by receiving processing completion events from the external solution via webhooks.
    • Stabilizing External Integration:
      • Implemented retry logic for external system integration failures.
      • Updated the internal status to the latest by checking the actual processing status of the external system when querying ads (preventing status inconsistencies).
      • Secured operational visibility by adding detailed logging.
    • Minimizing Legacy Impact: Performed minimal changes such as adding a video type field, implementing file processing logic, and extending the subsequent ad API server, followed by deployment after integration testing.
Case #6Introducing and Stabilizing Event-Driven Architecture During MSA TransitionCatenoid
Node.jsNest.jsExpress.jsAWS LambdaSQSEventBridgeECSDocker
  • Project Overview: Designed and introduced an event-driven architecture to resolve inter-service coupling during the transition of the Loomex media distribution management solution from a .NET monolith to a Node.js MSA.
  • Key Achievements: Eliminated synchronous dependencies between services and secured a flexibly scalable structure / Optimized infrastructure costs by introducing asynchronous serverless processing.

  • Problem Situation

    • Synchronous calls between services in the tightly coupled monolithic legacy system → Increased coupling, propagated failures.
    • Structural limitations of performing long-running tasks like media transcoding via API synchronous processing.
  • Technical Approach & Implementation

    Designed the infrastructure by distinguishing inter-service communication methods into "Commands" and "Events":

    • AWS SQS (Command): Used when specific tasks require asynchronous processing (e.g., VOD transcoding requests). The processing service or Lambda pulls and processes messages from the queue.
    • AWS EventBridge (Event): Used when multiple services need to independently subscribe to state change events (e.g., encoding completion, channel state changes). Eliminated inter-service coupling using the Publish/Subscribe pattern.
    • Message Processing Stability: Supported preservation and reprocessing of failed messages by configuring a DLQ (Dead Letter Queue). Prevented duplicate processing by applying an idempotency key (message ID) in the Lambda processing logic.
    • Automated Resource Cleanup: Automatically deleted S3 stream chunks and thumbnails using a Lambda + EventBridge scheduler (applied DB-based retention policies).
    • CloudWatch Monitoring: Set up alarms for core metrics such as SQS queue depth and Lambda execution error rates.
Case #7Automating Complex Social Media Integration ProcessesCatenoid
Node.jsYoutube APIFacebook Live APIOAuth 2.0REST API
  • Project Overview: Fully automated the 6-step manual setup process for customers' social media simulcasting on a live streaming platform into a single authentication flow.
  • Key Achievements: Automated the 6-step setup process (like manual stream key input) into a 1-time authentication / Drastically reduced CS inquiries caused by setup errors / Achieved full automation for simultaneous YouTube and Facebook broadcasting.

  • Problem Situation

    • To simulcast to YouTube and Facebook, customers had to manually visit each platform and complete a 6-step manual setup, including issuing a stream key.
      • Visit social platform → Login → Go Live → Issue Stream Key & Server URL → Paste into our solution → Start Broadcast.
    • Non-technical customers frequently generated CS inquiries due to typos and setup errors while manually entering stream keys.
    • The complexity of integrating and managing different API specs, authentication methods, and setup procedures for each platform.
  • Technical Approach & Implementation

    • Implementing OAuth 2.0 Auth Flow: Registered apps in the developer consoles of each platform (YouTube, Facebook), handled Redirect URI on the backend, exchanged authorization codes, acquired Tokens, and stored them in the DB. Eliminated the burden of re-authentication through automatic Access Token renewal based on Refresh Tokens.
    • Abstracting API Modules per Platform: Implemented YouTube and Facebook API clients respectively and abstracted features like live creation and stream info lookup into an internal standard interface → Making it easy to add new platforms.
    • Implementing Automation Workflow: After the initial authentication, invoked APIs sequentially using stored tokens (Create Live → Acquire Stream Info → Auto-configure Catenoid Streaming Engine). Handled success/failure and rollbacks at each step.
    • Security: Encrypted and stored user token information in the DB.
Case #8Stable Upload of Large Files (Chunking and Resuming)Catenoid
Vue.jsNode.jsTypeScriptREST API
  • Project Overview: Resolved memory shortage and timeout issues that occurred when uploading media files of several GBs by implementing chunked uploads and a resume feature.
  • Key Achievements: Completely resolved timeouts and memory shortages during large file uploads / Secured practically a 100% upload success rate barring file-specific issues / Supported resuming from the point of interruption upon network disconnection.

  • Problem Situation

    • Single HTTP uploads of several GBs frequently caused browser memory shortages, server memory overflows, and network timeouts.
    • User inconvenience of having to retry from the beginning upon an upload failure.
  • Technical Approach & Implementation

    • Client Splitting & Transmission: Split the file into fixed-size chunks in JavaScript and transmitted them via FormData including metadata like file ID, total chunk count, and chunk index (supported parallel transmission).
    • Server Reception, Storage, & Reassembly: Implemented a chunk reception API on the Node.js server and saved chunks in a temporary directory (/tmp/{fileId}/{chunkIndex}). Upon receiving all chunks, merged them sequentially, saved the final file, and deleted the temporary files.
    • Implementing Resume:
      • Called an API to query the list of already uploaded chunks on the server before starting the upload.
      • The server scanned the temporary directory and responded with a list of existing chunk indexes.
      • The client re-transmitted starting from the last successful chunk (applied defensive logic to re-transmit the last successful chunk as well, considering the possibility of it terminating abnormally).
Case #9New Development of EDN+ Server and Ad Integration with KarrotGmarket
Node.jsSpringREST APIKarrotBanner Ad
  • Project Overview: Completely redeveloped Gmarket's external display ad network EDN+ (Ebay Display Network AD) server from a Node.js legacy to a Spring base, and implemented ad placement integration with Karrot (Danggeun Market).
  • Key Achievements:
    • Node.js Legacy → Completed construction of the new Spring-based server.
    • Implemented Karrot Ad Placement Integration — Directly served APIs for banner ads displayed on Karrot.
    • Handled the overall backend system of external network ad products similar to Google GDN and Criteo AD Choices.

  • EDN+ Product Introduction

    • EDN+ is Gmarket's external display ad network product.
    • Exposes Gmarket ads on external digital media (news sites, portals, online communities, etc.) like Google GDN and Criteo AD Choices.
    • Includes a DMP (Data Management Platform) and an ad selection engine based on retargeting, lookalike audiences, and user targeting.
  • Development Scope & Implementation Details

    • New Server Development: Fully redeveloped the existing Node.js-based legacy EDN+ server to a Spring base.
      • Implemented backend logic for the ad selection engine (Retargeting, Lookalike, User Targeting).
      • Implemented APIs for external media ad placement quality management.
      • Implemented an HTML-based ad creative template processing server.
      • Developed a backend admin system for advertiser/publisher settlements.
    • Karrot Ad Placement Integration:
      • Discussed and designed API integration specs with Karrot.
      • Implemented an API that selects and returns the appropriate ad when an ad is requested from a Karrot placement.
      • All banner ads exposed on Karrot are served through this API.