SQS vs EventBridge — Criteria for Designing an Event-Driven Architecture During MSA Transition
During the transition from a monolith to MSA, we adopted an AWS event-driven architecture to decouple services. Sharing the criteria for separating SQS and EventBridge usage, and how we designed idempotency and DLQs.
Note: The code in this article has been conceptually rewritten based on actual work experience. It is not associated with the actual company code.
Event-Driven Architecture Structure
Introduction
At Catenoid, I was responsible for transitioning the Loomex media distribution management solution from a .NET monolith to a Node.js Microservices Architecture (MSA).
The biggest challenge in transitioning to an MSA is not the separation of features itself. It's the question of how the separated services communicate with each other. If designed poorly, you end up with a "distributed monolith" where the services are separated, but the coupling remains the same.
The Problem: The Limits of Synchronous Inter-Service Calls
If you separate services from a monolith and have them call each other via HTTP APIs, it looks like an MSA on the surface. However, in reality, problems like these arise:
[Video Upload Service]
→ HTTP Call → [Transcoding Service]
→ HTTP Call → [Metadata Service]
→ HTTP Call → [Notification Service]
- While the Transcoding Service is responding, the Upload Service is blocked waiting.
- If the Transcoding Service goes down, the Upload Service also fails.
- Every time a new follow-up task is added, the Upload Service code must be modified.
This is simply a monolith wearing an HTTP shell.
Design Criteria: Command vs Event
When introducing an event-driven architecture, the very first criterion we established was distinguishing between "Command" and "Event".
| Classification | Definition | AWS Service |
|---|---|---|
| Command | A request to perform a specific action. Has 1 handler. | SQS (Queue) |
| Event | A fact that a state change has occurred. Can have multiple handlers. | EventBridge (Bus) |
SQS Use Case: VOD Transcoding Request
When a video upload is complete, transcoding is required. Transcoding is a command delegated to a specific Lambda function.
// Request transcoding after upload completion (Command → SQS)
await sqs.sendMessage({
QueueUrl: TRANSCODING_QUEUE_URL,
MessageBody: JSON.stringify({
videoId: video.id,
sourcePath: video.s3Path,
targetFormats: ['mp4_720p', 'mp4_1080p']
}),
MessageGroupId: video.id // Guarantee order with FIFO Queue
}).promise();
By using SQS, even if the Lambda goes down, the message is preserved in the queue. Once the Lambda recovers, processing resumes automatically.
EventBridge Use Case: Encoding Completion Event
When transcoding is complete, multiple services need to know about this fact:
- Metadata Service: Updates video information.
- Notification Service: Sends a completion notification to the advertiser.
- Channel Service: Updates the status of related channels.
// Publish transcoding completion event (Event → EventBridge)
await eventBridge.putEvents({
Entries: [{
Source: 'catenoid.transcoding',
DetailType: 'TranscodingCompleted',
Detail: JSON.stringify({
videoId: video.id,
status: 'COMPLETED',
outputPaths: { mp4_720p: '...', mp4_1080p: '...' }
}),
EventBusName: 'catenoid-media-bus'
}]
}).promise();
Each service independently subscribes to this event using EventBridge rules. Even if a new service is added, the Transcoding Service code doesn't need to be touched.
Stability Design: DLQ and Idempotency
Dead Letter Queue (DLQ)
If message processing fails repeatedly, that message is moved to the DLQ. Through this:
- Messages that fail to process are not lost.
- They can be reprocessed after analyzing the cause of failure.
- They don't block the processing of normal messages.
// Setup DLQ when creating an SQS Queue
const queueAttributes = {
RedrivePolicy: JSON.stringify({
deadLetterTargetArn: DLQ_ARN,
maxReceiveCount: '3' // Move to DLQ after 3 failures
})
};
Idempotency
SQS Standard Queues guarantee "At-Least-Once Delivery." This means the same message can be processed twice.
// Prevent duplicate processing with an idempotency key
export const handler = async (event: SQSEvent) => {
for (const record of event.Records) {
const messageId = record.messageId;
// Check if the message has already been processed
const isProcessed = await idempotencyStore.exists(messageId);
if (isProcessed) {
console.log(`Skipping already processed: ${messageId}`);
continue;
}
await processTranscoding(JSON.parse(record.body));
// Record completion
await idempotencyStore.set(messageId, { processedAt: new Date() });
}
};
Bonus: Automatic S3 Cleanup
Temporary chunk files and thumbnails generated during live streaming should be automatically deleted after the stream ends.
// Periodic cleanup using EventBridge Scheduler
export const cleanupHandler = async () => {
const expiredChannels = await db.channels
.findMany({ where: { status: 'ENDED', endedAt: { lt: sevenDaysAgo } } });
for (const channel of expiredChannels) {
await s3.deleteObjects({
Bucket: MEDIA_BUCKET,
Delete: {
Objects: await listChunkFiles(channel.id)
}
}).promise();
}
};
Results
By transitioning to an event-driven architecture:
- Eliminated inter-service coupling: The Upload Service functions normally even if the Transcoding Service goes down.
- Easy to add new features: We can connect new services just by adding EventBridge rules, without modifying existing code.
- Infrastructure cost efficiency: Heavy tasks like transcoding are decoupled into Lambdas, running only when needed.
Conclusion: When is Event-Driven Appropriate?
Event-driven architecture is not always the answer. It is particularly effective in the following cases:
- Long-running asynchronous tasks (video transcoding, sending emails, etc.)
- When multiple services must react to the same event (Pub/Sub pattern)
- Tasks that require retries upon failure (utilizing DLQs)
On the other hand, if an immediate response is required (e.g., payment confirmation) or if processing order is extremely strict, synchronous calls might be more appropriate.