15 min read Last updated on Feb 6, 2026

Design a YouTube-Style Video Platform

A video platform at YouTube scale handles massive upload volumes, transcoding across dozens of resolution/codec combinations, and global delivery to billions of daily viewers. This design covers the upload pipeline, transcoding infrastructure, adaptive streaming delivery, and metadata/discovery systems—focusing on the architectural decisions that enable sub-second playback start times while processing 500+ hours of new video every minute.

High-level architecture: upload → transcode → store → deliver. Metadata flows in parallel to enable immediate discoverability while transcoding completes.

High-level architecture: upload → transcode → store → deliver. Metadata flows in parallel to enable immediate discoverability while transcoding completes.

A video platform’s architecture is shaped by three fundamental constraints:

  1. Video is computationally expensive: A single 10-minute 4K upload generates 50+ output files (resolutions × codecs × bitrates). Transcoding must parallelize across chunked segments to complete in minutes rather than hours.

  2. Latency tolerance varies by phase: Uploads tolerate multi-second latencies; playback start must be < 2 seconds. This asymmetry justifies aggressive CDN caching and segment-level prefetching.

  3. Traffic follows extreme power laws: ~10% of videos receive 90% of views. Hot/warm/cold storage tiering and origin shield caching exploit this distribution.

The core mechanisms:

  • Resumable chunked uploads (tus protocol) handle unreliable connections and multi-gigabyte files
  • Segment-parallel transcoding splits videos into 2-second chunks, transcodes in parallel, reassembles
  • Multi-codec encoding (H.264 for reach, VP9/AV1 for efficiency) optimizes bandwidth vs. compatibility
  • Adaptive Bitrate Streaming (HLS/DASH) with hybrid ABR algorithms balances quality and rebuffering
  • Origin shield + edge caching achieves 95%+ cache hit rates, reducing origin egress dramatically
RequirementPriorityNotes
Video uploadCoreResumable, chunked, up to 256GB files
Video playbackCoreAdaptive streaming, multiple quality levels
Transcoding pipelineCoreMulti-resolution, multi-codec output
Video metadata (title, description, tags)CoreEditable, searchable
Video searchCoreFull-text + filters (duration, date, category)
Thumbnails (auto-generated + custom)CoreMultiple sizes for different contexts
View countingCoreNear real-time, deduplicated
Comments and engagementExtendedThreaded, moderation
RecommendationsExtendedPersonalized, contextual
Live streamingOut of scopeDifferent latency requirements
Monetization/AdsOut of scopeSeparate ad-tech stack
RequirementTargetRationale
Upload availability99.9%Tolerate brief maintenance windows
Playback availability99.99%Revenue-critical, user experience
Upload processing time< 2× video durationUser expectation for availability
Playback start latencyp99 < 2sIndustry benchmark for abandonment
Rebuffering ratio< 0.5% of playback timeQuality threshold
Video qualityVMAF > 93 at target bitratePerceptual quality standard
Storage efficiency30% bandwidth savings via modern codecsCost optimization

YouTube-scale baseline:

Daily active users: 2.5 billion
Hours uploaded per minute: 500+
Daily video views: 5 billion
Upload traffic:
- 500 hours/min × 60 min × 24 hours = 720,000 hours/day
- Average raw file: 2 GB/hour (1080p)
- Daily upload ingestion: ~1.4 PB/day
Storage growth:
- Per video: 50 output files (resolutions × codecs)
- Storage multiplier: ~5x original (transcoded variants)
- Daily storage growth: ~7 PB/day
- Annual growth: ~2.5 EB/year
Playback traffic:
- 5 billion views/day
- Average view duration: 10 minutes
- Average bitrate: 4 Mbps (mixed quality)
- Peak concurrent viewers: 500M (estimate)
- Daily egress: ~150 PB/day

CDN efficiency impact:

Without CDN: 150 PB/day from origin
With 95% cache hit rate: 7.5 PB/day from origin
Cost reduction: 20x origin egress savings

Best when:

  • Smaller scale (< 10K uploads/day)
  • Predictable traffic patterns
  • Cost-sensitive (avoid distributed infrastructure)

Architecture:

  • Single transcoding cluster per region
  • Queue-based job scheduling
  • Linear processing (full video at once)

Trade-offs:

  • Simpler operations
  • Lower infrastructure cost at small scale
  • Transcoding time = video duration × quality ladder size
  • Single failure domain per region
  • Cannot scale transcoding speed for viral uploads

Real-world example: Vimeo (pre-2020) used centralized transcoding. Acceptable for professional content with predictable upload patterns.

Best when:

  • Massive scale (millions of uploads/day)
  • Need fast turnaround for time-sensitive content
  • Global upload sources require regional processing

Architecture:

  • Videos split into 2-second chunks
  • Chunks transcoded in parallel across distributed workers
  • Reassembled into final output streams
  • Custom hardware (ASICs) for encoding efficiency

Trade-offs:

  • Transcoding time independent of video length (parallelized)
  • Elastic scaling for traffic spikes
  • Fault isolation (failed chunk retries, not full video)
  • Complex orchestration layer
  • Chunk boundary artifacts require careful handling
  • Higher infrastructure complexity

Real-world example: YouTube’s Video Coding Unit (VCU) ASIC achieves 20-33x efficiency over software encoding. Netflix processes 250,000 jobs per 30-minute episode.

FactorCentralizedDistributed Chunk-Based
Processing latencyO(video duration)O(1) with enough workers
ScalabilityVertical (bigger machines)Horizontal (more workers)
Failure blast radiusFull video re-encodeSingle chunk retry
Infrastructure costLower at small scaleLower at large scale
Operational complexitySimpleHigh
Best for< 10K uploads/day> 100K uploads/day

This article focuses on Path B (Distributed Chunk-Based) because:

  1. YouTube-scale requires parallelization to meet processing SLAs
  2. The chunking approach enables interesting optimizations (per-shot quality, scene detection)
  3. Modern ABR streaming (HLS/DASH) is segment-native, aligning with chunk-based encoding

System components: upload flow (top-left), processing (center), storage (bottom-left), discovery (center-right), delivery (right).

System components: upload flow (top-left), processing (center), storage (bottom-left), discovery (center-right), delivery (right).
  1. Client initiates resumable upload → receives upload URI and session token
  2. Client uploads in chunks (5MB default) → server tracks received ranges
  3. On completion: validate checksum, store original, queue for processing
  4. Metadata extracted (duration, resolution, codec) and stored immediately
  5. Video becomes searchable before transcoding completes (thumbnail + metadata)
  1. Segmentation: Split video into 2-second GOP-aligned chunks
  2. Analysis: Scene detection, shot boundaries, content classification
  3. Parallel encoding: Each chunk transcoded to all target formats
  4. Quality validation: VMAF score per segment, re-encode if below threshold
  5. Assembly: Concatenate chunks into continuous streams
  6. Manifest generation: Create HLS/DASH manifests pointing to segments
  1. Client requests manifest → CDN serves cached or origin-fetched manifest
  2. ABR algorithm selects initial quality based on estimated bandwidth
  3. Segments fetched from nearest edge → playback begins
  4. Continuous adaptation: Quality switches based on buffer level and throughput
  5. Metrics collected: Startup time, rebuffering events, quality switches

The tus protocol provides HTTP-based resumable uploads, critical for large files over unreliable networks.

Protocol flow:

Resumable upload: client queries offset after disconnection, resumes from last confirmed position.

Resumable upload: client queries offset after disconnection, resumes from last confirmed position.

Key protocol headers:

HeaderPurpose
Upload-LengthTotal file size (optional for streaming)
Upload-OffsetByte position for this chunk
Tus-ResumableProtocol version (1.0.0)
Upload-MetadataBase64-encoded key-value pairs (filename, content-type)

Chunk size considerations:

Chunk SizeProsCons
1 MBFine-grained resumeHigher overhead (more requests)
5 MB (default)BalancedGood for most networks
25 MBLower overheadLarger retransmission on failure

Diagram

Validation checks:

  • File format: Supported containers (MP4, MOV, MKV, WebM, AVI)
  • Duration: Maximum 12 hours (configurable per channel)
  • Resolution: Up to 8K (7680×4320)
  • File size: Up to 256 GB
  • Audio tracks: Maximum 8 tracks

Automated thumbnails:

  1. Extract frames at 25%, 50%, 75% of duration
  2. Run scene detection, select visually distinct frames
  3. Apply quality scoring (sharpness, face detection, composition)
  4. Generate sprite sheet for scrubbing preview (every 10 seconds)

Output formats:

Use CaseDimensionsFormat
Search results320×180WebP/JPEG
Watch page640×360WebP/JPEG
Large player1280×720WebP/JPEG
Scrub preview160×90 (sprite)WebP

Transcoding pipeline: demux → analyze → encode (multi-codec × multi-resolution) → package for streaming.

Transcoding pipeline: demux → analyze → encode (multi-codec × multi-resolution) → package for streaming.
CodecCompression vs H.264Browser SupportEncode ComplexityUse Case
H.264 (AVC)BaselineUniversal1xDefault fallback
H.265 (HEVC)50% betterSafari, iOS, some Android2-4xApple ecosystem
VP950% betterChrome, Firefox, Edge, Android2-3xYouTube default
AV130-50% vs VP9Chrome, Firefox, Edge, Safari 17+5-10xBandwidth-critical

Encoding strategy:

  1. Always encode H.264: Universal fallback for all devices
  2. Default to VP9: Primary codec for modern browsers (Chrome 80%+ market share)
  3. AV1 for popular content: Encode after 1000+ views (amortize high encode cost)
  4. HEVC for Apple devices: Safari/iOS don’t support VP9

Per-title encoding optimizes bitrate per content type. Action films need higher bitrates than static presentations.

Standard ladder (VP9):

ResolutionBitrate RangeFPSNotes
4K (2160p)12-20 Mbps30/60High motion: 20 Mbps
1440p6-10 Mbps30/60Gaming content default
1080p3-6 Mbps30/60Most common
720p1.5-3 Mbps30Mobile default
480p0.5-1 Mbps30Bandwidth constrained
360p0.3-0.5 Mbps30Minimum viable
240p0.15-0.3 Mbps30Extreme constraints
144p0.05-0.1 Mbps30Audio-focused content

Per-title optimization:

Standard approach: Fixed bitrate ladder (same for all videos)
Per-title approach: Analyze content complexity, adjust bitrates
Example - Documentary vs Action Movie at 1080p:
- Documentary (low motion): 2.5 Mbps achieves VMAF 95
- Action movie (high motion): 5 Mbps needed for VMAF 95
Result: 50% bandwidth savings on documentaries without quality loss

Netflix reported 20% average bandwidth savings from per-title encoding, with some titles achieving 50%+ reductions.

Segmentation strategy:

  1. GOP alignment: Split at keyframe boundaries (every 2-4 seconds)
  2. Scene boundaries: Prefer splits at scene changes
  3. Uniform chunks: Maintain consistent segment duration for ABR

Parallel encoding flow:

Input: 10-minute video (300 seconds)
Chunk duration: 2 seconds
Total chunks: 150
Without parallelization:
- Encode time per codec/resolution: ~video duration
- Total variants: 8 resolutions × 3 codecs = 24
- Serial time: 24 × 10 min = 240 minutes (4 hours)
With parallelization (150 workers):
- Each worker encodes 1 chunk × 24 variants
- Per-chunk encode time: ~2 seconds × 24 = 48 seconds
- Total time: 48 seconds + assembly overhead
- Speedup: ~300x

Boundary handling:

Chunks must overlap slightly to prevent artifacts at boundaries:

  • Include 1-2 frames of context from adjacent chunks
  • Trim overlap during assembly
  • Validate continuous motion across boundaries

VMAF (Video Multimethod Assessment Fusion):

Netflix’s open-source perceptual quality metric, correlating strongly with human perception.

VMAF ScoreQuality Level
93+Excellent (target)
85-93Good
70-85Fair
< 70Poor (re-encode)

QC pipeline:

  1. Compute VMAF score per segment (source vs. encoded)
  2. Flag segments below threshold (< 93)
  3. Re-encode flagged segments at higher bitrate
  4. Iterate until quality target met or max bitrate reached
FeatureHLSDASH
Standard bodyApple proprietary (RFC 8216)ISO/IEC 23009-1
Manifest formatM3U8 (text playlist)MPD (XML)
Segment formatTS or fMP4fMP4, WebM
Apple supportFullNot supported (Safari)
DRMFairPlayWidevine, PlayReady
Low-latency variantLL-HLS (2-5s)LL-DASH (2-5s)

YouTube’s approach: DASH for most browsers, HLS for Safari/iOS. Manifest generator outputs both formats from same encoded segments (CMAF).

HLS Multivariant Playlist:

#EXTM3U
#EXT-X-VERSION:7
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080,CODECS="avc1.640028,mp4a.40.2"
1080p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720,CODECS="avc1.64001f,mp4a.40.2"
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=854x480,CODECS="avc1.64001e,mp4a.40.2"
480p/playlist.m3u8

Media Playlist (per quality):

#EXTM3U
#EXT-X-VERSION:7
#EXT-X-TARGETDURATION:4
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:4.000,
segment_0001.m4s
#EXTINF:4.000,
segment_0002.m4s
#EXTINF:4.000,
segment_0003.m4s
#EXT-X-ENDLIST

Three algorithm families:

  1. Throughput-based: Select bitrate based on measured download speed

    estimated_bandwidth = bytes_downloaded / download_time
    safe_bitrate = estimated_bandwidth × 0.7 (safety margin)
    select: highest quality where bitrate < safe_bitrate
  2. Buffer-based (BOLA): Select based on buffer occupancy

    if buffer > 30s: select highest quality
    if buffer < 10s: select lowest quality
    linear interpolation between thresholds
  3. Hybrid (industry standard): Combine throughput + buffer

    throughput_bitrate = estimate from recent segments
    buffer_factor = buffer_level / target_buffer (0.0 to 1.0)
    selected_bitrate = throughput_bitrate × buffer_factor

Startup behavior:

  1. Start at conservative quality (720p or lower)
  2. Prefetch multiple segments before playback
  3. Ramp up quality as buffer builds

Quality switch constraints:

  • Minimum dwell time: 10 seconds at current quality
  • Maximum quality drop: 2 levels per switch (prevent oscillation)
  • Buffer emergency threshold: Drop to lowest immediately if < 5 seconds
DurationProsCons
2 secondsLower latency, faster adaptationMore requests, higher overhead
4 secondsBalancedStandard choice
6 secondsFewer requests, better compressionSlower adaptation
10 secondsBest compression efficiencyToo slow for ABR

YouTube uses 2-4 second segments; Netflix uses 4-6 seconds. Lower durations improve responsiveness but increase CDN request volume.

Three-tier caching: edge (95% hit rate), shield (99% cumulative), origin (handles 1% of requests).

Three-tier caching: edge (95% hit rate), shield (99% cumulative), origin (handles 1% of requests).

Cache hit rate targets:

TierHit RatePurpose
Edge90-95%Serve most requests from nearest PoP
Origin Shield95-99%Catch edge misses, protect origin
Origin~1% requestsServe long-tail content

Without origin shield:

100 edge PoPs × 10% miss rate = 10% of total traffic to origin per PoP
If 1000 concurrent requests per PoP for same video:
100 × 100 = 10,000 origin requests

With origin shield:

100 edge PoPs → 5 shield regions → 1 origin
Shield consolidates: 10,000 potential requests → 5 requests (one per shield)
Origin load reduction: 2000x

AWS reports 95% origin egress reduction with CloudFront Origin Shield for video workloads.

Optimal cache key structure:

/{video_id}/{quality}/{codec}/{segment_number}.m4s
Example: /abc123/1080p/vp9/segment_0042.m4s

What to exclude from cache key:

  • Session tokens
  • User-specific parameters
  • Timestamp-based cache busters (use segment number instead)
  • Analytics parameters

Multi-CDN consistency:

When using multiple CDN providers, normalize cache keys:

  • Same path structure across all CDNs
  • Consistent query parameter handling (strip or include)
  • Standardized Cache-Control headers

Routing decision factors:

FactorImplementation
Geographic proximityDNS-based geo routing
CDN availabilityHealth checks, automatic failover
Cost optimizationRoute to cheapest CDN per region
PerformanceReal-user metrics, synthetic monitoring

Failover architecture:

Diagram

TierAccess PatternStorage TypeCostLatency
HotRecent uploads, trendingSSD/NVMe$$$< 10ms
WarmModerate views (1-100/day)HDD$$50-100ms
ColdLong-tail (< 1 view/day)Object storage$100-500ms
ArchiveOriginal raw filesGlacier-class¢Hours

Lifecycle policy:

Upload: → Hot tier (30 days)
→ Warm tier (views > 10/day) OR Cold tier
→ Archive (raw originals after 90 days)
→ Delete cold if views = 0 for 365 days

Per-video storage:

Input: 10-minute 1080p video (original: 500 MB)
Transcoded outputs:
- 8 resolutions × 3 codecs × average segment count
- Plus: thumbnails, sprite sheets, manifests
Typical expansion:
- H.264 variants: 800 MB
- VP9 variants: 500 MB
- AV1 variants: 400 MB (if encoded)
- Thumbnails/metadata: 10 MB
Total: ~1.7 GB (3.4x original)
With original retention: ~2.2 GB (4.4x)

Fleet sizing for 1 EB storage:

1 EB = 1,000 PB = 1,000,000 TB
Using 18 TB HDDs:
- Raw capacity needed: 1,000,000 TB
- With replication (3x): 3,000,000 TB
- Drives needed: 166,667 drives
- Drives per server (12): 13,889 servers

Multi-region replication:

Content TypeReplicationRationale
Hot (popular)3 regionsLow latency globally
Warm2 regionsCost vs. latency balance
Cold1 region + archiveCost optimization
Original2 regions + archiveDisaster recovery
-- Core video record
CREATE TABLE videos (
video_id UUID PRIMARY KEY,
channel_id UUID NOT NULL REFERENCES channels(id),
title VARCHAR(100) NOT NULL,
description TEXT,
duration_seconds INTEGER NOT NULL,
upload_timestamp TIMESTAMPTZ NOT NULL,
publish_timestamp TIMESTAMPTZ,
-- Processing state
status VARCHAR(20) NOT NULL DEFAULT 'processing',
-- processing, ready, failed, deleted
-- Computed metrics (denormalized)
view_count BIGINT DEFAULT 0,
like_count BIGINT DEFAULT 0,
comment_count INTEGER DEFAULT 0,
-- Content signals
category_id INTEGER,
language VARCHAR(10),
age_restricted BOOLEAN DEFAULT false,
-- Indexes
CONSTRAINT valid_status CHECK (status IN ('processing', 'ready', 'failed', 'deleted'))
);
CREATE INDEX idx_videos_channel ON videos(channel_id, publish_timestamp DESC);
CREATE INDEX idx_videos_category ON videos(category_id, publish_timestamp DESC);
CREATE INDEX idx_videos_trending ON videos(view_count DESC)
WHERE status = 'ready' AND publish_timestamp > NOW() - INTERVAL '7 days';

Elasticsearch mapping:

{
"mappings": {
"properties": {
"video_id": { "type": "keyword" },
"title": {
"type": "text",
"analyzer": "standard",
"fields": {
"exact": { "type": "keyword" },
"autocomplete": { "type": "search_as_you_type" }
}
},
"description": { "type": "text" },
"channel_name": {
"type": "text",
"fields": { "exact": { "type": "keyword" } }
},
"tags": { "type": "keyword" },
"category": { "type": "keyword" },
"duration_seconds": { "type": "integer" },
"view_count": { "type": "long" },
"publish_date": { "type": "date" },
"language": { "type": "keyword" },
"transcript": {
"type": "text",
"analyzer": "standard"
}
}
}
}

Search query example:

{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "kubernetes tutorial",
"fields": ["title^3", "description", "tags^2", "transcript"]
}
}
],
"filter": [{ "term": { "language": "en" } }, { "range": { "duration_seconds": { "gte": 300, "lte": 1800 } } }]
}
},
"sort": [{ "_score": "desc" }, { "view_count": "desc" }]
}

Challenge: Accurate, near-real-time view counting at billions of views/day while preventing fraud.

Architecture:

Diagram

Deduplication strategy:

  • Bloom filter per video_id (1-hour window)
  • Key: hash(video_id + user_id + IP + user_agent)
  • False positive rate: 1% (acceptable, slightly undercounts)

Fraud signals:

  • View duration < 30 seconds: Don’t count
  • Same IP, many views, short intervals: Rate limit
  • Suspicious patterns: ML-based fraud detection

Recommendations drive 70%+ of YouTube watch time. The system balances:

  1. Relevance: Content similar to current video
  2. Personalization: User’s historical preferences
  3. Exploration: Expose users to new content
  4. Freshness: Boost recent uploads

Two-stage recommendation: retrieve candidates from embedding index, rank with full model.

Two-stage recommendation: retrieve candidates from embedding index, rank with full model.
SignalSourceWeight
Watch timePlayback eventsHigh
Likes/dislikesExplicit feedbackHigh
CommentsEngagementMedium
SharesSocial signalsMedium
Search historyIntent signalsMedium
SubscriptionsLong-term preferenceMedium
Video co-watchCollaborative filteringMedium
Content similarityVideo embeddingsLow-Medium

Core responsibilities:

  1. Manifest parsing: HLS/DASH support
  2. ABR algorithm: Quality selection logic
  3. Buffer management: Segment prefetching
  4. Codec negotiation: Select supported codec/container
  5. DRM handling: License acquisition, key rotation
  6. Metrics collection: QoE telemetry

Buffer strategy:

Target buffer: 30 seconds
Minimum for playback start: 5 seconds
Low watermark (quality down): 10 seconds
High watermark (quality up): 25 seconds
Maximum (cap prefetch): 60 seconds

Time-to-first-byte targets:

PhaseTargetOptimization
DNS resolution< 50msDNS prefetch
TLS handshake< 100msTLS 1.3, session resumption
Manifest fetch< 200msCDN edge cache
First segment< 500msPreload hint, small init segment
Total startup< 2000msEnd-to-end target

Preload strategies:

<!-- DNS prefetch for CDN -->
<link rel="dns-prefetch" href="//cdn.example.com" />
<!-- Preconnect to establish TLS -->
<link rel="preconnect" href="https://cdn.example.com" />
<!-- Preload manifest -->
<link rel="preload" href="/video/abc/manifest.m3u8" as="fetch" />
ConstraintMitigation
Battery drainPrefer hardware decode (H.264/HEVC)
Data usageDefault to 480p on cellular
Memory limitsLimit buffer to 30 seconds
Background restrictionsPause prefetch when backgrounded
Network variabilityMore conservative ABR
ComponentPurposeOptions
Object storageRaw + encoded videosS3, GCS, Azure Blob, MinIO
Transcoding computeEncoding workersVMs, Containers, GPU instances
CDNGlobal deliveryCloudFront, Fastly, Akamai, Cloudflare
Message queueJob schedulingKafka, SQS, Pub/Sub, RabbitMQ
Metadata DBVideo recordsPostgreSQL, MySQL, CockroachDB
SearchDiscoveryElasticsearch, OpenSearch, Meilisearch
CacheHot metadataRedis, Memcached
MetricsTelemetryPrometheus, InfluxDB, Datadog

AWS deployment: S3 for storage, MediaConvert or Batch for transcoding, CloudFront with Origin Shield for delivery.

AWS deployment: S3 for storage, MediaConvert or Batch for transcoding, CloudFront with Origin Shield for delivery.

Service selection:

ServiceUse CaseWhy
S3 + S3 GlacierVideo storageTiered cost, 11 nines durability
MediaConvertManaged transcodingNo infrastructure management
AWS Batch + GPUCustom transcodingFull control, custom codecs
CloudFrontCDNOrigin Shield, Lambda@Edge
RDS PostgreSQLMetadataManaged, Multi-AZ
OpenSearchSearchManaged Elasticsearch
ElastiCache RedisCachingSub-ms latency
Managed ServiceSelf-HostedWhen to Self-Host
MediaConvertFFmpeg + custom workersCustom codecs, cost at scale
CloudFrontNginx + VarnishMulti-CDN, specific routing
OpenSearchElasticsearchPlugin requirements
ElastiCacheRedis OSSRedis modules, specific configs

Designing a YouTube-scale video platform requires optimizing for fundamentally different access patterns across the pipeline:

Key architectural decisions:

  1. Resumable chunked uploads handle multi-GB files over unreliable networks
  2. Segment-parallel transcoding achieves O(1) processing time regardless of video length
  3. Multi-codec strategy (H.264 + VP9 + selective AV1) balances reach and bandwidth efficiency
  4. Per-title encoding saves 20-50% bandwidth by adapting bitrate ladders to content complexity
  5. Origin shield caching reduces origin egress by 95%+, critical for cost and scale
  6. Hybrid ABR algorithms balance quality maximization with rebuffering prevention

What this design optimizes for:

  • Fast upload processing (minutes, not hours)
  • Sub-2-second playback start
  • Minimal rebuffering (< 0.5% of playback time)
  • Efficient bandwidth usage (modern codecs for capable devices)

What this design sacrifices:

  • Low-latency live streaming (different architecture needed)
  • Simple operations (distributed transcoding adds complexity)
  • Storage efficiency vs. compatibility (multiple codec variants)

When to choose this design:

  • User-generated video platforms at scale
  • VOD streaming services
  • Any system where upload volume justifies parallel transcoding
  • Video encoding concepts: codecs, containers, bitrate
  • Streaming protocols: HLS, DASH fundamentals
  • CDN architecture: edge caching, origin shield
  • Distributed systems: message queues, eventual consistency
TermDefinition
ABRAdaptive Bitrate—dynamically selecting video quality based on network conditions
GOPGroup of Pictures—sequence of frames starting with a keyframe
HLSHTTP Live Streaming—Apple’s adaptive streaming protocol
DASHDynamic Adaptive Streaming over HTTP—ISO standard streaming protocol
VMAFVideo Multimethod Assessment Fusion—perceptual quality metric
TranscodingConverting video from one format/resolution/codec to another
ManifestPlaylist file describing available streams and segments (M3U8 or MPD)
SegmentChunk of video (typically 2-6 seconds) for ABR streaming
Origin shieldIntermediate cache layer protecting origin from edge cache misses
Bitrate ladderSet of quality levels (resolution + bitrate combinations)
Per-title encodingCustomizing bitrate ladder based on content complexity
VCUVideo Coding Unit—custom ASIC for hardware-accelerated encoding
  • Video platforms require three distinct subsystems: upload/processing, storage/delivery, metadata/discovery
  • Chunk-based parallel transcoding enables processing speed independent of video duration
  • Multi-codec encoding (H.264 + VP9 + AV1) trades storage for bandwidth efficiency
  • Origin shield + edge caching achieves 95%+ cache hit rates, reducing origin load 20x+
  • Hybrid ABR algorithms (throughput + buffer) provide best quality-of-experience
  • Per-title encoding saves 20-50% bandwidth by adapting to content complexity
  • Hot/warm/cold storage tiering exploits power-law view distribution
Continue Reading
  • Previous

    Design a Time Series Database

    System Design / System Design Problems 21 min read

    A comprehensive system design for a metrics and monitoring time-series database (TSDB) handling high-velocity writes, efficient compression, and long-term retention. This design addresses write throughput at millions of samples/second, sub-millisecond queries over billions of datapoints, cardinality management for dimensional data, and multi-tier storage for cost-effective retention.

  • Next

    Design Netflix Video Streaming

    System Design / System Design Problems 15 min read

    Netflix serves 300+ million subscribers across 190+ countries, delivering 94 billion hours of content in H2 2024 alone. Unlike user-generated video platforms (YouTube), Netflix is a consumption-first architecture—the challenge is not upload volume but delivering pre-encoded content with sub-second playback start times while optimizing for quality-per-bit across wildly different devices and network conditions. This design covers the Open Connect CDN, per-title/shot-based encoding pipeline, adaptive bitrate delivery, and the personalization systems that drive 80% of viewing hours.