How I Built A Video Encoding And Streaming Service
A journey through building a scalable video processing pipeline from bash scripts to Go-powered distributed encoding workers
⚠️ _Warning: This blog is 30% best practices, 70% 'Oh God Why' moments. Proceed with caution._
Just a small intro (skippable). This is Aman, a 4th year college student. I have always loved videos and how they work internally. And this is how I tried building a scalable video processing pipeline (not a success story).
Phase 1: Initial Build ("How hard could it be?")
Tools: ffmpeg, bash, prayer
At first, my main goal was to achieve maximum video compression. Don't ask me why. I just like comparing the initial numbers ;)
So, I put together a simple Bash script that takes a video as input and compresses it using the H.264 codec with the "slow" preset. To be honest, it did the job. But wait, what's the difference? I can just pass the arguments to ffmpeg directly and call it a day.
Phase 2: Happy Phase ("Look, it works! For my 2 friends")
Tools: Nodejs, FFMPEG
I upgraded from my janky Bash script to a _real_ API. Users could upload videos, and my pipeline would magically encode them! I even Googled _"how to FFmpeg"_ and discovered flags (life-changing).
My "research" taught me that compressing a video isn't just about making it smaller — it's a blood pact between encode time, codec compatibility, and quality. For example, `-crf 23` it was a lifesaver.
I cut ties with AWS S3 forever, choosing __Cloudflare R2__ instead, and deployed it all on a $5/month VPS.
The prototype was simple: a Node.js API that took uploads, ran FFmpeg commands, and dumped outputs into R2. My achievement was an FFmpeg preset that _mostly_ worked:
ffmpeg -i input.mp4 -c:v libx264 -preset medium -crf 23 -c:a aac -b:a 128k output.mp4
Phase 3: The Initial Launch ("Why so slow?")
I stumbled upon a harsh truth: "Do not stream mp4 files" which made sense as streaming a whole MP4 file through a single HTTP request can be expensive.
My app "worked," but users waited longer for a video to load than it took to film it. HLS and DASH protocols that chop videos into chunks and use a manifest file for playback. I thought, ignoring the fact that my entire pipeline ran on a single server as an API, encoder, and packager.
Current Architecture:
User Upload → Single Server (API + Encoding) → R2 Bucket
Everything looks good, right? Nah. I was so innocent that I used a single server to serve the api and encode the videos. Trust me, I am not that dumb :( Now I have to figure out how to make the API server and video encoding worker work together.
So I came up with this.
When a user uploaded a video to S3, an event notification triggered an AWS Lambda function. The Lambda function initiated an ECS Fargate task to perform the video encoding process. The Fargate task executed FFmpeg to transcode the video into HLS and DASH segments, along with their respective manifest files. The resulting output files were stored in Cloudflare R2 for cost efficiency. A webhook was used to update the job status, although it occasionally returned inconsistent information.
Here's the script and architecture I used:
#!/bin/bash
set -e
AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY
R2_ACCESS_KEY_ID=$R2_ACCESS_KEY_ID
R2_SECRET_ACCESS_KEY=$R2_SECRET_ACCESS_KEY
R2_ENDPOINT=$R2_ENDPOINT
R2_BUCKET=$R2_BUCKET
if [ -z "$LOCAL_TEST" ]; then
S3_BUCKET=${S3_BUCKET}
S3_KEY=${S3_KEY}
VIDEO_FILE="/tmp/${S3_KEY}"
HLS_OUTPUT_DIR="/tmp/hls"
aws s3 cp s3://${S3_BUCKET}/${S3_KEY} ${VIDEO_FILE}
else
VIDEO_FILE=${LOCAL_VIDEO_FILE}
HLS_OUTPUT_DIR=${LOCAL_HLS_OUTPUT_DIR}
fi
mkdir -p ${HLS_OUTPUT_DIR}
OUTPUT_BASE_NAME=$(basename "${VIDEO_FILE%.*}")
UPLOAD_DIR=$(dirname "${S3_KEY}")
ffmpeg -i "${VIDEO_FILE}" \
-map 0:v -map 0:a -map 0:v -map 0:a \
-filter:v:0 "scale=-2:480" -c:v:0 libx264 -preset veryfast -crf 23 -c:a:0 aac -b:a 192k \
-filter:v:1 "scale=-2:720" -c:v:1 libx264 -preset veryfast -crf 23 -c:a:1 aac -b:a 192k \
-hls_time 10 -hls_playlist_type vod \
-hls_segment_filename "${HLS_OUTPUT_DIR}/segment_%v%03d.ts" \
-start_number 0 -var_stream_map "v:0,a:0 v:1,a:1" \
-master_pl_name master.m3u8 "${HLS_OUTPUT_DIR}/stream%v.m3u8"
echo "Transcoding completed."
mkdir -p /root/.config/rclone
cat <<EOF > /root/.config/rclone/rclone.conf
[myr2]
type = s3
provider = Cloudflare
access_key_id = ${R2_ACCESS_KEY_ID}
secret_access_key = ${R2_SECRET_ACCESS_KEY}
endpoint = ${R2_ENDPOINT}
EOF
rclone sync --transfers 100 ${HLS_OUTPUT_DIR} myr2:${R2_BUCKET}/${UPLOAD_DIR}/${OUTPUT_BASE_NAME} --config /root/.config/rclone/rclone.conf
rm -rf ${VIDEO_FILE} ${HLS_OUTPUT_DIR}
This was way better than the previous implementation and scaled better.
Phase 4: Migrating the Codebase to Golang
I went into video tech hibernation after the last implementation. And at that time, GoLang caught my attention. I started diving a little into GoLang. And after 2 months or so, I wanted to rewrite the whole backend in golang as it's way easier to deploy to the cloud (JK, I wanted to casually drop "Golang backend" when someone asked about the stack).
But the null pointer exception made life hard. Once things started working, I for some reason decided not to use AWS can make my own implementation on worker-queue and manage the encoding workers myself.
The real challenge? Distributing jobs across multiple encoding workers, each with its own hardware specs. I started with round-robin because, well, it sounded fair, until I remembered that equal distribution doesn't play well with unequal CPUs. Classic rookie mistake.
- Each job enters a Redis-backed pipeline.
- We generate a hash of the job and append it to a Redis list.
- Then, publish a notification to alert workers.
- Each worker listens and, if it's free or its CPU isn't having a meltdown, it accepts the job.
Whenever the worker is free or CPU usage is below the threshold, it can accept the job. The best part about this is that the worker nodes are independent.
Phase 5: Optimizing the Encoding Process
Once basic encoding was working, I noticed a problem: fast-paced scenes looked like Minecraft gameplay. Turns out, using a fixed bitrate for the entire video isn't smart — action scenes need more bits than a static interview.
That led me into the world of Per-Title Encoding and Two-Pass Encoding. Most of what I learned came from deep-diving Netflix and Bunny.net blog posts. Here's the gist:
- Per-Title Encoding adjusts the bitrate ladder based on the video's complexity.
- Two-Pass Encoding analyzes the video first, then optimizes bitrate allocation for quality and efficiency.
I also discovered the concepts of temporal and spatial complexity, both crucial for deciding how many bits are required for a scene.
I wired all this into my pipeline using FFmpeg, which handled:
- Scene analysis
- Two-pass processing
- Smarter bitrate control
The result? Better quality, lower bandwidth, and no more Minecraft gameplay during fight scenes.
After getting the encoding process up and running, I wanted to optimize for bandwidth and ensure broad compatibility across devices. That's when I remembered that platforms like __Hotstar__ use a tool called __Bento4__ for video packaging.
Bento4 breaks the video into __fragmented MP4 files__ (.m4s), making it suitable for adaptive streaming. With recent updates, __HLS now supports fragmented MP4__, which means we can generate __common media segments__ for both __DASH__ and __HLS__. This not only simplifies the packaging workflow but also saves on __bandwidth and storage__, since we're not duplicating content for different streaming protocols.
Conclusion
I started encoding videos with a single FFmpeg command, just enough to get things started. Over time, I tweaked and optimized it a bit. I really wanted to dive into hardware acceleration using __NVENC__, but I realized the rental cost for GPU instances was more than my life savings.
I also came to understand that __video compression__ is a deep field, way more complex than just running FFmpeg with a few flags. There's an entire world of codec tuning, hardware-level acceleration, and perceptual quality metrics still to explore.
I've barely scratched the surface, but I'm excited to keep learning, and I plan to share more along the way through future blog posts.
LINKS
Live: https://streamscale-dev.aksdev.me/
GitHub: https://github.com/amankumarsingh77/streamscale
Email: amankumarsingh7702@gmail.com
System Status: Nominal // 2025