📑 Table of Contents

  1. Problem Understanding
  2. Requirements & Scope
  3. High-Level Design
  4. Hash Function Deep Dive
  5. Detailed Design
  6. Advanced Considerations
  7. Reference Materials

1. Problem Understanding

🧠 What is a URL Shortener?

A URL shortener takes a long URL and creates a much shorter alias that redirects to the original URL. When users click the short URL, they are redirected to the original long URL.

Example:

Core Operations:

  1. URL Shortening: Given a long URL → return a shorter URL
  2. URL Redirecting: Given a shorter URL → redirect to the original URL

Why URL Shorteners?


2. Requirements & Scope

🎯 Key Clarifying Questions

Basic Functionality:

Advanced Features:

Scale Considerations:

📋 Functional Requirements

📊 Non-Functional Requirements

📐 Back-of-the-Envelope Estimation

Traffic Analysis:

Bandwidth Estimation:

Memory Estimation:


3. High-Level Design

🏗️ System Architecture

[Client] → [Web Server] → [Load Balancer] → [URL Shortening Service]
                                                      ↓
                                            [Database] ← [Cache]

🔌 API Design

1. URL Shortening

POST /api/v1/data/shorten
Request: {longUrl: longURLString}
Response: shortURL

2. URL Redirecting

GET /api/v1/shortUrl
Response: longURL for HTTP redirection

🔄 URL Redirecting Flow

301 vs 302 Redirect:

Implementation Approach:

Use hash table to store <shortURL, longURL> mappings:

🎯 URL Shortening Flow

High-Level Process:

  1. User submits long URL
  2. Check if URL already exists in database
  3. If exists, return existing short URL
  4. If not, generate new short URL using hash function
  5. Store mapping in database
  6. Return short URL to user

4. Hash Function Deep Dive

🔢 Hash Value Length

Character Set: [0-9, a-z, A-Z] = 62 possible characters

Length Analysis:

🛠️ Hash Function Approaches

Approach 1: Hash + Collision Resolution

Method:

Example:

URL: https://en.wikipedia.org/wiki/Systems_design
CRC32: 5cb54054 (take first 7: 5cb5405)
MD5: 5a62509c (take first 7: 5a62509)
SHA-1: 0fddc8e7b (take first 7: 0fddc8e)

Collision Resolution Process:

  1. Hash the URL, take first 7 characters
  2. Check if shortURL exists in database
  3. If collision detected, append predefined string and rehash
  4. Repeat until unique shortURL found

Optimization - Bloom Filters:

Pros:

Cons:

Approach 2: Base 62 Conversion

Method:

Detailed Algorithm:

Base 62 conversion process for 11157₁₀:

Step 1: 11157 ÷ 62 = 179 remainder 59 → 'X' (59 maps to X)
Step 2: 179 ÷ 62 = 2 remainder 55 → 'T' (55 maps to T)  
Step 3: 2 ÷ 62 = 0 remainder 2 → '2' (2 maps to 2)
Reading backwards: 2TX
Result: https://tinyurl.com/2TX

Character Mapping Table:

0-9: 0,1,2,3,4,5,6,7,8,9
10-35: a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z
36-61: A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z

Dependency: Unique ID Generator

Requires distributed unique ID generator (covered in Chapter 7):

Pros:

Cons:

⚖️ Comparison Summary

Aspect Hash + Collision Base 62 Conversion
Shortness ✅ Fixed length ✅ Fixed length
Uniqueness ❌ Requires collision handling ✅ Guaranteed unique
Performance ❌ DB queries needed ✅ Fast computation
Predictability ✅ Less predictable ❌ Sequential patterns
Implementation ❌ More complex ✅ Simpler

Recommendation: Use Base 62 conversion with unique ID generator


5. Detailed Design

🗄️ Data Model

Database Schema:

Table: url_mappings
- id: BIGINT (Primary Key)
- shortURL: VARCHAR(7) (Unique Index)
- longURL: TEXT
- createdAt: TIMESTAMP
- expiredAt: TIMESTAMP (NULL for permanent URLs)

Why Database over Hash Table:

URL Encoding Deep Dive

Base 62 Encoding Implementation:

Base 62 uses characters: [0-9a-zA-Z]

0 → 0, 1 → 1, ..., 9 → 9
10 → a, 11 → b, ..., 35 → z  
36 → A, 37 → B, ..., 61 → Z

Algorithm Steps:

  1. Start with unique integer ID (e.g., 2009215674938)
  2. Convert to base 62:
    • 2009215674938 % 62 = 26 → ‘q’
    • 2009215674938 / 62 = 32406382338
    • 32406382338 % 62 = 52 → ‘Q’
    • Continue until quotient is 0
  3. Reverse the result: “Qq…” → final short URL

�🔄 URL Shortening Detailed Flow

Step-by-Step Process:

  1. Input Validation: Check if longURL is valid format
  2. Duplicate Check: Query database for existing longURL
  3. Return if Found: If exists, return cached shortURL
  4. Generate ID: Create unique ID using distributed ID generator
  5. Encode: Transform ID to shortURL using base 62
  6. Store: Save (ID, shortURL, longURL, timestamps) in database
  7. Cache Update: Store mapping in cache for future reads
  8. Return: Send shortURL back to client

Example Walkthrough:

Input: https://en.wikipedia.org/wiki/Systems_design
Unique ID: 2009215674938
Base 62 conversion: 2009215674938 → "zn9edcu"
Result: https://tinyurl.com/zn9edcu

🚀 URL Redirecting Detailed Flow

With Caching Optimization:

  1. Input: User clicks shortURL
  2. Load Balancer: Routes request to web server
  3. Cache Check: Look for shortURL in cache first
  4. Cache Hit: If found, redirect immediately
  5. Cache Miss: Query database for longURL
  6. Store in Cache: Cache the result for future requests
  7. Redirect: Return longURL with HTTP redirect status

Performance Benefits:


6. Advanced Considerations

Unique ID Generator Options

Option 1: Multi-master replication

Option 2: UUID (Universally Unique Identifier)

Option 3: Ticket Server

Option 4: Twitter Snowflake

�🛡️ Rate Limiting

Problem: Malicious users might overwhelm system with requests

Solution: Implement rate limiting based on:

📊 Analytics

Tracking Capabilities:

Implementation:

🔍 Web Server Scaling

Horizontal Scaling:

🗄️ Database Scaling

Techniques:

Sharding Strategy:

🔒 Security Considerations

Threats and Mitigations:

📈 Availability & Reliability

High Availability Strategies:

⏱️ URL Expiration

Expiration Strategies:


Interview Wrap-up Topics

🎯 Additional Discussion Points

If there’s extra time in the interview, consider these advanced topics:

Rate Limiting:

Web Server Scaling:

Database Scaling:

Analytics Integration:

Availability, Consistency, and Reliability:


7. Reference Materials

System Design Fundamentals

Industry Examples

Technical Deep Dives

Distributed Systems Concepts

Best Practices