📑 Table of Contents

  1. Problem Understanding
  2. Requirements & Scope
  3. High-Level Design Options
  4. Twitter Snowflake Deep Dive
  5. Implementation Details
  6. Advanced Considerations
  7. Alternative Approaches

1. Problem Understanding

🧠 What is a Unique ID Generator?

A unique ID generator is a system component that creates globally unique identifiers across distributed systems. Unlike traditional database auto-increment, distributed unique ID generators must work across multiple servers without coordination overhead.

Why Not Use Database Auto-Increment?

Examples of Unique IDs:

12345678901234567890  (Numeric)
550e8400-e29b-41d4-a716-446655440000  (UUID)
1420070400000000000   (Snowflake-style)

Common Use Cases:


2. Requirements & Scope

🎯 Key Clarifying Questions

ID Characteristics:

Scale and Performance:

Ordering and Timing:

Availability and Reliability:

Integration Requirements:

📋 Functional Requirements

📊 Non-Functional Requirements


3. High-Level Design Options

🗃️ Option 1: Multi-Master Replication

How it works:

Pros:

Cons:

🆔 Option 2: UUID (Universally Unique Identifier)

How it works:

Example UUID:

09c93e62-50b4-468d-bf8a-c07e1040bfb2

Pros:

Cons:

🎫 Option 3: Ticket Server

How it works:

Pros:

Cons:

❄️ Option 4: Twitter Snowflake (Recommended)

How it works:

Why choose Snowflake:


4. Twitter Snowflake Deep Dive

🏗️ 64-bit ID Structure

 0 |         41 bits          | 5 |  5  |      12 bits      |
   |       Timestamp          |DC |Mach |    Sequence       |
   |                          |ID | ID  |     Number        |

Bit Allocation:

Timestamp Section (41 bits)

Epoch Configuration:

Time Range:

🏢 Datacenter and Machine IDs

Datacenter ID (5 bits):

Machine ID (5 bits):

ID Assignment Strategy:

🔢 Sequence Number (12 bits)

Functionality:

Overflow Handling:

Capacity Calculation:

Per machine: 4,096 IDs/ms × 1,000 ms/s = 4,096,000 IDs/second
Total system: 1,024 machines × 4,096,000 = 4,194,304,000 IDs/second

5. Implementation Details

🔧 Core Algorithm

class SnowflakeIDGenerator:
    def __init__(self, datacenter_id, machine_id, epoch=1288834974657):
        self.datacenter_id = datacenter_id
        self.machine_id = machine_id
        self.epoch = epoch
        self.sequence = 0
        self.last_timestamp = -1
        
    def generate_id(self):
        timestamp = self.get_timestamp()
        
        # Handle clock moving backwards
        if timestamp < self.last_timestamp:
            raise Exception("Clock moved backwards")
            
        # Same millisecond, increment sequence
        if timestamp == self.last_timestamp:
            self.sequence = (self.sequence + 1) & 0xFFF  # 12 bits
            if self.sequence == 0:
                # Sequence overflow, wait for next millisecond
                timestamp = self.wait_next_millis(timestamp)
        else:
            self.sequence = 0
            
        self.last_timestamp = timestamp
        
        # Combine all parts
        id = ((timestamp - self.epoch) << 22) | \
             (self.datacenter_id << 17) | \
             (self.machine_id << 12) | \
             self.sequence
             
        return id

⚙️ Configuration Management

Startup Configuration:

snowflake:
  datacenter_id: 1
  machine_id: 15
  epoch: 1288834974657
  sequence_bits: 12
  machine_bits: 5
  datacenter_bits: 5

Environment Variables:

SNOWFLAKE_DATACENTER_ID=1
SNOWFLAKE_MACHINE_ID=15
SNOWFLAKE_EPOCH=1288834974657

🌐 Service Architecture

ID Generator Service:

[Client] → [Load Balancer] → [ID Generator Instances]
                                    ↓
                            [Configuration Store]

API Design:

GET /api/v1/id
Response: {"id": 1420070400000000000}

POST /api/v1/ids/batch
Body: {"count": 100}
Response: {"ids": [1420070400000000001, ...]}

6. Advanced Considerations

🕐 Clock Synchronization

Challenge:

Servers might have slightly different clocks, leading to:

Solutions:

🎛️ Bit Allocation Tuning

Different Scenarios:

High-volume, short-term:

Timestamp: 39 bits, Sequence: 14 bits
More IDs per millisecond, shorter time range

Low-volume, long-term:

Timestamp: 43 bits, Sequence: 10 bits
Longer time range, fewer IDs per millisecond

Geographic distribution:

Timestamp: 40 bits, Region: 6 bits, Machine: 6 bits, Sequence: 11 bits
More regions and machines, slightly fewer IDs per millisecond

🚨 High Availability

Redundancy Strategies:

Failover Handling:

def get_id_with_failover(primary_service, backup_services):
    try:
        return primary_service.generate_id()
    except Exception:
        for backup in backup_services:
            try:
                return backup.generate_id()
            except Exception:
                continue
        raise Exception("All ID services unavailable")

📊 Monitoring and Metrics

Key Metrics:

Alerting Thresholds:


7. Alternative Approaches

🔄 Database Sequence

PostgreSQL Sequences:

CREATE SEQUENCE global_id_seq
    START WITH 1
    INCREMENT BY 1
    NO CYCLE;

SELECT nextval('global_id_seq');

Pros:

Cons:

🌊 UUID Variants

Time-based UUID (Version 1):

Custom UUID:

def generate_custom_uuid():
    timestamp = int(time.time() * 1000)
    random_part = random.randint(0, 0xFFFFFFFFFFFF)
    return (timestamp << 24) | random_part

🔢 Counter-based Systems

Redis Counter:

def generate_id_redis(redis_client, key="global_counter"):
    return redis_client.incr(key)

Pros:

Cons:

🌟 Hybrid Approaches

Snowflake + UUID:

Multi-tier Generation:


Design Process Summary

Step-by-Step Approach

  1. Understand requirements

    • Clarify ID format, scale, and ordering needs
    • Define performance and availability requirements
  2. Evaluate options

    • Compare multi-master, UUID, ticket server, and Snowflake
    • Consider trade-offs for each approach
  3. Choose Snowflake design

    • Select appropriate bit allocation
    • Design for target scale and geography
  4. Plan deployment

    • Design service architecture
    • Plan for high availability and monitoring
  5. Handle edge cases

    • Clock synchronization issues
    • Sequence overflow scenarios
    • Failure and recovery procedures

Key Takeaways

Interview Success Tips

  1. Ask clarifying questions: Understand specific requirements first
  2. Compare multiple approaches: Show knowledge of alternatives
  3. Explain trade-offs: Discuss pros and cons of each option
  4. Focus on Snowflake: Deep dive into most suitable solution
  5. Consider edge cases: Clock sync, sequence overflow, failures
  6. Discuss scalability: How system scales with growth

📚 Reference Materials

The following resources provide additional depth and implementation details for unique ID generation:

Original Papers and Articles

Industry Implementations

Technical Deep Dives

Implementation Examples

System Design Resources