📑 Table of Contents
- Problem Understanding
- Requirements & Scope
- High-Level Design Options
- Twitter Snowflake Deep Dive
- Implementation Details
- Advanced Considerations
- Alternative Approaches
1. Problem Understanding
🧠 What is a Unique ID Generator?
A unique ID generator is a system component that creates globally unique identifiers across distributed systems. Unlike traditional database auto-increment, distributed unique ID generators must work across multiple servers without coordination overhead.
Why Not Use Database Auto-Increment?
- Single point of failure: Database becomes bottleneck
- Limited scalability: Cannot scale horizontally
- Cross-database challenges: Coordination between databases is complex
- Performance issues: Network latency affects ID generation speed
Examples of Unique IDs:
12345678901234567890 (Numeric)
550e8400-e29b-41d4-a716-446655440000 (UUID)
1420070400000000000 (Snowflake-style)
Common Use Cases:
- User IDs in social networks
- Order IDs in e-commerce
- Message IDs in chat systems
- Transaction IDs in payment systems
2. Requirements & Scope
🎯 Key Clarifying Questions
ID Characteristics:
- What are the characteristics of unique IDs (format, length, sortability)?
- Do IDs need to be sequential or just unique?
- Should IDs be numeric only or can they contain alphanumeric characters?
- What’s the acceptable ID length (32-bit, 64-bit, 128-bit)?
- Do IDs need to be sortable by creation time?
Scale and Performance:
- How many IDs need to be generated per second (1K, 10K, 100K)?
- What’s the expected peak load during high traffic?
- How many servers will be generating IDs simultaneously?
- What’s the acceptable latency for ID generation?
- Do we need batch ID generation capabilities?
Ordering and Timing:
- Do IDs need to be ordered by creation time?
- Should IDs created later have larger values?
- How strict is the ordering requirement (global vs approximate)?
- Do we need to handle clock synchronization issues?
Availability and Reliability:
- What’s the acceptable downtime for ID generation service?
- How should the system handle server failures?
- Do we need cross-datacenter ID generation?
- What’s the recovery strategy for lost IDs?
Integration Requirements:
- How will clients request IDs (API, library, service)?
- Do we need to support different ID formats for different services?
- Are there existing systems that need to be integrated?
- What monitoring and alerting capabilities are needed?
📋 Functional Requirements
- Uniqueness: IDs must be globally unique across all servers
- Numeric format: IDs should be numerical values only
- 64-bit constraint: IDs must fit into 64-bit integers
- Time ordering: IDs created later should have larger values
- High throughput: Generate 10,000+ IDs per second
📊 Non-Functional Requirements
- Low latency: Fast ID generation (< 1ms)
- High availability: System should be fault-tolerant
- Scalability: Handle increasing load by adding servers
- No coordination: Avoid synchronization between servers
- Monotonic: IDs should generally increase over time
3. High-Level Design Options
🗃️ Option 1: Multi-Master Replication
How it works:
- Use multiple database servers with auto-increment
- Each server increments by
k(number of servers) instead of 1 - Server 1: 1, 3, 5, 7, 9…
- Server 2: 2, 4, 6, 8, 10…
Pros:
- Numerical IDs
- Easy to implement with existing databases
- Reasonable scalability
Cons:
- Hard to scale across multiple datacenters
- IDs don’t increase with time across servers
- Difficult to add/remove servers
- Database becomes potential bottleneck
🆔 Option 2: UUID (Universally Unique Identifier)
How it works:
- Generate 128-bit identifiers independently on each server
- No coordination required between servers
- Very low collision probability
Example UUID:
09c93e62-50b4-468d-bf8a-c07e1040bfb2
Pros:
- Simple to generate
- No coordination between servers needed
- Easy to scale with web servers
- No synchronization issues
Cons:
- 128 bits long (exceeds 64-bit requirement)
- IDs don’t increase with time
- Can be non-numeric (contains hyphens and letters)
- Not suitable for sorting by creation time
🎫 Option 3: Ticket Server
How it works:
- Use centralized auto-increment in single database
- All servers request IDs from ticket server
- Essentially a centralized counter service
Pros:
- Numeric IDs
- Easy to implement
- Works well for small to medium scale
- Guarantees uniqueness
Cons:
- Single point of failure
- Bottleneck for high traffic
- Network latency affects performance
- Complex to make highly available
❄️ Option 4: Twitter Snowflake (Recommended)
How it works:
- Divide 64-bit ID into multiple sections
- Each section serves different purpose
- Combine timestamp, datacenter, machine, and sequence
Why choose Snowflake:
- Meets all requirements (64-bit, numeric, sortable)
- No coordination between servers
- High performance and scalability
- Time-ordered IDs
4. Twitter Snowflake Deep Dive
🏗️ 64-bit ID Structure
0 | 41 bits | 5 | 5 | 12 bits |
| Timestamp |DC |Mach | Sequence |
| |ID | ID | Number |
Bit Allocation:
- Sign bit (1 bit): Always 0, reserved for future use
- Timestamp (41 bits): Milliseconds since custom epoch
- Datacenter ID (5 bits): Supports 32 datacenters
- Machine ID (5 bits): Supports 32 machines per datacenter
- Sequence number (12 bits): 4,096 IDs per millisecond per machine
⏰ Timestamp Section (41 bits)
Epoch Configuration:
- Use custom epoch instead of Unix epoch
- Twitter’s default: 1288834974657 (Nov 04, 2010, 01:42:54 UTC)
- Custom epoch extends system lifetime
Time Range:
- 41 bits = 2^41 - 1 = 2,199,023,255,551 milliseconds
- Approximately 69 years from epoch
Binary to UTC conversion example:
Binary: 1011000000100101101101001000001 Decimal: 1497902457089 UTC: June 19, 2017, 5:47:37 PM
🏢 Datacenter and Machine IDs
Datacenter ID (5 bits):
- Supports up to 32 datacenters (2^5)
- Fixed at startup time
- Enables geographic distribution
Machine ID (5 bits):
- Supports up to 32 machines per datacenter
- Total capacity: 32 × 32 = 1,024 machines
- Fixed at startup time
ID Assignment Strategy:
- Assign IDs during server initialization
- Store in configuration files or environment variables
- Avoid dynamic changes to prevent ID conflicts
🔢 Sequence Number (12 bits)
Functionality:
- Increments for each ID generated within same millisecond
- Resets to 0 at start of new millisecond
- Supports 4,096 IDs per millisecond per machine
Overflow Handling:
- When sequence reaches 4,095, wait for next millisecond
- Reset sequence to 0 at new millisecond
- Ensures uniqueness within machine
Capacity Calculation:
Per machine: 4,096 IDs/ms × 1,000 ms/s = 4,096,000 IDs/second
Total system: 1,024 machines × 4,096,000 = 4,194,304,000 IDs/second
5. Implementation Details
🔧 Core Algorithm
class SnowflakeIDGenerator:
def __init__(self, datacenter_id, machine_id, epoch=1288834974657):
self.datacenter_id = datacenter_id
self.machine_id = machine_id
self.epoch = epoch
self.sequence = 0
self.last_timestamp = -1
def generate_id(self):
timestamp = self.get_timestamp()
# Handle clock moving backwards
if timestamp < self.last_timestamp:
raise Exception("Clock moved backwards")
# Same millisecond, increment sequence
if timestamp == self.last_timestamp:
self.sequence = (self.sequence + 1) & 0xFFF # 12 bits
if self.sequence == 0:
# Sequence overflow, wait for next millisecond
timestamp = self.wait_next_millis(timestamp)
else:
self.sequence = 0
self.last_timestamp = timestamp
# Combine all parts
id = ((timestamp - self.epoch) << 22) | \
(self.datacenter_id << 17) | \
(self.machine_id << 12) | \
self.sequence
return id
⚙️ Configuration Management
Startup Configuration:
snowflake:
datacenter_id: 1
machine_id: 15
epoch: 1288834974657
sequence_bits: 12
machine_bits: 5
datacenter_bits: 5
Environment Variables:
SNOWFLAKE_DATACENTER_ID=1
SNOWFLAKE_MACHINE_ID=15
SNOWFLAKE_EPOCH=1288834974657
🌐 Service Architecture
ID Generator Service:
[Client] → [Load Balancer] → [ID Generator Instances]
↓
[Configuration Store]
API Design:
GET /api/v1/id
Response: {"id": 1420070400000000000}
POST /api/v1/ids/batch
Body: {"count": 100}
Response: {"ids": [1420070400000000001, ...]}
6. Advanced Considerations
🕐 Clock Synchronization
Challenge:
Servers might have slightly different clocks, leading to:
- Out-of-order IDs across machines
- Potential ID conflicts
- Timestamp inconsistencies
Solutions:
- Network Time Protocol (NTP): Synchronize clocks across servers
- Clock drift monitoring: Alert when clocks drift too much
- Logical clocks: Use Lamport timestamps for ordering
- Clock skew tolerance: Allow small clock differences
🎛️ Bit Allocation Tuning
Different Scenarios:
High-volume, short-term:
Timestamp: 39 bits, Sequence: 14 bits
More IDs per millisecond, shorter time range
Low-volume, long-term:
Timestamp: 43 bits, Sequence: 10 bits
Longer time range, fewer IDs per millisecond
Geographic distribution:
Timestamp: 40 bits, Region: 6 bits, Machine: 6 bits, Sequence: 11 bits
More regions and machines, slightly fewer IDs per millisecond
🚨 High Availability
Redundancy Strategies:
- Multiple instances per datacenter: Assign different machine IDs
- Cross-datacenter deployment: Replicate across regions
- Health monitoring: Check instance health and failover
- Graceful degradation: Continue with reduced capacity
Failover Handling:
def get_id_with_failover(primary_service, backup_services):
try:
return primary_service.generate_id()
except Exception:
for backup in backup_services:
try:
return backup.generate_id()
except Exception:
continue
raise Exception("All ID services unavailable")
📊 Monitoring and Metrics
Key Metrics:
- Generation rate: IDs generated per second
- Latency: Time to generate ID
- Sequence utilization: How often sequence overflows
- Clock drift: Difference between server clocks
- Error rate: Failed ID generation attempts
Alerting Thresholds:
- Sequence overflow > 1% of requests
- Clock drift > 100ms
- Error rate > 0.1%
- Latency > 1ms
7. Alternative Approaches
🔄 Database Sequence
PostgreSQL Sequences:
CREATE SEQUENCE global_id_seq
START WITH 1
INCREMENT BY 1
NO CYCLE;
SELECT nextval('global_id_seq');
Pros:
- ACID guarantees
- Easy to implement
- Built-in database feature
Cons:
- Database bottleneck
- Single point of failure
- Not suitable for high scale
🌊 UUID Variants
Time-based UUID (Version 1):
- Includes timestamp and MAC address
- 128-bit but partially time-ordered
- Still exceeds 64-bit requirement
Custom UUID:
def generate_custom_uuid():
timestamp = int(time.time() * 1000)
random_part = random.randint(0, 0xFFFFFFFFFFFF)
return (timestamp << 24) | random_part
🔢 Counter-based Systems
Redis Counter:
def generate_id_redis(redis_client, key="global_counter"):
return redis_client.incr(key)
Pros:
- Simple implementation
- Atomic operations
- Good performance
Cons:
- External dependency
- Requires Redis clustering for HA
- Network calls for each ID
🌟 Hybrid Approaches
Snowflake + UUID:
- Use Snowflake for ordered IDs
- Fall back to UUID when Snowflake unavailable
- Convert UUID to 64-bit hash if needed
Multi-tier Generation:
- Local counter for burst generation
- Background sync with global service
- Graceful degradation during network issues
✅ Design Process Summary
Step-by-Step Approach
Understand requirements
- Clarify ID format, scale, and ordering needs
- Define performance and availability requirements
Evaluate options
- Compare multi-master, UUID, ticket server, and Snowflake
- Consider trade-offs for each approach
Choose Snowflake design
- Select appropriate bit allocation
- Design for target scale and geography
Plan deployment
- Design service architecture
- Plan for high availability and monitoring
Handle edge cases
- Clock synchronization issues
- Sequence overflow scenarios
- Failure and recovery procedures
Key Takeaways
- Snowflake algorithm provides optimal balance of requirements
- Bit allocation should match specific use case needs
- Clock synchronization is critical for distributed systems
- High availability requires redundancy and monitoring
- Performance scales linearly with machine count
Interview Success Tips
- Ask clarifying questions: Understand specific requirements first
- Compare multiple approaches: Show knowledge of alternatives
- Explain trade-offs: Discuss pros and cons of each option
- Focus on Snowflake: Deep dive into most suitable solution
- Consider edge cases: Clock sync, sequence overflow, failures
- Discuss scalability: How system scales with growth
📚 Reference Materials
The following resources provide additional depth and implementation details for unique ID generation:
Original Papers and Articles
Industry Implementations
Technical Deep Dives
- Universally Unique Identifier (Wikipedia)
- Network Time Protocol
- Distributed ID Generation at High Scale