Azure Cosmos DB: Globally Distributed NoSQL | Interview Prep Hub

Azure Cosmos DB: Globally Distributed NoSQL

Interview Preparation Hub for Cloud Database and NoSQL Roles

Introduction

Azure Cosmos DB is Microsoft’s globally distributed, multi-model NoSQL database service. It is designed for high availability, low latency, and elastic scalability across multiple regions. Cosmos DB supports multiple APIs (SQL, MongoDB, Cassandra, Gremlin, Table) and offers tunable consistency models, making it a versatile choice for modern applications requiring global reach and performance.

Core Features

  • Global Distribution: Replicate data across multiple Azure regions.
  • Multi-Model Support: SQL API, MongoDB API, Cassandra API, Gremlin API, Table API.
  • Tunable Consistency: Choose from five consistency levels.
  • Elastic Scalability: Scale throughput and storage independently.
  • Low Latency: Guaranteed <10 ms reads and <10 ms writes at the 99th percentile.
  • High Availability: 99.999% SLA for multi-region writes.

Consistency Models

Consistency Level Description Use Case
Strong Linearizability; reads always return the latest committed write. Financial transactions, mission-critical systems.
Bounded Staleness Reads lag behind writes by a bounded time or operations. Applications needing predictable lag tolerance.
Session Guarantees consistency within a client session. Most user-centric applications.
Consistent Prefix Reads never see out-of-order writes. Event logging, ordered data streams.
Eventual Reads may return stale data but converge eventually. Social media feeds, non-critical data.

Partitioning and Scaling

Cosmos DB uses horizontal partitioning to scale data. Each container is partitioned by a partition key, distributing data across physical partitions. Throughput is provisioned in Request Units (RUs), which measure the cost of operations. Autoscale mode allows dynamic scaling based on workload demand.

Python Example (Querying Cosmos DB)

from azure.cosmos import CosmosClient

url = "https://your-cosmos-account.documents.azure.com:443/"
key = "your-primary-key"
client = CosmosClient(url, credential=key)

database = client.get_database_client("myDatabase")
container = database.get_container_client("myContainer")

query = "SELECT * FROM c WHERE c.category='Books'"
for item in container.query_items(query=query, enable_cross_partition_query=True):
    print(item)
    

Real-World Applications

  • Global e-commerce platforms requiring low-latency access worldwide.
  • IoT solutions ingesting and querying massive data streams.
  • Gaming backends needing session consistency and scalability.
  • Financial services requiring strong consistency for transactions.
  • Social media apps leveraging eventual consistency for feeds.

Security & Governance

  • Data encrypted at rest and in transit.
  • Integration with Azure AD for RBAC.
  • Private endpoints for network isolation.
  • Auditing and monitoring with Azure Monitor.

Best Practices

  • Choose the right partition key to ensure balanced distribution.
  • Use session consistency for most applications to balance performance and correctness.
  • Enable multi-region writes for global applications.
  • Monitor RU consumption and optimize queries.
  • Implement retry logic for transient errors.

Common Mistakes

  • Choosing a poor partition key β†’ hotspots and uneven distribution.
  • Overprovisioning RUs without autoscale β†’ wasted costs.
  • Using strong consistency unnecessarily β†’ higher latency.
  • Ignoring indexing policies β†’ inefficient queries.
  • Not enabling geo-replication β†’ limited availability.

Interview Notes

  • Be ready to explain the five consistency models.
  • Discuss partitioning and RU-based scaling.
  • Explain multi-model support (SQL, MongoDB, Cassandra, Gremlin, Table).
  • Know SLA guarantees for availability and latency.
  • Understand use cases for strong vs eventual consistency.

Summary

Azure Cosmos DB is a globally distributed, multi-model NoSQL database designed for modern applications requiring scale, performance, and availability. Its tunable consistency models, elastic scalability, and global distribution make it a powerful choice for enterprises. For interviews, focus on consistency models, partitioning strategies, multi-model APIs, and best practices. Mastery of Cosmos DB fundamentals demonstrates readiness for cloud-native database engineering roles.