shard_database_architecture

Shard (database architecture)

A shard in database architecture refers to a partition of data that is stored across multiple machines or nodes, typically in a distributed database system. Sharding is an essential technique used in large-scale systems where a single machine cannot store or handle the volume of data. Each shard is an independent part of the entire database, containing a subset of the overall dataset. By distributing the data across several shards, the system can achieve horizontal scaling, which enables better handling of high loads and large datasets. Sharding is often used by databases like MongoDB (MongoDB sharding) and Cassandra (Cassandra sharding) to scale their data storage horizontally across multiple servers.

https://en.wikipedia.org/wiki/Sharding

Sharding works by splitting data into smaller, more manageable parts that can be distributed across multiple servers or nodes. For example, a database with millions of records could have its data sharded by certain key fields, such as customer ID, region, or product category. Each shard stores only a specific subset of the data and operates independently of the others. The benefit of sharding is that it allows queries to be executed in parallel across multiple nodes, improving performance by distributing both the storage and the computational load. This enables systems to scale out, rather than scaling up, which is often a more cost-effective and efficient approach for handling large-scale applications.

https://en.wikipedia.org/wiki/Sharding

However, sharding introduces some database complexity in terms of database management and query optimization. Since the data is distributed across multiple shards, each shard needs to be managed individually, and queries must be able to access data from multiple shards efficiently. Sharding also requires techniques such as database replication and database load balancing to ensure data consistency and high availability. Managing a sharded database architecture can be challenging because if one shard fails, it could impact the database availability of a portion of the data. As such, sharding is typically suited to systems that require database high availability, database fault tolerance, and the ability to scale out across many servers.

https://en.wikipedia.org/wiki/Sharding

Snippet from Wikipedia: Shard (database architecture): A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard may be held on a separate database server instance, to spread load.
Some data in a database remains present in all shards, but some appears only in a single shard. Each shard acts as the single source for this subset of data.

Creative Commons Attribution-Share Alike 4.0