https://DevOpsCloud.io -- Cloud Monk Losang Jinpa, Ph.D., MCSE/MCT, GitOps DevOps Engineer

Distributed Databases

Distributed databases are designed to store and manage data across multiple physical locations, which can either be spread across multiple servers in a single data center or across different geographic regions. In a distributed database system, the data is not stored in a single physical location, but instead, is distributed across various networked systems. This architecture enables better scalability, fault tolerance, and data redundancy compared to traditional, centralized database systems. The distributed nature of these databases also helps in handling large volumes of data more efficiently, as it can be spread across many machines for parallel processing.

https://en.wikipedia.org/wiki/Distributed_database

A key challenge in distributed databases is maintaining data consistency, which is addressed using various consistency models. One of the most famous is the ACID (Atomicity, Consistency, Isolation, Durability) properties, which ensures that database transactions are processed reliably. However, in distributed databases, consistency models often extend beyond ACID to allow for higher availability and partition tolerance, such as the CAP theorem (Consistency, Availability, and Partition tolerance). Depending on the use case, distributed databases can be configured to prioritize one or more of these properties over the others to achieve the desired balance of consistency and performance.

https://en.wikipedia.org/wiki/CAP_theorem

Distributed databases also offer significant benefits when it comes to system fault tolerance. With data replicated across multiple nodes, the failure of one node does not lead to data loss or downtime. These systems typically employ replication techniques like master-slave replication or peer-to-peer replication to ensure that data is duplicated and available for querying even if one or more servers become unavailable. As a result, distributed databases are widely used in applications requiring high availability and resilience, such as cloud storage systems, e-commerce platforms, and real-time analytics systems.

https://en.wikipedia.org/wiki/Replication_(computing)

Snippet from Wikipedia: Distributed database

A distributed database is a database in which data is stored across different physical locations. It may be stored in multiple computers located in the same physical location (e.g. a data centre); or maybe dispersed over a network of interconnected computers. Unlike parallel systems, in which the processors are tightly coupled and constitute a single database system, a distributed database system consists of loosely coupled sites that share no physical components.
System administrators can distribute collections of data (e.g. in a database) across multiple physical locations. A distributed database can reside on organised network servers or decentralised independent computers on the Internet, on corporate intranets or extranets, or on other organisation networks. Because distributed databases store data across multiple computers, distributed databases may improve performance at end-user worksites by allowing transactions to be processed on many machines, instead of being limited to one.
Two processes ensure that the distributed databases remain up-to-date and current: replication and duplication.

Replication involves using specialized software that looks for changes in the distributive database. Once the changes have been identified, the replication process makes all the databases look the same. The replication process can be complex and time-consuming, depending on the size and number of the distributed databases. This process can also require much time and computer resources.

Duplication, on the other hand, has less complexity. It identifies one database as a master and then duplicates that database. The duplication process is normally done at a set time after hours. This is to ensure that each distributed location has the same data. In the duplication process, users may change only the master database. This ensures that local data will not be overwritten.

Both replication and duplication can keep the data current in all distributive locations.
Besides distributed database replication and fragmentation, there are many other distributed database design technologies. For example, local autonomy, synchronous, and asynchronous distributed database technologies. The implementation of these technologies can and do depend on the needs of the business and the sensitivity/confidentiality of the data stored in the database and the price the business is willing to spend on ensuring data security, consistency and integrity.
When discussing access to distributed databases, Microsoft favors the term distributed query, which it defines in protocol-specific manner as "[a]ny SELECT, INSERT, UPDATE, or DELETE statement that references tables and rowsets from one or more external OLE DB data sources". Oracle provides a more language-centric view in which distributed queries and distributed transactions form part of distributed SQL.

Creative Commons Attribution-Share Alike 4.0