Writes to the primary node are transferred to the lower-level block device and simultaneously propagated to the secondary node s. The secondary node s then transfers data to its corresponding lower-level block device. When the failed ex-primary node returns, the system may or may not raise it to primary level again, after device data resynchronization. DRBD is often deployed together with the Pacemaker or Heartbeat cluster resource managers, although it does integrate with other cluster management frameworks. It integrates with virtualization solutions such as Xen , and may be used both below and on top of the Linux LVM stack. Shared cluster storage comparison[ edit ] Conventional computer cluster systems typically use some sort of shared storage for data being used by cluster resources.

Author:Bagore Mazulkis
Language:English (Spanish)
Published (Last):1 June 2004
PDF File Size:1.43 Mb
ePub File Size:16.43 Mb
Price:Free* [*Free Regsitration Required]

The guide is constantly being updated. This guide assumes, throughout, that you are using DRBD version 8. If you are using a pre Please use the drbd-user mailing list to submit comments. Recent changes is an overview of changes in DRBD 8. Introduction to DRBD 1. DRBD mirrors data in real time. Replication occurs continuously while applications modify the data on the device. Applications need not be aware that the data is stored on multiple hosts.

With synchronous mirroring, applications are notified of write completions after the writes have been carried out on all hosts. With asynchronous mirroring, applications are notified of write completions when the writes have completed locally, which usually is before they have propagated to the other hosts. Because of this, DRBD is extremely flexible and versatile, which makes it a replication solution suitable for adding high availability to just about any application. DRBD is, by definition and as mandated by the Linux kernel architecture, agnostic of the layers above it.

Thus, it is impossible for DRBD to miraculously add features to upper layers that these do not possess. For example, DRBD cannot auto-detect file system corruption or add active-active clustering capability to file systems like ext3 or XFS.

Figure 1. User space administration tools DRBD comes with a set of administration tools which communicate with the kernel module in order to configure and administer DRBD resources.

All parameters to drbdsetup must be passed on the command line. The separation between drbdadm and drbdsetup allows for maximum flexibility. Most users will rarely need to use drbdsetup directly, if at all. Like drbdsetup, most users will only rarely need to use drbdmeta directly. Resources In DRBD, resource is the collective term that refers to all aspects of a particular replicated data set. Volumes Any resource is a replication group consisting of one of more volumes that share a common replication stream.

DRBD ensures write fidelity across all volumes in the resource. Volumes are numbered starting with 0, and there may be up to 65, volumes in one resource. A volume contains the replicated data set, and a set of metadata for DRBD internal use. It has a device major number of , and its minor numbers are numbered from 0 onwards, as is customary.

Each DRBD device corresponds to a volume in a resource. Connection A connection is a communication link between two hosts that share a replicated data set. As of the time of this writing, each resource involves only two hosts and exactly one connection between these hosts, so for the most part, the terms resource and connection can be used interchangeably.

At the drbdadm level, a connection is addressed by the resource name. The choice of terms here is not arbitrary. Primary vs. It is usually the case in a high-availability environment that the primary node is also the active one, but this is by no means necessary. A DRBD device in the primary role can be used unrestrictedly for read and write operations.

It can not be used by applications, neither for read nor write access. The reason for disallowing even read-only access to the device is the necessity to maintain cache coherency, which would be impossible if a secondary resource were made accessible in any way.

Changing the resource role from secondary to primary is referred to as promotion, whereas the reverse operation is termed demotion. Some of these features will be important to most users, some will only be relevant in very specific deployment scenarios. Common administrative tasks and Troubleshooting and error recovery contain instructions on how to enable and use these features in day-to-day operation.

Single-primary mode In single-primary mode, a resource is, at any given time, in the primary role on only one cluster member. Since it is guaranteed that only one cluster node manipulates the data at any moment, this mode can be used with any conventional file system ext3, ext4, XFS etc. Deploying DRBD in single-primary mode is the canonical approach for high availability fail-over capable clusters.

Dual-primary mode In dual-primary mode, a resource is, at any given time, in the primary role on both cluster nodes. Since concurrent access to the data is thus possible, this mode requires the use of a shared cluster file system that utilizes a distributed lock manager. Deploying DRBD in dual-primary mode is the preferred approach for load-balancing clusters which require concurrent data access from two nodes.

See Enabling dual-primary mode for information on enabling dual-primary mode for specific resources. Replication modes DRBD supports three distinct replication modes, allowing three degrees of replication synchronicity. Protocol A Asynchronous replication protocol. Local write operations on the primary node are considered completed as soon as the local disk write has finished, and the replication packet has been placed in the local TCP send buffer.

In the event of forced fail-over, data loss may occur. The data on the standby node is consistent after fail-over, however, the most recent updates performed prior to the crash could be lost.

Protocol A is most often used in long distance replication scenarios. Protocol B Memory synchronous semi-synchronous replication protocol. Local write operations on the primary node are considered completed as soon as the local disk write has occurred, and the replication packet has reached the peer node. Normally, no writes are lost in case of forced fail-over.

Protocol C Synchronous replication protocol. Local write operations on the primary node are considered completed only after both the local and the remote disk write have been confirmed. As a result, loss of a single node is guaranteed not to lead to any data loss. Data loss is, of course, inevitable even with this replication protocol if both nodes or their storage subsystems are irreversibly destroyed at the same time.

The choice of replication protocol influences two factors of your deployment: protection and latency. Throughput, by contrast, is largely independent of the replication protocol selected. See Configuring your resource for an example resource configuration which demonstrates replication protocol configuration. It may be used on any system that has IPv4 enabled.

This is equivalent in semantics and performance to IPv4, albeit using a different addressing scheme. SDP uses and IPv4-style addressing scheme. DRBD can use this socket type for very low latency replication. SuperSockets must run on specific hardware which is currently available from a single vendor, Dolphin Interconnect Solutions.

Efficient synchronization Re- synchronization is distinct from device replication. While replication occurs on any write event to a resource in the primary role, synchronization is decoupled from incoming writes. Rather, it affects the device as a whole. Synchronization is necessary if the replication link has been interrupted for any reason, be it due to failure of the primary node, failure of the secondary node, or interruption of the replication link.

Synchronization is efficient in the sense that DRBD does not synchronize modified blocks in the order they were originally written, but in linear order, which has the following consequences: Synchronization is fast, since blocks in which several successive write operations occurred are only synchronized once.

Synchronization is also associated with few disk seeks, as blocks are synchronized according to the natural on-disk block layout. During synchronization, the data set on the standby node is partly obsolete and partly already updated. This state of data is called inconsistent.

The service continues to run uninterrupted on the active node, while background synchronization is in progress. A node with inconsistent data generally cannot be put into operation, thus it is desirable to keep the time period during which a node is inconsistent as short as possible.

This ensures that a consistent copy of the data is always available on the peer, even while synchronization is running. See Variable sync rate configuration for configuration suggestions with regard to variable-rate synchronization. Fixed-rate synchronization In fixed-rate synchronization, the amount of data shipped to the synchronizing peer per second the synchronization rate has a configurable, static upper limit.

Based on this limit, you may estimate the expected sync time based on the following simple formula: Figure 2. Synchronization time tsync is the expected sync time. D is the amount of data to be synchronized, which you are unlikely to have any influence over this is the amount of data that was modified by your application while the replication link was broken.

See Configuring the rate of synchronization for configuration suggestions with regard to fixed-rate synchronization. When using checksum-based synchronization, then rather than performing a brute-force overwrite of blocks marked out of sync, DRBD reads blocks before synchronizing them and computes a hash of the contents currently found on disk.

It then compares this hash with one computed from the same sector on the peer, and omits re-writing this block if the hashes match. This can dramatically cut down synchronization times in situation where a filesystem re-writes a sector with identical contents while DRBD is in disconnected mode. See Configuring checksum-based synchronization for configuration suggestions with regard to synchronization.

Suspended replication If properly configured, DRBD can detect if the replication network is congested, and suspend replication in this case. When more bandwidth becomes available, replication automatically resumes and a background synchronization takes place.

Suspended replication is typically enabled over links with variable bandwidth, such as wide area replication over shared connections between data centers or cloud instances. See Configuring congestion policies and suspended replication for details on congestion policies and suspended replication.

On-line device verification On-line device verification enables users to do a block-by-block data integrity check between nodes in a very efficient manner. Note that efficient refers to efficient use of network bandwidth here, and to the fact that verification does not break redundancy in any way. On-line verification is still a resource-intensive operation, with a noticeable impact on CPU utilization and load average.

It works by one node the verification source sequentially calculating a cryptographic digest of every block stored on the lower-level storage device of a particular resource. DRBD then transmits that digest to the peer node the verification target , where it is checked against a digest of the local copy of the affected block.

If the digests do not match, the block is marked out-of-sync and may later be synchronized.


Subscribe to RSS



How to install and setup DRBD on CentOS



Configure DRBD 8.4 on Ubuntu Server 12.04



User Guides


Related Articles