Volumes in Docker

Improvements to managing Docker volumes are currently in closed beta testing.

This guide explains what you need to know about working with Volumes in Docker on MedStack Control. We'll look into the specifics of local volumes in Docker, emphasizing their role in managing stateful data within containers.

While local volumes serve a purpose in Docker, developers should carefully evaluate their use. Whenever feasible, opt for managed database servers and object storage to handle critical stateful data, ensuring reliability and ease of maintenance.

Types of Containers

In a containerized environment, a container's interaction with data can be categorized into two main types:

Stateless Containers

Stateless containers are designed to be ephemeral. They don’t maintain any persistent state; if the container is shut down and started back up again, all local container data is lost. These containers rely on external services (such as databases, caches, or APIs) for any required state.

Stateless containers are ideal for microservices, where scalability and rapid deployment are crucial. They are highly portable and can move between nodes in the cluster efficiently with little-to-no risk. For example, a web server serving static content or distributing requests can be stateless.

Stateful Containers

Stateful containers, on the other hand, need to preserve data across restarts or migrations. Examples include containers for services like databases (MySQL, PostgreSQL, or MongoDB, etc.), file servers, and caching systems.

These containers store critical application data, user sessions, or configuration settings. Ensuring data consistency and durability is essential for stateful containers.

Local Volumes in Docker

Now, let’s focus specifically on local volumes within a Docker Swarm.

Overview

A local volume is a disk that is mounted to a container and stored on the filesystem of the host node running the container. These volumes allow any containers mounting it to read and/or write data and persists volume data beyond a container's lifecycle.

Local volumes are suitable for scenarios where data persistence is required but doesn’t need to be shared across nodes.

Capabilities and Limits

  • Data Persistence: Local volumes persist data even if the container restarts or moves to another node within the cluster. MedStack Control's backup system captures a snapshot of all cluster Docker volumes every hour.
  • Node Dependency: Data stored in local volumes is tightly coupled to the node. If the container moves to a different node, it loses access to its original data. Note, the original data is not lost.
  • Scaling Considerations: When scaling services (e.g., adding replicas), each replica gets its own local volume. This can lead to data inconsistency if not managed carefully. This is where using node labels and service placement constraints can be imperative.
  • Node Failures: If the node hosting a local volume fails, the data becomes inaccessible until the node is restored or replaced. A common and effective troubleshooting effort to repair nodes is to reboot them, which issues a redeploy on the cloud provider.

Recommendations

Avoid Heavy Reliance on Local Volumes

  • Whenever possible, design your containers to be stateless. Delegate data storage to external services (managed databases, object storage, etc.).
  • Stateless containers are easier to scale, maintain, and recover.

When Using Local Volumes

  • Documentation: Developers should understand the limitations of local volumes and document their usage.
  • Node Affinity: Be aware that data is tied to a specific node. Avoid scenarios where critical services depend solely on local volumes.
  • Monitoring: It is particularly important to monitor the health of nodes hosting stateful containers since they do not have the same portability properties of stateless containers. Detect and handle node failures promptly.

Managed Database Servers

MedStack Control clusters support managed databases for MySQL and PostgreSQL. They support replication, backups, and failover mechanisms by default. By electing to use a managed database server instead of stateful containers, teams will reduce the operational burden of managing the cluster and improve data durability.

Object Storage

MedStack Control clusters support Azure Blob storage accounts for storing unstructured data. By electing to store unstructured and large datasets in object storage instead of in Docker volumes, teams will reduce the operational burden of managing the cluster.

Using object storage instead of local volumes can significantly improve cluster performance and CPU load for large volumes that have datasets that do not compress efficiently.

Managing Volumes

Within a cluster, navigate to Manage Docker > Volumes to manage volumes. Volumes are managed in two ways in the cluster:

On the Manager Node

All Docker volumes are created on the Manager node. Data is only written to a local volume on the Manager node if a container that mounts and writes data to the volume.

On a Worker Node

Docker volumes appear on Worker nodes when a container with a mounted volume runs on the node.

Creating a Docker Volume

Creating a Docker volume can be done by clicking "New Volume" and naming the volume. This will create an empty volume on the Manager node and can be mounted to services in the cluster.

Removing a Docker Volume

❗️

Use caution

When a volume is deleted from the Manager node, it cannot be mounted by new containers in the cluster.

Removing a Docker volume can be done by clicking "Delete" for the volume. When a volume is deleted, so are data within the volume. MedStack's backup system captures hourly backups of Docker volumes. These backups are not deleted if the volume is deleted, so it can be restored at a later date.

Capabilities and Limits

❗️

Anonymous and unused volumes

With this improvement in volume observability, you may notice anonymous volumes (their name appears as a random string) and unused volumes (volumes that aren't mounted to any containers) on nodes.

  • Nodes with Volumes: A table will appear for each node that has one or more volumes on disk. If a node does not have a volume on disk, it will not appear in this list. The node IP address and any attached node labels will appear in the table header.
  • Anonymous Volumes: If a service is misconfigured and writes data to a local path that is not mounted to a volume, Docker will create an anonymous volume. Anonymous volumes have a random name (i.e., 7893ea883aa3094d909f632d047) may contain valuable information. If you notice an anonymous volume, we strongly recommend you review your application and service configuration to ensure the cluster is set up as intended.
    ❗️Anonymous volumes are not included in MedStack's backup system.
  • Deleting Volumes: When a volume is deleted, the data in the volume is removed from the node disk. To remove a volume from the cluster and prevent it from being mounted, you must delete the volume on the Manager node.
    A volume can only be deleted on a node if it is not in use. That means that a volume cannot be mounted to running or stopped containers.
  • Removing Stopped Containers: For nodes that have stopped containers with volume mounts, it will still not be possible to delete volumes. By removing stopped containers on a node, volumes previously mounted can now be deleted on the node.