Learning by reading: How docker cp works

Published on Oct 22, 2020

Many programmers I look up to preach the importance of reading code in addition to just writing it.

I’ve decided to kill two birds with one stone and read the source for the tools that I use. Hopefully I’ll pick up some programming pearls, as well as learn more about the behavior of my tools.

In this post I’ll share what I found from reading the implementation of docker cp. Keep reading to see what I learned about the edge cases it handles, and the design decisions they made to keep the implementation clean!

How docker cp works

to container diagram from container diagram

The docker cp implementation starts with the CLI.

Each Docker subcommand is a Cobra command, which makes it easy to handle flag parsing.

At a high level, the CLI parses the arguments to figure out if it should copy from or to the container.

It then calls the appropriate RPC on the Docker daemon to copy the requested files. This RPC uses a tar archive to represent files.

However, before calling the RPC, it does some preprocessing to make sure the files end up in the expected destination.

Takeaway: Copying files has a lot of edge cases

The trickiest part of this code, in my opinion, is hidden in archive.PrepareForCopy.

They have to handle different behaviors depending on whether or not the source and destination files are directories. For example, if the source is a file, and the destination is a directory, then docker cp copies into the directory, rather than replacing it.

archive.PrepareForCopy alone has a switch statement with 6 branches.

This function also deals with modifying the filenames in the archive so that they end up in the right place in the container. For example, if your local file lives at /my/local/file, and you’re copying to /the/remote/dest, docker cp needs to change the path in the archive to be /the/remote/dest.

If they were willing to make the code less user friendly, they could have skipped a lot of the validation and preprocessing and let the copy itself error. However, their matches the behavior of cp locally, so I’m glad they did the extra work to make the UX nice.

 

How Cloud Native kills developer productivity

img

Making the switch to microservices but think it’s too good to be true? Or you already made the switch but you’re starting to notice that local development is harder than it used to be. You’re not alone.

Download Now

Takeaway: The Docker CLI/Daemon interface is good for portability

The most naive implementation of docker cp would just have the CLI perform the copy directly.

However, this would mean that docker cp would only work if the CLI and Docker containers were running on the same machine. By making the copy functionality a RPC on the Docker daemon, the code just works with remote Docker hosts, and on Docker Desktop, where Docker runs inside a VM.

It’s also just a better separation of concerns. The CLI doesn’t care how the copy happens, so the daemon could be running containers with funky container runtime and the CLI wouldn’t have to change at all.

Takeaway: Tar is a good transport format for files

Using the tar format for copying files is perfect for this code.

Using tar meant they didn’t have to use a different data structure for copying directories versus individual files. This makes the code significantly easier to maintain and understand.

Plus, tar is built to preserve file metadata like the owner, permissions, and the file type. This makes implementing archive mode trivial.

You can see that it uses tar by passing in a tar archive to stdin of docker cp. The following example copies a local directory named “files” into a container.

# Start the container we'll copy into.
$ docker run -d --name ubuntu ubuntu tail -f /dev/null

# See the files locally.
$ ls files
app.html  index.html

# Create a tar archive of the local files, and use `docker cp` to copy it.
$ tar c files/ | docker cp - ubuntu:/

# Confirm the files were copied.
$ docker exec ubuntu ls /files
app.html
index.html

Bonus: Validating arguments with bit operations

I enjoyed the way the code represents the direction of the copy The type cases are so simple that I don’t think it’s strictly necessary, but it makes the code a bit easier to read.

Basically, they use the bit flags pattern for representing the copy direction. This pattern is commonly used in systems code (e.g. for representing file permissions).

Users can either copy to the container, or from the container:

  • docker cp local-file container:remote-file => copyToContainer
  • docker cp container:remote-file local-file => copyFromContainer
  • docker cp container:remote-file-1 container:remote-file-2 => Error

The code defines the directions as constants:

  • fromContainer is 1
  • toContainer is fromContainer << 1 => 2
  • acrossContainer is fromContainer | toContainer => 3

This lets them write this snippet of code (I tweaked it a bit for readability):

var direction copyDirection
if srcIsContainer {
	direction |= fromContainer
}
if dstIsContainer {
	direction |= toContainer
}

switch direction {
case fromContainer:
	return copyFromContainer()
case toContainer:
	return copyToContainer()
case acrossContainers:
	return errors.New("copying between containers is not supported")
}

Conclusion

It was fun to see how docker cp is implemented. The code is easy to read, but handles some complex cases.

I’m planning on making this a regular thing. If there’s a particular project that you think has interesting code, please let me know!

References

How We Cut Our Docker Push Time by 90%

Read 5 common Docker Compose mistakes

Official Docker CP Documentation

Cobra Go package


Published by Kevin Lin
Co-founder, Engineer of Blimp
Kevin Lin is an engineering expert in cloud native tooling. His interest in developer productivity first started with programming language design while attending Berkeley. His focus on cloud native led him to co-founding and building Blimp, a cloud container development platform.