diff options
authorKent Overstreet <>2016-09-21 21:06:24 -0800
committerKent Overstreet <>2016-09-21 21:06:24 -0800
commit63f05187d93d78e6a2349f1eeaee64b4afad0c7a (patch)
parent3b08ffe21ffeda8efa20e3895c77548be1739b33 (diff)
Bcachefs, encryption updates
2 files changed, 43 insertions, 18 deletions
diff --git a/Bcachefs.mdwn b/Bcachefs.mdwn
index 71e301c..469b98b 100644
--- a/Bcachefs.mdwn
+++ b/Bcachefs.mdwn
@@ -141,7 +141,7 @@ possible.
awhile in favor of making the core functionality production quality -
replication is not currently suitable for outside testing.
- - Encryption
+ - [[Encryption]]
Implementation is finished, and passes all the tests. The blocker on rolling
it out is finishing the design doc and getting outside review (as feedback
@@ -186,6 +186,9 @@ Please ask questions and ask for them to be added here!
* a feature bits field
* bring some structure to the variable length portion, so we can add more
crap later - do it like inode optional fields
+ * on clean shutdown, write current journal sequence number to superblock -
+ help guard against corruption or an encrypted filesystem being tampered
+ with
* More bits (once we have feature bits) for "has this feature ever been used", e.g.
* encryption - if we don't have encrypted data, we don't need to load cyphers
diff --git a/Encryption.mdwn b/Encryption.mdwn
index ec9d85f..6852d10 100644
--- a/Encryption.mdwn
+++ b/Encryption.mdwn
@@ -36,6 +36,10 @@ security and robustness, and is meant to defend against a wider variety of
adversarial models than is typical in existing filesystem level or block level
+In particular, the goal is to be secure even when the attacker controls the
+storage device itself, and can see reads and writes as they happen and return
+arbitrary data from read requests.
## Filesystem vs. directory encryption
We do not currently offer per directory encryption; instead, we take an "encrypt
@@ -61,14 +65,8 @@ everything after the header for that particular metadata write - will not leak.
By virtue of working within a copy on write filesystem with provisions for ZFS
style checksums (that is, checksums with the pointers, not the data), we’re
able to use a modern AEAD style construction. We use ChaCha20 and Poly1305. We
-use the cyphers directly instead of using the kernel AEAD library (and thus
-means there's a bit more in the design that needs auditing).
-The current design uses the same key for both ChaCha20 and Poly1305, but my
-recent rereading of the Poly1305-AES paper seems to imply that the Poly1305 key
-shouldn't be used for anything else. Guidance from actual cryptographers would
-be appreciated here; the ChaCha20/Poly1305 AEAD RFC appears to be silent on the
+use the cyphers directly instead of using the kernel AEAD library. However, we
+do follow pretty closely the approach of [[RFC 7539|]].
Note that ChaCha20 is a stream cypher. This means that it’s critical that we use
a cryptographic MAC (which would be highly desirable anyways), and also avoiding
@@ -96,6 +94,10 @@ key, which is stored in the superblock - also with ChaCha20. The master key is
encrypted with an 8 byte header, so that we can tell if the correct key was
+TODO: Add a field to the superblock specifying the key derivation function, so
+that we can transition to newer KDFs later (e.g. Argon2) or specify cost
### Metadata
Except for the superblock, no metadata in bcache/bcachefs is updated in place -
@@ -166,15 +168,35 @@ sized chunks of data, and we store one checksum/MAC per extent, not per block: a
checksum or MAC might cover up to 64k (extents that aren't checksummed or
compressed may be larger). Nonces are thus also per extent, not per block.
-Currently, the Poly1305 MAC is truncated to 64 bits - due to a desire not to
-inflate our metadata any more than necessary. Guidance from cryptographers is
-requested as to whether this is a reasonable option; do note that the MAC is not
-stored with the data, but is itself stored encrypted elsewhere in the btree. We
-do already have different fields for storing 4 byte checksums and 8 byte
-checksums; it will be a trivial matter to add a field allowing 16 byte checksums
-to be stored, and we will add that anyways - so this isn't a pressing design
-issue, this is just a matter of what the defaults should be and what we should
-tell users.
+By default, for data extents the Poly1305 MAC is truncated to 80 bits, for space
+efficiency reasons. Optionally the full 128 bit macs may be stored, at the cost
+increasing the size of extents by 8 bytes (with 80 bit macs, an extent with a
+single replica will typically be 32 bytes, or 40 bytes with 128 bit macs).
+This should be completely safe for the vast majority of uses cases. Most uses of
+cryptographic MACs are in networked applications, where an attacker may be able
+to send an unlimited number of forged messages: in that environment, a 64 bit
+mac is clearly insufficient - if an attacker is able to send 2^32 forgery
+attempts (not a huge number these days), probability of success is 1 / 2^32 -
+which is not considered a remotely safe margin by cryptographers.
+However, with a filesystem, even in the case of a completely compromised device
+(say an attacker has compromised the firmware on the disk, and is able to return
+whatever they want when we read a sector) - if the MAC doesn't match (because
+the attacker is attempting to forge data), we consider the device to be failing
+and very shortly we're going to stop using it - we won't attempt to reread data
+that appears to be corrupt indefinitely. So, attacker gets a very small (on the
+order of 10) attempts to forge a particular extent. In the very worst case, if
+we're trying very hard to migrate data off a device that appears to be bad, the
+attacker might get ~10 attempts multiplied by the number of extents on the
+device - but the number of forgery attempts should be clearly bounded.
+If the user is in an environment where transient failures/corruption are
+expected should be tolerated, instead of assuming the device is bad (e.g. the
+disks are accessed over the network, and the network path is known to corrupt
+data) - in that situation 128 bit macs should be used (and in the future we may
+enforce that if the maximum number of read retries is set to more than a small
+number, 128 bit macs must be used if encryption is in use).
#### Extent nonces