If you want % foot of page notes, don't include the endnotes package in the % usepackage command, below. % This version uses the latex2e styles, not the very ancient 2.09 stuff. \documentclass[letterpaper,twocolumn,10pt]{article} % \usepackage{usenix2019,epsfig,endnotes} \usepackage{usenix2019,epsfig} \begin{document} %don't want date printed \date{} %make title bold and 14 pt font (Latex default is non-bold, 16 pt) \title{\Large \bf bcachefs : A next generation filesystem } %for single author (just remove % characters) \author{ {\rm Kent Overstreet}\\ \and {\rm Second Name}\\ Second Institution % copy the following lines to add more authors % \and % {\rm Name}\\ %Name Institution } % end author \maketitle % Use the following at camera-ready time to suppress page numbers. % Comment it out when you first submit the paper for review. \thispagestyle{empty} \subsection*{Abstract} bcachefs is a new b-tree based, CoW local POSIX filesystem for Linux. Here we describe the background, status, and an overview of the key design elements. \section{Introduction} \subsection{What is bcachefs?} bcachefs is a full featured POSIX filesystem, with a featureset aimed at competing with ZFS, btrfs and xfs. Existing features include: \begin{itemize} \item Copy on Write \item Full data checksumming \item Compression \item Encryption (AEAD style encryption using ChaCha20 and Poly1305) \item Multi-device \item Replication \item Online grow/shrink and add/remove of devices, and a completely flexible data layout \item Tiering/caching (both writethrough and writeback; IO stack is in fact policy driven now) \end{itemize} Upcoming features: \begin{itemize} \item Erasure coding (Reed-Solomon, with ability to add others) \item Snapshotting \end{itemize} \subsection{Status} bcachefs is in outside use by a small community of early adopters and testers, and likely soon in a production capacity as well. Upstreaming is hopefully imminent as there are no blockers left on the bcachefs side. \section{What makes bcachefs interesting?} \begin{itemize} \item The design is a major departure from existing filesystems. In particular it is constructed more as a layer on top of a generic key-value store, with much more consistent handling of metadata and on disk data structures. \item This simplification is possible because we have a very sophisticated and scalable b-tree implementation, which has been under development for nearly 10 years now, inherited from bcache. \end{itemize} \section{bcachefs design novelties} \begin{itemize} \item Auxiliary search tree for searching within a btree node \end{itemize} \section{bcachefs design fetaures} \subsection{Filesystem as RDBMS} In bcachefs, all metadata is organized into a small number of "btree" - effectively tables: \begin{itemize} \item Inodes table \item Extents table \item Dirents table \item Xattrs table \item etc. \end{itemize} This is a major departure from most existing filesystems, where most data structures hang off of the inodes. This isn't an unreasonable way to do things - it's an effective way to shard most data structures and historically, those data structures weren't particularly scalable so you really wanted that sharding (i.e. block based filesystems that used the double/triple indirect block scheme). But bcachefs's history meant that we started out with a btree implementation scalable enough that we no longer needed that sharding - bcache uses a single btree for indexing every cached extent, regardless of the number of volumes being cached (and a cached volume in bcache is equivalent to a file in bcachefs). This means we have a single unified, iterator based API for manipulating essentially all filesystem data (everything not stored in the superblock), and that metadata is all in a single flat namespace. This is a huge win for the complexity of the filesystem code itself. Rename, for example, is just a couple of operations on keys in the dirents btree, done atomically in a single update operation - drastically simpler than most other filesystems. Even better, we don't need complex logging to make high level filesystem operations like create, link and rename atomic. Instead, they make heavy use of the btree's ability to use multiple iterators simultaneously, and then the final update is done by passing a list of (iterator, new key) pairs to the btree update path - for atomicity, all that is required is that the btree update path use the same journal reservation for all the updates to be done. This make journalling and in particular journal replay drastically simpler than on other contemporary filesystems. A paragraph of text goes here. Lots of text. Plenty of interesting text. \\ % More fascinating text. 