summaryrefslogtreecommitdiff
path: root/usenix2019.tex
blob: 34557a45ea79702b3303de346907a596c5e8c953 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
% TEMPLATE for Usenix papers, specifically to meet requirements of
%  USENIX '05
% originally a template for producing IEEE-format articles using LaTeX.
%   written by Matthew Ward, CS Department, Worcester Polytechnic Institute.
% adapted by David Beazley for his excellent SWIG paper in Proceedings,
%   Tcl 96
% turned into a smartass generic template by De Clarke, with thanks to
%   both the above pioneers
% use at your own risk.  Complaints to /dev/null.
% make it two column with no page numbering, default is 10 point

% Munged by Fred Douglis <douglis@research.att.com> 10/97 to separate
% the .sty file from the LaTeX source template, so that people can
% more easily include the .sty file into an existing document.  Also
% changed to more closely follow the style guidelines as represented
% by the Word sample file. 

% Note that since 2010, USENIX does not require endnotes. If you want
% foot of page notes, don't include the endnotes package in the 
% usepackage command, below.

% This version uses the latex2e styles, not the very ancient 2.09 stuff.
\documentclass[letterpaper,twocolumn,10pt]{article}
% \usepackage{usenix2019,epsfig,endnotes}
\usepackage{usenix2019,epsfig}

\begin{document}

%don't want date printed
\date{}

%make title bold and 14 pt font (Latex default is non-bold, 16 pt)
\title{\Large \bf bcachefs : A next generation filesystem }

%for single author (just remove % characters)
\author{
{\rm Kent Overstreet}\\
\and
{\rm Second Name}\\
Second Institution
% copy the following lines to add more authors
% \and
% {\rm Name}\\
%Name Institution
} % end author

\maketitle

% Use the following at camera-ready time to suppress page numbers.
% Comment it out when you first submit the paper for review.
\thispagestyle{empty}


\subsection*{Abstract}

bcachefs is a new b-tree based, CoW local POSIX filesystem for Linux. Here we
describe the background, status, and an overview of the key design elements.

\section{Introduction}

\subsection{What is bcachefs?}

bcachefs is a full featured POSIX filesystem, with a featureset aimed at
competing with ZFS, btrfs and xfs. Existing features include:

\begin{itemize}
	\item Copy on Write
	\item Full data checksumming
	\item Compression
	\item Encryption (AEAD style encryption using ChaCha20 and Poly1305)
	\item Multi-device
	\item Replication
	\item Online grow/shrink and add/remove of devices, and a completely
		flexible data layout
	\item Tiering/caching (both writethrough and writeback; IO stack is in
		fact policy driven now)
\end{itemize}
Upcoming features:
\begin{itemize}
	\item Erasure coding (Reed-Solomon, with ability to add others)
	\item Snapshotting
\end{itemize}

\subsection{Status}

bcachefs is in outside use by a small community of early adopters and testers,
and likely soon in a production capacity as well. Upstreaming is hopefully
imminent as there are no blockers left on the bcachefs side.

\section{What makes bcachefs interesting?}

\begin{itemize}
	\item The design is a major departure from existing filesystems. In
		particular it is constructed more as a layer on top of a generic
		key-value store, with much more consistent handling of metadata
		and on disk data structures.
	\item This simplification is possible because we have a very
		sophisticated and scalable b-tree implementation, which has been
		under development for nearly 10 years now, inherited from
		bcache.
\end{itemize}

\section{bcachefs design novelties}

\begin{itemize}
	\item Auxiliary search tree for searching within a btree node

\end{itemize}



\section{bcachefs design fetaures}

\subsection{Filesystem as RDBMS}

In bcachefs, all metadata is organized into a small number of "btree" -
effectively tables:

\begin{itemize}
	\item Inodes table
	\item Extents table
	\item Dirents table
	\item Xattrs table
	\item etc.
\end{itemize}

This is a major departure from most existing filesystems, where most data
structures hang off of the inodes. This isn't an unreasonable way to do things -
it's an effective way to shard most data structures and historically, those data
structures weren't particularly scalable so you really wanted that sharding
(i.e. block based filesystems that used the double/triple indirect block
scheme).

But bcachefs's history meant that we started out with a btree implementation
scalable enough that we no longer needed that sharding - bcache uses a single
btree for indexing every cached extent, regardless of the number of volumes
being cached (and a cached volume in bcache is equivalent to a file in
bcachefs).

This means we have a single unified, iterator based API for manipulating
essentially all filesystem data (everything not stored in the superblock), and
that metadata is all in a single flat namespace.

This is a huge win for the complexity of the filesystem code itself. Rename, for
example, is just a couple of operations on keys in the dirents btree, done
atomically in a single update operation - drastically simpler than most other
filesystems.

Even better, we don't need complex logging to make high level filesystem
operations like create, link and rename atomic. Instead, they make heavy use of
the btree's ability to use multiple iterators simultaneously, and then the final
update is done by passing a list of (iterator, new key) pairs to the btree
update path - for atomicity, all that is required is that the btree update path
use the same journal reservation for all the updates to be done.

This make journalling and in particular journal replay drastically simpler than
on other contemporary filesystems.






A paragraph of text goes here.  Lots of text.  Plenty of interesting
text. \\

% More fascinating text. Features\endnote{Remember to use endnotes, not footnotes!} galore, plethora of promises.\\

\section{This is Another Section}

Some embedded literal typset code might 
look like the following :

{\tt \small
\begin{verbatim}
int wrap_fact(ClientData clientData,
              Tcl_Interp *interp,
              int argc, char *argv[]) {
    int result;
    int arg0;
    if (argc != 2) {
        interp->result = "wrong # args";
        return TCL_ERROR;
    }
    arg0 = atoi(argv[1]);
    result = fact(arg0);
    sprintf(interp->result,"%d",result);
    return TCL_OK;
}
\end{verbatim}
}

Now we're going to cite somebody.  Watch for the cite tag.
Here it comes~\cite{Chaum1981,Diffie1976}.  The tilde character (\~{})
in the source means a non-breaking space.  This way, your reference will
always be attached to the word that preceded it, instead of going to the
next line.

\section{This Section has SubSections}
\subsection{First SubSection}

Here's a typical figure reference.  The figure is centered at the
top of the column.  It's scaled.  It's explicitly placed.  You'll
have to tweak the numbers to get what you want.\\

% you can also use the wonderful epsfig package...
\begin{figure}[t]
\begin{center}
\begin{picture}(300,150)(0,200)
\put(-15,-30){\special{psfile = fig1.ps hscale = 50 vscale = 50}}
\end{picture}\\
\end{center}
\caption{Wonderful Flowchart}
\end{figure}

This text came after the figure, so we'll casually refer to Figure 1
as we go on our merry way.

\subsection{New Subsection}

It can get tricky typesetting Tcl and C code in LaTeX because they share
a lot of mystical feelings about certain magic characters.  You
will have to do a lot of escaping to typeset curly braces and percent
signs, for example, like this:
``The {\tt \%module} directive
sets the name of the initialization function.  This is optional, but is
recommended if building a Tcl 7.5 module.
Everything inside the {\tt \%\{, \%\}}
block is copied directly into the output. allowing the inclusion of
header files and additional C code." \\

Sometimes you want to really call attention to a piece of text.  You
can center it in the column like this:
\begin{center}
{\tt \_1008e614\_Vector\_p}
\end{center}
and people will really notice it.\\

\noindent
The noindent at the start of this paragraph makes it clear that it's
a continuation of the preceding text, not a new para in its own right.


Now this is an ingenious way to get a forced space.
{\tt Real~$*$} and {\tt double~$*$} are equivalent. 

Now here is another way to call attention to a line of code, but instead
of centering it, we noindent and bold it.\\

\noindent
{\bf \tt size\_t : fread ptr size nobj stream } \\

And here we have made an indented para like a definition tag (dt)
in HTML.  You don't need a surrounding list macro pair.
\begin{itemize}
\item[]  {\tt fread} reads from {\tt stream} into the array {\tt ptr} at
most {\tt nobj} objects of size {\tt size}.   {\tt fread} returns
the number of objects read. 
\end{itemize}
This concludes the definitions tag.

\subsection{How to Build Your Paper}

You have to run {\tt latex} once to prepare your references for
munging.  Then run {\tt bibtex} to build your bibliography metadata.
Then run {\tt latex} twice to ensure all references have been resolved.
If your source file is called {\tt usenixTemplate.tex} and your {\tt
  bibtex} file is called {\tt usenixTemplate.bib}, here's what you do:
{\tt \small
\begin{verbatim}
latex usenixTemplate
bibtex usenixTemplate
latex usenixTemplate
latex usenixTemplate
\end{verbatim}
}


\subsection{Last SubSection}

Well, it's getting boring isn't it.  This is the last subsection
before we wrap it up.

\section{Acknowledgments}

A polite author always includes acknowledgments.  Thank everyone,
especially those who funded the work. 

\section{Availability}

It's great when this section says that MyWonderfulApp is free software, 
available via anonymous FTP from

\begin{center}
{\tt ftp.site.dom/pub/myname/Wonderful}\\
\end{center}

Also, it's even greater when you can write that information is also 
available on the Wonderful homepage at 

\begin{center}
{\tt http://www.site.dom/\~{}myname/SWIG}
\end{center}

Now we get serious and fill in those references.  Remember you will
have to run latex twice on the document in order to resolve those
cite tags you met earlier.  This is where they get resolved.
We've preserved some real ones in addition to the template-speak.
After the bibliography you are DONE.

{\footnotesize \bibliographystyle{acm}
\bibliography{../common/bibliography}}


% \theendnotes

\end{document}