the state of cephfs in giant

Discussion:

Sage Weil

2014-10-13 18:16:31 UTC

We've been doing a lot of work on CephFS over the past few months. This
is an update on the current state of things as of Giant.

What we've working on:

* better mds/cephfs health reports to the monitor
* mds journal dump/repair tool
* many kernel and ceph-fuse/libcephfs client bug fixes
* file size recovery improvements
* client session management fixes (and tests)
* admin socket commands for diagnosis and admin intervention
* many bug fixes

We started using CephFS to back the teuthology (QA) infrastructure in the
lab about three months ago. We fixed a bunch of stuff over the first
month or two (several kernel bugs, a few MDS bugs). We've had no problems
for the last month or so. We're currently running 0.86 (giant release
candidate) with a single MDS and ~70 OSDs. Clients are running a 3.16
kernel plus several fixes that went into 3.17.

With Giant, we are at a point where we would ask that everyone try
things out for any non-production workloads. We are very interested in
feedback around stability, usability, feature gaps, and performance. We
recommend:

* Single active MDS. You can run any number of standby MDS's, but we are
not focusing on multi-mds bugs just yet (and our existing multimds test
suite is already hitting several).
* No snapshots. These are disabled by default and require a scary admin
command to enable them. Although these mostly work, there are
several known issues that we haven't addressed and they complicate
things immensely. Please avoid them for now.
* Either the kernel client (kernel 3.17 or later) or userspace (ceph-fuse
or libcephfs) clients are in good working order.

The key missing feature right now is fsck (both check and repair). This is
*the* development focus for Hammer.

Here's a more detailed rundown of the status of various features:

* multi-mds: implemented. limited test coverage. several known issues.
use only for non-production workloads and expect some stability
issues that could lead to data loss.

* snapshots: implemented. limited test coverage. several known issues.
use only for non-production workloads and expect some stability issues
that could lead to data loss.

* hard links: stable. no known issues, but there is somewhat limited
test coverage (we don't test creating huge link farms).

* direct io: implemented and tested for kernel client. no special
support for ceph-fuse (the kernel fuse driver handles this).

* xattrs: implemented, stable, tested. no known issues (for both kernel
and userspace clients).

* ACLs: implemented, tested for kernel client. not implemented for
ceph-fuse.

* file locking (fcntl, flock): supported and tested for kernel client.
limited test coverage. one known minor issue for kernel with fix
pending. implemention in progress for ceph-fuse/libcephfs.

* kernel fscache support: implmented. no test coverage. used in
production by adfin.

* hadoop bindings: implemented, limited test coverage. a few known
issues.

* samba VFS integration: implemented, limited test coverage.

* ganesha NFS integration: implemented, no test coverage.

* kernel NFS reexport: implemented. limited test coverage. no known
issues.

Anybody who has experienced bugs in the past should be excited by:

* new MDS admin socket commands to look at pending operations and client
session states. (Check them out with "ceph daemon mds.a help"!) These
will make diagnosing, debugging, and even fixing issues a lot simpler.

* the cephfs_journal_tool, which is capable of manipulating mds journal
state without doing difficult exports/imports and using hexedit.

Thanks!
sage

Wido den Hollander

2014-10-13 18:20:33 UTC