cmogstored news
---------------

1.8.1 / 2021-02-13 02:25 UTC
----------------------------

  This release fixes a segfault on some systems/toolchains where
  our per-thread stack size was too small.  Given the prevalance
  of 64-bit systems nowadays, using a small stack is unlikely to
  yield any benefits.

  Users on 32-bit systems who wish to continue with a minimal
  stack should use "ulimit -s" in startup scripts or configure
  their process manager appropriately (e.g. setting the
  "LimitSTACK" directive in described in systemd.exec(5)).

  Thanks to Xiao Yu <xyu@automattic.com> for reporting and
  testing at our public mailbox:
  https://yhbt.net/cmogstored-public/CABfxMcW+kb5gwq3pSB_89P49EVv+4UkJXz+mUPQTy19AdrwbAg@mail.gmail.com/T/

1.8.0 / 2020-08-13 20:54 UTC
----------------------------

  devXXX/usage files are emitted properly for systems where
  the mount point can't be resolved.  This is needed for
  multi-device filesystems such as btrfs.

  PUT and DELETE requests now update the in-memory representation
  of these devXXX/usage files, since the 10s update interval may
  be too low for high-traffic situations.

  Their is a new "USAGE FILES" section in the manpage documenting
  the changes from 1.7.0 and this release.

  Our public mail archives are now available over IMAPS.
  gnulib is updated to 4e082bffbcc46e68 in the tarball.

1.7.3 / 2020-03-22 00:12 UTC
----------------------------

  Improve RFC 7230 conformance w.r.t Content-Length and
  Transfer-Encoding handling in PUT requests.  We now
  favor "Transfer-Encoding: chunked" if a Content-Length
  header is also present.  Furthermore, we no longer
  accept Transfer-Encoding values aside from chunk,
  since we don't support gzip/compress/deflate as described
  in RFC 7230.

1.7.2 / 2020-02-19 01:30 UTC
----------------------------

  s/bogomips.org/yhbt.net/ in all documentation, due to
  bogomips.org expiring.  The tarball is also updated with
  the latest gnulib changes.

1.7.1 / 2019-05-12 00:46 UTC
----------------------------

  The Linux kernel bugfix should hit mainline and stable kernels,
  soon.  But there's no reason for us to be caring if errno is
  EINTR or not...

  cf. https://lore.kernel.org/lkml/20190427093319.sgicqik2oqkez3wk@dcvr/
      https://lore.kernel.org/lkml/20190507043954.9020-1-deepa.kernel@gmail.com/

  There are also some minor build/test updates since v1.7.0 (2018-12-18):

        test/mgmt-usage.rb: fix mismatched indentation warning
        add .gitattributes for Ruby files
        test/mgmt_auto_adjust.rb: improve diagnostic messages
        .gitignore: add extra ignores for gnulib in Debian 9
        notify.c: workaround epoll_pwait bug in current Linux 5.0/5.1
        doc: remove mailing list subscription info

1.7.0 / 2018-12-18 04:01 UTC
----------------------------

  The big feature in this release is "devNNN/usage" are served
  from memory, allowing up-to-date usage information even
  unwritable/unreadable filesystems.

  This can also be used to reduce spinups and wear on HDDs.

  "devNNN/usage" files are still updated on the FS by default for
  compatibility with existing HTTP servers, but admins may wish
  to disable updates to them by removing all permissions from
  the "usage" files:

  	chmod 0000 $MOG_DOCROOT/dev*/usage

  Filesystem errors from the sendfile(2) syscalls are also
  logged to syslog.  There's also a bugfix for zombies for
  libkqueue-on-epoll users, but that doesn't affect native
  kqueue users on *BSDs.

  And the usual round of gnulib, minor doc and style updates.

  18 changes since v1.6.0:

        cmogstored.h: remove unused mog_file.mmptr member
        doc: documentation for ioq
        doc: further comment updates around ioq
        build-aux/txt2pre: support '=' in URLs
        test/inherit: fix ambiguous parenthese warning
        test/inherit: stop testing Ruby itself
        doc: update URLs to HTTPS
        compat_sendfile: ensure this works without an offset
        doc/queues.txt: add key point about only retrieving ONE event
        fix trace.h dependency on probes.h
        update to gnulib.git 90f289f249a266b1afb9c63e182f5d979d17df5f
        http_get.c: log filesystem-level errors from sendfile
        serve /dev*/usage requests from memory
        doc: URL updates to reduce redirects and favor HTTPS
        test/inherit.rb: fix syntax error under Ruby 1.8
        update copyrights for 2018 and use SPDX for "GPL-3.0+"
        selfwake: enable self-pipe with kqueue
        http_parser: workaround parsing OOM in Ragel 6.10

1.6.0 / 2016-08-31 03:14 UTC
----------------------------

  There are minor robustness fixes on handling errors when
  allocating memory or spawn failures on otherwise-hosed systems.
  These bugfixes will not affect real users unless the system
  is already hosed or in badly overtaxed, so there's no real
  need to upgrade.

  There are minor portability improvements and I now test under
  FreeBSD 10.x.

  The iostat test cases are relaxed a bit to account for
  virtualized devices (as iostat is less useful with modern

  17 changes since 1.5.0 (Nov 2015):
        Rakefile: add missing <div> for Atom feed
        test/pwrite-wrap: remove unused variable and comment
        test/pwrite_wrap: squelch unnecessary output
        test/pwrite_wrap: reduce space overhead required
        update copyrights for 2016
        build-aux/txt2pre: drop CGI.pm requirement
        stdin is always redirected to /dev/null
        minor vfork/fork safety fixes
        process: try to handle OOM gracefully
        http_put: gracefully handle path allocation errors
        iostat_process: declare environ extern
        test/mgmt: relax checks for iostat mapping
        gnulib copyright update for 2016
        upgrade: avoid syslog call if execve fails
        rely on gnulib for environ portability
        INSTALL: update latest Debian stable version to 8.x
        README: stop mentioning cgit

1.5.0 / 2015-11-21 01:33 UTC
----------------------------

  A bunch of minor changes; most notable is systemd-style socket
  activation support.  This was easy-to-add since we've always had
  socket activation support for nginx-style SIGUSR2 upgrades.

  This places no link or runtime dependency on libsystemd, so the
  LISTEN_FDS and LISTEN_PID environment variables may be used in other
  init systems as well.  While I have my own reservations about
  systemd itself, I also strongly believe in using socket activation
  to prevent downtime.  Existing behavior with CMOGSTORED_FD
  (used for SIGUSR2 upgrades) is now documented in the manpage and
  will always supported.

  We've also added vfork support for Linux systems, allowing
  faster spawning of iostat if malloc is using too much memory.

  Behavior changes:

  Bad Range: headers return 416 responses in more cases for invalid
  ranges (e.g. miscalculated ranges such as "1--1", while
  completely wrong ones (lacking a "bytes=" prefix) are ignored
  entirely as in nginx.

  Bugfixes:

  There are also some cleanups to avoid dying on OOM in more places
  on weird systems which trigger OOM.  More work on this is ongoing.

  Also updates to the latest gnulib.git
  commit 71d39c1644762745b94e9449c45bfd716a79a5eb
  ("autoupdate") along with a change which fixes a memory leak when
  people build from cmogstored.git using gnulib
  commit c6148bca89e9465fd6ba3a10d273ec4cb58c2dbe
  or later ("mountlist: add me_mntroot field on Linux machines").

  This memory leak did not affect any released tarballs of cmogstored.
  Note, users building from git (as opposed to the tarball) will
  need gnulib commit 41d1b6c42641a5b9e21486ca2074198ee7909bd7
  ("mountlist: add support for deallocating returned list entries")
  or later (from July 2013).

  There are also various documentation updates and our mailing
  list is now readable over NNTP:

    nntp://news.public-inbox.org/inbox.comp.file-systems.mogilefs.cmogstored

1.5.0rc1 / 2015-11-11 21:24 UTC
-------------------------------

  A bunch of minor changes; most notable is systemd-style socket
  activation support.  This was easy-to-add since we've always had
  socket activation support for nginx-style SIGUSR2 upgrades.

  This places no link or runtime dependency on libsystemd, so the
  LISTEN_FDS and LISTEN_PID environment variables may be used in other
  init systems as well.  While I have my own reservations about
  systemd itself, I also strongly believe in using socket activation
  to prevent downtime.

  Behavior changes:

  Bad Range: headers return 416 responses in more cases for invalid
  ranges (e.g. miscalculated ranges such as "1--1", while
  completely wrong ones (lacking a "bytes=" prefix)) are ignored
  entirely as in nginx.

  Bugfixes:

  There are also some cleanups to avoid dying on OOM in more places
  on weird systems which trigger OOM.  More work on this is ongoing.

  Also updates to the latest gnulib.git
  commit f197c2c9e5e0d12c373f26d5b3211809457bc972
  ("intprops: new public macro EXPR_SIGNED")
  along with a change which fixes a memory leak when people
  build from cmogstored.git using gnulib
  commit c6148bca89e9465fd6ba3a10d273ec4cb58c2dbe
  or later ("mountlist: add me_mntroot field on Linux machines").
  This memory leak did not affect any released tarballs of cmogstored.

  shortlog of changes since 1.4.3:

        doc: use "builder" RubyGem to generate Atom feed
        dev.c: fail gracefully on out-of-memory errors
        do not die on OOM when for mgmt paths
        HACKING: update URLs to reduce redirects
        http: return 416 errors in more cases for bad Ranges
        update .gitignores for latest autotools + gnulib
        Rakefile: remove text-only part from the Atom feed
        support systemd-style socket activation via environment
        set TCP listener options on inherited sockets
        doc: add example systemd config files
        use free_mount_entry from gnulib instead of rolling our own
        fix tmpdir dependency for slow Ruby tests
        doc: publish examples directory to website

1.4.3 / 2015-03-09 22:52 UTC
----------------------------

  For all platforms, the startup device scanning thread at startup
  may not handle EINTR properly.  This bug only manifested at
  startup and does not affect running instances.  However, this
  bug is also readily apparent on newer versions of FreeBSD
  which support the ppoll function call.

  Thanks to Mykola Golub <trociny@FreeBSD.org> for the bug report
  which led to this release.

  For systems lacking epoll_pwait (older GNU/Linux, all *BSDs),
  there is also a bugfix for systems which experience signal spam
  leading to errno clobbering in the main thread.  This bug was
  only only noticed due to a bug report against Ruby:

  	https://bugs.ruby-lang.org/issues/10866

  There is no need to upgrade if 1.4.1 is already running well
  on modern GNU/Linux systems capable of epoll_pwait.  But then
  again nginx-style SIGUSR2 upgrades are transparent to clients.

  shortlog since 1.4.2:

        Makefile.am: fix publish rule for website
        Fix assertion failure during startup
        avoid relying on ppoll as a cancellation point
        preserve errno when inside sig handler for self-pipe

1.4.2 / 2015-03-06 02:18 UTC
----------------------------

  * Makefile.am: gzip README and associated data
  * manpage: update contact and copyright information
  * update copyrights to 2014 (and all contributors)
  * doc/design.txt: add a few more notes on compromises
  * http_dav: log 500 errors from DELETE requests
  * tapset/http_access_log: note CLF differences
  * copyright updates for 2015

1.4.1 / 2014-09-07 02:21 UTC
----------------------------

  The PHP PECL MogileFS extension uses neon to handle WebDAV operations,
  and neon seems to send (valid but unfortunate) headers with empty
  string values.  Thanks to Patrice Damezin at Skyrock.com for reporting
  this bug.

  There's also a few minor cleanups.  The latest 2.6.34 stable kernel
  release no longer requires our EPOLL_CTL_MOD race workaround.  There
  are also some test suite updates for future releases of Ruby.
  Bigger changes coming later this year...

  There's also a new public mailing list at:

      cmogstored-public@bogomips.org

  No subscription will ever be necessary to post.
  Subscription is optional via:

      cmogstored-public+subscribe@bogomips.org

  Archives are available at http://bogomips.org/cmogstored-public/

  Eric Wong (11):
        minor cleanups for functions which do not return
        remove old fsck_queue declarations
        svc_dev: calling free does not need the lock
        test/mgmt: lengthen test for iostat watch
        test/http: plug race condition in FIFO test
        test/http_chunked_put: test for gigantic trailer
        update address to public mailing list
        Rakefile: remove freecode/freshmeat references
        Rakefile: shorten ChangeLog dump
        queue_epoll: disable buggy epoll workaround for 2.6.34.15+
        http_common: correctly handle empty header values

1.4.0 / 2014-02-24 03:01 UTC
----------------------------

  bsd_sendfile is now supported on Debian GNU/kFreeBSD systems.
  This release also fixes a compability bug with Perl mogstored config
  files where "daemonize = (0|1)" was not supported properly.

  Eric Wong (3):
        check for sys/sendfile.h header instead of __linux__
        allow bsd_sendfile with freebsd-glue on Debian/kFreeBSD
        support "daemonize = 0|1" in the config file

1.3.3 / 2014-02-09 04:28 UTC
----------------------------

  This release fixes build problems with Debian GNU/kFreeBSD support
  (turns out it's been broken for over a year and nobody noticed :x).
  There are also build system upgrades for automake 1.14 and test case
  cleanups, but no changes to any of the core code.  No changes nor
  need to upgrade if you're on anything other than Debian GNU/kFreeBSD.

1.3.2 / 2013-12-10 22:10 UTC
----------------------------

  This release speeds up graceful shutdown on busy systems such
  as FreeBSD.  There is also a minor resource savings for users
  of the undocumented --worker-processes switch.  There are also
  some minor memory error fixes for test cases (which did not
  affect the daemon itself).

  Upgrading is optional unless you are affected by these fixes.

  Note: GNU/Linux users are encouraged to read the manpage update
  regarding glibc malloc arenas

  Eric Wong (9):
        selfwake: do share pipe descriptors with workers
        test/chunk-parser-1: fix uninitialized file structures
        test: fix valgrind warnings in test-only C code
        doc: refer to malloc-related environment variables
        thrpool: sleep instead of yield when poking thread
        test/mgmt-usage: relax regexp for ZFS
        m4/.gitignore: bump for newer gnulib
        doc: fix wording in manpage
        doc: fix link to MogileFS homepage

1.3.1 / 2013-10-12 21:45 UTC
----------------------------

  This release fixes a bug which only affects users of the
  undocumented multi-process configuration feature
  (which is also multi-threaded).

  * avoid use-after-free with multi-process setups

    readdir on the same DIR pointer is undefined if DIR was inherited by
    multiple children.  Using the reentrant readdir_r would not have
    helped, since the underlying file descriptor and kernel file handle
    were still shared (and we need rewinddir, too).

    This readdir usage bug existed in cmogstored since the earliest
    releases, but was harmless until the cmogstored 1.3 series.

    This misuse of readdir lead to hitting a leftover call to free().
    So this bug only manifested since
    commit 1fab1e7a7f03f3bc0abb1b5181117f2d4605ce3b
    (svc: implement top-level by_mog_devid hash)

    Fortunately, these bugs only affect users of the undocumented
    multi-process feature (not just multi-threaded).

1.3.0 / 2013-09-30 08:51 UTC
----------------------------

  There are no changes from 1.3.0rc2.

  For the most part, cmogstored 1.2.2 works well, but 1.3 contains some
  fairly major changes and improvements.

  cmogstored CPU usage may be higher than other servers because it's
  designed to use whatever resources it has at its disposal to
  distribute load to different storage devices.  cmogstored 1.3
  continues this, but it should be safer to lower thread counts
  without hurting performance too much for non-dedicated servers.

  cmogstored 1.3 contains improvements for storage hosts at the
  extremes ends of the performance scale.  For large machines with many
  cores, memory/thread usage is reduced because we had too many acceptor
  threads.  There are more improvements for smaller machines, especially
  those with slow/imbalanced drive speeds and few CPUs.  Some of the
  improvements came from my testing with ancient single-core machines,
  others came from testing on 24-core machines :)

  Major features in 1.3:

  ioq - a I/O queues for all MogileFS requests
  --------------------------------------------

  The new I/O queue (ioq) implements the equivalent of AIO channels
  functionality from Perlbal/mogstored.  This feature prevents a
  failing/overloaded disk from monopolizing all the threads in the system.

  Since cmogstored uses threads directly (and not AIO), the common
  (uncontended case) behaves like a successful sem_wait with POSIX
  semaphores.  Queueing+rescheduling only occurs in the contended case
  (unlike with AIO-style APIs, where request are always queued).  I
  experimented with, but did not use POSIX semaphores as contention would
  still starve the thread pool.

  Unlike the old fsck_queue, ioq is based on the MogileFS devid in the URL
  and not the st_dev ID of the actual underlying file.  This is less
  correct from a systems perspective, but should make no difference for
  normal production deployments (which are expected to use one MogileFS
  devid for each st_dev ID) and has several advantages:

  1) testing/mock deploys of this feature with mock deploys is easier

  2) we do not require any additional filesystem syscall (open/*stat)
     to look up the ioq based on st_dev, so we can use ioq to avoid
     stalls from slow open/openat/stat/fstatat/unlink/unlinkat syscalls.

  Otherwise, the implementation of this very closely resembles the old
  fsck queue implementation, but is generic across HTTP and sidechannel
  clients.  The existing fsck queue functionality is now implemented using
  ioq.  Thus, fsck queue functionality is mapped by the MogileFS devid and
  not the system st_dev ID as a result of this change.

  One benefit of this feature is the ability to run fewer aio_threads
  safely without worrying about cross-device contention on machines with
  limited resources or few disks (or not solely dedicated to MogileFS
  storage).

  The capacity of these I/O queues is automatically scaled to the number
  of available aio_threads, so they can change dynamically while your
  admin is tuning "SERVER aio_threads = XX"

  However, on a dedicated storage node, running many aio_threads (as is
  the default) should still be beneficial.  Having more threads can keep
  the internal I/O queues of the kernel and storage hardware more
  populated and can improve throughput.

  thread shutdown fixes (epoll)
  -----------------------------

  Our previous reliance on pthreads cancellation primitives left us open
  to a small race condition where I/O events (from epoll) could be lost
  during graceful shutdown or thread reduction via
  "SERVER aio_threads = XX".  We no longer rely on pthreads cancellation
  for stopping threads and instead implement explicit check points for
  epoll.

  This did not affect kqueue users, but the code is simpler and more
  consistent across epoll/kqueue implementations.

  Graceful shutdown improvements
  ------------------------------

  The addition of our I/O queueing and use of our custom thread shutdown
  API also allowed us to improve the responsiveness and fairness when the
  process enters graceful shutdown mode.  This improves fairness and
  avoids client-side timeouts when large PUT requests are being issued
  over a fast network to slow disks during graceful shutdown.

  Currently, graceful shutdown remains single-threaded, but we will likely
  become multi-threaded in the future (like normal runtime).

  Miscellaneous fixes and improvements
  ------------------------------------

  Further improved matching for (Linux) device-mapper setups where the
  same device (not symlinks) appears multiple times in /dev

  aio_threads count is automatically updated when new devices are
  added/removed.  This is currently synced to MOG_DISK_USAGE_INTERVAL, but
  will use inotify (or the kqueue equivalent) in the future.

  HTTP read buffers grow monotonically (up to 64K) and always use aligned
  memory.  This allows deployments which pass large HTTP headers do not
  trigger unnecessary reallocations.  Deployments which use small HTTP
  headers should notice no memory increase.

  Acceptor threads are now limited to two per process instead of being
  scaled to CPU count.  This avoids excessive threads/memory usage and
  contention of kernel-level mutexes for large multi-core machines.

  The gnulib version used for building the tarball is now included in the
  tarball for ease-of-reproducibility.

  Additional tests for uncommon error conditions using the fault-injection
  capabilities of GNU ld.

  The "shutdown" command over the sidechannel is more responsive for epoll
  users.

  Improved reporting of failed requests during PUT requests.  Again, I run
  MogileFS instances on some of the most horrible networks on the planet[2]

  fix LIB_CLOCK_GETTIME linkage on some toolchains.

  "SERVER mogstored.persist_client = (0|1)" over the sidechannel is supported
  for compatibility with Perlbal/mogstored

  The Status: header is no longer returned on HTTP responses.  All known
  MogileFS clients parse the HTTP status response correctly without the
  need for the Status: header.  Neither Perlbal nor nginx set the Status:
  header on responses, so this is unlikely to introduce incompatibilities.
  The Status: header was originally inherited from HTTP servers which had
  to deal with a much larger range of (non-compliant) clients.

1.3.0rc2 / 2013-09-03 09:05 UTC
-------------------------------

  The Status: header is no longer returned on HTTP responses.  All known
  MogileFS clients parse the HTTP status response correctly without the
  need for the Status: header.  Neither Perlbal nor nginx set the Status:
  header on responses, so this is unlikely to introduce incompatibilities.
  The Status: header was originally inherited from HTTP servers which had
  to deal with a much larger range of (non-compliant) clients.

  SystemTap support is mostly fleshed out.  There are some bundled awk
  scripts which should make better sense of the all.stp which logs just
  about everything.

  Raising aio_threads now correctly increases ioq capacity.  This
  regression was only introduced in the 1.3.0 rc series, as ioq
  was not in 1.2.x.

1.3.0rc1 / 2013-07-14 02:32 UTC
-------------------------------

  For the most part, cmogstored 1.2.2 works well, but 1.3 contains some
  fairly major changes and improvements.

  cmogstored CPU usage may be higher than other servers because it's
  designed to use whatever resources it has at its disposal to distribute
  load to different storage devices.  cmogstored 1.3 will continue this,
  but it should be safer to lower thread counts without hurting
  performance too much for non-dedicated servers.

  Unfortunately, the minor, Linux-only bug affecting 1.2.2 for (uncommon)
  thread shutdowns required some fairly intrusive changes to fix, so I'm
  not sure if releasing a 1.2.3 is worth it.  If you're happy with 1.2.x,
  I recommend marking the host down via mogadm before lowering
  "SERVER aio_threads = XX" or sending SIGQUIT to cmogstored.  But
  I think thread shutdown is uncommon enough to not affect normal
  deployments.

  cmogstored 1.3 will contain improvements for storage hosts at the
  extremes ends of the performance scale.  For large machines with many
  cores, memory/thread usage is reduced because we had too many acceptor
  threads.  There are more improvements for smaller machines, especially
  those with slow/imbalanced drive speeds and few CPUs.  Some of the
  improvements came from my testing with ancient single-core machines,
  others came from testing on 24-core machines :)

  The SystemTap tracing work is still in-progress (although the 1.3 cycle
  was originally intended to focus on this :x).  I expect the remaining
  changes to be non-intrusive and will work on them through the RC cycle.

  Major features in 1.3:

  ioq - a I/O queues for all MogileFS requests
  --------------------------------------------

  The new I/O queue (ioq) implements the equivalent of AIO channels
  functionality from Perlbal/mogstored.  This feature prevents a
  failing/overloaded disk from monopolizing all the threads in the system.

  Since cmogstored uses threads directly (and not AIO), the common
  (uncontended case) behaves like a successful sem_wait with POSIX
  semaphores.  Queueing+rescheduling only occurs in the contended case
  (unlike with AIO-style APIs, where request are always queued).  I
  experimented with, but did not use POSIX semaphores as contention would
  still starve the thread pool.

  Unlike the old fsck_queue, ioq is based on the MogileFS devid in the URL
  and not the st_dev ID of the actual underlying file.  This is less
  correct from a systems perspective, but should make no difference for
  normal production deployments (which are expected to use one MogileFS
  devid for each st_dev ID) and has several advantages:

  1) testing/mock deploys of this feature with mock deploys is easier

  2) we do not require any additional filesystem syscall (open/*stat)
     to look up the ioq based on st_dev, so we can use ioq to avoid
     stalls from slow open/openat/stat/fstatat/unlink/unlinkat syscalls.

  Otherwise, the implementation of this very closely resembles the old
  fsck queue implementation, but is generic across HTTP and sidechannel
  clients.  The existing fsck queue functionality is now implemented using
  ioq.  Thus, fsck queue functionality is mapped by the MogileFS devid and
  not the system st_dev ID as a result of this change.

  One benefit of this feature is the ability to run fewer aio_threads
  safely without worrying about cross-device contention on machines with
  limited resources or few disks (or not solely dedicated to MogileFS
  storage).

  The capacity of these I/O queues is automatically scaled to the number
  of available aio_threads, so they can change dynamically while your
  admin is tuning "SERVER aio_threads = XX"

  However, on a dedicated storage node, running many aio_threads (as is
  the default) should still be beneficial.  Having more threads can keep
  the internal I/O queues of the kernel and storage hardware more
  populated and can improve throughput.

  thread shutdown fixes (epoll)
  -----------------------------

  Our previous reliance on pthreads cancellation primitives left us open
  to a small race condition where I/O events (from epoll) could be lost
  during graceful shutdown or thread reduction via
  "SERVER aio_threads = XX".  We no longer rely on pthreads cancellation
  for stopping threads and instead implement explicit check points for
  epoll.

  This did not affect kqueue users, but the code is simpler and more
  consistent across epoll/kqueue implementations.

  Graceful shutdown improvements
  ------------------------------

  The addition of our I/O queueing and use of our custom thread shutdown
  API also allowed us to improve the responsiveness and fairness when the
  process enters graceful shutdown mode.  This improves fairness and
  avoids client-side timeouts when large PUT requests are being issued
  over a fast network to slow disks during graceful shutdown.

  Currently, graceful shutdown remains single-threaded, but we will likely
  become multi-threaded in the future (like normal runtime).

  Miscellaneous fixes and improvements
  ------------------------------------

  Further improved matching for (Linux) device-mapper setups where the
  same device (not symlinks) appears multiple times in /dev

  aio_threads count is automatically updated when new devices are
  added/removed.  This is currently synced to MOG_DISK_USAGE_INTERVAL, but
  will use inotify (or the kqueue equivalent) in the future.

  HTTP read buffers grow monotonically (up to 64K) and always use aligned
  memory.  This allows deployments which pass large HTTP headers do not
  trigger unnecessary reallocations.  Deployments which use small HTTP
  headers should notice no memory increase.

  Acceptor threads are now limited to two per process instead of being
  scaled to CPU count.  This avoids excessive threads/memory usage and
  contention of kernel-level mutexes for large multi-core machines.

  The gnulib version used for building the tarball is now included in the
  tarball for ease-of-reproducibility.

  Additional tests for uncommon error conditions using the fault-injection
  capabilities of GNU ld.

  The "shutdown" command over the sidechannel is more responsive for epoll
  users.

  Improved reporting of failed requests during PUT requests.  Again, I run
  MogileFS instances on some of the most horrible networks on the planet[2]

  fix LIB_CLOCK_GETTIME linkage on some toolchains.

  "SERVER mogstored.persist_client = (0|1)" over the sidechannel is supported
  for compatibility with Perlbal/mogstored

1.2.2 / 2013-05-11 23:04 UTC
----------------------------

  This is a minor maintenance release, no need to upgrade unless
  a) your gcc defaults to -march=i386 (e.g. 32-bit CentOS 5)
  b) your device names include '-' (e.g. Linux device mapper users)

  There are also some minor doc updates to clarify tarball vs git
  installation and a trivial error-handling fix which should not
  affect any current users.

  Eric Wong (6):
        build: add check for GCC atomics
        alloc: posix_memalign does not set errno
        iostat_parser: allow '-' for device names
        test/cmogstored-cfg: ensure TMPDIR is absolute for valgrind
        INSTALL: clarify between starting from tarball vs git
        INSTALL: update versions and URLs

  cmogstored 1.3 will have some fairly intrusive internal changes
  and cleanups to make it easier for users to trace and diagnose
  system and network problems.

1.2.1 / 2013-03-04 01:33 UTC
----------------------------

  This release only fixes an assertion failure during graceful shutdown
  while MogileFS fsck is running with checksumming enabled.

  This only affects users running fsck with checksumming enabled during a
  graceful shutdown of cmogstored.  For upgrading cmogstored it is
  recommended to:

  1) stop fsck on the trackers (via "mogadm fsck stop")
  2) wait for all tracker queues to drain and stop sending
     fsck traffic to the affected host.  You may wish to
     "!want 0 fsck" on all your trackers and wait for the
     fsck workers to stop.
  3) upgrade cmogstored (in place upgrade works)

  There are also several code comment updates for internal
  components of cmogstored which may interest potential hackers.

1.2.0 / 2013-02-18 23:39 UTC
----------------------------

  This release suppors nginx-style binary upgrades via SIGUSR2.
  The behavior of this process should match that of nginx:
  http://wiki.nginx.org/CommandLine#Upgrading_To_a_New_Binary_On_The_Fly
  SIGWINCH and SIGHUP are currently no-ops and may match nginx behavior
  in the future.  They are not required for binary upgrades.

  Slow/unreliable mount points (if you have them) should have less effect
  on iostat sidechannel clients once the process is running.  Startup is
  still slow with unreliable mount points, unfortunately.

  Error handling is now graceful for thread creation failures and
  systems lacking stdio memstream support (FreeBSD).

1.1.0 / 2013-01-18 10:54 UTC
----------------------------

  cmogstored now works around an EPOLL_CTL_MOD race condition
  in old kernels.  This workaround is unneeded and disabled on
  Linux v3.0.59+, v3.2.37+, v3.4.26+, v3.5.7.3+, v3.7.3+ and v3.8+

  FreeBSD users should no longer see ECONNRESET errors on
  close(2).

  Unnecessary mkdir/mkdirat syscalls are optimistically avoided on PUT.
  MSG_MORE is used (instead of TCP_CORK) on Linux to avoid extra
  syscalls.  We also avoid POSIX_FADV_SEQUENTIAL when returning
  small responses.

1.0.0 / 2012-12-13 08:38 UTC
----------------------------

  FreeBSD support is improved:
  - ZFS support (tested on FreeBSD 9.0)
  - Fix disk usage report on FSes w/ small fragment size (ZFS)
  - faster graceful shutdown on kqueue()-based systems.
  - systems lacking open_memstream() no longer misreports OOM on SIGQUIT

  Thanks to Ask Bjørn Hansen for helping with the ZFS support

  Linux:
  - several bugfixes for handling of out-of-FD situations

  There are also minor documentation and build improvements.

0.9.0 / 2012-12-05 01:15 UTC
----------------------------

  This release handles out-of-disk-space errors from PUT and usage
  file generation more gracefully.  Failed PUTs due to client-side
  disconnects are also logged with additional debugging information.
  There are also minor internal cleanups for shutdowns.

0.8.0 / 2012-11-14 21:34 UTC
----------------------------

  HTTP connections remain persistent after a failed Content-MD5, this
  prevents the MogileFS monitor from generating excessive TIME-WAIT
  connections.

  Linux only: Idle HTTP connections are automatically closed under FD
  pressure from systems with too many trackers.  This adds zero
  overhead unless the process runs out of file descriptors.

  "server aio_threads = <digit>" is supported and attempts to mimic the
  Perlbal interface.  There is no error reporting nor reporting of the
  current thread count.  Requests are silently capped in the 1-100
  (inclusive) range.

  The "shutdown" command from Perlbal is also supported.

  The Content-Range response header (for partial requests) was missing a
  '\r' and thus not compliant with extremely strict HTTP parsers.

0.7.0 / 2012-10-17 00:06 UTC
----------------------------

  This release fixes a boundary error in Content-MD5 trailer
  reading+parsing when receiving chunked PUT requests of certain
  sizes.  This bug was found while attempting to upload a
  198689228 byte file using 16K chunks (using the Ruby
  mogilefs-client 3.4.0 to send the Content-MD5 trailer)

  There are also minor cleanups, new test cases and better error
  reporting for ENOSPC errors.

0.6.0 / 2012-10-04 09:05 UTC
----------------------------

  This release fixes a concurrency assertion failure affecting
  sidechannel+checksum users.  This bug has existed since 0.2.0, but was
  not noticed without the concurrency simplification in 0.5.0

  There are additional test cases for checksumming and gnulib updates
  for the tarball.

0.5.0 / 2012-09-10 01:14 UTC
----------------------------

  * I/O utilization is now reported correctly when multiple
    MogileFS devices share the same local filesystem

  * attempt to reduce client-side syscalls on low-latency networks
    with TCP_NOPUSH/TCP_CORK

  * remove a tiny chance of starvation when sharing work between threads
    while improving theoretical fairness

  * minor code cleanups and gnulib updates

  * acceptor threads push directly into the event queue to avoid
    poorly-defined TCP_DEFER_ACCEPT/accept filter semantics

  * acceptor threads no longer touch the FS, so they are now scaled
    to CPU count, not filesystem count

  * absolute URIs are now supported in HTTP/1.1 requests

0.4.0 / 2012-05-19 00:37 UTC
----------------------------

  * The kqueue code path now requires no more syscalls than the epoll
    code path on *BSDs.

  * avoids malloc() usage after fork() (for iostat(1)).  This works
    around some malloc() implementations which fail to reinitialize
    malloc locks after fork().

  * iostat(1) is no longer spawned for HTTP-only deployments.

  * usage files are no longer created in HTTP-only deployments.

  * FIONBIO + ioctl() is no longer required, we'll fall back to the
    slower (but POSIX) fcntl() equivalent if FIONBIO isn't available.

  * Testers without native epoll or kqueue support, the the current
    stable branch of libkqueue (r549) should work as should future
    releases.
    - svn://mark.heily.com/libkqueue/branches/stable
    - http://mark.heily.com/project/libkqueue

  Users on GNU/Linux should notice no changes from the last release
  unless they're using HTTP-only.

  I look forward to hearing from our first GNU/Hurd users!

0.3.0 / 2012-03-27 01:13 UTC
----------------------------

  This release adds support for SHA-1 digests over the sidechannel.
  There are also minor cleanups to signal handling.

0.2.2 / 2012-03-20 01:42 UTC
----------------------------

  This release fixes build errors on glibc 2.5 - 2.9 where
  _ATFILE_SOURCE is required and defined by _GNU_SOURCE.

0.2.1 / 2012-03-20 00:58 UTC
----------------------------

  This release fixes build errors on FreeBSD and Debian GNU/kFreeBSD
  systems.  There are also minor, inconsequential cleanups but
  no other changes.  GNU/Linux users may not notice any difference
  between this release and 0.2.0.

0.2.0 / 2012-03-17 23:36 UTC
----------------------------

  * Graceful shutdown support (via SIGQUIT).  This prevents process
    termination from breaking outstanding requests but will drop
    idle, persistent HTTP connections.  cmogstored stops accepting
    new connections ASAP, so it's possible to start a new cmogstored
    process (or switch back to regular Perl mogstored) almost
    immediately after sending SIGQUIT.

  * PUT creates missing directories (except for the toplevel, like
    mogstored).  MKCOL is now disabled, as forcing PUT to create
    missing directories speeds up "create_close" in MogileFS as the
    tracker no longer has to ensure directories exist.

  * Active clients have thread affinity, this prevents per-client data
    structures from ping-ponging between cores and caches.
    Idle clients that become active still retain the ability
    to migrate to _any_ idle thread and stay on it as long as it's
    active and there are free threads for other clients.

  * MD5 + fsck scheduling improvements for checksums testers.
    This limits fsck MD5 requests to one per-device.  A similar
    patch is also included with the proposed checksum extensions
    to MogileFS.  On Linux systems using the CFQ I/O scheduler,
    we drop the IO priority temporarily in an effort to prevent
    fsck traffic from impacting normal traffic.

  * Removed hard-to-test/support ENOSYS fallback support.  Building
    cmogstored is easy, so fully-supporting ENOSYS fallbacks is too
    much maintenance/testing overhead.

  * PUT uses a temporary file so incomplete files are not left on
    the filesystem.  Likewise, Content-MD5 rejections won't leave
    files on the filesystem, either.

  There are also some experimental features (not in Perl mogstored)
  documented only in git commit messages.  These features which may
  be removed, changed, or renamed in the future.  See "git log" for
  full details and rely on them at your own risk.

0.1.0 / 2012-02-12 09:45 UTC
----------------------------

  cmogstored now supports enough HTTP to support MogileFS entirely.  If
  you're willing to live on the bleeding edge, a single executable is now
  all that's needed to run a MogileFS storage node.

  HTTP features supported include pipelining, persistent connections,
  chunked PUT, partial PUT, partial GET, and Content-MD5 handling.

  Graceful shutdown/hot upgrades is not supported yet, so it's recommended
  you mark storage nodes "down" with mogadm before shutting down/upgrading
  cmogstored (you should probably do that regardless).

  While GNU/Linux systems with epoll+NPTL remain the platform of choice,
  this release is also tested on FreeBSD 9.0 and Debian GNU/kFreeBSD 6.0.

  FreeBSD 8.x and other *BSDs are likely to work, too.

  There were no fatal bugs found in 0.0.0, but this release (with HTTP
  support) is much more complex, so there may still be fatal bugs in
  it.

0.0.0 / 2012-01-12 03:39 UTC
----------------------------

  This release supports the mgmt commands (aka "mogstored_stream_port" on
  Linux 2.6 systems for now.  This will support (enough of) DAV and
  FreeBSD 8+ systems in the future.