cmogstored changelog since v1.0.0 Full changeset information is available at https://yhbt.net/cmogstored.git See NEWS file for a user-oriented summary of changes commit e1a4e5d1c0361d31fe771b6b83b0da50690635df (HEAD -> master, tag: v1.8.1) Author: Eric Wong <e@80x24.org> Date: 2021-02-13 02:20:04 +0000 cmogstored 1.8.1 - use default system stack size This release fixes a segfault on some systems/toolchains where our per-thread stack size was too small. Given the prevalance of 64-bit systems nowadays, using a small stack is unlikely to yield any benefits. Users on 32-bit systems who wish to continue with a minimal stack should use "ulimit -s" in startup scripts or configure their process manager appropriately (e.g. setting the "LimitSTACK" directive in described in systemd.exec(5)). Thanks to Xiao Yu <xyu@automattic.com> for reporting and testing at our public mailbox: https://yhbt.net/cmogstored-public/CABfxMcW+kb5gwq3pSB_89P49EVv+4UkJXz+mUPQTy19AdrwbAg@mail.gmail.com/T/ commit f441da11c290373e444771d4806dfe58d4d6d972 (origin/master) Author: Eric Wong <e@yhbt.net> Date: 2021-02-13 01:03:58 +0000 thrpool: remove stack size changing for all platforms As compilers and system C libraries change, the using a non-default stack size is too risky and can lead to difficult-to-diagnose problems. Using the default stack size seems to solve the segfaults at http_close reported by Xiao Yu <xyu@automattic.com>. Users on modern 64-bit systems were unlikely to find any benefit in using a small stack size with this code base. Users on 32-bit systems who wish to continue with a minimal stack should use "ulimit -s" in startup scripts or configure their process manager appropriately (e.g. setting the "LimitSTACK" directive in described in systemd.exec(5)). Reported-and-tested-by: Xiao Yu <xyu@automattic.com> Link: https://yhbt.net/cmogstored-public/CABfxMcW+kb5gwq3pSB_89P49EVv+4UkJXz+mUPQTy19AdrwbAg@mail.gmail.com/T/ commit fac3a390395520c10d6d0524448c9aa26768a7d1 (tag: v1.8.0) Author: Eric Wong <e@yhbt.net> Date: 2020-08-13 20:49:37 +0000 cmogstored 1.8.0 devXXX/usage files are emitted properly for systems where the mount point can't be resolved. This is needed for multi-device filesystems such as btrfs. PUT and DELETE requests now update the in-memory representation of these devXXX/usage files, since the 10s update interval may be too low for high-traffic situations. Their is a new "USAGE FILES" section in the manpage documenting the changes from 1.7.0 and this release. Our public mail archives are now available over IMAPS. gnulib is updated to 4e082bffbcc46e68 in the tarball. commit 3d4f0bbffae8eb4a111f5ef9cfc4f8997e774187 Author: Eric Wong <e@yhbt.net> Date: 2020-08-13 20:44:37 +0000 doc: add IMAPS, NNTPS and .onion archive URLs public-inbox.org started supporting IMAP + IMAPS a few months ago, and has always supported Tor .onions via NNTP. Non-TLS is still supported for older systems and users with oppressive firewalls. commit 9958e4ea86dee4a2f65656356759ac537e1bfc47 Author: Eric Wong <e@yhbt.net> Date: 2020-08-13 19:57:29 +0000 m4/.gitignore: update for gnulib 4e082bffbcc46e68 commit 4e082bffbcc46e68644ae0d59b4f09bf2b5feb84 ("sys_random: Work around an uClibc bug.") commit 3a97c98e07fdfc988199fe00f3471bb76620215b Author: Eric Wong <e@80x24.org> Date: 2020-07-22 18:40:41 +0000 http: update in-memory devXX/usage on PUT+DELETE Under heavy write traffic, free space changes constantly, and the periodic updates every 10 (or MOG_DISK_USAGE_INTERVAL) seconds can be too far behind. Since we keep the usage file contents in-memory now for out-of-FD situations, we can update that without incurring extra VFS traffic. v2: We no longer try to use fstatvfs(2) and instead pay the cost of an extra name lookups and just update all usage files. This was necessary since calculating free space while a file is still open can take a long time on some FSes and we need to send the HTTP response back ASAP to avoid timeouts on the client-side. This avoids contention in the request worker threads and the mostly idle main thread to do more work. commit d5451338548c9cbfc159c5f166a4236e70d098aa Author: Eric Wong <e@80x24.org> Date: 2020-07-22 20:03:27 +0000 doc: add "USAGE FILES" section to manpage And fix formatting of the SIGNALS section while we're at it. commit fc2f3298da5ad3496ff2ae7c1f6b5b5c4327decd Author: Eric Wong <e@80x24.org> Date: 2020-07-20 03:36:48 +0000 dev.c: emit usage for devices with unknown mount point LUKS + btrfs on Linux gives an .st_rdev value of `0', so we can't reliably figure out what the "device:" field in /devXX/usage should be without parsing /proc/partitions. Since MogileFS::Worker::Monitor only cares about the "used:" and "total:" fields, we'll just emit "(?)" in the device field. The effort of parsing /proc/partitions to correctly display a field that our only known consumer won't use is a waste of our time. commit d7626e6cd2e71ab3587d3facc561187d8f94afa4 (tag: v1.7.3) Author: Eric Wong <e@yhbt.net> Date: 2020-03-22 00:09:36 +0000 cmogstored 1.7.3 Improve RFC 7230 conformance w.r.t Content-Length and Transfer-Encoding handling in PUT requests. We now favor "Transfer-Encoding: chunked" if a Content-Length header is also present. Furthermore, we no longer accept Transfer-Encoding values aside from chunk, since we don't support gzip/compress/deflate as described in RFC 7230. commit a4af139431e74bf0c5d8c0b361c9dc154637cfb2 Author: Eric Wong <e@yhbt.net> Date: 2020-03-17 06:56:52 +0000 http: favor chunked over Content-Length RFC 7230 is actually explicit about favoring the "Transfer-Encoding: chunked" over a Content-Length header when a client specifies both. commit 5ba9c3ef8a90a64ff34dc069d4ed89f91d38606a Author: Eric Wong <e@yhbt.net> Date: 2020-03-17 06:56:51 +0000 http: reject non-chunked Transfer-Encoding RFC 7230 3.3.3, point 3 states: > If a Transfer-Encoding header field > is present in a request and the chunked transfer coding is not > the final encoding, the message body length cannot be determined > reliably; the server MUST respond with the 400 (Bad Request) > status code and then close the connection. And no MogileFS client is known to send "gzip", "deflate", or "compress" as part of the Transfer-Encoding, so we'll only accept "chunked". commit 63a57fee9e75c6fad2b146a125ac8f029773a36b Author: Eric Wong <e@yhbt.net> Date: 2020-02-19 08:17:09 +0000 build-aux/txt2pre: match '!' and '@' in URLs commit f98c236b8b70cf8d18c0d52c51bb798bb7f29bac (tag: v1.7.2) Author: Eric Wong <e@yhbt.net> Date: 2020-02-19 01:29:04 +0000 cmogstored 1.7.2 s/bogomips.org/yhbt.net/ in all documentation, due to bogomips.org expiring. The tarball is also updated with the latest gnulib changes. commit 5ee44b19b02016429b065875cf332df90fcffed3 Author: Eric Wong <e@yhbt.net> Date: 2020-02-19 01:09:57 +0000 update gnulib to f4693b0166bab83ab232dcd3cfd95906411d1110 commit 0abf9c357584c4d25d924871dc41d2b1cb9695c1 Author: Eric Wong <e@80x24.org> Date: 2020-01-18 21:08:56 +0000 s/bogomips.org/yhbt.net/, update copyrights for 2020 bogomips.org is due to expire, soon, and I'm not willing to pay extortionist fees to Ethos Capital/PIR/ICANN to keep a .org. So it's at yhbt.net, for now, but it will change again to whatever's affordable... Identity is overrated. Tor users can use .onions and kick ICANN to the curb: torsocks w3m http://cmogstored.ou63pmih66umazou.onion/ torsocks git clone http://ou63pmih66umazou.onion/cmogstored.git/ torsocks w3m http://ou63pmih66umazou.onion/cmogstored-public/ commit 14d8e8b1fd9720bf0061a1050ec24e4833fcf8cb Author: Eric Wong <e@80x24.org> Date: 2019-09-21 20:57:43 +0000 TODO: a few low-priority items... commit 3943bc911e78b0df2308c0ddd8930d9da3c996fd (tag: v1.7.1) Author: Eric Wong <e@80x24.org> Date: 2019-05-11 20:13:14 +0000 cmogstored 1.7.1 - Linux 5.0/5.1 epoll_pwait workaround The Linux kernel bugfix should hit mainline and stable kernels, soon. But there's no reason for us to be caring if errno is EINTR or not... cf. https://lore.kernel.org/lkml/20190427093319.sgicqik2oqkez3wk@dcvr/ https://lore.kernel.org/lkml/20190507043954.9020-1-deepa.kernel@gmail.com/ There are also some minor build/test updates since v1.7.0 (2018-12-18): test/mgmt-usage.rb: fix mismatched indentation warning add .gitattributes for Ruby files test/mgmt_auto_adjust.rb: improve diagnostic messages .gitignore: add extra ignores for gnulib in Debian 9 notify.c: workaround epoll_pwait bug in current Linux 5.0/5.1 doc: remove mailing list subscription info commit a92453217ef516f205e7bfcb81c7c6b2c5b3ac88 Author: Eric Wong <e@80x24.org> Date: 2019-05-12 00:45:44 +0000 build: add .gitattributes to EXTRA_DIST commit a37ba5f890ad9ef17a5845c4c1740c44bfe74784 Author: Eric Wong <e@80x24.org> Date: 2019-05-11 08:07:34 +0000 doc: remove mailing list subscription info Mail subscriber lists are centralized data which is not commonly forkable or reproducible. Mail archives are more important. commit af80cb709474cac2eaae29bf33facdc9e13af20d Author: Eric Wong <e@80x24.org> Date: 2019-05-11 07:50:22 +0000 notify.c: workaround epoll_pwait bug in current Linux 5.0/5.1 The bugfix should hit mainline and stable kernels, soon; but there's no reason for us to be caring if errno is EINTR, or not... https://lore.kernel.org/lkml/20190427093319.sgicqik2oqkez3wk@dcvr/ https://lore.kernel.org/lkml/20190507043954.9020-1-deepa.kernel@gmail.com/ commit 79b949e62b8a6f6f0047edcd2bf20970481a94b1 (meltdown/master) Author: Eric Wong <e@80x24.org> Date: 2019-04-27 21:29:43 +0000 .gitignore: add extra ignores for gnulib in Debian 9 I usually use gnulib.git, but not everybody does and it's worth cleaning things up a bit for this common case. Tested with gnulib 20140202+stable-2+deb9u1 in Debian 9 (stretch). Further updates may be needed for other common distros which package gnulib. commit cd9baef52c4d6f03b6d474edf9863fd9bcfb6c31 Author: Eric Wong <e@80x24.org> Date: 2019-04-27 21:23:58 +0000 test/mgmt_auto_adjust.rb: improve diagnostic messages Chasing down a regression in Linux 5.0: https://lkml.kernel.org/r/20190427093319.sgicqik2oqkez3wk@dcvr commit 8ad2972f425fa7ebbc46ae9cb95be613a86265b1 Author: Eric Wong <e@80x24.org> Date: 2019-04-27 20:19:16 +0000 add .gitattributes for Ruby files This hopefully makes our other changes easier-to-read. commit 278e31821308e4a510f03d6c3a316a34361a0e55 Author: Eric Wong <e@80x24.org> Date: 2019-04-26 19:56:30 +0000 test/mgmt-usage.rb: fix mismatched indentation warning Newer versions of Ruby warn on it commit c1226981ec311d96ccfb3bce259e48538a1dbbf4 (tag: v1.7.0) Author: Eric Wong <e@80x24.org> Date: 2018-12-18 03:40:23 +0000 cmogstored 1.7.0 The big feature in this release is "devNNN/usage" are served from memory, allowing up-to-date usage information even unwritable/unreadable filesystems. This can also be used to reduce spinups and wear on HDDs. "devNNN/usage" files are still updated on the FS by default for compatibility with existing HTTP servers, but admins may wish to disable updates to them by removing all permissions from the "usage" files: chmod 0000 $MOG_DOCROOT/dev*/usage Filesystem errors from the sendfile(2) syscalls are also logged to syslog. There's also a bugfix for zombies for libkqueue-on-epoll users, but that doesn't affect native kqueue users on *BSDs. And the usual round of gnulib, minor doc and style updates. 18 changes since v1.6.0: cmogstored.h: remove unused mog_file.mmptr member doc: documentation for ioq doc: further comment updates around ioq build-aux/txt2pre: support '=' in URLs test/inherit: fix ambiguous parenthese warning test/inherit: stop testing Ruby itself doc: update URLs to HTTPS compat_sendfile: ensure this works without an offset doc/queues.txt: add key point about only retrieving ONE event fix trace.h dependency on probes.h update to gnulib.git 90f289f249a266b1afb9c63e182f5d979d17df5f http_get.c: log filesystem-level errors from sendfile serve /dev*/usage requests from memory doc: URL updates to reduce redirects and favor HTTPS test/inherit.rb: fix syntax error under Ruby 1.8 update copyrights for 2018 and use SPDX for "GPL-3.0+" selfwake: enable self-pipe with kqueue http_parser: workaround parsing OOM in Ragel 6.10 commit bd144a77fae53a2c02f2ccda7e309ff46f739fb2 Author: Eric Wong <e@80x24.org> Date: 2018-12-07 23:56:24 +0000 http_parser: workaround parsing OOM in Ragel 6.10 Noticed in FreeBSD 11.2 where Ragel 6.10 was OOM-ing, this doesn't affect Ragel 6.9. TODO: make sure this is fixed upstream in Ragel. commit 86ace01fed5ed39a48e6d21810fec93f976baa97 Author: Eric Wong <e@80x24.org> Date: 2018-11-29 19:35:55 +0000 selfwake: enable self-pipe with kqueue This was causing my libkqueue build to stall on Linux where epoll_pwait exists. We actually favor kqueue in the code for testing purposes, so we need to enable the self-wake pipe when using libkqueue if epoll_pwait is detected. commit a4bff7526f9c9f642767e254463f22ba2c10f507 Author: Eric Wong <e@80x24.org> Date: 2018-11-28 02:03:58 +0000 update copyrights for 2018 and use SPDX for "GPL-3.0+" copyrights updated by "update-copyright" in gnulib: git ls-files | UPDATE_COPYRIGHT_HOLDER='all contributors' \ UPDATE_COPYRIGHT_USE_INTERVALS=2 \ xargs /path/to/gnulib/build-aux/update-copyright While we're at it, SPDX seems to be the accepted way to identify licenses nowadays, so lets use it. git ls-files | xargs perl -i -p -e \ 's,GPLv3 or later.*,GPL-3.0+ <https://www.gnu.org/licenses/gpl-3.0.txt>,g' commit 2c22379e811f9ab3f2692d459c075bb193f05e89 Author: Eric Wong <e@80x24.org> Date: 2018-07-10 20:59:23 +0000 test/inherit.rb: fix syntax error under Ruby 1.8 Not sure if it's worth supporting 1.8, anymore, but parts of the Ruby VM test and benchmark suite still remain 1.8-compatible... commit 7329eab49ef73521be6aaf03d32d0403f60b432c Author: Eric Wong <e@80x24.org> Date: 2018-11-28 01:17:12 +0000 doc: URL updates to reduce redirects and favor HTTPS HTTPS is usually more secure and redirects slow readers down. commit ce60ea3e4cf733aadd4ecb8ca08e2f81b498d67c Author: Eric Wong <e@80x24.org> Date: 2018-11-27 02:03:17 +0000 serve /dev*/usage requests from memory Filesystems may become unwritable and out-of-date "usage" files will cause trackers to see out-of-date information. We still write to the filesystem by default for compatibility with existing HTTP servers. However, giving the "usage" file a 0000 mode will prevent cmogstored from overwriting it. This allows admins to also reduce wear on storage devices: chmod 0000 $mogroot/dev*/usage commit 6567228f718578373e771f92dd69daaa716fbbed Author: Eric Wong <e@80x24.org> Date: 2018-07-09 07:37:52 +0000 http_get.c: log filesystem-level errors from sendfile Socket errors are too common to log (especially from malicous clients), but filesystem errors are rare and important. commit 4a62891c7487a6776ed7112184ef54091e04a6a1 Author: Eric Wong <e@80x24.org> Date: 2018-06-02 08:41:02 +0000 update to gnulib.git 90f289f249a266b1afb9c63e182f5d979d17df5f commit 780a9a25bcf657a7a81a28deea28248f7ad19d5a Author: Eric Wong <e@80x24.org> Date: 2018-06-02 08:37:38 +0000 fix trace.h dependency on probes.h I haven't tested with systemtap, lately; maybe something bit rotted. commit 8c8916c5b7a2afdb5dcb5cc88b4e1d28fe8a5acc Author: Eric Wong <e@80x24.org> Date: 2017-10-24 18:55:45 +0000 doc/queues.txt: add key point about only retrieving ONE event This had become such second nature to me that I forgot to document it :x commit 93c11990d215c678d254a56a7b3bc63e3a53e0de Author: Eric Wong <e@80x24.org> Date: 2017-03-11 00:57:09 +0000 compat_sendfile: ensure this works without an offset While we never call sendfile without an offset, some projects may copy our code and want to use it without an offset. commit e5c86cb81fceb3b37019aed701b475bf40802c10 Author: Eric Wong <e@80x24.org> Date: 2017-02-09 23:07:44 +0000 doc: update URLs to HTTPS HTTPS seems to be working well for the rest of bogomips.org with Let's Encrypt, so lets use it and hope it protects some users from snooping. commit 5da00e0746fd4984dde43a859271398902b26c52 Author: Eric Wong <e@80x24.org> Date: 2017-01-03 06:21:00 +0000 test/inherit: stop testing Ruby itself TCPSocket.new raises exceptions on failure. commit d9484e983eed4be2d8457cebf6adeca39e9f1576 Author: Eric Wong <e@80x24.org> Date: 2017-01-03 06:15:54 +0000 test/inherit: fix ambiguous parenthese warning Who tests the tests? commit 554869605ce9d5987d5689a071e68583ad3d5b98 Author: Eric Wong <e@80x24.org> Date: 2016-12-24 09:16:42 +0000 build-aux/txt2pre: support '=' in URLs We'll just have everything on r******** soon, I hope :p commit 1333a3de16d7a9192286195da241ef195dbc556a Author: Eric Wong <e@80x24.org> Date: 2016-12-24 08:50:34 +0000 doc: further comment updates around ioq Let's not forget about this queue, it is a useful design. commit eb41c7abde797e03dd51c0bc945f0298d0fe235c Author: Eric Wong <e@80x24.org> Date: 2016-12-24 08:30:46 +0000 doc: documentation for ioq It's a queue that looks like a semaphore, so document it in doc/queues.txt and provide pointers to perhaps-forgotten documentation. commit 5752a8d1b051b1cb4e4d62e6fd1afbeb28ce7eaf Author: Eric Wong <e@80x24.org> Date: 2016-12-16 11:04:28 +0000 cmogstored.h: remove unused mog_file.mmptr member This was intended for zero-copy PUT support, but that is probably not worth it due to checksumming and the general unpredictability of mmap/munmap performance, especially on non-Linux systems. commit 99958cf048f3f1d234b2155558e0fc672848e8e9 (tag: v1.6.0) Author: Eric Wong <e@80x24.org> Date: 2016-08-31 03:05:16 +0000 cmogstored 1.6.0 - minor fixes on allocation errors There are minor robustness fixes on handling errors when allocating memory or spawn failures on otherwise-hosed systems. These bugfixes will not affect real users unless the system is already hosed or in badly overtaxed, so there's no real need to upgrade. There are minor portability improvements and I now test under FreeBSD 10.x. The iostat test cases are relaxed a bit to account for virtualized devices (as iostat is less useful with modern 17 changes since 1.5.0 (Nov 2015): Rakefile: add missing <div> for Atom feed test/pwrite-wrap: remove unused variable and comment test/pwrite_wrap: squelch unnecessary output test/pwrite_wrap: reduce space overhead required update copyrights for 2016 build-aux/txt2pre: drop CGI.pm requirement stdin is always redirected to /dev/null minor vfork/fork safety fixes process: try to handle OOM gracefully http_put: gracefully handle path allocation errors iostat_process: declare environ extern test/mgmt: relax checks for iostat mapping gnulib copyright update for 2016 upgrade: avoid syslog call if execve fails rely on gnulib for environ portability INSTALL: update latest Debian stable version to 8.x README: stop mentioning cgit commit 189e8e5646136d9524f51cfab9081c59584740bb Author: Eric Wong <e@80x24.org> Date: 2016-08-31 03:04:28 +0000 README: stop mentioning cgit I do not expect to run it much longer since it contains CSS and renders poorly without it. commit d4580f32cc1b78626336cbece796d64a56c55b73 Author: Eric Wong <e@80x24.org> Date: 2016-08-26 21:09:49 +0000 INSTALL: update latest Debian stable version to 8.x Debian 8.x (jessie) was released over a year ago :x commit 026d9f4d635ac360f9d349ffcb50a8252719730e (origin/gl-env, pre16) Author: Eric Wong <e@80x24.org> Date: 2016-07-18 07:17:41 +0000 rely on gnulib for environ portability This avoids warnings on my GNU system while still working on FreeBSD. commit 53030c527eaac6ea2d6acbf501569d575fef9d41 Author: Eric Wong <e@80x24.org> Date: 2016-07-17 12:52:42 +0000 upgrade: avoid syslog call if execve fails We cannot safely call syslog on all platforms under vfork; but we have normal exit handling to tell us of the presence of execve errors, just not which. commit 1f4e95f5887521d8df3b7cd3d4da612066d03ea6 Author: Eric Wong <e@80x24.org> Date: 2016-07-17 05:22:49 +0000 gnulib copyright update for 2016 commit 360ef6aee6b1e4e0855377a343a6e39263b15daa Author: Eric Wong <e@80x24.org> Date: 2016-07-17 05:16:24 +0000 test/mgmt: relax checks for iostat mapping In the age of virtualized devices and fast solid-state storage, iostat information isn't as useful at it was a decade ago and probably less useful in tests. So relax the tests. commit 7389b9ba076ffd49d5c37113809f46c2bf1f38f3 Author: Eric Wong <e@80x24.org> Date: 2016-07-17 04:33:19 +0000 iostat_process: declare environ extern This is necessary for FreeBSD and probably other non-GNU systems. commit 504a5ced05f48bf8cc1a08b22ce8830b6db98d41 Author: Eric Wong <e@80x24.org> Date: 2016-06-01 22:32:37 +0000 http_put: gracefully handle path allocation errors Failing to allocate memory should be a temporary error and be non-fatal. commit a03ccc608f68f44122f83dbde7bb09e9acbbc185 Author: Eric Wong <e@80x24.org> Date: 2016-06-01 22:32:29 +0000 process: try to handle OOM gracefully If we fail to register a process, it is not fatal since a process is already running. However, we may not know about when to restart it when it dies. commit a19f6bf70866e9fed34c7220f8a83d8486102821 Author: Eric Wong <e@80x24.org> Date: 2016-06-01 03:06:56 +0000 minor vfork/fork safety fixes In case "/bin/sh" or "/dev/null" becomes unavailable during the lifetime of cmogstored, we will no longer crash when attempting to (re)start iostat. However, your system is probably hosed anyways if "/bin/sh" or "/dev/null" become unavailable. This also fixes a bug where we would leak the iostat pipe if either fork/vfork fails. We also close an innocuous race condition where the child might toggle flags in the parent process and trigger an extra wakeup. Finally, we use sigprocmask in the child in case pthread_sigmask does not not work on some systems after forking. This is likely only a cosmetic change. commit d413151f5c0e3ccd3c7c7fe9d1db9112e7e83561 Author: Eric Wong <e@80x24.org> Date: 2016-06-01 03:06:55 +0000 stdin is always redirected to /dev/null There is no reason for stdin to ever be connected to a terminal, ensure we have a consistent stdin for iostat processes and the like. commit c6b7757b241baf82be9aec9b937478881ab0d282 Author: Eric Wong <e@80x24.org> Date: 2016-05-29 12:31:06 +0100 build-aux/txt2pre: drop CGI.pm requirement CGI.pm is no longer in the main Perl distro, so depending on it is not worth the effort for a few lines. commit a6ba02f02e4319c0bf5b8aa000ef6851905185b4 Author: Eric Wong <e@80x24.org> Date: 2016-05-29 06:14:16 +0000 update copyrights for 2016 git ls-files | UPDATE_COPYRIGHT_HOLDER='all contributors' \ UPDATE_COPYRIGHT_USE_INTERVALS=2 \ xargs /path/to/gnulib/build-aux/update-copyright commit fa172db40c58ddb3894c5eec968b29df466f6b4c Author: Eric Wong <e@80x24.org> Date: 2016-05-29 06:14:15 +0000 test/pwrite_wrap: reduce space overhead required It's probably overkill to use 100G of space, even if its sparse. commit f57e755430ca14e2b559cbdb41825e87b82f6225 Author: Eric Wong <e@80x24.org> Date: 2016-02-01 10:49:29 +0000 test/pwrite_wrap: squelch unnecessary output Oops, leftover from development many years ago. commit f66d2b4739eb9439280ac29dc6e4ef45802157f5 Author: Eric Wong <e@80x24.org> Date: 2016-02-01 10:23:23 +0000 test/pwrite-wrap: remove unused variable and comment They were blindly copied and s/search/replace/-ed from epoll-wrap.c commit 49bd505dd305b381e4fff7f6d2e18d649a446e03 Author: Eric Wong <e@80x24.org> Date: 2015-11-28 01:39:08 +0000 Rakefile: add missing <div> for Atom feed Apparently this is needed for proper XHTML rendering in iceweasel? commit 907c073738636b44919dc8d3c79a6cbf38114e64 (tag: v1.5.0) Author: Eric Wong <e@80x24.org> Date: 2015-11-20 05:41:03 +0000 cmogstored 1.5.0 - vfork, systemd, 416 codes A bunch of minor changes; most notable is systemd-style socket activation support. This was easy-to-add since we've always had socket activation support for nginx-style SIGUSR2 upgrades. This places no link or runtime dependency on libsystemd, so the LISTEN_FDS and LISTEN_PID environment variables may be used in other init systems as well. While I have my own reservations about systemd itself, I also strongly believe in using socket activation to prevent downtime. Existing behavior with CMOGSTORED_FD (used for SIGUSR2 upgrades) is now documented in the manpage and will always supported. We've also added vfork support for Linux systems, allowing faster spawning of iostat if malloc is using too much memory. Behavior changes: Bad Range: headers return 416 responses in more cases for invalid ranges (e.g. miscalculated ranges such as "1--1", while completely wrong ones (lacking a "bytes=" prefix) are ignored entirely as in nginx. Bugfixes: There are also some cleanups to avoid dying on OOM in more places on weird systems which trigger OOM. More work on this is ongoing. Also updates to the latest gnulib.git commit 71d39c1644762745b94e9449c45bfd716a79a5eb ("autoupdate") along with a change which fixes a memory leak when people build from cmogstored.git using gnulib commit c6148bca89e9465fd6ba3a10d273ec4cb58c2dbe or later ("mountlist: add me_mntroot field on Linux machines"). This memory leak did not affect any released tarballs of cmogstored. Note, users building from git (as opposed to the tarball) will need gnulib commit 41d1b6c42641a5b9e21486ca2074198ee7909bd7 ("mountlist: add support for deallocating returned list entries") or later (from July 2013). There are also various documentation updates and our mailing list is now readable over NNTP: nntp://news.public-inbox.org/inbox.comp.file-systems.mogilefs.cmogstored commit 4f7e8edf9f3bf734ca6bfb56756ef7cd90ffb32e Author: Eric Wong <e@80x24.org> Date: 2015-11-20 21:35:34 +0000 require newer gnulib for free_mount_entry support gnulib commit 41d1b6c42641a5b9e21486ca2074198ee7909bd7 ("mountlist: add support for deallocating returned list entries") or later (from July 2013) is needed for free_mount_entry support introduced in our commit 1225f9ce4c32b3bba61ce92a487d99260a001995 ("use free_mount_entry from gnulib instead of rolling our own"). commit d3ad6ed40305cecb1abfe30fb9bf9db047b45e07 Author: Eric Wong <e@80x24.org> Date: 2015-11-20 03:23:22 +0000 Makefile.am: distribute txt2pre in tarball Oops. commit 6e5770899aa986cc6b7e59f084853d87f03166ed Author: Eric Wong <e@80x24.org> Date: 2015-11-20 03:15:04 +0000 add cmogstored manpage to website Sometimes people will forget to install the manpage, make sure it's online in plain-text or HTML format. commit c0da4eb6eeb4bec9b70aede7176a91f536e5bbe8 Author: Eric Wong <e@80x24.org> Date: 2015-11-20 01:56:27 +0000 misc doc updates Generate pre-formatted HTML which gives us a consistent visual style with our mailing list archives and enhance linkability. <a>, <pre>, and <title> are among the few useful HTML tags I'll use :P Drop the AUTHORS file, it's pointless maintenance task and users can just look at git history instead (and honestly, I have zero interest in recognition; I only use my real name to deter GPL violations). commit fd0a3959bb678e94719bfa454c8b3742635ca98c Author: Eric Wong <e@80x24.org> Date: 2015-11-20 01:10:43 +0000 README: update contact information Most notably, our mailing list is now available over NNTP. Stop advertising ssoma since it's too much to expect users would be willing to install and use yet another new tool when NNTP is already standardized and our NNTP server is pretty efficient. commit b45c3852dfcc153ef6ac820ec5fd54265845d296 Author: Eric Wong <e@80x24.org> Date: 2015-11-13 21:47:50 +0000 use vfork under Linux before execve Given the prevalance of gigantic VM footprints due to current glibc malloc and our potentially large number of threads, vfork can speed up fork used for spawning iostat and SIGUSR2 upgrades. vfork only pauses the spawning thread, so it will not affect other I/O threads used in cmogstored; only the non-performance-critical master thread. Swapping 'fork()' for 'vfork()' in the following C test program should show a large speedup under Linux. Changing FILL to increase or decrease memory usage will respectively decrease or increase performance improvement gain from vfork over fork.. -----------------------------8<------------------------- /* gcc -o x x.c -Wall -O2 -lpthread && ./x */ #include <sys/types.h> #include <sys/time.h> #include <unistd.h> #include <pthread.h> #include <poll.h> #include <stdio.h> #include <sys/wait.h> #include <stdlib.h> #include <string.h> #define FILL (1024 * 1024) static void *thfunc(void *p) { void *ptr = malloc(FILL); memset(ptr, 1, FILL); poll(0, 0, -1); return 0; } int main(void) { long i; void *ptr = malloc(FILL); memset(ptr, 1, FILL); for (i = 0; i < 100; i++) { pthread_t th; pthread_create(&th, 0, thfunc, (void *)i); } poll(0, 0, 1000); for (i = 0; i < 100; i++) { /* swapping fork with vfork increases performance on Linux */ pid_t pid = fork(); if (pid < 0) { fprintf(stderr, "ERROR: forking %m\n"); return 1; } if (pid == 0) { char *argv[] = { "/bin/true", 0 }; char *env[] = { 0 }; execve(argv[0], argv, env); return 1; } else { int s; waitpid(pid, &s, 0); } } return 0; } commit 4fb84823e327de868a14c4158cedf1dc7751d3fc Author: Eric Wong <e@80x24.org> Date: 2015-11-11 22:48:46 +0000 doc: document CMOGSTORED_FDS in the manpage This has always been supported internally, and we can't stop supporting it since we'll be supporting upgrades from old versions indefinitely. So document it, as it has some minor advantages over the LISTEN_{FDS,PID} environment handling of systemd. commit 1a08a350c0b504ff31acf0e3ac0b6cdfe75ef521 (tag: v1.5.0rc1) Author: Eric Wong <e@80x24.org> Date: 2015-11-11 21:13:26 +0000 cmogstored 1.5.0rc1 A bunch of minor changes; most notable is systemd-style socket activation support. This was easy-to-add since we've always had socket activation support for nginx-style SIGUSR2 upgrades. This places no link or runtime dependency on libsystemd, so the LISTEN_FDS and LISTEN_PID environment variables may be used in other init systems as well. While I have my own reservations about systemd itself, I also strongly believe in using socket activation to prevent downtime. Behavior changes: Bad Range: headers return 416 responses in more cases for invalid ranges (e.g. miscalculated ranges such as "1--1", while completely wrong ones (lacking a "bytes=" prefix)) are ignored entirely as in nginx. Bugfixes: There are also some cleanups to avoid dying on OOM in more places on weird systems which trigger OOM. More work on this is ongoing. Also updates to the latest gnulib.git commit f197c2c9e5e0d12c373f26d5b3211809457bc972 ("intprops: new public macro EXPR_SIGNED") along with a change which fixes a memory leak when people build from cmogstored.git using gnulib commit c6148bca89e9465fd6ba3a10d273ec4cb58c2dbe or later ("mountlist: add me_mntroot field on Linux machines"). This memory leak did not affect any released tarballs of cmogstored. shortlog of changes since 1.4.3: doc: use "builder" RubyGem to generate Atom feed dev.c: fail gracefully on out-of-memory errors do not die on OOM when for mgmt paths HACKING: update URLs to reduce redirects http: return 416 errors in more cases for bad Ranges update .gitignores for latest autotools + gnulib Rakefile: remove text-only part from the Atom feed support systemd-style socket activation via environment set TCP listener options on inherited sockets doc: add example systemd config files use free_mount_entry from gnulib instead of rolling our own fix tmpdir dependency for slow Ruby tests doc: publish examples directory to website commit 961d5ba545995250c7f2ca26600c0248ac3120f9 Author: Eric Wong <e@80x24.org> Date: 2015-11-11 21:10:19 +0000 doc: publish examples directory to website This might improve visibility of these scripts for use with systemd. commit 23123554d4bf85246aeb80bc837fd94add5b4269 Author: Eric Wong <e@80x24.org> Date: 2015-11-11 10:43:29 +0000 fix tmpdir dependency for slow Ruby tests .slowrb tests have a different suffix and the test dependencies need to be split out separately. commit 1225f9ce4c32b3bba61ce92a487d99260a001995 Author: Eric Wong <e@80x24.org> Date: 2015-11-11 03:56:47 +0000 use free_mount_entry from gnulib instead of rolling our own gnulib.git added the me_mntroot element in commit c6148bca89e9465fd6ba3a10d273ec4cb58c2dbe, so we would leak memory during filesystem refreshes as a result :x Use the gnulib-provided API (free_mount_entry) instead of freeing elements ourselves. commit 25e23de2bb67ed65abb535a01ea502c78113f83a Author: Eric Wong <e@80x24.org> Date: 2015-11-11 03:43:36 +0000 doc: add example systemd config files Since we'll support systemd, it's not a bad idea to include reasonable example files for users. commit 42a65a32623158c5bdce234b1b431b9f5093da70 Author: Eric Wong <e@80x24.org> Date: 2015-11-11 03:38:47 +0000 set TCP listener options on inherited sockets systemd users may not set the correct TCP socket options for us, so be sure to set TCP_NODELAY, SO_KEEPALIVE, and use a sufficiently large listen backlog to avoid hurting performance for users who bind sockets outside of cmogstored. commit 0312c1e6220ef4280268a0f48f24db90738037bd Author: Eric Wong <e@80x24.org> Date: 2015-11-11 01:43:06 +0000 support systemd-style socket activation via environment While I have my reservations about systemd, socket activation alone is a good idea and we already have existing infrastructure for supporting it in SIGUSR2 upgrades. We are intentionally avoiding linkage to libsystemd to avoid dealing with ABI compatibility issues between old and new systems. This also allows us to integrate more easily with non-systemd systems which use the same environment variables as systemd. commit 97ade9d8d5d751c197b61faee5f3ae6589b6b432 Author: Eric Wong <e@80x24.org> Date: 2015-11-10 20:25:49 +0000 Rakefile: remove text-only part from the Atom feed The pre-formatted HTML is readable as raw XML, and feed readers tend to have no problem rendering the HTML, so there's no point in nearly doubling our bandwidth usage on the text-only part given we're already serving XML. While we're at it, disable XML indentation to avoid wasting space; it doesn't significantly hamper readability, either. commit 0c7c2d0c7d4cb89704c4e75c7194edf2bfd59686 Author: Eric Wong <e@80x24.org> Date: 2015-11-10 01:22:59 +0000 update .gitignores for latest autotools + gnulib Tested on automake 1:1.14.1-4 on Debian jessie, and automake 1:1.11.6-1 on Debian wheezy. gnulib was tested on commit 36d982f39b683d0266b9c6ff1e01cbfc94bd97f6 ("test-timespec: fix typo in previous change") from git://git.savannah.gnu.org/gnulib.git commit f715f6f228f9da83309a515a94de26fa3766b230 Author: Eric Wong <e@80x24.org> Date: 2015-11-09 00:51:32 +0000 http: return 416 errors in more cases for bad Ranges For completely unparseable Range: headers, we'll ignore them entirely as nginx does. However, if /bytes=/ is matched, we'll start returning 416 errors instead of 400. commit 225d5fb10474d853261b6ee2f9ceeff9c2bd73c6 Author: Eric Wong <e@80x24.org> Date: 2015-08-29 05:22:27 +0000 HACKING: update URLs to reduce redirects The ragel link no longer worked, actually... commit 45bfce46d24db91d25b85a5115c2b41d4a1484fc Author: Eric Wong <e@80x24.org> Date: 2015-08-23 20:49:52 +0000 do not die on OOM when for mgmt paths This also makes trywrite OOM-aware and will simulate a write error on allocation. commit 7754b9ffc1b496170498f78fd2f05409dd0fb962 Author: Eric Wong <e@80x24.org> Date: 2015-08-17 06:00:30 +0000 dev.c: fail gracefully on out-of-memory errors The rest of cmogstored shall be updated to fail gracefully on OOM in due time. It may take a while, since not many systems encounter this, but we shall become more robust as time goes on. commit dc55a5b5bdd60850553ebf01adfe357d2a2a68b8 Author: Eric Wong <e@80x24.org> Date: 2015-07-28 20:58:49 +0000 doc: use "builder" RubyGem to generate Atom feed Nokogiri takes too long to build and install due to the C extension and bundled library. Prefer a widely-used pure-Ruby gem instead. commit a76af438da94e0d7211d4602b7fb00f2beb5e74e (tag: v1.4.3) Author: Eric Wong <e@80x24.org> Date: 2015-03-09 22:51:59 +0000 cmogstored 1.4.3 - mostly non-GNU/Linux fixups For all platforms, the startup device scanning thread at startup may not handle EINTR properly. This bug only manifested at startup and does not affect running instances. However, this bug is also readily apparent on newer versions of FreeBSD which support the ppoll function call. Thanks to Mykola Golub <trociny@FreeBSD.org> for the bug report which led to this release. For systems lacking epoll_pwait (older GNU/Linux, all *BSDs), there is also a bugfix for systems which experience signal spam leading to errno clobbering in the main thread. This bug was only only noticed due to a bug report against Ruby: https://bugs.ruby-lang.org/issues/10866 There is no need to upgrade if 1.4.1 is already running well on modern GNU/Linux systems capable of epoll_pwait. But then again nginx-style SIGUSR2 upgrades are transparent to clients. shortlog since 1.4.2: Makefile.am: fix publish rule for website Fix assertion failure during startup avoid relying on ppoll as a cancellation point preserve errno when inside sig handler for self-pipe commit d33c7cba557ae40fb55446d841e084a74eacb425 Author: Eric Wong <e@80x24.org> Date: 2015-03-09 21:18:00 +0000 preserve errno when inside sig handler for self-pipe We must not clobber errno of the main thread inside signal handler in case write fails. This bug only affects systems without epoll_pwait where the self-pipe is required, so it does not affect modern GNU/Linux systems; but does affect FreeBSD systems and anybody else relying on kqueue. Thanks to Steven Stewart-Gallus for a Ruby bug report which inspired this fix: https://bugs.ruby-lang.org/issues/10866 Cc: Mykola Golub <trociny@FreeBSD.org> Cc: Steven Stewart-Gallus <sstewartgallus00@mylangara.bc.ca> commit 58cec82abb8b1e6feea090c72806bd9d8f693a37 Author: Eric Wong <e@80x24.org> Date: 2015-03-09 20:40:25 +0000 avoid relying on ppoll as a cancellation point While glibc supports ppoll, ppoll is not standardized and apparently is not a cancellation point in some versions FreeBSD based on Mykola Golub's bug report in <20150309151851.GC2195@gmail.com> Reported-by: Mykola Golub <trociny@FreeBSD.org> commit c659acbb8d7a6b0c8098646981124a47f15cceae Author: Eric Wong <e@80x24.org> Date: 2015-03-09 20:22:23 +0000 Fix assertion failure during startup During the initial device scan, it is possible for the waiter to be interrupted while awaiting cancellation. We must account for this on all platforms regardless of whether pselect or ppoll is used. Reported-by: Mykola Golub <trociny@FreeBSD.org> commit 1a7f32d0d8a48b9f26f595d0fa9f5db0c657bc3a Author: Eric Wong <e@80x24.org> Date: 2015-03-06 02:30:07 +0000 Makefile.am: fix publish rule for website Oops, we cannot have zero-byte gzipped files :x commit 03f957c256ca7a686e097779433eca73fcda22a6 (tag: v1.4.2) Author: Eric Wong <e@80x24.org> Date: 2015-03-06 02:18:14 +0000 cmogstored 1.4.2 * Makefile.am: gzip README and associated data * manpage: update contact and copyright information * update copyrights to 2014 (and all contributors) * doc/design.txt: add a few more notes on compromises * http_dav: log 500 errors from DELETE requests * tapset/http_access_log: note CLF differences * copyright updates for 2015 commit 4ca5fa15f45dd8512d1244db1ca24d5624e483d4 Author: Eric Wong <e@80x24.org> Date: 2015-03-06 02:03:34 +0000 copyright updates for 2015 Via update-copyright in gnulib, also added a few copyrights to non-trivial files. git ls-files | UPDATE_COPYRIGHT_HOLDER='all contributors' \ UPDATE_COPYRIGHT_USE_INTERVALS=2 \ xargs /path/to/gnulib/build-aux/update-copyright commit 3e14979ef533e41fe72f7e68afd533a3cc87471d Author: Eric Wong <e@80x24.org> Date: 2015-02-13 00:49:28 +0000 tapset/http_access_log: note CLF differences We have two differences from CLF, note them correctly. commit 10ae48e0880fc76d1f2044f80e20004491801663 Author: Eric Wong <e@80x24.org> Date: 2015-02-05 21:52:08 +0000 http_dav: log 500 errors from DELETE requests Errors on failed unlink can be a prelude to a bigger problem, so log it locally ourselves even if the tracker will notice it. This commit was tested manually by setting up cmogstored to point to a read-only mount point on my system and attempting a DELETE request on it. commit ee4a340bfb304f0270ef3704b09ba7faca6a3c1e Author: Eric Wong <e@80x24.org> Date: 2015-01-15 07:48:19 +0000 doc/design.txt: add a few more notes on compromises In case I forget, writing this down while my mind is on the subject for other projects. commit 047b0c13e91fe755fe165defc9de3ad0d8843330 Author: Eric Wong <e@80x24.org> Date: 2014-11-02 09:08:40 +0000 update copyrights to 2014 (and all contributors) In the future, we can use the update-copyright tool from gnulib: git ls-files | UPDATE_COPYRIGHT_HOLDER='all contributors' \ UPDATE_COPYRIGHT_USE_INTERVALS=2 \ xargs /path/to/gnulib/build-aux/update-copyright This project (nor any project I manage) has or ever will have have copyright assignment. All contributors retain copyrights to their contributions. commit f7341063e774a032a679ff2ec69aae3bd8c40281 Author: Eric Wong <e@80x24.org> Date: 2014-11-02 08:49:47 +0000 manpage: update contact and copyright information I'll continue accepting email to my private address, but public email is preferred as it is easier for others to find messages well as making it easier to credit bug reporters. commit d5be3d489822cf72fcc491a25b8f1af313435ffc Author: Eric Wong <e@80x24.org> Date: 2014-09-20 10:18:33 +0000 Makefile.am: gzip README and associated data Speeds up site loading when combined with things like try_gzip_static in nginx. commit 3a39cbd6632a8f80af3d0ef082d5007e28826384 (tag: v1.4.1) Author: Eric Wong <e@80x24.org> Date: 2014-09-07 00:43:28 +0000 cmogstored 1.4.1 - bugfix for neon clients The PHP PECL MogileFS extension uses neon to handle WebDAV operations, and neon seems to send (valid but unfortunate) headers with empty string values. Thanks to Patrice Damezin at Skyrock.com for reporting this bug. There's also a few minor cleanups. The latest 2.6.34 stable kernel release no longer requires our EPOLL_CTL_MOD race workaround. There are also some test suite updates for future releases of Ruby. Bigger changes coming later this year... There's also a new public mailing list at: cmogstored-public@bogomips.org No subscription will ever be necessary to post. Subscription is optional via: cmogstored-public+subscribe@bogomips.org Archives are available at http://bogomips.org/cmogstored-public/ Eric Wong (11): minor cleanups for functions which do not return remove old fsck_queue declarations svc_dev: calling free does not need the lock test/mgmt: lengthen test for iostat watch test/http: plug race condition in FIFO test test/http_chunked_put: test for gigantic trailer update address to public mailing list Rakefile: remove freecode/freshmeat references Rakefile: shorten ChangeLog dump queue_epoll: disable buggy epoll workaround for 2.6.34.15+ http_common: correctly handle empty header values commit cc46352c76193a2f1732a1f64761eea8b7581e60 Author: Eric Wong <e@80x24.org> Date: 2014-09-03 17:27:06 +0000 http_common: correctly handle empty header values The PHP PECL MogileFS extension uses neon to handle WebDAV operations, and neon seems to send (valid but unfortunate) headers with empty string values: ref: http://svn.webdav.org/repos/projects/neon/trunk/src/ne_request.c else if (!sess->is_http11 && !sess->any_proxy_http) { ne_buffer_czappend(req->headers, "Keep-Alive: " EOL "Connection: TE, Keep-Alive" EOL); } else if (!req->session->is_http11 && !sess->any_proxy_http) { ne_buffer_czappend(req->headers, "Keep-Alive: " EOL "Proxy-Connection: Keep-Alive" EOL "Connection: TE" EOL); } Thanks to Patrice Damezin at Skyrock.com for reporting the issue. commit 5087825f3fd0ad59ce7afedaaaaa17d16196e1f6 Author: Eric Wong <e@80x24.org> Date: 2014-09-05 19:04:31 +0000 queue_epoll: disable buggy epoll workaround for 2.6.34.15+ commit 356ad39592cfcb537a512b2f88ed44380ae5cd78 ("epoll: prevent missed events on EPOLL_CTL_MOD") in the 2.6.34 stable tree commit e734132c6710451340e04a765fd08f60b6102771 Author: Eric Wong <e@80x24.org> Date: 2014-09-05 01:38:16 +0000 Rakefile: shorten ChangeLog dump We don't need ChangeLog info going back to 1.0.0 commit 69af3d363a06030466ceca9f7a13252eef0caa81 Author: Eric Wong <e@80x24.org> Date: 2014-09-04 23:48:49 +0000 Rakefile: remove freecode/freshmeat references The site is dead. commit 1b0b17910da9d3dea3d1f96083743545469db160 Author: Eric Wong <e@80x24.org> Date: 2014-09-05 00:33:50 +0000 update address to public mailing list Receiving bug reports via private email is awkward because I must ask reporters if they wish to be credited publically. This also allows users to help each other in case they're not subscribed to the MogileFS list (which requires subscription). So the new public mail address is at: cmogstored-public@bogomips.org No subscription will ever be required to post. HTML email is considered spam and blocked. There's now a public mailing list for reporting issues with git clone-able archives (via ssoma[1]) at: git://bogomips.org/cmogstored-public [1] http://soma.public-inbox.org/README commit 4fbe02062007d1ad073a550f5e37b599fc0019e4 Author: Eric Wong <e@80x24.org> Date: 2014-06-22 22:49:39 +0000 test/http_chunked_put: test for gigantic trailer This is a potential attack vector, and we seem to pass. commit 29bc0766942a92549774d0439d1a6362c53bc26c Author: Eric Wong <e@80x24.org> Date: 2014-09-03 07:10:04 +0000 test/http: plug race condition in FIFO test This is noticeable in the trunk version of ruby since r47288 ("io.c: do not swallow exceptions at end of block"). commit 9be3f68b9d8d86379339dc0e6852612061880e38 Author: Eric Wong <normalperson@yhbt.net> Date: 2014-05-23 08:51:13 +0000 test/mgmt: lengthen test for iostat watch The iostat may take a while to notice a new device, so let it run a bit. commit 446a21c9ac664f7456e2e4e739979baab8ba13c1 Author: Eric Wong <normalperson@yhbt.net> Date: 2014-05-23 10:21:32 +0000 svc_dev: calling free does not need the lock We do not need to be holding devstats_lock when releasing a local buffer which will never be used by another thread. commit c53cecda7106e4c7eb14d5c26e28bda82743771d Author: Eric Wong <e@80x24.org> Date: 2014-05-30 22:10:16 +0000 remove old fsck_queue declarations fsck_queues were replaced by generic ioq for all requests in 1.3, but the declarations here were forgotten. commit cd7b4cbacbc968bd4d7ed5fed9122f75d229793c Author: Eric Wong <e@80x24.org> Date: 2014-04-08 03:51:02 +0000 minor cleanups for functions which do not return pthread_exit and abort never returns, so quiet down some warnings when using -Wunreachable-code on clang. Unfortunately using -Wunreachable-code globally is too noisy due to 1) Ragel-generated code. 2) constant branch conditions for build-time options (trace/cork) commit 3f56e841c3612a113cc5261b01552396cc24ea13 (tag: v1.4.0) Author: Eric Wong <normalperson@yhbt.net> Date: 2014-02-22 01:12:55 +0000 cmogstored 1.4.0 bsd_sendfile is now supported on Debian GNU/kFreeBSD systems. This release also fixes a compability bug with Perl mogstored config files where "daemonize = (0|1)" was not supported properly. Eric Wong (3): check for sys/sendfile.h header instead of __linux__ allow bsd_sendfile with freebsd-glue on Debian/kFreeBSD support "daemonize = 0|1" in the config file commit af3e5766523110f50cfb5bcdaba82f700d7d7807 Author: Eric Wong <e@80x24.org> Date: 2014-02-21 22:47:15 +0000 support "daemonize = 0|1" in the config file This is expected by Perl mogstored, and our previous support of "daemonize" (standalone) was in error (but still supported for now). commit 35782a1facdc61ae007086657689cf289c96dd92 Author: Eric Wong <e@80x24.org> Date: 2014-02-21 17:05:55 -0500 allow bsd_sendfile with freebsd-glue on Debian/kFreeBSD Debian GNU/kFreeBSD users may ./configure with LIBS=-lfreebsd-glue to use the FreeBSD sendfile syscall. commit 3d96736835c69b3de698bd3cc9ed12bab1da8d73 Author: Eric Wong <e@80x24.org> Date: 2014-02-17 16:49:57 -0500 check for sys/sendfile.h header instead of __linux__ Non-Linux OSes may eventually gain a Linux-compatible sendfile. commit 6eaf13539681dd1d6725021112dc43b69ae2be4d (tag: v1.3.3) Author: Eric Wong <normalperson@yhbt.net> Date: 2014-02-09 04:23:40 +0000 cmogstored 1.3.3 - Debian GNU/kFreeBSD fixes This release fixes build problems with Debian GNU/kFreeBSD support (turns out it's been broken for over a year and nobody noticed :x). There are also build system upgrades for automake 1.14 and test case cleanups, but no changes to any of the core code. No changes nor need to upgrade if you're on anything other than Debian GNU/kFreeBSD. commit 4d55ec4f7aecfe7a127647f82175d92732879917 Author: Eric Wong <normalperson@yhbt.net> Date: 2014-02-09 03:56:44 +0000 m4/gnulib-cache: update for 2014 commit 7a55cf1b1529f487f39d7916b5d3c8188af5eccf Author: Eric Wong <e@80x24.org> Date: 2014-02-08 22:53:30 -0500 test/upgrade: cleanup and robustness improvements Avoid calling top-level methods inside other tests in case some versions of test-unit or minitest can call setup/teardown twice. Avoid Timeout, as it is expensive and unnecessary in some cases. commit 6b4cbed9de98b0988692f7855871034cb5f2bb3f Author: Eric Wong <e@80x24.org> Date: 2014-02-08 22:50:54 -0500 Makefile.am: updates for automake 1.14.1 Tested with automake 1:1.14.1-2 on Debian GNU/kFreeBSD commit 6b974dc9cb48e6af8e4ea9410141168208e7ca06 Author: Eric Wong <e@80x24.org> Date: 2014-02-08 20:49:12 -0500 tests: skip iostat-dependent tests Debian GNU/kFreeBSD still does not have iostat :< commit fd6722ac69f72bc4783675f055ab567a9902c713 Author: Eric Wong <normalperson@yhbt.net> Date: 2014-02-08 01:14:19 +0000 Makefile: do not clobber NOSTD_CFLAGS from configure This was breaking the Debian kFreeBSD build commit d6147a83867fb41eabdfdde6d71a23d0e1de5f71 Author: Eric Wong <normalperson@yhbt.net> Date: 2014-02-04 22:42:02 +0000 doc/queues.txt: add a note about our non-use of AIO It was obvious to me to use pthreads up front, hopefully that's explained to others, too. commit 4e48663f6b07954fbcfc34339f44c9f487d9b4c8 (tag: v1.3.2) Author: Eric Wong <normalperson@yhbt.net> Date: 2013-12-10 20:17:38 +0000 cmogstored 1.3.2 - FreeBSD shutdown speedup This release speeds up graceful shutdown on busy systems such as FreeBSD. There is also a minor resource savings for users of the undocumented --worker-processes switch. There are also some minor memory error fixes for test cases (which did not affect the daemon itself). Upgrading is optional unless you are affected by these fixes. Note: GNU/Linux users are encouraged to read the manpage update regarding glibc malloc arenas Eric Wong (9): selfwake: do share pipe descriptors with workers test/chunk-parser-1: fix uninitialized file structures test: fix valgrind warnings in test-only C code doc: refer to malloc-related environment variables thrpool: sleep instead of yield when poking thread test/mgmt-usage: relax regexp for ZFS m4/.gitignore: bump for newer gnulib doc: fix wording in manpage doc: fix link to MogileFS homepage commit 3420cb228a3aa09b453d7464ef0c7ab4b6a1d0db Author: Eric Wong <normalperson@yhbt.net> Date: 2013-12-10 22:09:31 +0000 doc: fix link to MogileFS homepage mogilefs.org is the correct domain commit 9ed6f8849d238286b37d8a2f82207d1a9c900b73 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-12-10 22:08:00 +0000 doc: fix wording in manpage commit 0aeb2fa7696474c7c578f9fa6d948f4e27b88bb3 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-12-09 11:21:17 +0000 m4/.gitignore: bump for newer gnulib Now at gnulib commit 43593319b31e6b0175b8eec4433bac744959822d ("md5, sha1, sha256, sha512: add gl_SET_CRYPTO_CHECK_DEFAULT") commit 3433a482c97913aaddccf83a224cb9cff819d340 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-12-09 10:54:11 +0000 test/mgmt-usage: relax regexp for ZFS ZFS device mount points do not start with a leading '/'. We already account for this in our internal mountpoint handling, but did not account for this in the test case. Reported-by: Mikolaj Golub commit f5328d433c588e26a7763266208fe3460ef7ee99 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-12-09 10:50:45 +0000 thrpool: sleep instead of yield when poking thread This unfortunate loop burned too much CPU on FreeBSD and caused shutdown to take too long when using sched_yield. nanosleep for 10ms instead, hopefully allowing the system to accomplish some disk I/O and other tasks before we poke it again. Reported-by: Mikolaj Golub commit fe587418ea7a71f34e5a0f49eb20148e82b9c389 Author: Eric Wong <e@yhbt.net> Date: 2013-12-02 22:23:07 +0000 doc: refer to malloc-related environment variables Using non-portable mallopt/mallctl functions is not feasible because detecting them correctly at _link_ time is not easy. Detecting them at compile time is insufficient because malloc implementations can be swapped at link time (and even with LD_PRELOAD, unfortunately). commit ce5cce161d504df849a50ee1080db42a66ca8c42 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-12-01 10:53:56 +0000 test: fix valgrind warnings in test-only C code Unfortunately, none of the C-only tests are run with valgrind (however all of the Ruby ones are). commit 2410738dcf00cda49c9f1d5847289f6a48944c2a Author: Eric Wong <normalperson@yhbt.net> Date: 2013-12-01 10:50:19 +0000 test/chunk-parser-1: fix uninitialized file structures This test failed when during the test on FreeBSD 11.0-CURRENT with MALLOC_DEBUG enabled or if MALLOC_OPTIONS=J is set in the environment. Reported-by: Mikolaj Golub commit 1a4a94f338dbe641a3f1b27a080fc34bac7f43d4 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-12-02 04:28:54 +0000 selfwake: do share pipe descriptors with workers This only affects users of the undocumented --worker-processes switch. Furthermore, this only affects non-Linux platforms which rely on the pipe implementation of selfwake. This prevents us from wasting one extraneous file descriptor slot (and hence potentially wasting 128 bytes in userland). commit b7bda87ead4a53bb792dbbfb6079aad8cd4170de (tag: v1.3.1) Author: Eric Wong <normalperson@yhbt.net> Date: 2013-10-12 21:44:08 +0000 cmogstored 1.3.1 - fix for an undocumented feature This release fixes a bug which only affects users of the undocumented multi-process configuration feature (which is also multi-threaded). * avoid use-after-free with multi-process setups readdir on the same DIR pointer is undefined if DIR was inherited by multiple children. Using the reentrant readdir_r would not have helped, since the underlying file descriptor and kernel file handle were still shared (and we need rewinddir, too). This readdir usage bug existed in cmogstored since the earliest releases, but was harmless until the cmogstored 1.3 series. This misuse of readdir lead to hitting a leftover call to free(). So this bug only manifested since commit 1fab1e7a7f03f3bc0abb1b5181117f2d4605ce3b (svc: implement top-level by_mog_devid hash) Fortunately, these bugs only affect users of the undocumented multi-process feature (not just multi-threaded). commit e8217a1fe0cf341b7219a426f23e02cb44281301 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-10-12 07:00:58 +0000 avoid use-after-free with multi-process setups readdir on the same DIR pointer is undefined if DIR was inherited by multiple children. Using the reentrant readdir_r would not have helped, since the underlying file descriptor and kernel file handle were still shared (and we need rewinddir, too). This readdir usage bug existed in cmogstored since the earliest releases, but was harmless until the cmogstored 1.3 series. This misuse of readdir lead to hitting a leftover call to free(). So this bug only manifested since commit 1fab1e7a7f03f3bc0abb1b5181117f2d4605ce3b (svc: implement top-level by_mog_devid hash) Fortunately, these bugs only affect users of the undocumented multi-process feature (not just multi-threaded). commit a4126a4bef3708c6f3b63f8a8877a3ce2213470b (tag: v1.3.0) Author: Eric Wong <normalperson@yhbt.net> Date: 2013-09-30 08:44:15 +0000 cmogstored 1.3.0 - many improvements There are no changes from 1.3.0rc2. For the most part, cmogstored 1.2.2 works well, but 1.3 contains some fairly major changes and improvements. cmogstored CPU usage may be higher than other servers because it's designed to use whatever resources it has at its disposal to distribute load to different storage devices. cmogstored 1.3 continues this, but it should be safer to lower thread counts without hurting performance too much for non-dedicated servers. cmogstored 1.3 contains improvements for storage hosts at the extremes ends of the performance scale. For large machines with many cores, memory/thread usage is reduced because we had too many acceptor threads. There are more improvements for smaller machines, especially those with slow/imbalanced drive speeds and few CPUs. Some of the improvements came from my testing with ancient single-core machines, others came from testing on 24-core machines :) Major features in 1.3: ioq - a I/O queues for all MogileFS requests -------------------------------------------- The new I/O queue (ioq) implements the equivalent of AIO channels functionality from Perlbal/mogstored. This feature prevents a failing/overloaded disk from monopolizing all the threads in the system. Since cmogstored uses threads directly (and not AIO), the common (uncontended case) behaves like a successful sem_wait with POSIX semaphores. Queueing+rescheduling only occurs in the contended case (unlike with AIO-style APIs, where request are always queued). I experimented with, but did not use POSIX semaphores as contention would still starve the thread pool. Unlike the old fsck_queue, ioq is based on the MogileFS devid in the URL and not the st_dev ID of the actual underlying file. This is less correct from a systems perspective, but should make no difference for normal production deployments (which are expected to use one MogileFS devid for each st_dev ID) and has several advantages: 1) testing/mock deploys of this feature with mock deploys is easier 2) we do not require any additional filesystem syscall (open/*stat) to look up the ioq based on st_dev, so we can use ioq to avoid stalls from slow open/openat/stat/fstatat/unlink/unlinkat syscalls. Otherwise, the implementation of this very closely resembles the old fsck queue implementation, but is generic across HTTP and sidechannel clients. The existing fsck queue functionality is now implemented using ioq. Thus, fsck queue functionality is mapped by the MogileFS devid and not the system st_dev ID as a result of this change. One benefit of this feature is the ability to run fewer aio_threads safely without worrying about cross-device contention on machines with limited resources or few disks (or not solely dedicated to MogileFS storage). The capacity of these I/O queues is automatically scaled to the number of available aio_threads, so they can change dynamically while your admin is tuning "SERVER aio_threads = XX" However, on a dedicated storage node, running many aio_threads (as is the default) should still be beneficial. Having more threads can keep the internal I/O queues of the kernel and storage hardware more populated and can improve throughput. thread shutdown fixes (epoll) ----------------------------- Our previous reliance on pthreads cancellation primitives left us open to a small race condition where I/O events (from epoll) could be lost during graceful shutdown or thread reduction via "SERVER aio_threads = XX". We no longer rely on pthreads cancellation for stopping threads and instead implement explicit check points for epoll. This did not affect kqueue users, but the code is simpler and more consistent across epoll/kqueue implementations. Graceful shutdown improvements ------------------------------ The addition of our I/O queueing and use of our custom thread shutdown API also allowed us to improve the responsiveness and fairness when the process enters graceful shutdown mode. This improves fairness and avoids client-side timeouts when large PUT requests are being issued over a fast network to slow disks during graceful shutdown. Currently, graceful shutdown remains single-threaded, but we will likely become multi-threaded in the future (like normal runtime). Miscellaneous fixes and improvements ------------------------------------ Further improved matching for (Linux) device-mapper setups where the same device (not symlinks) appears multiple times in /dev aio_threads count is automatically updated when new devices are added/removed. This is currently synced to MOG_DISK_USAGE_INTERVAL, but will use inotify (or the kqueue equivalent) in the future. HTTP read buffers grow monotonically (up to 64K) and always use aligned memory. This allows deployments which pass large HTTP headers do not trigger unnecessary reallocations. Deployments which use small HTTP headers should notice no memory increase. Acceptor threads are now limited to two per process instead of being scaled to CPU count. This avoids excessive threads/memory usage and contention of kernel-level mutexes for large multi-core machines. The gnulib version used for building the tarball is now included in the tarball for ease-of-reproducibility. Additional tests for uncommon error conditions using the fault-injection capabilities of GNU ld. The "shutdown" command over the sidechannel is more responsive for epoll users. Improved reporting of failed requests during PUT requests. Again, I run MogileFS instances on some of the most horrible networks on the planet[2] fix LIB_CLOCK_GETTIME linkage on some toolchains. "SERVER mogstored.persist_client = (0|1)" over the sidechannel is supported for compatibility with Perlbal/mogstored The Status: header is no longer returned on HTTP responses. All known MogileFS clients parse the HTTP status response correctly without the need for the Status: header. Neither Perlbal nor nginx set the Status: header on responses, so this is unlikely to introduce incompatibilities. The Status: header was originally inherited from HTTP servers which had to deal with a much larger range of (non-compliant) clients. commit 97a39a02481dc24582aa7317d8d94c21d753d040 (tag: v1.3.0rc2) Author: Eric Wong <normalperson@yhbt.net> Date: 2013-09-03 07:31:48 +0000 cmogstored 1.3.0rc2 - fixes since rc1, systemtap The Status: header is no longer returned on HTTP responses. All known MogileFS clients parse the HTTP status response correctly without the need for the Status: header. Neither Perlbal nor nginx set the Status: header on responses, so this is unlikely to introduce incompatibilities. The Status: header was originally inherited from HTTP servers which had to deal with a much larger range of (non-compliant) clients. SystemTap support is mostly fleshed out. There are some bundled awk scripts which should make better sense of the all.stp which logs just about everything. Raising aio_threads now correctly increases ioq capacity. This regression was only introduced in the 1.3.0 rc series, as ioq was not in 1.2.x. commit 82fe4d7dfad38e210bb86d2989e9436c267dd81a Author: Eric Wong <normalperson@yhbt.net> Date: 2013-09-03 08:49:49 +0000 Makefile: update for systemtap support files commit 3a9a1c5cada0630c499fcf42dfb5b38d11694844 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-08-31 03:14:40 +0000 ioq: correctly reenqueue blocked mfds on capacity increase Otherwise, reenqueue-ing only one mfd at-a-time is pointless and prevents cmogstored from utilizing new threads. commit 3d55af133e1da342a7eb52c3dc099daf4ed6acf6 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-08-31 00:58:47 +0000 ioq: avoid over-yielding on and after ioq contention We do not need to set the contended flag again until we're certain we have no free slots in the ioq, not when we assume the client is the last one to take a slot. This is because ioq access itself is serialized, and the last client taking the ioq could be getting a false positive when another thread is waiting on ioq->mtx to release the ioq. This prevents throughput loss while recovering from a situation where an ioq is oversubscribed. This is reproduced under heavy load and switching temporarily to "SERVER aio_threads = 1" and then bringing aio_threads back up to a high value. commit 2b7a572ddd9bcce063e3cd10851fd953f525fe24 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-08-29 19:36:40 +0000 m4/systemtap.m4: quote cm_cv_sdt_h_usable var The variable may not be defined at all, so it must be quoted to avoid spewing a warning of dtrace/stap are not found. commit 723a81a0e25ff07c2e6dd9dbd6bf838f6bee7411 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-08-27 01:19:53 +0000 tapset/*awk: document these scripts Otherwise I will forget what they output one day and will have to read the code again. commit dc35288ce1b6e05e74040aa9e8af1166cfa92bd8 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-08-27 00:37:03 +0000 TODO: remove item for systemtap/dtrace systemtap support is implemented, and hopefully dtrace works, too. commit 37a5071021601480384c2abe20f2d33ad974579d Author: Eric Wong <normalperson@yhbt.net> Date: 2013-08-07 20:03:34 +0000 flesh out systemtap support and awk helpers Our "all.stp" tapset now generates awk-friendly output for feeding some sample awk scripts. Using awk (and gawk) was necessary to avoid reimplementing strftime in guru mode for generating CLF (Common Log Format) HTTP access logs. Using awk also gives us several advantages: * floating point number support (for time differences) * a more familiar language to systems administrators (given this is for MogileFS, perhaps Perl would be even more familiar...). * fast edit/run cycle, so the slowness of using stap to rebuild/reload the kernel module for all.stp changes can be avoided when output must be customized. commit fe1e1200c1541676e6b8402b7972a16105a76a63 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-08-22 23:11:49 +0000 http: remove Status: header from all responses This was inherited from a server which needed to deal with some broken clients, MogileFS does not have this problem. Neither Perlbal nor nginx set this response header, either, so lets save ourselves a few bytes. commit 1199492dd1adb394cf4cc0d599e7f77c52ccbdbf Author: Eric Wong <e@yhbt.net> Date: 2013-07-31 20:26:25 +0000 trywrite: workaround potential inf loops from kernel bugs While we're fortunate enough to not have encountered a case where send/writev returns zero with a non-zero-length buffer, it's not inconceivable that it could strike us one day. In that case, error out the connection instead of infinite looping. Dropping a connection is safer than letting a thread run in an infinite loop. commit 317b979e29774a77fb933c4f42514ff007669b39 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-26 06:32:49 +0000 test/mgmt: warn about slow mount points on test failure Unfortunately, slow mount points still cause minor reliability issues with the test suite. commit 596dbef8b4b23657fd78dca4bc55e261c3f6b376 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-26 06:12:34 +0000 test/mgmt: increase reliability of max devid test This seems to fail more under heavy load, so wait a bit longer for iostat to become aware of the new devices. commit c49cf315dadbf1cfe2f5e80c1f3c1ae27ad0761e Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-19 02:17:18 +0000 move trace.h include to global cmogstored.h We'll have tracing everywhere, so it's too much maintenance overhead to add it to every file which wants it. Increased build-times are a problem, but less than the maintenance overhead of finding the right headers. commit 939abdfed71349df87712559553593dc95f406c5 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-19 00:29:47 +0000 tapset: rename http_request.stp -> all.stp This tapset will contain every probe point and acts as a check/documentation for extracting useful probes. commit 313a04bd35534a6cd024149d9f2c9b9487f08165 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-19 00:15:31 +0000 split out {mgmt,http}_parse_continue checks Incomplete request headers are uncommon, so if we see them, something is probably off or strange. This should make it easier to maintain probe points to watch for this behavior. commit 00d234c6f9362c11938f3b67c03bf208c7638eca Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-18 23:58:23 +0000 probes: add probes for rbuf growth Growing the rbufs should be uncommon, but it should set off alarms if it happens too often. commit 4c6a7474a281451b1ef57f686b9b21cbb8216b0d Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-18 20:21:43 +0000 test/mgmt: cover the large rbuf growth case mgmt may now encounter large rbufs, so ensure that uncommon case is tested. commit 6d2642bb1a42840e809e7a73896a1631d37b15e6 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-18 00:25:47 +0000 split out {http,mgmt}_rbuf_grow functions This should allow easier tracing of rbuf growth, and should hopefully make the code more explicit and harder to screw up. commit 48bbaf84da51644451a3dc0c1254d51c035ccce0 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-17 18:41:48 +0000 ioq: add probes tracing and documentation ioq tracing will allow users to notice when devices are saturated (from a cmogstored POV) and increase aio_threads if necessary. commit f8c655bbb3b733a10c6aab9c71246e94652c6cc9 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-16 10:19:29 +0000 tapset/http_request: log listen address and PID of connection It is helpful to know the address of the listener on the server which accepted the client socket. Additionally, the PID,FD combination should be be safely unique for any point in time. commit 7c49988ebf5c176cadd4a9e287e443d49a2cdeec Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-17 01:39:24 +0000 document ioq and mog_{mgmt,http}_drop interaction safety I needed to spend time to convince myself this was safe, so leave a note to others (and future self) in case there is cause for concern. Basically, this is highly dependent on our overall one-shot-based concurrency model and safe as long as basic rules are followed. commit 2869d2bf7a24a0b42bde738589221def0289ce54 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-16 23:05:43 +0000 queue_epoll: EPOLL_CTL_MOD should be safe on 2.6.32.61+ Willy Tarreau cherry-picked the relevant fix into 2.6.32 longterm stable tree ref: commit 1c137a47bbdd6e86298627e04f547afd7f35d523 git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git commit 800bb2057ce8559eede740816be06cf60d959f39 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-14 08:23:52 +0000 alloc: remove mog_rbuf_free_and_null This function is no longer used as we now attempt to reattach rbufs to the TLS space of each thread. commit 4edbdd6ba3686a60a8ddeed8f6f26e55abf0b207 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-14 07:26:36 +0000 downgrade thread/device-count fields to unsigned int It's unlikely we'll even come close to see 2-4 billion devices in a MogileFS instance for a while. Meanwhile, it's also unlikely the kernel will ever run that many threads, either. So make it easier to pack and shrink data structures to save a few bytes and perhaps get better memory alignement. For reference, the POSIX semaphore API specifies initial values with unsigned (int) values, too. This leads to a minor size reduction (and we're not even packing): $ ~/linux/scripts/bloat-o-meter cmogstored.before cmogstored add/remove: 0/0 grow/shrink: 0/13 up/down: 0/-86 (-86) function old new delta mog_svc_dev_quit_prepare 13 12 -1 mog_mgmt_fn_aio_threads 147 146 -1 mog_dev_user_rescale_i 27 26 -1 mog_ioq_requeue_prepare 52 50 -2 mog_ioq_init 80 78 -2 mog_thrpool_start 101 96 -5 mog_svc_dev_user_rescale 143 137 -6 mog_svc_start_each 264 256 -8 mog_svc_aio_threads_handler 257 249 -8 mog_ioq_ready 263 255 -8 mog_ioq_next 303 295 -8 mog_svc_thrpool_rescale 206 197 -9 mog_thrpool_set_size 1028 1001 -27 commit e46c221c47e3cd00edfcae199146cb2f50b9b63f (tag: v1.3.0rc1) Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-14 00:52:14 +0000 cmogstored 1.3.0rc1 For the most part, cmogstored 1.2.2 works well, but 1.3 contains some fairly major changes and improvements. cmogstored CPU usage may be higher than other servers because it's designed to use whatever resources it has at its disposal to distribute load to different storage devices. cmogstored 1.3 will continue this, but it should be safer to lower thread counts without hurting performance too much for non-dedicated servers. Unfortunately, the minor, Linux-only bug affecting 1.2.2 for (uncommon) thread shutdowns required some fairly intrusive changes to fix, so I'm not sure if releasing a 1.2.3 is worth it. If you're happy with 1.2.x, I recommend marking the host down via mogadm before lowering "SERVER aio_threads = XX" or sending SIGQUIT to cmogstored. But I think thread shutdown is uncommon enough to not affect normal deployments. cmogstored 1.3 will contain improvements for storage hosts at the extremes ends of the performance scale. For large machines with many cores, memory/thread usage is reduced because we had too many acceptor threads. There are more improvements for smaller machines, especially those with slow/imbalanced drive speeds and few CPUs. Some of the improvements came from my testing with ancient single-core machines, others came from testing on 24-core machines :) The SystemTap tracing work is still in-progress (although the 1.3 cycle was originally intended to focus on this :x). I expect the remaining changes to be non-intrusive and will work on them through the RC cycle. Major features in 1.3: ioq - a I/O queues for all MogileFS requests -------------------------------------------- The new I/O queue (ioq) implements the equivalent of AIO channels functionality from Perlbal/mogstored. This feature prevents a failing/overloaded disk from monopolizing all the threads in the system. Since cmogstored uses threads directly (and not AIO), the common (uncontended case) behaves like a successful sem_wait with POSIX semaphores. Queueing+rescheduling only occurs in the contended case (unlike with AIO-style APIs, where request are always queued). I experimented with, but did not use POSIX semaphores as contention would still starve the thread pool. Unlike the old fsck_queue, ioq is based on the MogileFS devid in the URL and not the st_dev ID of the actual underlying file. This is less correct from a systems perspective, but should make no difference for normal production deployments (which are expected to use one MogileFS devid for each st_dev ID) and has several advantages: 1) testing/mock deploys of this feature with mock deploys is easier 2) we do not require any additional filesystem syscall (open/*stat) to look up the ioq based on st_dev, so we can use ioq to avoid stalls from slow open/openat/stat/fstatat/unlink/unlinkat syscalls. Otherwise, the implementation of this very closely resembles the old fsck queue implementation, but is generic across HTTP and sidechannel clients. The existing fsck queue functionality is now implemented using ioq. Thus, fsck queue functionality is mapped by the MogileFS devid and not the system st_dev ID as a result of this change. One benefit of this feature is the ability to run fewer aio_threads safely without worrying about cross-device contention on machines with limited resources or few disks (or not solely dedicated to MogileFS storage). The capacity of these I/O queues is automatically scaled to the number of available aio_threads, so they can change dynamically while your admin is tuning "SERVER aio_threads = XX" However, on a dedicated storage node, running many aio_threads (as is the default) should still be beneficial. Having more threads can keep the internal I/O queues of the kernel and storage hardware more populated and can improve throughput. thread shutdown fixes (epoll) ----------------------------- Our previous reliance on pthreads cancellation primitives left us open to a small race condition where I/O events (from epoll) could be lost during graceful shutdown or thread reduction via "SERVER aio_threads = XX". We no longer rely on pthreads cancellation for stopping threads and instead implement explicit check points for epoll. This did not affect kqueue users, but the code is simpler and more consistent across epoll/kqueue implementations. Graceful shutdown improvements ------------------------------ The addition of our I/O queueing and use of our custom thread shutdown API also allowed us to improve the responsiveness and fairness when the process enters graceful shutdown mode. This improves fairness and avoids client-side timeouts when large PUT requests are being issued over a fast network to slow disks during graceful shutdown. Currently, graceful shutdown remains single-threaded, but we will likely become multi-threaded in the future (like normal runtime). Miscellaneous fixes and improvements ------------------------------------ Further improved matching for (Linux) device-mapper setups where the same device (not symlinks) appears multiple times in /dev aio_threads count is automatically updated when new devices are added/removed. This is currently synced to MOG_DISK_USAGE_INTERVAL, but will use inotify (or the kqueue equivalent) in the future. HTTP read buffers grow monotonically (up to 64K) and always use aligned memory. This allows deployments which pass large HTTP headers do not trigger unnecessary reallocations. Deployments which use small HTTP headers should notice no memory increase. Acceptor threads are now limited to two per process instead of being scaled to CPU count. This avoids excessive threads/memory usage and contention of kernel-level mutexes for large multi-core machines. The gnulib version used for building the tarball is now included in the tarball for ease-of-reproducibility. Additional tests for uncommon error conditions using the fault-injection capabilities of GNU ld. The "shutdown" command over the sidechannel is more responsive for epoll users. Improved reporting of failed requests during PUT requests. Again, I run MogileFS instances on some of the most horrible networks on the planet[2] fix LIB_CLOCK_GETTIME linkage on some toolchains. "SERVER mogstored.persist_client = (0|1)" over the sidechannel is supported for compatibility with Perlbal/mogstored commit 12049de467b52f1c8e4e16b53cb10182d06c6a51 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-14 02:31:41 +0000 m4/systemtap: require stap for enabling systemtap build Only relying on dtrace leads to build problems on FreeBSD which I haven't had a chance to fix. commit 8f9b7e28eaf74e5fdc72328f0dfb890d92c02ec1 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-14 00:46:10 +0000 ioq: reset internal queues during requeue/shutdown This should avoid concurrency bugs where client may run in multiple threads if we switch to multi-threaded graceful shutdown. commit b773c55485a7a50904493a0cdc8dd22da9bbfdee Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-13 19:50:14 +0000 test/pwrite_wrap: disable test under valgrind for now This test is too slow and timing-sensitive under valgrind, so disable it for now until we have a better solution. commit f3ff911f3cfeb6af3e32513c4301be389a936d76 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-13 10:32:46 +0000 ioq: set contended flag if we are the last one acquiring the lock We could be completely out of threads upon acquiring an ioq, so the last thread to acquire a lock slot must trigger a yield soon to avoid starvation and fairness issues. Otherwise, all threads for a given device could remained pinned indefinitely. commit 6333dc06a23a80690f60f3659428df88bd19d736 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-13 10:31:22 +0000 test/mgmt_persist_client: teardown running processes Tests need to cleanup by stopping running processes. commit 5b1c49b1cb6c719eb098beae3823cf63d116d8ed Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-13 02:26:17 +0000 pass mog_accept instead of mog_svc to post-accept callbacks This allows us to capture/trace the listen address which accepted the request without consuming additional stack space. commit 5c65fa6a053691ffee983b61298f3863b660b408 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-13 02:10:51 +0000 set addrinfo field for "struct mog_accept" This will allow us to properly report the listen address the client connected to. commit 22a718de33fef78bab33bc00e52cd230c22e1945 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-13 00:08:00 +0000 http: pass "struct mog_fd *" more consistently in API This makes it easier to write tapsets which key objects by: PID,FD for uniqueness. This also avoids some mog_fd_of() calls. commit ec096dc8de3d37f4e33e7bc47bcfbe5207ae6855 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-12 23:22:29 +0000 m4/ld_wrap: avoid compiler warning for missing declaration This avoids noise in config.log commit 5666c4496facb4ad7cfa073cf1d6d849784e06b8 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-13 06:41:59 +0000 iostat: keep update prefix on stack instead of heap The update prefix is bounded in size, so this will save us NR_DEVICES malloc/free pairs each second from typical iostat output. commit fe57de9a8b6b9a6f4f840ab5a2ca17c8f803ce20 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-12 22:21:56 +0000 mgmt_fn: minor cleanup for emitting blank response No need to recreate mog_mgmt_fn_blank for sending blank responses. commit 249c82c4080c7adb08c32ebcd6cd74ffec5acd18 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-12 20:52:39 +0000 test/http: disable time-dependent test under valgrind test_head_response_time does not test anything which would not be otherwise tested by other tests under valgrind. This test is only needed for occasional validation of fuckups regarding TCP_NOPUSH on FreeBSD, and not necessary for general use. commit 4244fd63ef360a1b5a201d82e323c54842f0db55 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-12 20:49:46 +0000 http: check persist_client state when parsing starts We don't want drop in-flight pipelined requests when disabling persistent connections. Disabling persistent connections will always be potentially racy, but hopefully this makes the race small enough that lower-level latencies are the only thing which affect that. commit 86c7628b01130559c53dffe1d799f2031a020918 (persist_client) Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-12 02:56:41 +0000 http: signal connection close during shutdown While we always properly disconnected clients during shutdown, we explicitly set "Connection: close" now to inform clients of our pending shutdown. This avoids potentially confusing clients when we disconnect them as there may still be a race condition where we shut down a client while their request packets are in-flight. commit 0e5d6c6f4b28a75853d1020f07e493632031a054 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-12 02:53:17 +0000 mgmt: support "SET mogstored.persist_client = $BOOL" This is Perlbal functionality which works in Perl mogstored, so we will also support it here, as it makes upgrading to new versions easier. commit 1c9fe8380f14e2b67bed99d16ef465db8d379b41 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-12 00:54:57 +0000 svc: increase responsiveness of graceful shutdown By reducing the capacity of each ioq, we force each running worker thread to yield the current client and hit an exit point (epoll_wait/kqueue) sooner. commit 56d4a65df3fc011086648563b2235eac49b7ba60 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-12 00:42:55 +0000 test/mgmt: increase reliability on overloaded systems Without this, test_iostat_watch fails sometimes under valgrind. commit f206fc4ee27546c57ebc6b4bf069257c05970cd2 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-12 00:22:29 +0000 tests: introduce pwrite-wrap test for slow I/O pwrite can be a slow, blocking function on an overloaded system, but a slow pwrite requires a wrapper to simulate. This allows us to have coverage of the: if (mog_ioq_contended()) return MOG_NEXT_WAIT_RD; cases in http_put.c commit e50365f275ada4afcd5f25f2ac3328e341a79d71 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-11 22:15:56 +0000 ioq: rescale to match user-set aio_threads values Users reducing or increasing thread counts should increase ioq capacity, otherwise there's no point in having more or less threads if they are synched to the ioq capacity. commit f83d0466afc32542f3f4ff962105c817a1be2c96 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-11 19:06:27 +0000 mgmt: checksumming is interruptible during thread shutdown We want to yield dying threads as soon as possible during thread shutdown, so we check the quit flag and yield the running thread to trigger a MOG_NEXT_ACTIVE. commit daab757f5e52ce36a47e2d713365d68367a0e6dd Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-11 08:57:02 +0000 ioq: introduce mog_ioq_contended hint This will allow us to detect I/O contention on our queue and yield the current thread to other clients for fairness. This can prevent a client from hogging the thread in situations where the network is much faster than the filesystem/disk. commit 9302d584dcf68489a9c4739a3a42a468323ccda6 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-10 07:55:29 +0000 struct mog_ni: document reasoning for the ':' in ni_serv This is somewhat strange, but makes the code base slightly easier to reuse for non-HTTP purposes. commit 9897d28bb57f2aa84f91b1a8594c7ecd30be8446 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-09 00:14:56 +0000 http: include IP:PORT in "client died" message This should hopefully make failures easier to track down. commit 2c24cf070dfc9341462fcba59fab4c6b7b330938 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-03 07:50:00 +0000 remove assertion for handling iostat death This only triggered if the (undocumented) --worker-processes option is used. This assertion is no longer valid as of commit d5a52618ca1f9b5d7f6998716fbfe7714f927112 (refactor handling of "server aio_threads = " command) commit b600fc854d2a813dc7cf08eb58590ada90db4c02 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-03 07:32:52 +0000 file: embed ioq in the opened mog_file object This allows us to avoid a redundant hash lookup every time we "activate" an open file for reading or writing. commit 013e903340a75b12523bd795d15fe5f23d725be9 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-29 03:14:54 +0000 ioq: implement and enable generic I/O queues This will allow us to limit concurrency on a per-device basis with limited impact on HTTP header reading/parsing. This prevents pathological slowness on a single device from bringing down an entire host. This also allows users to more safely run with fewer aio_threads (e.g. 1:1 thread:device mapping) on fast devices with smaller low-level (kernel/hardware) I/O queues. commit fef978104cf134dc6629115456b27dfa2856ded7 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-29 00:39:49 +0000 packaddr: simplify mog_sockaddr definition "struct sockaddr" turns out to be smaller than "struct sockaddr_in6", so we can avoid complicated casting and just add that to the union. We continue avoiding "struct sockaddr_storage", however, as it is unnecessarily large for our needs. commit 71849ca64134b0cfa197fc4b1ce8fc10c7fb5d98 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-27 22:01:52 +0000 test/mgmt: remove unused variable This was triggering warnings with Ruby 2.0.0-p195 commit 160e768fe8d6043af1e435daeb35d5c92e05de11 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-27 03:54:39 +0000 rbuf: reattach/reuse read buffers when possible Reattaching/reusing read buffers allows us to avoid repeated reallocation/growth/free when clients repeatedly send us large headers. This may also increase cache-hits by favoring recently-used buffers as long as fragmentation is kept in check. The fragmentation should be no worse that is currently, due to the existing detach nature of rbufs commit 331e7a1300ae59a052763ffecc77b45a56e2deb3 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-27 00:18:08 +0000 mgmt: remove restriction on large rbuf sizes We'll be allowing the migration of buffers between threads and from waiting clients back to thread-local storage. commit d9486d154f69be2bbe44dbc8ea74efce1d0195ad Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-26 20:18:34 +0000 alloc: cache-align all rbuf memory allocations Some setups use clients which pass large headers (User-Agent, or even cookies(!)) to cmogstored, so large rbufs may be used often and repeatedly in those cases. We limit rbuf sizes to 64K anyways, so keeping "larger" buffers around should not be much of an issue for modern systems. This prepares us for reusing/recycling large rbufs as TLS buffers. commit bb27afc702459d683a6b6ca5822b746142047acc Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-26 02:16:15 +0000 mgmt: handle disk-using requests outside of the parser This will allow us to use control flow similar to the http client handling code when we queue clients based on I/O channel. commit ad961733c0afb96a7ab44dc9837a0f8c8fa239a4 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-26 01:03:52 +0000 introduce generic I/O queue functionality This replaces the fsck_queue internals with a generic ioq implementation which is based on the MogileFS devid, and not the operating system devid. commit 70efa665edeef05f53978f9d541f411b0e1a2b2a Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-26 00:41:54 +0000 http: add assertion for unused wbuf We need to ensure we do not introduce code to launch http_process_client while we have buffered data (or socket write errors). commit c86b6a2c769c821a64fc14c62a953244b41cb190 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:20 +0000 dev: shrink and cache-align struct mog_dev We will have structures inside the dev struct accessed by multiple threads frequently, so keep it cache-aligned. To reduce memory usage for large-numbered devices, avoid storing the prefix on output and instead just rely on the printf-family of routines to generate stringified output in uncommon code paths. commit f56b866f92e195ffd24a2f8f80e8e2cef226c775 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-03 16:30:32 +0000 mgmt: fix case where rbuf->rsize may be uninitialized Detachers MUST set rsize properly. This API is unfortunately fragile and will eventually be fixed to be more difficult to misuse. commit 5027df50b5072d964f551414e259c2903778ea36 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-07-04 00:20:01 +0000 build: fix LIB_CLOCK_GETTIME linkage on some toolchains According to the m4/clock_gettime.m4 documentation (from gnulib), the LIB_CLOCK_GETTIME variable should be added to a *LDADD variable and not AM_LDFLAGS. This is also consistent with GNU automake documentation. Thanks to Cody Pisto for reporting this problem under Ubuntu 12.04 ref: http://www.gnu.org/software/automake/manual/html_node/Linking.html commit 212cca976056069d49b120ab196c25e76315a427 (good) Merge: cb6851f 93c14dd Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-25 22:43:48 +0000 Merge branch '1.2-stable' * 1.2-stable: cmogstored 1.2.2 - minor maintenance release INSTALL: update versions and URLs INSTALL: clarify between starting from tarball vs git test/cmogstored-cfg: ensure TMPDIR is absolute for valgrind iostat_parser: allow '-' for device names alloc: posix_memalign does not set errno commit cb6851fc69a3fb3d47e4e3a350787deef1bfafa6 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:39 +0000 tests: fault-injection test for ENOSPC on epoll_ctl For difficult-to-trigger errors, fault injection is necessary for testing our error handling. I have confirmed this test fails with "avoid leaks on epoll/kqueue resources exhaustion" reverted. commit c1ced9e91ddc647a40f343d20d43cf13fe88eeba Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:38 +0000 avoid leaks on epoll/kqueue resources exhaustion Simply releasing the descriptor triggering ENOSPC/ENOMEM errors from epoll_ctl and kevent is not good enough, as those descriptors may have other descriptors (e.g. files to be served) hanging off of them. commit e12e70b6bd242cb3fea74d1df8b7b44e0a9f7f26 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:36 +0000 introduce mog_yield wrapper around sched_yield/pthread_yield While pthread_yield is non-standard, it is relatively common and preferable for systems where pthreads are _not_ 1:1 mapped to kernel threads. This also provides a stronger yield to weaken the priority of the calling thread wherever we previously used sched_yield. commit a18a08a0e9a7c472656afc86cbbbfcefda5e456d Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:35 +0000 call sched_yield repeatedly when terminating threads This should allow the threads we're terminating to more quickly enter a safe state where they're allowed to exit. On SMP systems, we need to yield the signalling thread more times to increase the probability the interrupted thread can run (and exit). commit df9729555394542064d1c9e9d1b67446bf36d3f3 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:32 +0000 Makefile.am: fix systemtap probes.h distribution Our tests over-link (to save developer time :P), so we must link in probes with our tests. Also, we must keep probes.h around for distclean (but not maintainerclean) commit f159a33754215eac82b26912bce5592294f9a989 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:31 +0000 shrink mog_packaddr and improve portability We cannot assume sa_family_t is the first element of "struct sockaddr_in" or "struct sockaddr_in6". FreeBSD has a "sa_len" member as the first element while Linux does not. So only keep the parts of the "struct sockaddr*" we need and use inet_ntop instead of getnameinfo. This also gives us a little more space to add additional fields to "struct mog_http" in the future without increasing memory (or CPU cache) use. commit fe593c035d50efb5cee7ad10697172ee4072556d Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:30 +0000 dist: include newly-added files to the tarball Tarballs were otherwise unusable. commit 0b090760e82545b178cdb0b2d63bf03990fc0595 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:29 +0000 replace pthreads cancellation with explicit checks Due to data/event loss, we cannot rely on normal syscalls (accept/epoll_wait) being cancellation points. The benefits of using a standardized API to terminate threads asynchronously are lost when toggling cancellation flags. This implementation allows us to be more explicit and obvious at the few points where our worker threads may exit and reduces the amount of code we have. By avoiding the calls to pthread_setcancelstate, we should halve the number of atomic operations required in the common case (where the thread is not marked for termination). commit 328623972837345dbcf3ed372293201e3bc4fe3c Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:28 +0000 "server aio_threads = XX" no longer requires malloc This should prevent one class of "accidental" failures. (The sidechannel has never been meant to be secure and exposed to the public). commit 40f84cd0924958c619d434a9147e7ed2b6abaadc Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:27 +0000 fdmap: do not warn on ENOTCONN due to unavoidable race A client may disconnect at any time, so shutdown may fail harmlessly with ENOTCONN. commit 9f43d3eb8cf6a156108c714551a7eb68472e17a4 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:26 +0000 fix "shutdown" over sidechannel with epoll_pwait The "shutdown" command needs to trigger EINTR when using epoll_pwait, otherwise the sleeping thread may not wake up properly. commit 07569135228020880d8092d9aaf7d6325cc48d26 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:25 +0000 do not rely on normal syscalls as cancellation points Cancellation with epoll_wait, accept4 (and accept) may cause events to be lost, as cancellation relies on signals anyways in glibc/Linux. So instead, we use signaling ourselves and explicitly test for cancellation only if we know we are interrupted and in a state where a thread can safely be cancelled. ref: http://mid.gmane.org/CAE2sS1gxQkqmcywQ07pmgNHM+CyqzMkuASVjmWDL+hgaTMURWQ@mail.gmail.com commit ba8a3673a6ada7122c89e420455901b6b1288500 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:24 +0000 avoid needlessly reinitializing common sigset_t This should hopefully save a few cycles and reduce stack usage slightly. commit df50c675f127c876e8d74be522ddc858aa3795ef Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:22 +0000 svc: make thr_per_dev per-svc instead of global We could eventually make this a tunable parameter, as it could be advantageous over a global aio_threads value. commit d5a52618ca1f9b5d7f6998716fbfe7714f927112 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:21 +0000 refactor handling of "server aio_threads = " command We're using per-svc-based thread pools, so different MogileFS instances we serve no longer affect each other. This means changing the aio_threads count only affects the svc of the sidechannel port which triggered the change. commit 03c2391078e19dc36ea62c75fa6745569b5cbef6 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:18 +0000 define MOG_DEVID_MAX and MOG_PATH_MAX variables This improves maintainability in case MogileFS changest these limits. commit 9312bf345a9329137652f91c079a38931211faba Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:17 +0000 consistently check OOM from hash_initialize/hash_insert Both hash_initialize and hash_insert may return NULL to indicate allocation errors. So implement a mog_oom_if_null helper function to destroy the process instead of attempting to continue and dereferencing NULL pointers. This may affect configurations with limited memory and lacking overcommit; but is unlikely to trigger given the small memory footprint of cmogstored. commit 1fab1e7a7f03f3bc0abb1b5181117f2d4605ce3b Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:16 +0000 svc: implement top-level by_mog_devid hash This will allow us to lookup devices for per-(mog)device I/O queues. commit 6357381200266f4c3e5d8f93403de987db95143c Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:15 +0000 http_*: fixup long lines from automated conversion Lines longer than 80 columns aren't readable on my screen with gigantic fonts. commit 89f0cf089b9e68730948ce652b42efaf26b98fd2 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:14 +0000 parse out mogilefs devid in mgmt/http requests This will allow us to do lookups for IO queues/semaphores before we attempt to fstatat/stat a path. commit 2376ed3c3da3bd2c9e8326e7dd75be2188fffc35 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:13 +0000 fix devices/thread count if sidechannel is inactive If the mogstored sidechannel is inactive (in HTTP-only mode), we should still count the number of devices correctly to correctly scale the number of worker threads. commit e90b43119ff33fb591ffb3bc100cf847537ca5fb Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:12 +0000 switch to per-svc (per-docroot) queues This simplifies code, reduces contention, and reduces the chances of independent MogileFS instances (with one instance of cmogstored) stepping over each other. Most cmogstored deployments are single docroot (for a single instance of MogileFS), however cmogstored supports multiple docroots for some rare configurations and we support them here. commit 2acbe7f4001de74091282ee199e3cad50c2e3e7f Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:11 +0000 thrpool: add comment explaining minimum thread count I forgot why this bound was necessary, so add a comment ensuring I do not forget again. commit 10a38ab650e3e25e37dd70b310631760d0b2000f Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:10 +0000 limit acceptors to reduce contention on large machines Having too many acceptor threads does not help, as it leads to lock contention in the accept syscalls and the EPOLL_CTL_ADD paths. The fair FIFO ordering of _blocking_ accept/accept4 syscalls also means we trigger unnecessary task switching and incur cache misses under high load. Since it is almost impossible for the acceptor threads to be stuck on disk I/O since commit 832316624f7a8f44b3e1d78a8a7a62a399241840 ("acceptor threads push directly into event queue") commit 4d112de546a28b99d52435d4fed075f148455826 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:09 +0000 update aio_threads count when new devices appear This will help ensure availability when new devices are added, without additional user interaction to manually set aio_threads via sidechannel. commit 3d93bd96c92cedd583e14ea58b34bb143c4e9e87 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 03:34:08 +0000 make mog_fd_get static, favor mog_fd_init mog_fd_init enforces setting the correct type, so relegate mog_fd_get to private usage inside fdmap.c commit 97ed7a71d216eb4c6cbd1c40f2759e8d8957864a Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-21 11:17:47 +0000 build: get the gnulib version via autogen.sh This is useful for: a) repeatibly generating the same tarball off git b) diagnosing and tracking down (rare) gnulib bugs c) 3rd parties verifying we do not put malicious code into our tarballs commit 0ad0f16bce2769a599eb718261e0283e79c57639 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-06-25 19:46:02 +0000 mnt: attempt to match iostat output by st_rdev st_rdev matching is necessary for cases where the block devices are aliased (not via symlinks), and mountlist returns a different name for the device than what iostat uses. This is the case for my cryptmount(8) setup, where /dev/mapper/FOO and /dev/dm-N refer to the same device (with matching st_dev and st_rdev numbers), but neither is a symlink to the other (nor are they hardlinks). stat() on block devices in /dev should always be fast and non-blocking, as /dev is expected to be non-networked on any reasonable system (at least those serving as a MogileFS storage node). commit 93c14ddc0977b82718d8b70c0c0e8a297b8a4211 (tag: v1.2.2, origin/1.2-stable, 1.2-stable) Author: Eric Wong <normalperson@yhbt.net> Date: 2013-05-11 22:56:46 +0000 cmogstored 1.2.2 - minor maintenance release This is a minor maintenance release, no need to upgrade unless a) your gcc defaults to -march=i386 (e.g. 32-bit CentOS 5) b) your device names include '-' (e.g. Linux device mapper users) There are also some minor doc updates to clarify tarball vs git installation and a trivial error-handling fix which should not affect any current users. Eric Wong (6): build: add check for GCC atomics alloc: posix_memalign does not set errno iostat_parser: allow '-' for device names test/cmogstored-cfg: ensure TMPDIR is absolute for valgrind INSTALL: clarify between starting from tarball vs git INSTALL: update versions and URLs cmogstored 1.3 will have some fairly intrusive internal changes and cleanups to make it easier for users to trace and diagnose system and network problems. commit cdf2128a1e183e8abfa3d4fbf033c4fa46848898 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-05-11 13:57:51 +0000 INSTALL: update versions and URLs libkqueue recently migrated to SourceForge and Debian 7.0 is the new stable. We still support Debian 6.0 and will likely support it for years to come since CentOS 5.x remains supported. (cherry picked from commit 86e5d10649f14fe3b3c8af37fd8ec04cc337fc9e) commit 9d4347d5c8385fa93b6eb31045f7280a4a228c94 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-05-11 13:57:50 +0000 INSTALL: clarify between starting from tarball vs git Users unfamiliar with autotools may not realize bootstraping is required when building from git. (cherry picked from commit 1e80ba592ede05fe40b31686142f82294891afd0) commit 86e5d10649f14fe3b3c8af37fd8ec04cc337fc9e Author: Eric Wong <normalperson@yhbt.net> Date: 2013-05-11 13:57:51 +0000 INSTALL: update versions and URLs libkqueue recently migrated to SourceForge and Debian 7.0 is the new stable. We still support Debian 6.0 and will likely support it for years to come since CentOS 5.x remains supported. commit 1e80ba592ede05fe40b31686142f82294891afd0 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-05-11 13:57:50 +0000 INSTALL: clarify between starting from tarball vs git Users unfamiliar with autotools may not realize bootstraping is required when building from git. commit d698442186bfa7c1b35e68720412c9add422616c Author: Eric Wong <normalperson@yhbt.net> Date: 2013-05-06 23:45:37 +0000 test/cmogstored-cfg: ensure TMPDIR is absolute for valgrind Our use of chdir in this test confuses valgrind which may create a temporary file. (cherry picked from commit dc801d4a4ded67d74f5306d6dad4aba629045cc8) commit dc801d4a4ded67d74f5306d6dad4aba629045cc8 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-05-06 23:45:37 +0000 test/cmogstored-cfg: ensure TMPDIR is absolute for valgrind Our use of chdir in this test confuses valgrind which may create a temporary file. commit e247cd327850090dca3d500bc4abcafb3d098875 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-04-14 00:50:10 +0000 iostat_parser: allow '-' for device names Linux device-mapper names show up as 'dm-0', 'dm-1' and so on. This allows users to store MogileFS files on encrypted devices using dm-crypt and perhaps other, similar tools. (cherry picked from commit 88d34b4686a650dba89674aa302ab13c78e8cef0) commit 27c299a123597729d011b4ec205acb0e0bc48b83 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-04-15 19:29:31 +0000 alloc: posix_memalign does not set errno We must set errno manually for die_errno() if posix_memalign fails (cherry picked from commit 8c79cf794f6178b6978743af99d498ca0b449fb1) commit 0c918c095d8f611f8d0072db468e37683597ef01 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-05-06 22:35:06 +0000 favor "struct mog_fd" for acceptors over int FDs There's no reason to be referencing FDs for these acceptors since they're infrequently accessed by svc, so this should make our internals more consistent. This also removes our use of mog_fd_get (outside of test code). commit f80c52cfe4e08fba39995830a3fcf5835d0bb846 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-05-06 22:20:05 +0000 preliminary systemtap support for tracing We will key most client events by pid() and file descriptors, as this is least ambiguous. There are some minor refactorings to pass "struct mog_fd *" around as much as possible instead of "struct mog_http *". commit b60e0eebc4e108f63372f9a0ffe318589599728f Author: Eric Wong <normalperson@yhbt.net> Date: 2013-04-17 07:59:36 +0000 http: minor debloat via better alignment This results in a small size reduction due to better alignment: $ ~/linux/scripts/bloat-o-meter cmogstored.before cmogstored.after add/remove: 0/0 grow/shrink: 2/2 up/down: 20/-56 (-36) function old new delta mog_http_get_open 1460 1476 +16 mog_chunk_init 65 69 +4 http_forward_in_progress 63 55 -8 mog_http_parse 27171 27123 -48 commit 354eae3bd113e66c863b384765d88680406ed633 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-04-17 03:45:24 +0000 http_parser: do not differentiate between MD5 sources It does not matter if the Content-MD5 comes from the trailer or header, we process it the same way with the Ragel parser. This is obvious when reading our code (and associated hunk this commit changes) in http_put.c commit 7b097d6129a7971197430d817682163adb8e2e8a Author: Eric Wong <normalperson@yhbt.net> Date: 2013-04-14 00:50:09 +0000 save socket address on accept/accept4 getpeername() does not work on unconnected sockets. For error-handling, unconnected sockets is a fairly common occurrence, so we want to get the address early on when we know the address is still valid. For IPv4 addresses, this does not increase memory overhead at all. IPv6 addresses[1] does require an additional heap allocation, but it does not need to be aligned since it is infrequently accessed. If IPv6 becomes common, we may need to expand our per-client storage to 192 bytes (from 128) on 64-bit (or see if we may pack data more carefully). [1] IPv6 addresses are rare with MogileFS, as MogileFS does not currently support them. commit 449b85daa42cae1b9542a26e6dd52a1db38cce93 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-04-14 00:50:12 +0000 allow binding to IPv6 addresses MogileFS currently does not support IPv6, but maybe one day it will. When it does, we'll be ready. commit 29342bcd9864e4aabb9e6febef8748a5f51ac944 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-04-16 20:26:57 +0000 wrap getnameinfo for consistency in error logging This will allow us to more easily handle error reporting for IPv6 addresses and allow for consistent formatting of stringified IP addresses. commit 88d34b4686a650dba89674aa302ab13c78e8cef0 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-04-14 00:50:10 +0000 iostat_parser: allow '-' for device names Linux device-mapper names show up as 'dm-0', 'dm-1' and so on. This allows users to store MogileFS files on encrypted devices using dm-crypt and perhaps other, similar tools. commit 4d9a4f921c1a79d2d82aae3e104cac43537b1e2d Author: Eric Wong <normalperson@yhbt.net> Date: 2013-04-14 00:50:08 +0000 potentially make the mog_sockaddr union smaller The generic "struct sockaddr" may be padded to be the same size as "struct sockaddr_storage" (which is what we were trying to avoid in the first place by uinsg mog_sockaddr). This change makes no difference on GNU/Linux. commit 8c79cf794f6178b6978743af99d498ca0b449fb1 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-04-15 19:29:31 +0000 alloc: posix_memalign does not set errno We must set errno manually for die_errno() if posix_memalign fails commit 9427f2989eae96106090d77ddff1656f8510957d (origin/attr) Author: Eric Wong <normalperson@yhbt.net> Date: 2013-03-19 09:33:47 +0000 http: put parser-private attrs in a private struct This will allow easy use of memset to reset attributes in between requests without clobbering more important data. commit cce7f3c33207c534f9e5a6c0cb389a97df21235b Author: Eric Wong <normalperson@yhbt.net> Date: 2013-03-08 10:21:38 +0000 build: add check for GCC atomics Andrey Okunev noted undefined references on the MogileFS mailing list when building cmogstored 1.2.1 on his 32-bit CentOS5 machine. commit 08b8d7f1e5101631f642134718871dd2ef24c1e5 (tag: v1.2.1) Author: Eric Wong <normalperson@yhbt.net> Date: 2013-03-04 01:25:09 +0000 cmogstored 1.2.1 - fix graceful shutdown failure This release only fixes an assertion failure during graceful shutdown while MogileFS fsck is running with checksumming enabled. This only affects users running fsck with checksumming enabled during a graceful shutdown of cmogstored. For upgrading cmogstored it is recommended to: 1) stop fsck on the trackers (via "mogadm fsck stop") 2) wait for all tracker queues to drain and stop sending fsck traffic to the affected host. You may wish to "!want 0 fsck" on all your trackers and wait for the fsck workers to stop. 3) upgrade cmogstored (in place upgrade works) There are also several code comment updates for internal components of cmogstored which may interest potential hackers. commit bc82924e5f26f4d72b145185254f563526adb8f9 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-03-04 01:22:42 +0000 TODO: add a few item for our roadmap We have a future! commit f128eea752d51a566996043fd159da9be8d83597 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-03-02 10:56:17 +0000 alloc: document use of TLS buffers tls_rbuf allows us to avoid nearly all dynamic allocation for common HTTP requests. However, the mog_rbuf structure may be detached from TLS as necessary (and another one allocated in its place) when the need arises. commit 20bcb2ccc3d3d38b0fc2f16c25cad74d8404d5bb Author: Eric Wong <normalperson@yhbt.net> Date: 2013-03-02 10:46:12 +0000 fdmap: documentation for the FD-based memory allocation Avoiding heap allocations in common paths is important to high performance server design; document this important design decision. commit adc750ab6600980ba98d77d371efb07b38886f30 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-23 20:41:33 +0000 mgmt: fix fsck digest assert failure in graceful shutdown Items in the low-priority fsck queue could trigger a assertion failure during graceful shutdown due to improper handling of the MOG_NEXT_IGNORE state in mog_mgmt_quit_step(). However, using the fsck queue in graceful shutdown (which is single-threaded) is probably a bad idea anyways, as the fsck digest could monopolize other requests. So give no special handling to fsck digest queries during graceful shutdown. This only affects users running fsck with checksumming enabled during a graceful shutdown of cmogstored. For checksums users, it is recommended to stop fsck from the trackers and wait for all tracker queues to drain before upgrading cmogstored (and using graceful shutdown on the old cmogstored). commit 8757c6458e67e9ab20f9a049a9a68f51b3229816 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-23 01:17:09 +0000 http_get: comment about snprintf() being a hot spot cmogstored is pretty fast, but it could be faster. commit c81abd17fbbbb37c4df13771b485e139c8ab71d9 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-21 03:32:07 +0000 queue_common: update comments to match code While we're at it, explain the use of cloexec. commit f57064cc07d872583f50a04b2421f214304cc483 (tag: v1.2.0) Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-18 23:37:17 +0000 document/reserve SIGWINCH/SIGHUP for future use Despite having an extensive test suite and minimal room for user error, giving users the options to back out of a hot upgrade may be worth supporting. commit cbab5b9d18f13c22f6d94bdad2490e8d280ea927 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-18 23:30:30 +0000 copyright comment updates for 2013 (part 2) Many files were missed the first time around in commit 37026af96dec638aa850d604003bf7218d90037d commit d7a6fe7d93c2e7c771e99f7083d2a59d320da12f Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-18 23:27:13 +0000 manpage: document SIGUSR2 upgrades This is a new feature and needs to be documented. commit f5a6eb5faa0459d6ec4ac9255c0f24d4dbe73583 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-18 12:18:55 +0000 move cmogstored_exit() prototype to cmogstored.h This fixes a missing prototype warning for cmogstored_exit() when checking exit.c with sparse. commit 56cb260ed21561c2b84c1ca5dec8b25c738343c8 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-18 05:46:47 +0000 queue_epoll: fix bad cast for epoll.event The events field of struct epoll_event is a uint32_t, not int. commit 43d893ac7043ca69f2e93b987856e22cfa4a3978 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-18 05:46:46 +0000 tests: add valgrind supp for epoll_ctl on 32-bit arch The epoll_event.data union is 64-bits on 32-bit systems while pointers are 32-bit. We only use 32-bits of that union, but valgrind mistakenly complains about it (the kernel does not care about the user-supplied data union at all). commit 92b8a2091414c0024fe9fd35aed6891308c9dc26 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-18 05:46:45 +0000 ioutil: fix memory access error on from mog_iou_write sizeof(buf) returns the size of the pointer if buf is a passed parameter, even if it the function prototype dictates a fixed size for buf as we do in mog_iou_write. While we're at it, make our mog_iou_write buf parameter const. This bug was introduced in: commit a960a351b2248a196c91cdbf6256f98e1bc2ef37 "split iostat util% tracking from mountlist" and never affected an official release of cmogstored. This bug was caught while testing on a 32-bit GNU/Linux machine. My normal 32-bit FreeBSD 9.0 environment did not catch this as iostat on that platform only reports integer percentages and does not need more than 4 bytes. commit 719e4fc320e1978bc9ea6ee8be9f8249dcb54dab (next) Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-16 12:55:42 +0000 handle pthread_create returning ENOMEM on old glibc Older glibc will return ENOMEM on mprotect() failures. This bug was only fixed in 2011, so the long-term distros and old installations may not have the necessary backports. ref: http://www.sourceware.org/bugzilla/show_bug.cgi?id=386 commit 13cbdcea65248271668562064aafdcc9634ef9ce Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-15 11:04:55 +0000 graceful handling of pthread_create EAGAIN failure pthread_create may return EAGAIN as a temporary failure, do not abort a running process if this is the case. For the initial mountlist scan, we must retry indefinitely for cmogstored to be usable. However, with our thread pools, we can always run fewer threads (as long as there is at least one thread per-pool). commit fcb41385271818586a162d02aeb23bc3414a602e Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-15 09:27:13 +0000 test/http_idle_expire: hopefully improve test reliability This is a tricky test and doesn't always succeed, since it's hard to tell how many file descriptors glibc will use internally. commit 476cc380a94db3355f818b2c798cdeeb0c626cc0 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-15 02:54:20 +0000 sig: avoid pselect if ppoll is present in mog_sleep We want to favor ppoll over pselect, since ppoll is a better interface and we can have a slightly smaller binary with fewer dependencies. While we're at it, use mog_sleep(-1) as an alias for mog_selfwake_wait to further reduce binary size. commit b7403080f0266ac41cecae80adcfa0391f3f93b7 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-15 02:43:51 +0000 avoid racy sleep on fork failure in master process We need to atomically enable interrupts and sleep with the same syscall. Fortunately, using pselect (through mog_sleep) allows that and is POSIX-compliant, so use that. commit 5629899a12649b9b21f41efc29b92adbd82afe6c Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-15 02:40:50 +0000 mnt: inform user of slow mountlist scan This will inform the user of why cmogstored may be slow to start, since we need the mountlist to be populated at startup. We also throw a pthread_cancel() in there to load libgcc_s under glibc, so we can avoid loading libgcc_s once we're under FD pressure. This makes test/http_idle_expire.rb more reliable. commit 44f4f76d06899b1a0e4719671a4fde3c0851764a Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-14 11:32:11 +0000 test/http_range: do not allow webrick to perform lookups DNS lookups cause webrick tests to fail or timeout. Our tests should not have external network dependencies. commit cfe689f85b0b39d1f3b3e21d9b564d34b2146d88 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-14 10:26:17 +0000 inherit: avoid DNS lookup on upgrade A typo caused unnecessary DNS lookups when inheriting sockets. While we're at it, fix another typo in the error message, too. commit f8b30b2846c25461940c99d8fd4432ec49920098 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-14 05:21:27 +0000 selfwake: use epoll_pwait on Linux instead of eventfd This saves us a file descriptor in Linux, which provides epoll_pwait in 2.6.19+ (and ppoll for 2.6.18, the oldest kernel we support). commit 4ccf06a600ce31c6dbd61d9c44b491233758c18b Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-11 09:48:20 +0000 mnt: revert to mutex for protecting by_dev hash Since we now update future copies of by_dev offline and only need a lock to swap in the new one, contention for by_dev should be less of a problem than it was before. This should make reader-writer locks an unnecessary risk. Reader-writer locks are riskier since writer starvation can potentially be an issue with many readers. commit f54e27e0ec0a520c0a079d6e8428eeefdcd366ab Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-11 02:07:41 +0000 test/mogilefs_integration: increase test reliability Use SO_REUSEADDR, since Linux requires both the new program (cmogstored) and this test both use SO_REUSEADDR for SO_REUSEADDR to be effective. Also, minimize the window for port conflicts. While there are hard-to-avoid race windows for conflicts when binding random ports, we can minimize those windows by holding those ports open in the parent as long as possible. commit 384c801cced851e782cbe94b548a31b1deaa70f3 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-11 01:25:16 +0000 kqueue: update NOTIFYRD -> SELFWAKE This was missed in the earlier changes to allow eventfd usage under Linux instead of using an notification pipe. commit ba24aa82b1c9306e0053089296741f028fafa148 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-11 00:51:06 +0000 fix signal races when master process is used In the absence of a pselect/ppoll-like version of waitpid; we must use a selfwake descriptor (pipe or eventfd) to wake the master up whenever a signal is received. So wait on the selfwake descriptor and always run waitpid with WNOHANG in a loop to ensure all children are reaped. The: mog_intr_disable(); waitpid(); mog_intr_enable() sequence was completely stupid I can't believe I wrote it. commit 5537c96848f483d403da1ed663809681e7b09f3b Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-10 11:37:34 +0000 allow self-wakeup to use eventfd under modern Linux eventfd uses fewer resources than a pipe, so create less overhead for our users by using eventfd instead of a pipe. commit 955991aae8c3da5a13e34e929188db3fd9216a0e Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-10 23:17:41 +0000 pidfile: delay unlink of old file on aborted upgrades We don't want to be without any pidfile if writing the new pidfile fails. commit 975a329912818b49f04de15349f6414719430808 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-10 20:39:56 +0000 upgrade: do not disable interrupts in forked child The child disables interrupts right away, so there's no reason to enable interrupts temporarily. commit 2163a4c6f09a9813a0e69a9533923623d448dce9 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-09 23:09:14 +0000 test/upgrade: more thorough PID file checking We need to ensure the PID file is non-empty, not just that it exists. commit 7d56b023d2aac8530b249b2db7d90a738297a6fc Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-09 23:06:41 +0000 prioritize upgrade before exit in main loop If we receive both SIGUSR2 and SIGQUIT in a short time period; we should trigger the upgrade before gsince raceful exit; as no user will (intentionally) send SIGQUIT before SIGUSR2. commit 3f454ae96e7cc1352f7bf7756a064cf5781154c4 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-09 23:05:23 +0000 test/upgrade: teardown more careful about killing We don't want to accidentally kill ourselves by targeting PID=0 if the PID file is empty. commit b96d1018ae5261d8ee9344b959acb04c1be43279 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-09 07:04:13 +0000 tests: fix several Ruby warnings Unused variables and unset Content-Type for Net::HTTP requests commit 7d740e5825e05030b5978ed296fd0b801666b405 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-09 07:03:02 +0000 test/inherit: fix Ruby 2.0.0 close-on-exec compatibility FD inheritance from exec() must be done explicitly in Ruby 2.0 commit e427fb773837953c01ebe8dfaf8f8679c7895fc2 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-08 08:48:36 +0000 mnt: move stat/lstat logic to mnt_usable This centralizes the mountpoint suitability logic in one place. In the future, it may also allow us to parallelize the work of scanning filesystems. commit 223adf17682765f9e72d3436348700085d823a6e Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-08 03:02:18 +0000 upgrade: fix env placeholder for valgrind Having a NULL at the beginning of the list caused iteration in the destructor to stop, allowing valgrind to detect a memory leak. commit d8ac1b937647b86342135a91abe933a3f8812909 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-08 02:54:54 +0000 cfg: require PATH to be set for --daemonize Maybe some weird users do not have PATH commit bd37ad7bfae8c9b25a9eef1e1ce9b7c17d1f5257 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-08 02:27:03 +0000 upgrade: avoid non-async-safe functions in child execvp may malloc internally in its path lookup, so use find_in_path to perform this lookup in the parent instead. Additionally, putenv() may not be async-signal-safe either, but execve is, so use execve. commit 117a11e9e2b8a365df90336ae78b61f6562b7bd3 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-07 22:02:41 +0000 cfg: disallow trailing ':' in PATH with daemonize Trailing ':' in PATH means using the current path, which is now incompatible with daemonize. commit 4f45f562180489a97a4572ebd3822e9f15289bd6 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-07 20:29:47 +0000 upgrade: avoid potential deadlock from post-fork mutex use Pthreads implementations do not require mutexes be in a consistent/usable state in a forked child Since we don't need the mutex in a single-threaded forked child, we can just skip it and avoid reinitializing it entirely. commit 315487f70c90b117aa4e9d63bbb21abae8af80ab Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-07 19:50:49 +0000 rename fs_usable to mnt_usable It should be clearer this code is only called from inside mnt.c and not fs.c (the latter is for general filesystem operations, not operations on a mount point). commit c3550946c61a43cad54f1aa7c0f0f062a451042f Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-07 11:00:10 +0000 release memory allocated for upgrade at exit This is not strictly necessary as this memory is freed anyways, but stop valgrind from complaining and avoid unnecessary suppressions (since shutdown performance is not important). commit 03bf577eb3a328f130083d992b180ba72ee1f0b4 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-07 08:36:47 +0000 forbid relative paths with daemonization Relative paths are incompatible with daemonization, as it does not work for SIGUSR2 upgrades (since daemonize forces the server to run in "/"). Relative paths are confusing and error-prone anyways, so do not allow users to specify them along with --daemonize. commit 5b7d2608f95332c3bcd69d1eb56044236fc6b978 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-07 09:30:17 +0000 omit trailing newline from die() and warn() calls The GNU error() function already emits a newline at the end of these messages. commit 9510a1d244e22725e2710d04571e3d6fbf89b0e8 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-07 09:33:37 +0000 favor error.h GNU system header over gnulib one error.h is available on GNU/Linux (and presumably GNU/kFreeBSD and GNU/Hurd, so favor that system-wide header over the gnulib one. commit 26432ee7d0cf7f94fdc62804611cdbc7c5ec960c Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-06 02:44:05 +0000 remove warn module and alias it to error() in gnulib There is no need to maintain our own code for this. commit 930da6932ae96b3c5f40324b9f24fc6415f3e500 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-02-06 02:42:58 +0000 queue_epoll: change fprintf(stderr, ...) to use warn() This makes it easier alter how/if we write to stderr. commit 7abd078c4f7e61e87f9394c6662be027fe0253b2 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-31 23:10:50 +0000 ioutil: avoid assigned but unused variable Noticed with gcc 4.7.2 in Debian testing (4.7.2-5) commit c0931fd23e065521237530cd6f9f6068f259e4e1 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-31 06:47:41 +0000 cmogstored: initialize syslog before inheriting This ensures the: inherited $ADDRESS:$PORT on fd=... messages are prefixed with the PID in logs. commit 459d163514766653d28d8964a7e2e25d27f7c873 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-31 06:46:39 +0000 cfg: daemonize is a boolean, not an integer This project uses C99 features (and some GNU extensions), so bool is usable. commit dffe6b3dc226cafb0a6107443f9d7e23095dd789 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-31 04:20:35 +0000 sockaddr*-related data structure size reductions We do not need all the weight of sockaddr_storage or NI_MAXHOST. cmogstored currently only supports IPv4 and IPv6[1] and (like any respectable server) will not perform reverse DNS lookups. This allows us to reduce our stack usage in some places and keep caches hotter. [1] MogileFS does not support IPv6, yet, even commit d68b0ad231192c6ccf701e66be66b6bc956bed2b Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-31 03:52:20 +0000 minimize interrupt windows for master process Code is easier to follow when interrupts occur at well-defined points. The worker processes (and master-less standalone) already follows this. commit 088138b235e79fa54a4e3602a4d60975e9581571 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-30 02:55:29 +0000 implement nginx-style binary upgrade via SIGUSR2 USR2 now forks a new cmogstored process which inherits listener file descriptors from the parent. The parent renames its pidfile with a ".oldbin" suffix so the new child can use the new PID file. Clusters may now upgrade to future versions of cmogstored without needing to mark hosts down via mogadm. The behavior of this process should match that of nginx: http://wiki.nginx.org/CommandLine#Upgrading_To_a_New_Binary_On_The_Fly commit 2b252bb6b4704be01d629194aff588b24d579cdd Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-26 00:19:59 +0000 refactor process management To support transparent upgrades, we need to be able to reap child processes regardless of what the child process was. So we must do away with the iostat/worker-specifc waitpid() calls and use waitpid(-1) to cast a wide net to reap anything and everything. When we support transparent upgrades, the fork+exec-ed child process may die, so the main process (master if --worker-processes are used) needs to be capable of reaping that new process. commit e292e0e874a064fcd39f76565f38935449b7f7c8 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-24 12:26:47 +0000 inherit: preliminary FD inheritance over exec() This lets us inherit listen sockets from the parent process, in the future this will allow transparent upgrades. commit 80aef1b3c8e9a20ec047dcf040e594a5e2a23811 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-24 03:28:51 +0000 move graceful exit functionality into its own file No need to clutter up the main file with graceful exit functionality. commit 25d7a82d9851c204e6ca47ab8af35fdab9bbd37c Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-24 03:21:05 +0000 move pidfile preparation function out cmogstored.c is too big, we can move pidfile functionality out to pidfile.c easily. commit 852ca09524c17dc15ab68a1f85cee008c22a3a76 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-31 00:59:34 +0000 determine mount point usability via statfs/statvfs Filesystems with no block size are unusable, so avoid stat()-ing them and potentially having problems with our subsequent stat() stalling when a network connection slows (or goes) down. commit 9f7c1f7b8326a03c2105328f50df3ce2099de1d5 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-31 00:25:24 +0000 better error handling when faking memstream For systems without memstream support, using temporary files to emulate memstream opens us up to more common (than ENOMEM) errors such as: EIO, ENOSPC, ENFILE and EMFILE. Since we don't want our server to die completely on these (sometimes temporary) error cases, we'll just stop publishing iostat data to "watch" subscribers. commit c25d3846d96ad761e0e71304903dec79ca56424d Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-30 23:02:47 +0000 mnt: allow concurrent readers on mount list On refresh, we can (slowly) build a new mount list and replace the old one quickly and atomically. This prevents ioutil reader/writers from waiting on slow mount list refreshes; as the mount list does not change frequently. This increases windows where iostat utilization may be read/updated, especially when network mounts are temporarily unreachable or slow. commit a3ca090b4b01d44e674f4db5cb13f5d111d0aa32 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-30 21:57:40 +0000 mnt: cleanup/document mountlist storage/nesting Moving ioutil out to a separate table allows us to discard our private mog_mntent struct. Data structure simplification also allows code simplification. commit a960a351b2248a196c91cdbf6256f98e1bc2ef37 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-30 11:59:58 +0000 split iostat util% tracking from mountlist This prevents us from losing iostat utilization each time the mount list is rescanned. Additionally, this allows us to read iostat utilization (and write to sidechannel clients) concurrently while the mount list is being refreshed. commit 2e1958b26b926c42f213ba47b71ec735d81448e7 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-30 10:18:40 +0000 consistent allocation size for iostat utilization This is better than open-coding a length everywhere. commit 45843883077bc0a4ef745d03c3b4241278463f3b Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-30 02:48:44 +0000 test_helper: expand relative paths in $PATH --daemonize will chdir("/"), so relative paths must be expanded for USR2 upgrades to work. commit af8fe00640d70aab2e44110c88e4772e4daf4f68 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-30 02:48:45 +0000 test/mogilefs_integration: reduce chance of socket conflicts Disable SO_REUSADDR to avoid recently-used ports. Additionally, only close (via dereference) sockets when all listener sockets are bound. commit 25740ec5e7e1a13e48552b4c6b0e60e8730641ea Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-31 00:59:00 +0000 move MOG_STR() macro to util.h We shall use it outside of defaults.h commit 6ed94e8b4a9219b10f8132078ba2882fddae40f9 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-24 01:21:52 +0000 mnt: avoid recursion in mount_entry_free Insane mount point aliasing may result in stack depth explosion. commit 24a1f80ca96d2c12f0cefaca4f7040dbc4f07919 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-25 20:55:33 +0000 limit --worker-processes to UINT_MAX UINT_MAX worker processes should be more than enough for anyone. commit dff57bf1b16a9435428c57ae26e6a3701f8d2ea9 (tag: v1.1.0) Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-17 21:25:22 +0000 queue_epoll: fix version check for 2.x kernels Oops, we could accidentally mark a 2.x kernel as non-buggy. 2.6.32 and 2.6.34 may eventually get backports. commit e47a1fe799edc981272cb66a7f52f11d50826a9b Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-17 20:13:47 +0000 http: avoid MSG_MORE on HEAD responses We need to signal we do not have more bytes to write to the socket when generating HTTP HEAD responses. This avoids a 200ms delay between HTTP responses. This regression only appeared in commit 14e0684507c06439ee9c7a731fd6ca90b7b9adcb and was never in a release. commit 1074f0d2c55fee0de4f2ceba2829f6a3e12ce845 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-17 10:55:13 +0000 tests: additional test for trysend buffering in Linux This is hard code to exercise in the real world since we only send tiny HTTP headers with trysend. commit be2062a5eb21718c932aaa4d49685e36763842ed Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-17 06:42:18 +0000 close: ignore ECONNRESET errors (for FreeBSD, maybe others) FreeBSD (and possibly other BSDs) returns ECONNRESET on close(). The descriptor still seems to get released and eventually become usable again; so retrying close() is dangerous as we allocate file descriptors from multiple threads. commit dc7c6efeac5fb0fb1f44e7f3cee625a6c33f7e26 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-17 02:51:52 +0000 http_date: time_t pointer is const This pointer is passed to gmtime_r(), which also takes a const time_t *. This makes it easier for users of mog_http_date() to know the timep parameter does not get altered in any way. I noticed this discrepancy when rereading http_get.c for the first time in a few months. commit 88b5f33f1f7e79d799da34b350f4dc59b875cf40 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-17 02:34:56 +0000 lazily call mkdir for file creation There's no need to waste time creating/checking directories which already exist. Since directories tend to hold multiple files, we can optimistically run open()/openat() and only call mkdir()/mkdirat() on ENOENT. commit c57ff63768cedd72abf74e93a3ba59070de77357 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-17 01:54:32 +0000 trywrite: build fix for platforms without MSG_MORE Oops... commit 0e9a8e156f0a060c7822069c4f69eea3710c793c Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-17 01:37:17 +0000 simplify TCP_NOPUSH support code (remove TCP_CORK) Since we no longer use TCP_CORK under Linux (where we use MSG_MORE instead), we can cleanup the nomenclature and avoid confusing people by mentioning TCP_CORK. commit 14e0684507c06439ee9c7a731fd6ca90b7b9adcb Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-17 01:11:04 +0000 linux: favor send() w/MSG_MORE over TCP_CORK This saves several setsockopt() syscalls and reduces system CPU usage. commit 37026af96dec638aa850d604003bf7218d90037d Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-17 01:17:48 +0000 copyright comment updates for 2013 gnulib did it for us in m4/gnulib-cache.m4, we'll match. commit 96b173c5516ad56ad2ea41d99406d58998be88b2 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-17 00:37:29 +0000 http_get: disable FADV_SEQUENTIAL for small responses So there's no need to waste a syscall on small reads which would not benefit from readahead at all. Using 256K as a threshold for "small" reads, which is twice the normal max readahead window on modern Linux 3.x. commit 17e2cca675df298c6d99ee2b7f0e099c02eb271c Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-17 00:26:28 +0000 epoll: update EPOLL_CTL_MOD workaround for stable kernels Linux v3.2.37+ does not need this fix. Linux v3.0.59+, v3.4.26+, v3.5.7.3+, v3.7.3+ are all under review and not need this. commit 6c4e0c408de81ceb49fe9dac2ad38a1655aea625 Author: Eric Wong <normalperson@yhbt.net> Date: 2013-01-02 11:19:09 +0000 epoll: avoid EPOLL_CTL_MOD bug in Linux <= 3.7.1 On SMP machines, EPOLL_CTL_MOD had a race condition under Linux <= 3.7.1. This allowed events to be missed if it arrived near the time the EPOLL_CTL_MOD request was issued. ref: linux.git commit 128dd1759d96ad36c379240f8b9463e8acfd37a1 commit 86477eafdadda573b850cf610a60c5e6eab1c189 Author: Eric Wong <normalperson@yhbt.net> Date: 2012-12-13 09:29:25 +0000 Rakefile: fix Regexp encoding issues under 1.9