about summary refs log tree commit homepage
DateCommit message (Collapse)
2014-09-03http_common: correctly handle empty header values empty-header-values
2014-09-03test/http_chunked_put: test for gigantic trailer
This is a potential attack vector, and we seem to pass.
2014-09-03test/http: plug race condition in FIFO test
This is noticeable in the trunk version of ruby since r47288 ("io.c: do not swallow exceptions at end of block").
2014-05-30test/mgmt: lengthen test for iostat watch
The iostat may take a while to notice a new device, so let it run a bit.
2014-05-30svc_dev: calling free does not need the lock
We do not need to be holding devstats_lock when releasing a local buffer which will never be used by another thread.
2014-05-30remove old fsck_queue declarations
fsck_queues were replaced by generic ioq for all requests in 1.3, but the declarations here were forgotten.
2014-04-08minor cleanups for functions which do not return
pthread_exit and abort never returns, so quiet down some warnings when using -Wunreachable-code on clang. Unfortunately using -Wunreachable-code globally is too noisy due to 1) Ragel-generated code. 2) constant branch conditions for build-time options (trace/cork)
2014-02-22cmogstored 1.4.0 v1.4.0
bsd_sendfile is now supported on Debian GNU/kFreeBSD systems. This release also fixes a compability bug with Perl mogstored config files where "daemonize = (0|1)" was not supported properly. Eric Wong (3): check for sys/sendfile.h header instead of __linux__ allow bsd_sendfile with freebsd-glue on Debian/kFreeBSD support "daemonize = 0|1" in the config file
2014-02-22support "daemonize = 0|1" in the config file
This is expected by Perl mogstored, and our previous support of "daemonize" (standalone) was in error (but still supported for now).
2014-02-21allow bsd_sendfile with freebsd-glue on Debian/kFreeBSD
Debian GNU/kFreeBSD users may ./configure with LIBS=-lfreebsd-glue to use the FreeBSD sendfile syscall.
2014-02-17check for sys/sendfile.h header instead of __linux__
Non-Linux OSes may eventually gain a Linux-compatible sendfile.
2014-02-09cmogstored 1.3.3 - Debian GNU/kFreeBSD fixes v1.3.3
This release fixes build problems with Debian GNU/kFreeBSD support (turns out it's been broken for over a year and nobody noticed :x). There are also build system upgrades for automake 1.14 and test case cleanups, but no changes to any of the core code. No changes nor need to upgrade if you're on anything other than Debian GNU/kFreeBSD.
2014-02-09m4/gnulib-cache: update for 2014
2014-02-08test/upgrade: cleanup and robustness improvements
Avoid calling top-level methods inside other tests in case some versions of test-unit or minitest can call setup/teardown twice. Avoid Timeout, as it is expensive and unnecessary in some cases.
2014-02-08Makefile.am: updates for automake 1.14.1
Tested with automake 1:1.14.1-2 on Debian GNU/kFreeBSD
2014-02-08tests: skip iostat-dependent tests
Debian GNU/kFreeBSD still does not have iostat :<
2014-02-08Makefile: do not clobber NOSTD_CFLAGS from configure
This was breaking the Debian kFreeBSD build
2014-02-04doc/queues.txt: add a note about our non-use of AIO
It was obvious to me to use pthreads up front, hopefully that's explained to others, too.
2013-12-10cmogstored 1.3.2 - FreeBSD shutdown speedup v1.3.2
This release speeds up graceful shutdown on busy systems such as FreeBSD. There is also a minor resource savings for users of the undocumented --worker-processes switch. There are also some minor memory error fixes for test cases (which did not affect the daemon itself). Upgrading is optional unless you are affected by these fixes. Note: GNU/Linux users are encouraged to read the manpage update regarding glibc malloc arenas Eric Wong (9): selfwake: do share pipe descriptors with workers test/chunk-parser-1: fix uninitialized file structures test: fix valgrind warnings in test-only C code doc: refer to malloc-related environment variables thrpool: sleep instead of yield when poking thread test/mgmt-usage: relax regexp for ZFS m4/.gitignore: bump for newer gnulib doc: fix wording in manpage doc: fix link to MogileFS homepage
2013-12-10doc: fix link to MogileFS homepage
mogilefs.org is the correct domain
2013-12-10doc: fix wording in manpage
2013-12-09m4/.gitignore: bump for newer gnulib
Now at gnulib commit 43593319b31e6b0175b8eec4433bac744959822d ("md5, sha1, sha256, sha512: add gl_SET_CRYPTO_CHECK_DEFAULT")
2013-12-09test/mgmt-usage: relax regexp for ZFS
ZFS device mount points do not start with a leading '/'. We already account for this in our internal mountpoint handling, but did not account for this in the test case. Reported-by: Mikolaj Golub
2013-12-09thrpool: sleep instead of yield when poking thread
This unfortunate loop burned too much CPU on FreeBSD and caused shutdown to take too long when using sched_yield. nanosleep for 10ms instead, hopefully allowing the system to accomplish some disk I/O and other tasks before we poke it again. Reported-by: Mikolaj Golub
2013-12-02doc: refer to malloc-related environment variables
Using non-portable mallopt/mallctl functions is not feasible because detecting them correctly at _link_ time is not easy. Detecting them at compile time is insufficient because malloc implementations can be swapped at link time (and even with LD_PRELOAD, unfortunately).
2013-12-02test: fix valgrind warnings in test-only C code
Unfortunately, none of the C-only tests are run with valgrind (however all of the Ruby ones are).
2013-12-02test/chunk-parser-1: fix uninitialized file structures
This test failed when during the test on FreeBSD 11.0-CURRENT with MALLOC_DEBUG enabled or if MALLOC_OPTIONS=J is set in the environment. Reported-by: Mikolaj Golub
2013-12-02selfwake: do share pipe descriptors with workers
This only affects users of the undocumented --worker-processes switch. Furthermore, this only affects non-Linux platforms which rely on the pipe implementation of selfwake. This prevents us from wasting one extraneous file descriptor slot (and hence potentially wasting 128 bytes in userland).
2013-10-12cmogstored 1.3.1 - fix for an undocumented feature v1.3.1
This release fixes a bug which only affects users of the undocumented multi-process configuration feature (which is also multi-threaded). * avoid use-after-free with multi-process setups readdir on the same DIR pointer is undefined if DIR was inherited by multiple children. Using the reentrant readdir_r would not have helped, since the underlying file descriptor and kernel file handle were still shared (and we need rewinddir, too). This readdir usage bug existed in cmogstored since the earliest releases, but was harmless until the cmogstored 1.3 series. This misuse of readdir lead to hitting a leftover call to free(). So this bug only manifested since commit 1fab1e7a7f03f3bc0abb1b5181117f2d4605ce3b (svc: implement top-level by_mog_devid hash) Fortunately, these bugs only affect users of the undocumented multi-process feature (not just multi-threaded).
2013-10-12avoid use-after-free with multi-process setups
readdir on the same DIR pointer is undefined if DIR was inherited by multiple children. Using the reentrant readdir_r would not have helped, since the underlying file descriptor and kernel file handle were still shared (and we need rewinddir, too). This readdir usage bug existed in cmogstored since the earliest releases, but was harmless until the cmogstored 1.3 series. This misuse of readdir lead to hitting a leftover call to free(). So this bug only manifested since commit 1fab1e7a7f03f3bc0abb1b5181117f2d4605ce3b (svc: implement top-level by_mog_devid hash) Fortunately, these bugs only affect users of the undocumented multi-process feature (not just multi-threaded).
2013-09-30cmogstored 1.3.0 - many improvements v1.3.0
There are no changes from 1.3.0rc2. For the most part, cmogstored 1.2.2 works well, but 1.3 contains some fairly major changes and improvements. cmogstored CPU usage may be higher than other servers because it's designed to use whatever resources it has at its disposal to distribute load to different storage devices. cmogstored 1.3 continues this, but it should be safer to lower thread counts without hurting performance too much for non-dedicated servers. cmogstored 1.3 contains improvements for storage hosts at the extremes ends of the performance scale. For large machines with many cores, memory/thread usage is reduced because we had too many acceptor threads. There are more improvements for smaller machines, especially those with slow/imbalanced drive speeds and few CPUs. Some of the improvements came from my testing with ancient single-core machines, others came from testing on 24-core machines :) Major features in 1.3: ioq - a I/O queues for all MogileFS requests -------------------------------------------- The new I/O queue (ioq) implements the equivalent of AIO channels functionality from Perlbal/mogstored. This feature prevents a failing/overloaded disk from monopolizing all the threads in the system. Since cmogstored uses threads directly (and not AIO), the common (uncontended case) behaves like a successful sem_wait with POSIX semaphores. Queueing+rescheduling only occurs in the contended case (unlike with AIO-style APIs, where request are always queued). I experimented with, but did not use POSIX semaphores as contention would still starve the thread pool. Unlike the old fsck_queue, ioq is based on the MogileFS devid in the URL and not the st_dev ID of the actual underlying file. This is less correct from a systems perspective, but should make no difference for normal production deployments (which are expected to use one MogileFS devid for each st_dev ID) and has several advantages: 1) testing/mock deploys of this feature with mock deploys is easier 2) we do not require any additional filesystem syscall (open/*stat) to look up the ioq based on st_dev, so we can use ioq to avoid stalls from slow open/openat/stat/fstatat/unlink/unlinkat syscalls. Otherwise, the implementation of this very closely resembles the old fsck queue implementation, but is generic across HTTP and sidechannel clients. The existing fsck queue functionality is now implemented using ioq. Thus, fsck queue functionality is mapped by the MogileFS devid and not the system st_dev ID as a result of this change. One benefit of this feature is the ability to run fewer aio_threads safely without worrying about cross-device contention on machines with limited resources or few disks (or not solely dedicated to MogileFS storage). The capacity of these I/O queues is automatically scaled to the number of available aio_threads, so they can change dynamically while your admin is tuning "SERVER aio_threads = XX" However, on a dedicated storage node, running many aio_threads (as is the default) should still be beneficial. Having more threads can keep the internal I/O queues of the kernel and storage hardware more populated and can improve throughput. thread shutdown fixes (epoll) ----------------------------- Our previous reliance on pthreads cancellation primitives left us open to a small race condition where I/O events (from epoll) could be lost during graceful shutdown or thread reduction via "SERVER aio_threads = XX". We no longer rely on pthreads cancellation for stopping threads and instead implement explicit check points for epoll. This did not affect kqueue users, but the code is simpler and more consistent across epoll/kqueue implementations. Graceful shutdown improvements ------------------------------ The addition of our I/O queueing and use of our custom thread shutdown API also allowed us to improve the responsiveness and fairness when the process enters graceful shutdown mode. This improves fairness and avoids client-side timeouts when large PUT requests are being issued over a fast network to slow disks during graceful shutdown. Currently, graceful shutdown remains single-threaded, but we will likely become multi-threaded in the future (like normal runtime). Miscellaneous fixes and improvements ------------------------------------ Further improved matching for (Linux) device-mapper setups where the same device (not symlinks) appears multiple times in /dev aio_threads count is automatically updated when new devices are added/removed. This is currently synced to MOG_DISK_USAGE_INTERVAL, but will use inotify (or the kqueue equivalent) in the future. HTTP read buffers grow monotonically (up to 64K) and always use aligned memory. This allows deployments which pass large HTTP headers do not trigger unnecessary reallocations. Deployments which use small HTTP headers should notice no memory increase. Acceptor threads are now limited to two per process instead of being scaled to CPU count. This avoids excessive threads/memory usage and contention of kernel-level mutexes for large multi-core machines. The gnulib version used for building the tarball is now included in the tarball for ease-of-reproducibility. Additional tests for uncommon error conditions using the fault-injection capabilities of GNU ld. The "shutdown" command over the sidechannel is more responsive for epoll users. Improved reporting of failed requests during PUT requests. Again, I run MogileFS instances on some of the most horrible networks on the planet[2] fix LIB_CLOCK_GETTIME linkage on some toolchains. "SERVER mogstored.persist_client = (0|1)" over the sidechannel is supported for compatibility with Perlbal/mogstored The Status: header is no longer returned on HTTP responses. All known MogileFS clients parse the HTTP status response correctly without the need for the Status: header. Neither Perlbal nor nginx set the Status: header on responses, so this is unlikely to introduce incompatibilities. The Status: header was originally inherited from HTTP servers which had to deal with a much larger range of (non-compliant) clients.
2013-09-03cmogstored 1.3.0rc2 - fixes since rc1, systemtap v1.3.0rc2
The Status: header is no longer returned on HTTP responses. All known MogileFS clients parse the HTTP status response correctly without the need for the Status: header. Neither Perlbal nor nginx set the Status: header on responses, so this is unlikely to introduce incompatibilities. The Status: header was originally inherited from HTTP servers which had to deal with a much larger range of (non-compliant) clients. SystemTap support is mostly fleshed out. There are some bundled awk scripts which should make better sense of the all.stp which logs just about everything. Raising aio_threads now correctly increases ioq capacity. This regression was only introduced in the 1.3.0 rc series, as ioq was not in 1.2.x.
2013-09-03Makefile: update for systemtap support files
2013-08-31ioq: correctly reenqueue blocked mfds on capacity increase
Otherwise, reenqueue-ing only one mfd at-a-time is pointless and prevents cmogstored from utilizing new threads.
2013-08-31ioq: avoid over-yielding on and after ioq contention
We do not need to set the contended flag again until we're certain we have no free slots in the ioq, not when we assume the client is the last one to take a slot. This is because ioq access itself is serialized, and the last client taking the ioq could be getting a false positive when another thread is waiting on ioq->mtx to release the ioq. This prevents throughput loss while recovering from a situation where an ioq is oversubscribed. This is reproduced under heavy load and switching temporarily to "SERVER aio_threads = 1" and then bringing aio_threads back up to a high value.
2013-08-29m4/systemtap.m4: quote cm_cv_sdt_h_usable var
The variable may not be defined at all, so it must be quoted to avoid spewing a warning of dtrace/stap are not found.
2013-08-29tapset/*awk: document these scripts
Otherwise I will forget what they output one day and will have to read the code again.
2013-08-29TODO: remove item for systemtap/dtrace
systemtap support is implemented, and hopefully dtrace works, too.
2013-08-26flesh out systemtap support and awk helpers
Our "all.stp" tapset now generates awk-friendly output for feeding some sample awk scripts. Using awk (and gawk) was necessary to avoid reimplementing strftime in guru mode for generating CLF (Common Log Format) HTTP access logs. Using awk also gives us several advantages: * floating point number support (for time differences) * a more familiar language to systems administrators (given this is for MogileFS, perhaps Perl would be even more familiar...). * fast edit/run cycle, so the slowness of using stap to rebuild/reload the kernel module for all.stp changes can be avoided when output must be customized.
2013-08-23http: remove Status: header from all responses
This was inherited from a server which needed to deal with some broken clients, MogileFS does not have this problem. Neither Perlbal nor nginx set this response header, either, so lets save ourselves a few bytes.
2013-08-22trywrite: workaround potential inf loops from kernel bugs
While we're fortunate enough to not have encountered a case where send/writev returns zero with a non-zero-length buffer, it's not inconceivable that it could strike us one day. In that case, error out the connection instead of infinite looping. Dropping a connection is safer than letting a thread run in an infinite loop.
2013-07-26test/mgmt: warn about slow mount points on test failure
Unfortunately, slow mount points still cause minor reliability issues with the test suite.
2013-07-26test/mgmt: increase reliability of max devid test
This seems to fail more under heavy load, so wait a bit longer for iostat to become aware of the new devices.
2013-07-19move trace.h include to global cmogstored.h
We'll have tracing everywhere, so it's too much maintenance overhead to add it to every file which wants it. Increased build-times are a problem, but less than the maintenance overhead of finding the right headers.
2013-07-19tapset: rename http_request.stp -> all.stp
This tapset will contain every probe point and acts as a check/documentation for extracting useful probes.
2013-07-19split out {mgmt,http}_parse_continue checks
Incomplete request headers are uncommon, so if we see them, something is probably off or strange. This should make it easier to maintain probe points to watch for this behavior.
2013-07-19probes: add probes for rbuf growth
Growing the rbufs should be uncommon, but it should set off alarms if it happens too often.
2013-07-19test/mgmt: cover the large rbuf growth case
mgmt may now encounter large rbufs, so ensure that uncommon case is tested.
2013-07-19split out {http,mgmt}_rbuf_grow functions
This should allow easier tracing of rbuf growth, and should hopefully make the code more explicit and harder to screw up.
2013-07-17ioq: add probes tracing and documentation
ioq tracing will allow users to notice when devices are saturated (from a cmogstored POV) and increase aio_threads if necessary.