about summary refs log tree commit homepage
DateCommit message (Collapse)
2013-07-10struct mog_ni: document reasoning for the ':' in ni_serv
This is somewhat strange, but makes the code base slightly easier to reuse for non-HTTP purposes.
2013-07-10http: include IP:PORT in "client died" message
This should hopefully make failures easier to track down.
2013-07-10remove assertion for handling iostat death
This only triggered if the (undocumented) --worker-processes option is used. This assertion is no longer valid as of commit d5a52618ca1f9b5d7f6998716fbfe7714f927112 (refactor handling of "server aio_threads = " command)
2013-07-10file: embed ioq in the opened mog_file object
This allows us to avoid a redundant hash lookup every time we "activate" an open file for reading or writing.
2013-07-10ioq: implement and enable generic I/O queues
This will allow us to limit concurrency on a per-device basis with limited impact on HTTP header reading/parsing. This prevents pathological slowness on a single device from bringing down an entire host. This also allows users to more safely run with fewer aio_threads (e.g. 1:1 thread:device mapping) on fast devices with smaller low-level (kernel/hardware) I/O queues.
2013-07-10packaddr: simplify mog_sockaddr definition
"struct sockaddr" turns out to be smaller than "struct sockaddr_in6", so we can avoid complicated casting and just add that to the union. We continue avoiding "struct sockaddr_storage", however, as it is unnecessarily large for our needs.
2013-07-10test/mgmt: remove unused variable
This was triggering warnings with Ruby 2.0.0-p195
2013-07-10rbuf: reattach/reuse read buffers when possible
Reattaching/reusing read buffers allows us to avoid repeated reallocation/growth/free when clients repeatedly send us large headers. This may also increase cache-hits by favoring recently-used buffers as long as fragmentation is kept in check. The fragmentation should be no worse that is currently, due to the existing detach nature of rbufs
2013-07-10mgmt: remove restriction on large rbuf sizes
We'll be allowing the migration of buffers between threads and from waiting clients back to thread-local storage.
2013-07-10alloc: cache-align all rbuf memory allocations
Some setups use clients which pass large headers (User-Agent, or even cookies(!)) to cmogstored, so large rbufs may be used often and repeatedly in those cases. We limit rbuf sizes to 64K anyways, so keeping "larger" buffers around should not be much of an issue for modern systems. This prepares us for reusing/recycling large rbufs as TLS buffers.
2013-07-10mgmt: handle disk-using requests outside of the parser
This will allow us to use control flow similar to the http client handling code when we queue clients based on I/O channel.
2013-07-10introduce generic I/O queue functionality
This replaces the fsck_queue internals with a generic ioq implementation which is based on the MogileFS devid, and not the operating system devid.
2013-07-10http: add assertion for unused wbuf
We need to ensure we do not introduce code to launch http_process_client while we have buffered data (or socket write errors).
2013-07-10dev: shrink and cache-align struct mog_dev
We will have structures inside the dev struct accessed by multiple threads frequently, so keep it cache-aligned. To reduce memory usage for large-numbered devices, avoid storing the prefix on output and instead just rely on the printf-family of routines to generate stringified output in uncommon code paths.
2013-07-10mgmt: fix case where rbuf->rsize may be uninitialized
Detachers MUST set rsize properly. This API is unfortunately fragile and will eventually be fixed to be more difficult to misuse.
2013-07-04build: fix LIB_CLOCK_GETTIME linkage on some toolchains
According to the m4/clock_gettime.m4 documentation (from gnulib), the LIB_CLOCK_GETTIME variable should be added to a *LDADD variable and not AM_LDFLAGS. This is also consistent with GNU automake documentation. Thanks to Cody Pisto for reporting this problem under Ubuntu 12.04 ref: http://www.gnu.org/software/automake/manual/html_node/Linking.html
2013-06-25Merge branch '1.2-stable'
* 1.2-stable: cmogstored 1.2.2 - minor maintenance release INSTALL: update versions and URLs INSTALL: clarify between starting from tarball vs git test/cmogstored-cfg: ensure TMPDIR is absolute for valgrind iostat_parser: allow '-' for device names alloc: posix_memalign does not set errno
2013-06-25tests: fault-injection test for ENOSPC on epoll_ctl
For difficult-to-trigger errors, fault injection is necessary for testing our error handling. I have confirmed this test fails with "avoid leaks on epoll/kqueue resources exhaustion" reverted.
2013-06-25avoid leaks on epoll/kqueue resources exhaustion
Simply releasing the descriptor triggering ENOSPC/ENOMEM errors from epoll_ctl and kevent is not good enough, as those descriptors may have other descriptors (e.g. files to be served) hanging off of them.
2013-06-25introduce mog_yield wrapper around sched_yield/pthread_yield
While pthread_yield is non-standard, it is relatively common and preferable for systems where pthreads are _not_ 1:1 mapped to kernel threads. This also provides a stronger yield to weaken the priority of the calling thread wherever we previously used sched_yield.
2013-06-25call sched_yield repeatedly when terminating threads
This should allow the threads we're terminating to more quickly enter a safe state where they're allowed to exit. On SMP systems, we need to yield the signalling thread more times to increase the probability the interrupted thread can run (and exit).
2013-06-25Makefile.am: fix systemtap probes.h distribution
Our tests over-link (to save developer time :P), so we must link in probes with our tests. Also, we must keep probes.h around for distclean (but not maintainerclean)
2013-06-25shrink mog_packaddr and improve portability
We cannot assume sa_family_t is the first element of "struct sockaddr_in" or "struct sockaddr_in6". FreeBSD has a "sa_len" member as the first element while Linux does not. So only keep the parts of the "struct sockaddr*" we need and use inet_ntop instead of getnameinfo. This also gives us a little more space to add additional fields to "struct mog_http" in the future without increasing memory (or CPU cache) use.
2013-06-25dist: include newly-added files to the tarball
Tarballs were otherwise unusable.
2013-06-25replace pthreads cancellation with explicit checks
Due to data/event loss, we cannot rely on normal syscalls (accept/epoll_wait) being cancellation points. The benefits of using a standardized API to terminate threads asynchronously are lost when toggling cancellation flags. This implementation allows us to be more explicit and obvious at the few points where our worker threads may exit and reduces the amount of code we have. By avoiding the calls to pthread_setcancelstate, we should halve the number of atomic operations required in the common case (where the thread is not marked for termination).
2013-06-25"server aio_threads = XX" no longer requires malloc
This should prevent one class of "accidental" failures. (The sidechannel has never been meant to be secure and exposed to the public).
2013-06-25fdmap: do not warn on ENOTCONN due to unavoidable race
A client may disconnect at any time, so shutdown may fail harmlessly with ENOTCONN.
2013-06-25fix "shutdown" over sidechannel with epoll_pwait
The "shutdown" command needs to trigger EINTR when using epoll_pwait, otherwise the sleeping thread may not wake up properly.
2013-06-25do not rely on normal syscalls as cancellation points
Cancellation with epoll_wait, accept4 (and accept) may cause events to be lost, as cancellation relies on signals anyways in glibc/Linux. So instead, we use signaling ourselves and explicitly test for cancellation only if we know we are interrupted and in a state where a thread can safely be cancelled. ref: http://mid.gmane.org/CAE2sS1gxQkqmcywQ07pmgNHM+CyqzMkuASVjmWDL+hgaTMURWQ@mail.gmail.com
2013-06-25avoid needlessly reinitializing common sigset_t
This should hopefully save a few cycles and reduce stack usage slightly.
2013-06-25svc: make thr_per_dev per-svc instead of global
We could eventually make this a tunable parameter, as it could be advantageous over a global aio_threads value.
2013-06-25refactor handling of "server aio_threads = " command
We're using per-svc-based thread pools, so different MogileFS instances we serve no longer affect each other. This means changing the aio_threads count only affects the svc of the sidechannel port which triggered the change.
2013-06-25define MOG_DEVID_MAX and MOG_PATH_MAX variables
This improves maintainability in case MogileFS changest these limits.
2013-06-25consistently check OOM from hash_initialize/hash_insert
Both hash_initialize and hash_insert may return NULL to indicate allocation errors. So implement a mog_oom_if_null helper function to destroy the process instead of attempting to continue and dereferencing NULL pointers. This may affect configurations with limited memory and lacking overcommit; but is unlikely to trigger given the small memory footprint of cmogstored.
2013-06-25svc: implement top-level by_mog_devid hash
This will allow us to lookup devices for per-(mog)device I/O queues.
2013-06-25http_*: fixup long lines from automated conversion
Lines longer than 80 columns aren't readable on my screen with gigantic fonts.
2013-06-25parse out mogilefs devid in mgmt/http requests
This will allow us to do lookups for IO queues/semaphores before we attempt to fstatat/stat a path.
2013-06-25fix devices/thread count if sidechannel is inactive
If the mogstored sidechannel is inactive (in HTTP-only mode), we should still count the number of devices correctly to correctly scale the number of worker threads.
2013-06-25switch to per-svc (per-docroot) queues
This simplifies code, reduces contention, and reduces the chances of independent MogileFS instances (with one instance of cmogstored) stepping over each other. Most cmogstored deployments are single docroot (for a single instance of MogileFS), however cmogstored supports multiple docroots for some rare configurations and we support them here.
2013-06-25thrpool: add comment explaining minimum thread count
I forgot why this bound was necessary, so add a comment ensuring I do not forget again.
2013-06-25limit acceptors to reduce contention on large machines
Having too many acceptor threads does not help, as it leads to lock contention in the accept syscalls and the EPOLL_CTL_ADD paths. The fair FIFO ordering of _blocking_ accept/accept4 syscalls also means we trigger unnecessary task switching and incur cache misses under high load. Since it is almost impossible for the acceptor threads to be stuck on disk I/O since commit 832316624f7a8f44b3e1d78a8a7a62a399241840 ("acceptor threads push directly into event queue")
2013-06-25update aio_threads count when new devices appear
This will help ensure availability when new devices are added, without additional user interaction to manually set aio_threads via sidechannel.
2013-06-25make mog_fd_get static, favor mog_fd_init
mog_fd_init enforces setting the correct type, so relegate mog_fd_get to private usage inside fdmap.c
2013-06-25build: get the gnulib version via autogen.sh
This is useful for: a) repeatibly generating the same tarball off git b) diagnosing and tracking down (rare) gnulib bugs c) 3rd parties verifying we do not put malicious code into our tarballs
2013-06-25mnt: attempt to match iostat output by st_rdev
st_rdev matching is necessary for cases where the block devices are aliased (not via symlinks), and mountlist returns a different name for the device than what iostat uses. This is the case for my cryptmount(8) setup, where /dev/mapper/FOO and /dev/dm-N refer to the same device (with matching st_dev and st_rdev numbers), but neither is a symlink to the other (nor are they hardlinks). stat() on block devices in /dev should always be fast and non-blocking, as /dev is expected to be non-networked on any reasonable system (at least those serving as a MogileFS storage node).
2013-05-11cmogstored 1.2.2 - minor maintenance release v1.2.2 1.2-stable
This is a minor maintenance release, no need to upgrade unless a) your gcc defaults to -march=i386 (e.g. 32-bit CentOS 5) b) your device names include '-' (e.g. Linux device mapper users) There are also some minor doc updates to clarify tarball vs git installation and a trivial error-handling fix which should not affect any current users. Eric Wong (6): build: add check for GCC atomics alloc: posix_memalign does not set errno iostat_parser: allow '-' for device names test/cmogstored-cfg: ensure TMPDIR is absolute for valgrind INSTALL: clarify between starting from tarball vs git INSTALL: update versions and URLs cmogstored 1.3 will have some fairly intrusive internal changes and cleanups to make it easier for users to trace and diagnose system and network problems.
2013-05-11INSTALL: update versions and URLs
libkqueue recently migrated to SourceForge and Debian 7.0 is the new stable. We still support Debian 6.0 and will likely support it for years to come since CentOS 5.x remains supported. (cherry picked from commit 86e5d10649f14fe3b3c8af37fd8ec04cc337fc9e)
2013-05-11INSTALL: clarify between starting from tarball vs git
Users unfamiliar with autotools may not realize bootstraping is required when building from git. (cherry picked from commit 1e80ba592ede05fe40b31686142f82294891afd0)
2013-05-11INSTALL: update versions and URLs
libkqueue recently migrated to SourceForge and Debian 7.0 is the new stable. We still support Debian 6.0 and will likely support it for years to come since CentOS 5.x remains supported.
2013-05-11INSTALL: clarify between starting from tarball vs git
Users unfamiliar with autotools may not realize bootstraping is required when building from git.