cmogstored.git - alternative mogstored implementation for MogileFS

Date	Commit message (Collapse)
2013-07-10	struct mog_ni: document reasoning for the ':' in ni_serv
	This is somewhat strange, but makes the code base slightly easier to reuse for non-HTTP purposes.
2013-07-10	http: include IP:PORT in "client died" message
	This should hopefully make failures easier to track down.
2013-07-10	remove assertion for handling iostat death
	This only triggered if the (undocumented) --worker-processes option is used. This assertion is no longer valid as of commit d5a52618ca1f9b5d7f6998716fbfe7714f927112 (refactor handling of "server aio_threads = " command)
2013-07-10	file: embed ioq in the opened mog_file object
	This allows us to avoid a redundant hash lookup every time we "activate" an open file for reading or writing.
2013-07-10	ioq: implement and enable generic I/O queues
	This will allow us to limit concurrency on a per-device basis with limited impact on HTTP header reading/parsing. This prevents pathological slowness on a single device from bringing down an entire host. This also allows users to more safely run with fewer aio_threads (e.g. 1:1 thread:device mapping) on fast devices with smaller low-level (kernel/hardware) I/O queues.
2013-07-10	packaddr: simplify mog_sockaddr definition
	"struct sockaddr" turns out to be smaller than "struct sockaddr_in6", so we can avoid complicated casting and just add that to the union. We continue avoiding "struct sockaddr_storage", however, as it is unnecessarily large for our needs.
2013-07-10	test/mgmt: remove unused variable
	This was triggering warnings with Ruby 2.0.0-p195
2013-07-10	rbuf: reattach/reuse read buffers when possible
	Reattaching/reusing read buffers allows us to avoid repeated reallocation/growth/free when clients repeatedly send us large headers. This may also increase cache-hits by favoring recently-used buffers as long as fragmentation is kept in check. The fragmentation should be no worse that is currently, due to the existing detach nature of rbufs
2013-07-10	mgmt: remove restriction on large rbuf sizes
	We'll be allowing the migration of buffers between threads and from waiting clients back to thread-local storage.
2013-07-10	alloc: cache-align all rbuf memory allocations
	Some setups use clients which pass large headers (User-Agent, or even cookies(!)) to cmogstored, so large rbufs may be used often and repeatedly in those cases. We limit rbuf sizes to 64K anyways, so keeping "larger" buffers around should not be much of an issue for modern systems. This prepares us for reusing/recycling large rbufs as TLS buffers.
2013-07-10	mgmt: handle disk-using requests outside of the parser
	This will allow us to use control flow similar to the http client handling code when we queue clients based on I/O channel.
2013-07-10	introduce generic I/O queue functionality
	This replaces the fsck_queue internals with a generic ioq implementation which is based on the MogileFS devid, and not the operating system devid.
2013-07-10	http: add assertion for unused wbuf
	We need to ensure we do not introduce code to launch http_process_client while we have buffered data (or socket write errors).
2013-07-10	dev: shrink and cache-align struct mog_dev
	We will have structures inside the dev struct accessed by multiple threads frequently, so keep it cache-aligned. To reduce memory usage for large-numbered devices, avoid storing the prefix on output and instead just rely on the printf-family of routines to generate stringified output in uncommon code paths.
2013-07-10	mgmt: fix case where rbuf->rsize may be uninitialized
	Detachers MUST set rsize properly. This API is unfortunately fragile and will eventually be fixed to be more difficult to misuse.
2013-07-04	build: fix LIB_CLOCK_GETTIME linkage on some toolchains
	According to the m4/clock_gettime.m4 documentation (from gnulib), the LIB_CLOCK_GETTIME variable should be added to a *LDADD variable and not AM_LDFLAGS. This is also consistent with GNU automake documentation. Thanks to Cody Pisto for reporting this problem under Ubuntu 12.04 ref: http://www.gnu.org/software/automake/manual/html_node/Linking.html
2013-06-25	Merge branch '1.2-stable'
	* 1.2-stable: cmogstored 1.2.2 - minor maintenance release INSTALL: update versions and URLs INSTALL: clarify between starting from tarball vs git test/cmogstored-cfg: ensure TMPDIR is absolute for valgrind iostat_parser: allow '-' for device names alloc: posix_memalign does not set errno
2013-06-25	tests: fault-injection test for ENOSPC on epoll_ctl
	For difficult-to-trigger errors, fault injection is necessary for testing our error handling. I have confirmed this test fails with "avoid leaks on epoll/kqueue resources exhaustion" reverted.
2013-06-25	avoid leaks on epoll/kqueue resources exhaustion
	Simply releasing the descriptor triggering ENOSPC/ENOMEM errors from epoll_ctl and kevent is not good enough, as those descriptors may have other descriptors (e.g. files to be served) hanging off of them.
2013-06-25	introduce mog_yield wrapper around sched_yield/pthread_yield
	While pthread_yield is non-standard, it is relatively common and preferable for systems where pthreads are _not_ 1:1 mapped to kernel threads. This also provides a stronger yield to weaken the priority of the calling thread wherever we previously used sched_yield.
2013-06-25	call sched_yield repeatedly when terminating threads
	This should allow the threads we're terminating to more quickly enter a safe state where they're allowed to exit. On SMP systems, we need to yield the signalling thread more times to increase the probability the interrupted thread can run (and exit).
2013-06-25	Makefile.am: fix systemtap probes.h distribution
	Our tests over-link (to save developer time :P), so we must link in probes with our tests. Also, we must keep probes.h around for distclean (but not maintainerclean)
2013-06-25	shrink mog_packaddr and improve portability
	We cannot assume sa_family_t is the first element of "struct sockaddr_in" or "struct sockaddr_in6". FreeBSD has a "sa_len" member as the first element while Linux does not. So only keep the parts of the "struct sockaddr*" we need and use inet_ntop instead of getnameinfo. This also gives us a little more space to add additional fields to "struct mog_http" in the future without increasing memory (or CPU cache) use.
2013-06-25	dist: include newly-added files to the tarball
	Tarballs were otherwise unusable.
2013-06-25	replace pthreads cancellation with explicit checks
	Due to data/event loss, we cannot rely on normal syscalls (accept/epoll_wait) being cancellation points. The benefits of using a standardized API to terminate threads asynchronously are lost when toggling cancellation flags. This implementation allows us to be more explicit and obvious at the few points where our worker threads may exit and reduces the amount of code we have. By avoiding the calls to pthread_setcancelstate, we should halve the number of atomic operations required in the common case (where the thread is not marked for termination).
2013-06-25	"server aio_threads = XX" no longer requires malloc
	This should prevent one class of "accidental" failures. (The sidechannel has never been meant to be secure and exposed to the public).
2013-06-25	fdmap: do not warn on ENOTCONN due to unavoidable race
	A client may disconnect at any time, so shutdown may fail harmlessly with ENOTCONN.
2013-06-25	fix "shutdown" over sidechannel with epoll_pwait
	The "shutdown" command needs to trigger EINTR when using epoll_pwait, otherwise the sleeping thread may not wake up properly.
2013-06-25	do not rely on normal syscalls as cancellation points
	Cancellation with epoll_wait, accept4 (and accept) may cause events to be lost, as cancellation relies on signals anyways in glibc/Linux. So instead, we use signaling ourselves and explicitly test for cancellation only if we know we are interrupted and in a state where a thread can safely be cancelled. ref: http://mid.gmane.org/CAE2sS1gxQkqmcywQ07pmgNHM+CyqzMkuASVjmWDL+hgaTMURWQ@mail.gmail.com
2013-06-25	avoid needlessly reinitializing common sigset_t
	This should hopefully save a few cycles and reduce stack usage slightly.
2013-06-25	svc: make thr_per_dev per-svc instead of global
	We could eventually make this a tunable parameter, as it could be advantageous over a global aio_threads value.
2013-06-25	refactor handling of "server aio_threads = " command
	We're using per-svc-based thread pools, so different MogileFS instances we serve no longer affect each other. This means changing the aio_threads count only affects the svc of the sidechannel port which triggered the change.
2013-06-25	define MOG_DEVID_MAX and MOG_PATH_MAX variables
	This improves maintainability in case MogileFS changest these limits.
2013-06-25	consistently check OOM from hash_initialize/hash_insert
	Both hash_initialize and hash_insert may return NULL to indicate allocation errors. So implement a mog_oom_if_null helper function to destroy the process instead of attempting to continue and dereferencing NULL pointers. This may affect configurations with limited memory and lacking overcommit; but is unlikely to trigger given the small memory footprint of cmogstored.
2013-06-25	svc: implement top-level by_mog_devid hash
	This will allow us to lookup devices for per-(mog)device I/O queues.
2013-06-25	http_*: fixup long lines from automated conversion
	Lines longer than 80 columns aren't readable on my screen with gigantic fonts.
2013-06-25	parse out mogilefs devid in mgmt/http requests
	This will allow us to do lookups for IO queues/semaphores before we attempt to fstatat/stat a path.
2013-06-25	fix devices/thread count if sidechannel is inactive
	If the mogstored sidechannel is inactive (in HTTP-only mode), we should still count the number of devices correctly to correctly scale the number of worker threads.
2013-06-25	switch to per-svc (per-docroot) queues
	This simplifies code, reduces contention, and reduces the chances of independent MogileFS instances (with one instance of cmogstored) stepping over each other. Most cmogstored deployments are single docroot (for a single instance of MogileFS), however cmogstored supports multiple docroots for some rare configurations and we support them here.
2013-06-25	thrpool: add comment explaining minimum thread count
	I forgot why this bound was necessary, so add a comment ensuring I do not forget again.
2013-06-25	limit acceptors to reduce contention on large machines
	Having too many acceptor threads does not help, as it leads to lock contention in the accept syscalls and the EPOLL_CTL_ADD paths. The fair FIFO ordering of _blocking_ accept/accept4 syscalls also means we trigger unnecessary task switching and incur cache misses under high load. Since it is almost impossible for the acceptor threads to be stuck on disk I/O since commit 832316624f7a8f44b3e1d78a8a7a62a399241840 ("acceptor threads push directly into event queue")
2013-06-25	update aio_threads count when new devices appear
	This will help ensure availability when new devices are added, without additional user interaction to manually set aio_threads via sidechannel.
2013-06-25	make mog_fd_get static, favor mog_fd_init
	mog_fd_init enforces setting the correct type, so relegate mog_fd_get to private usage inside fdmap.c
2013-06-25	build: get the gnulib version via autogen.sh
	This is useful for: a) repeatibly generating the same tarball off git b) diagnosing and tracking down (rare) gnulib bugs c) 3rd parties verifying we do not put malicious code into our tarballs
2013-06-25	mnt: attempt to match iostat output by st_rdev
	st_rdev matching is necessary for cases where the block devices are aliased (not via symlinks), and mountlist returns a different name for the device than what iostat uses. This is the case for my cryptmount(8) setup, where /dev/mapper/FOO and /dev/dm-N refer to the same device (with matching st_dev and st_rdev numbers), but neither is a symlink to the other (nor are they hardlinks). stat() on block devices in /dev should always be fast and non-blocking, as /dev is expected to be non-networked on any reasonable system (at least those serving as a MogileFS storage node).
2013-05-11	cmogstored 1.2.2 - minor maintenance release v1.2.2 1.2-stable
	This is a minor maintenance release, no need to upgrade unless a) your gcc defaults to -march=i386 (e.g. 32-bit CentOS 5) b) your device names include '-' (e.g. Linux device mapper users) There are also some minor doc updates to clarify tarball vs git installation and a trivial error-handling fix which should not affect any current users. Eric Wong (6): build: add check for GCC atomics alloc: posix_memalign does not set errno iostat_parser: allow '-' for device names test/cmogstored-cfg: ensure TMPDIR is absolute for valgrind INSTALL: clarify between starting from tarball vs git INSTALL: update versions and URLs cmogstored 1.3 will have some fairly intrusive internal changes and cleanups to make it easier for users to trace and diagnose system and network problems.
2013-05-11	INSTALL: update versions and URLs
	libkqueue recently migrated to SourceForge and Debian 7.0 is the new stable. We still support Debian 6.0 and will likely support it for years to come since CentOS 5.x remains supported. (cherry picked from commit 86e5d10649f14fe3b3c8af37fd8ec04cc337fc9e)
2013-05-11	INSTALL: clarify between starting from tarball vs git
	Users unfamiliar with autotools may not realize bootstraping is required when building from git. (cherry picked from commit 1e80ba592ede05fe40b31686142f82294891afd0)
2013-05-11	INSTALL: update versions and URLs
	libkqueue recently migrated to SourceForge and Debian 7.0 is the new stable. We still support Debian 6.0 and will likely support it for years to come since CentOS 5.x remains supported.
2013-05-11	INSTALL: clarify between starting from tarball vs git
	Users unfamiliar with autotools may not realize bootstraping is required when building from git.