cmogstored.git - alternative mogstored implementation for MogileFS

Date	Commit message (Collapse)
2013-06-25	dist: include newly-added files to the tarball
	Tarballs were otherwise unusable.
2013-06-25	replace pthreads cancellation with explicit checks
	Due to data/event loss, we cannot rely on normal syscalls (accept/epoll_wait) being cancellation points. The benefits of using a standardized API to terminate threads asynchronously are lost when toggling cancellation flags. This implementation allows us to be more explicit and obvious at the few points where our worker threads may exit and reduces the amount of code we have. By avoiding the calls to pthread_setcancelstate, we should halve the number of atomic operations required in the common case (where the thread is not marked for termination).
2013-06-25	"server aio_threads = XX" no longer requires malloc
	This should prevent one class of "accidental" failures. (The sidechannel has never been meant to be secure and exposed to the public).
2013-06-25	fdmap: do not warn on ENOTCONN due to unavoidable race
	A client may disconnect at any time, so shutdown may fail harmlessly with ENOTCONN.
2013-06-25	fix "shutdown" over sidechannel with epoll_pwait
	The "shutdown" command needs to trigger EINTR when using epoll_pwait, otherwise the sleeping thread may not wake up properly.
2013-06-25	do not rely on normal syscalls as cancellation points
	Cancellation with epoll_wait, accept4 (and accept) may cause events to be lost, as cancellation relies on signals anyways in glibc/Linux. So instead, we use signaling ourselves and explicitly test for cancellation only if we know we are interrupted and in a state where a thread can safely be cancelled. ref: http://mid.gmane.org/CAE2sS1gxQkqmcywQ07pmgNHM+CyqzMkuASVjmWDL+hgaTMURWQ@mail.gmail.com
2013-06-25	avoid needlessly reinitializing common sigset_t
	This should hopefully save a few cycles and reduce stack usage slightly.
2013-06-25	svc: make thr_per_dev per-svc instead of global
	We could eventually make this a tunable parameter, as it could be advantageous over a global aio_threads value.
2013-06-25	refactor handling of "server aio_threads = " command
	We're using per-svc-based thread pools, so different MogileFS instances we serve no longer affect each other. This means changing the aio_threads count only affects the svc of the sidechannel port which triggered the change.
2013-06-25	define MOG_DEVID_MAX and MOG_PATH_MAX variables
	This improves maintainability in case MogileFS changest these limits.
2013-06-25	consistently check OOM from hash_initialize/hash_insert
	Both hash_initialize and hash_insert may return NULL to indicate allocation errors. So implement a mog_oom_if_null helper function to destroy the process instead of attempting to continue and dereferencing NULL pointers. This may affect configurations with limited memory and lacking overcommit; but is unlikely to trigger given the small memory footprint of cmogstored.
2013-06-25	svc: implement top-level by_mog_devid hash
	This will allow us to lookup devices for per-(mog)device I/O queues.
2013-06-25	http_*: fixup long lines from automated conversion
	Lines longer than 80 columns aren't readable on my screen with gigantic fonts.
2013-06-25	parse out mogilefs devid in mgmt/http requests
	This will allow us to do lookups for IO queues/semaphores before we attempt to fstatat/stat a path.
2013-06-25	fix devices/thread count if sidechannel is inactive
	If the mogstored sidechannel is inactive (in HTTP-only mode), we should still count the number of devices correctly to correctly scale the number of worker threads.
2013-06-25	switch to per-svc (per-docroot) queues
	This simplifies code, reduces contention, and reduces the chances of independent MogileFS instances (with one instance of cmogstored) stepping over each other. Most cmogstored deployments are single docroot (for a single instance of MogileFS), however cmogstored supports multiple docroots for some rare configurations and we support them here.
2013-06-25	thrpool: add comment explaining minimum thread count
	I forgot why this bound was necessary, so add a comment ensuring I do not forget again.
2013-06-25	limit acceptors to reduce contention on large machines
	Having too many acceptor threads does not help, as it leads to lock contention in the accept syscalls and the EPOLL_CTL_ADD paths. The fair FIFO ordering of _blocking_ accept/accept4 syscalls also means we trigger unnecessary task switching and incur cache misses under high load. Since it is almost impossible for the acceptor threads to be stuck on disk I/O since commit 832316624f7a8f44b3e1d78a8a7a62a399241840 ("acceptor threads push directly into event queue")
2013-06-25	update aio_threads count when new devices appear
	This will help ensure availability when new devices are added, without additional user interaction to manually set aio_threads via sidechannel.
2013-06-25	make mog_fd_get static, favor mog_fd_init
	mog_fd_init enforces setting the correct type, so relegate mog_fd_get to private usage inside fdmap.c
2013-06-25	build: get the gnulib version via autogen.sh
	This is useful for: a) repeatibly generating the same tarball off git b) diagnosing and tracking down (rare) gnulib bugs c) 3rd parties verifying we do not put malicious code into our tarballs
2013-06-25	mnt: attempt to match iostat output by st_rdev
	st_rdev matching is necessary for cases where the block devices are aliased (not via symlinks), and mountlist returns a different name for the device than what iostat uses. This is the case for my cryptmount(8) setup, where /dev/mapper/FOO and /dev/dm-N refer to the same device (with matching st_dev and st_rdev numbers), but neither is a symlink to the other (nor are they hardlinks). stat() on block devices in /dev should always be fast and non-blocking, as /dev is expected to be non-networked on any reasonable system (at least those serving as a MogileFS storage node).
2013-05-11	cmogstored 1.2.2 - minor maintenance release v1.2.2 1.2-stable
	This is a minor maintenance release, no need to upgrade unless a) your gcc defaults to -march=i386 (e.g. 32-bit CentOS 5) b) your device names include '-' (e.g. Linux device mapper users) There are also some minor doc updates to clarify tarball vs git installation and a trivial error-handling fix which should not affect any current users. Eric Wong (6): build: add check for GCC atomics alloc: posix_memalign does not set errno iostat_parser: allow '-' for device names test/cmogstored-cfg: ensure TMPDIR is absolute for valgrind INSTALL: clarify between starting from tarball vs git INSTALL: update versions and URLs cmogstored 1.3 will have some fairly intrusive internal changes and cleanups to make it easier for users to trace and diagnose system and network problems.
2013-05-11	INSTALL: update versions and URLs
	libkqueue recently migrated to SourceForge and Debian 7.0 is the new stable. We still support Debian 6.0 and will likely support it for years to come since CentOS 5.x remains supported. (cherry picked from commit 86e5d10649f14fe3b3c8af37fd8ec04cc337fc9e)
2013-05-11	INSTALL: clarify between starting from tarball vs git
	Users unfamiliar with autotools may not realize bootstraping is required when building from git. (cherry picked from commit 1e80ba592ede05fe40b31686142f82294891afd0)
2013-05-11	INSTALL: update versions and URLs
	libkqueue recently migrated to SourceForge and Debian 7.0 is the new stable. We still support Debian 6.0 and will likely support it for years to come since CentOS 5.x remains supported.
2013-05-11	INSTALL: clarify between starting from tarball vs git
	Users unfamiliar with autotools may not realize bootstraping is required when building from git.
2013-05-11	test/cmogstored-cfg: ensure TMPDIR is absolute for valgrind
	Our use of chdir in this test confuses valgrind which may create a temporary file. (cherry picked from commit dc801d4a4ded67d74f5306d6dad4aba629045cc8)
2013-05-11	test/cmogstored-cfg: ensure TMPDIR is absolute for valgrind
	Our use of chdir in this test confuses valgrind which may create a temporary file.
2013-05-11	iostat_parser: allow '-' for device names
	Linux device-mapper names show up as 'dm-0', 'dm-1' and so on. This allows users to store MogileFS files on encrypted devices using dm-crypt and perhaps other, similar tools. (cherry picked from commit 88d34b4686a650dba89674aa302ab13c78e8cef0)
2013-05-11	alloc: posix_memalign does not set errno
	We must set errno manually for die_errno() if posix_memalign fails (cherry picked from commit 8c79cf794f6178b6978743af99d498ca0b449fb1)
2013-05-06	favor "struct mog_fd" for acceptors over int FDs
	There's no reason to be referencing FDs for these acceptors since they're infrequently accessed by svc, so this should make our internals more consistent. This also removes our use of mog_fd_get (outside of test code).
2013-05-06	preliminary systemtap support for tracing
	We will key most client events by pid() and file descriptors, as this is least ambiguous. There are some minor refactorings to pass "struct mog_fd " around as much as possible instead of "struct mog_http ".
2013-04-17	http: minor debloat via better alignment
	This results in a small size reduction due to better alignment: $ ~/linux/scripts/bloat-o-meter cmogstored.before cmogstored.after add/remove: 0/0 grow/shrink: 2/2 up/down: 20/-56 (-36) function old new delta mog_http_get_open 1460 1476 +16 mog_chunk_init 65 69 +4 http_forward_in_progress 63 55 -8 mog_http_parse 27171 27123 -48
2013-04-17	http_parser: do not differentiate between MD5 sources
	It does not matter if the Content-MD5 comes from the trailer or header, we process it the same way with the Ragel parser. This is obvious when reading our code (and associated hunk this commit changes) in http_put.c
2013-04-17	save socket address on accept/accept4
	getpeername() does not work on unconnected sockets. For error-handling, unconnected sockets is a fairly common occurrence, so we want to get the address early on when we know the address is still valid. For IPv4 addresses, this does not increase memory overhead at all. IPv6 addresses[1] does require an additional heap allocation, but it does not need to be aligned since it is infrequently accessed. If IPv6 becomes common, we may need to expand our per-client storage to 192 bytes (from 128) on 64-bit (or see if we may pack data more carefully). [1] IPv6 addresses are rare with MogileFS, as MogileFS does not currently support them.
2013-04-17	allow binding to IPv6 addresses
	MogileFS currently does not support IPv6, but maybe one day it will. When it does, we'll be ready.
2013-04-16	wrap getnameinfo for consistency in error logging
	This will allow us to more easily handle error reporting for IPv6 addresses and allow for consistent formatting of stringified IP addresses.
2013-04-16	iostat_parser: allow '-' for device names
	Linux device-mapper names show up as 'dm-0', 'dm-1' and so on. This allows users to store MogileFS files on encrypted devices using dm-crypt and perhaps other, similar tools.
2013-04-16	potentially make the mog_sockaddr union smaller
	The generic "struct sockaddr" may be padded to be the same size as "struct sockaddr_storage" (which is what we were trying to avoid in the first place by uinsg mog_sockaddr). This change makes no difference on GNU/Linux.
2013-04-16	alloc: posix_memalign does not set errno
	We must set errno manually for die_errno() if posix_memalign fails
2013-03-19	http: put parser-private attrs in a private struct attr
	This will allow easy use of memset to reset attributes in between requests without clobbering more important data.
2013-03-08	build: add check for GCC atomics
	Andrey Okunev noted undefined references on the MogileFS mailing list when building cmogstored 1.2.1 on his 32-bit CentOS5 machine.
2013-03-04	cmogstored 1.2.1 - fix graceful shutdown failure v1.2.1
	This release only fixes an assertion failure during graceful shutdown while MogileFS fsck is running with checksumming enabled. This only affects users running fsck with checksumming enabled during a graceful shutdown of cmogstored. For upgrading cmogstored it is recommended to: 1) stop fsck on the trackers (via "mogadm fsck stop") 2) wait for all tracker queues to drain and stop sending fsck traffic to the affected host. You may wish to "!want 0 fsck" on all your trackers and wait for the fsck workers to stop. 3) upgrade cmogstored (in place upgrade works) There are also several code comment updates for internal components of cmogstored which may interest potential hackers.
2013-03-04	TODO: add a few item for our roadmap
	We have a future!
2013-03-02	alloc: document use of TLS buffers
	tls_rbuf allows us to avoid nearly all dynamic allocation for common HTTP requests. However, the mog_rbuf structure may be detached from TLS as necessary (and another one allocated in its place) when the need arises.
2013-03-02	fdmap: documentation for the FD-based memory allocation
	Avoiding heap allocations in common paths is important to high performance server design; document this important design decision.
2013-02-23	mgmt: fix fsck digest assert failure in graceful shutdown
	Items in the low-priority fsck queue could trigger a assertion failure during graceful shutdown due to improper handling of the MOG_NEXT_IGNORE state in mog_mgmt_quit_step(). However, using the fsck queue in graceful shutdown (which is single-threaded) is probably a bad idea anyways, as the fsck digest could monopolize other requests. So give no special handling to fsck digest queries during graceful shutdown. This only affects users running fsck with checksumming enabled during a graceful shutdown of cmogstored. For checksums users, it is recommended to stop fsck from the trackers and wait for all tracker queues to drain before upgrading cmogstored (and using graceful shutdown on the old cmogstored).
2013-02-23	http_get: comment about snprintf() being a hot spot
	cmogstored is pretty fast, but it could be faster.
2013-02-21	queue_common: update comments to match code
	While we're at it, explain the use of cloexec.