about summary refs log tree commit homepage
DateCommit message (Collapse)
2013-05-30per-mog_devid IO channels via semaphores iosem
This is similar to the AIO channels functionality in Perlbal, but implemented using semaphores to optimize for the uncontended case.
2013-05-30define MOG_DEVID_MAX and MOG_PATH_MAX variables
This improves maintainability in case MogileFS changest these limits.
2013-05-30consistently check OOM from hash_initialize/hash_insert
Both hash_initialize and hash_insert may return NULL to indicate allocation errors. So implement a mog_oom_if_null helper function to destroy the process instead of attempting to continue and dereferencing NULL pointers. This may affect configurations with limited memory and lacking overcommit; but is unlikely to trigger given the small memory footprint of cmogstored.
2013-05-30svc: implement top-level by_mog_devid hash
This will allow us to lookup devices for per-(mog)device I/O queues.
2013-05-30http_*: fixup long lines from automated conversion
Lines longer than 80 columns aren't readable on my screen with gigantic fonts.
2013-05-30parse out mogilefs devid in mgmt/http requests
This will allow us to do lookups for IO queues/semaphores before we attempt to fstatat/stat a path.
2013-05-30fix devices/thread count if sidechannel is inactive
If the mogstored sidechannel is inactive (in HTTP-only mode), we should still count the number of devices correctly to correctly scale the number of worker threads.
2013-05-30switch to per-svc (per-docroot) queues
This simplifies code, reduces contention, and reduces the chances of independent MogileFS instances (with one instance of cmogstored) stepping over each other. Most cmogstored deployments are single docroot (for a single instance of MogileFS), however cmogstored supports multiple docroots for some rare configurations and we support them here.
2013-05-30thrpool: add comment explaining minimum thread count
I forgot why this bound was necessary, so add a comment ensuring I do not forget again.
2013-05-30limit acceptors to reduce contention on large machines
Having too many acceptor threads does not help, as it leads to lock contention in the accept syscalls and the EPOLL_CTL_ADD paths. The fair FIFO ordering of _blocking_ accept/accept4 syscalls also means we trigger unnecessary task switching and incur cache misses under high load. Since it is almost impossible for the acceptor threads to be stuck on disk I/O since commit 832316624f7a8f44b3e1d78a8a7a62a399241840 ("acceptor threads push directly into event queue")
2013-05-30update aio_threads count when new devices appear
This will help ensure availability when new devices are added, without additional user interaction to manually set aio_threads via sidechannel.
2013-05-30make mog_fd_get static, favor mog_fd_init
mog_fd_init enforces setting the correct type, so relegate mog_fd_get to private usage inside fdmap.c
2013-05-11INSTALL: update versions and URLs
libkqueue recently migrated to SourceForge and Debian 7.0 is the new stable. We still support Debian 6.0 and will likely support it for years to come since CentOS 5.x remains supported.
2013-05-11INSTALL: clarify between starting from tarball vs git
Users unfamiliar with autotools may not realize bootstraping is required when building from git.
2013-05-11test/cmogstored-cfg: ensure TMPDIR is absolute for valgrind
Our use of chdir in this test confuses valgrind which may create a temporary file.
2013-05-06favor "struct mog_fd" for acceptors over int FDs
There's no reason to be referencing FDs for these acceptors since they're infrequently accessed by svc, so this should make our internals more consistent. This also removes our use of mog_fd_get (outside of test code).
2013-05-06preliminary systemtap support for tracing
We will key most client events by pid() and file descriptors, as this is least ambiguous. There are some minor refactorings to pass "struct mog_fd *" around as much as possible instead of "struct mog_http *".
2013-04-17http: minor debloat via better alignment
This results in a small size reduction due to better alignment: $ ~/linux/scripts/bloat-o-meter cmogstored.before cmogstored.after add/remove: 0/0 grow/shrink: 2/2 up/down: 20/-56 (-36) function old new delta mog_http_get_open 1460 1476 +16 mog_chunk_init 65 69 +4 http_forward_in_progress 63 55 -8 mog_http_parse 27171 27123 -48
2013-04-17http_parser: do not differentiate between MD5 sources
It does not matter if the Content-MD5 comes from the trailer or header, we process it the same way with the Ragel parser. This is obvious when reading our code (and associated hunk this commit changes) in http_put.c
2013-04-17save socket address on accept/accept4
getpeername() does not work on unconnected sockets. For error-handling, unconnected sockets is a fairly common occurrence, so we want to get the address early on when we know the address is still valid. For IPv4 addresses, this does not increase memory overhead at all. IPv6 addresses[1] does require an additional heap allocation, but it does not need to be aligned since it is infrequently accessed. If IPv6 becomes common, we may need to expand our per-client storage to 192 bytes (from 128) on 64-bit (or see if we may pack data more carefully). [1] IPv6 addresses are rare with MogileFS, as MogileFS does not currently support them.
2013-04-17allow binding to IPv6 addresses
MogileFS currently does not support IPv6, but maybe one day it will. When it does, we'll be ready.
2013-04-16wrap getnameinfo for consistency in error logging
This will allow us to more easily handle error reporting for IPv6 addresses and allow for consistent formatting of stringified IP addresses.
2013-04-16iostat_parser: allow '-' for device names
Linux device-mapper names show up as 'dm-0', 'dm-1' and so on. This allows users to store MogileFS files on encrypted devices using dm-crypt and perhaps other, similar tools.
2013-04-16potentially make the mog_sockaddr union smaller
The generic "struct sockaddr" may be padded to be the same size as "struct sockaddr_storage" (which is what we were trying to avoid in the first place by uinsg mog_sockaddr). This change makes no difference on GNU/Linux.
2013-04-16alloc: posix_memalign does not set errno
We must set errno manually for die_errno() if posix_memalign fails
2013-03-19http: put parser-private attrs in a private struct attr
This will allow easy use of memset to reset attributes in between requests without clobbering more important data.
2013-03-08build: add check for GCC atomics
Andrey Okunev noted undefined references on the MogileFS mailing list when building cmogstored 1.2.1 on his 32-bit CentOS5 machine.
2013-03-04cmogstored 1.2.1 - fix graceful shutdown failure v1.2.1
This release only fixes an assertion failure during graceful shutdown while MogileFS fsck is running with checksumming enabled. This only affects users running fsck with checksumming enabled during a graceful shutdown of cmogstored. For upgrading cmogstored it is recommended to: 1) stop fsck on the trackers (via "mogadm fsck stop") 2) wait for all tracker queues to drain and stop sending fsck traffic to the affected host. You may wish to "!want 0 fsck" on all your trackers and wait for the fsck workers to stop. 3) upgrade cmogstored (in place upgrade works) There are also several code comment updates for internal components of cmogstored which may interest potential hackers.
2013-03-04TODO: add a few item for our roadmap
We have a future!
2013-03-02alloc: document use of TLS buffers
tls_rbuf allows us to avoid nearly all dynamic allocation for common HTTP requests. However, the mog_rbuf structure may be detached from TLS as necessary (and another one allocated in its place) when the need arises.
2013-03-02fdmap: documentation for the FD-based memory allocation
Avoiding heap allocations in common paths is important to high performance server design; document this important design decision.
2013-02-23mgmt: fix fsck digest assert failure in graceful shutdown
Items in the low-priority fsck queue could trigger a assertion failure during graceful shutdown due to improper handling of the MOG_NEXT_IGNORE state in mog_mgmt_quit_step(). However, using the fsck queue in graceful shutdown (which is single-threaded) is probably a bad idea anyways, as the fsck digest could monopolize other requests. So give no special handling to fsck digest queries during graceful shutdown. This only affects users running fsck with checksumming enabled during a graceful shutdown of cmogstored. For checksums users, it is recommended to stop fsck from the trackers and wait for all tracker queues to drain before upgrading cmogstored (and using graceful shutdown on the old cmogstored).
2013-02-23http_get: comment about snprintf() being a hot spot
cmogstored is pretty fast, but it could be faster.
2013-02-21queue_common: update comments to match code
While we're at it, explain the use of cloexec.
2013-02-18document/reserve SIGWINCH/SIGHUP for future use v1.2.0
Despite having an extensive test suite and minimal room for user error, giving users the options to back out of a hot upgrade may be worth supporting.
2013-02-18copyright comment updates for 2013 (part 2)
Many files were missed the first time around in commit 37026af96dec638aa850d604003bf7218d90037d
2013-02-18manpage: document SIGUSR2 upgrades
This is a new feature and needs to be documented.
2013-02-18move cmogstored_exit() prototype to cmogstored.h
This fixes a missing prototype warning for cmogstored_exit() when checking exit.c with sparse.
2013-02-18queue_epoll: fix bad cast for epoll.event
The events field of struct epoll_event is a uint32_t, not int.
2013-02-18tests: add valgrind supp for epoll_ctl on 32-bit arch
The epoll_event.data union is 64-bits on 32-bit systems while pointers are 32-bit. We only use 32-bits of that union, but valgrind mistakenly complains about it (the kernel does not care about the user-supplied data union at all).
2013-02-18ioutil: fix memory access error on from mog_iou_write
sizeof(buf) returns the size of the pointer if buf is a passed parameter, even if it the function prototype dictates a fixed size for buf as we do in mog_iou_write. While we're at it, make our mog_iou_write buf parameter const. This bug was introduced in: commit a960a351b2248a196c91cdbf6256f98e1bc2ef37 "split iostat util% tracking from mountlist" and never affected an official release of cmogstored. This bug was caught while testing on a 32-bit GNU/Linux machine. My normal 32-bit FreeBSD 9.0 environment did not catch this as iostat on that platform only reports integer percentages and does not need more than 4 bytes.
2013-02-16handle pthread_create returning ENOMEM on old glibc
Older glibc will return ENOMEM on mprotect() failures. This bug was only fixed in 2011, so the long-term distros and old installations may not have the necessary backports. ref: http://www.sourceware.org/bugzilla/show_bug.cgi?id=386
2013-02-16graceful handling of pthread_create EAGAIN failure
pthread_create may return EAGAIN as a temporary failure, do not abort a running process if this is the case. For the initial mountlist scan, we must retry indefinitely for cmogstored to be usable. However, with our thread pools, we can always run fewer threads (as long as there is at least one thread per-pool).
2013-02-16test/http_idle_expire: hopefully improve test reliability
This is a tricky test and doesn't always succeed, since it's hard to tell how many file descriptors glibc will use internally.
2013-02-15sig: avoid pselect if ppoll is present in mog_sleep
We want to favor ppoll over pselect, since ppoll is a better interface and we can have a slightly smaller binary with fewer dependencies. While we're at it, use mog_sleep(-1) as an alias for mog_selfwake_wait to further reduce binary size.
2013-02-15avoid racy sleep on fork failure in master process
We need to atomically enable interrupts and sleep with the same syscall. Fortunately, using pselect (through mog_sleep) allows that and is POSIX-compliant, so use that.
2013-02-15mnt: inform user of slow mountlist scan
This will inform the user of why cmogstored may be slow to start, since we need the mountlist to be populated at startup. We also throw a pthread_cancel() in there to load libgcc_s under glibc, so we can avoid loading libgcc_s once we're under FD pressure. This makes test/http_idle_expire.rb more reliable.
2013-02-14test/http_range: do not allow webrick to perform lookups
DNS lookups cause webrick tests to fail or timeout. Our tests should not have external network dependencies.
2013-02-14inherit: avoid DNS lookup on upgrade
A typo caused unnecessary DNS lookups when inheriting sockets. While we're at it, fix another typo in the error message, too.
2013-02-14selfwake: use epoll_pwait on Linux instead of eventfd
This saves us a file descriptor in Linux, which provides epoll_pwait in 2.6.19+ (and ppoll for 2.6.18, the oldest kernel we support).