Date | Commit message (Collapse) |
|
Simply releasing the descriptor triggering ENOSPC/ENOMEM errors from
epoll_ctl and kevent is not good enough, as those descriptors may
have other descriptors (e.g. files to be served) hanging off of them.
|
|
pthread_cond_timedwait was the function which was buggy under
LinuxThreads, and we never supported LinuxThreads anyways...
|
|
While pthread_yield is non-standard, it is relatively common and
preferable for systems where pthreads are _not_ 1:1 mapped to kernel
threads. This also provides a stronger yield to weaken the priority
of the calling thread wherever we previously used sched_yield.
|
|
This should allow the threads we're terminating to more quickly
enter a safe state where they're allowed to exit. On SMP systems,
we need to yield the signalling thread more times to increase the
probability the interrupted thread can run (and exit).
|
|
This allows users to run fewer threads for many devices and still
(hopefully) maintain a 1:1 worker_thread:device ratio. This also
allows us to simultaneously run with more than 10 active, blocking
filesystem operations.
We need to roll our own trivial semaphore implementation based on
pthreads mutex and condvars for this, as there is no
portable/non-blocking way to change the "capacity" of a POSIX
semaphore. All the mutex locks held by this new semaphore
implementation are only held for short operations and not the
entire life of (potentially slow) FS operations.
|
|
Unfortunately, quasi-standard MIN/MAX macros are confusing to me :x
This fixes the code to match the comment describing the reasoning
for limiting threads.
|
|
Our tests over-link (to save developer time :P), so we must
link in probes with our tests. Also, we must keep probes.h
around for distclean (but not maintainerclean)
|
|
We cannot assume sa_family_t is the first element of "struct
sockaddr_in" or "struct sockaddr_in6". FreeBSD has a "sa_len"
member as the first element while Linux does not.
So only keep the parts of the "struct sockaddr*" we need and use
inet_ntop instead of getnameinfo. This also gives us a little more
space to add additional fields to "struct mog_http" in the future
without increasing memory (or CPU cache) use.
|
|
Tarballs were otherwise unusable.
|
|
Due to data/event loss, we cannot rely on normal syscalls
(accept/epoll_wait) being cancellation points. The benefits of
using a standardized API to terminate threads asynchronously are
lost when toggling cancellation flags.
This implementation allows us to be more explicit and obvious at the
few points where our worker threads may exit and reduces the amount
of code we have. By avoiding the calls to pthread_setcancelstate,
we should halve the number of atomic operations required in the
common case (where the thread is not marked for termination).
|
|
This should prevent one class of "accidental" failures.
(The sidechannel has never been meant to be secure and exposed
to the public).
|
|
A client may disconnect at any time, so shutdown may fail harmlessly
with ENOTCONN.
|
|
The "shutdown" command needs to trigger EINTR when using
epoll_pwait, otherwise the sleeping thread may not wake up properly.
|
|
Cancellation with epoll_wait, accept4 (and accept) may cause events
to be lost, as cancellation relies on signals anyways in glibc/Linux.
So instead, we use signaling ourselves and explicitly test for
cancellation only if we know we are interrupted and in a state where
a thread can safely be cancelled.
ref: http://mid.gmane.org/CAE2sS1gxQkqmcywQ07pmgNHM+CyqzMkuASVjmWDL+hgaTMURWQ@mail.gmail.com
|
|
This should hopefully save a few cycles and reduce stack
usage slightly.
|
|
We will be enabling interrupts across our worker threads,
so we must ensure we do not abort the process if we hit EINTR.
|
|
We could eventually make this a tunable parameter, as it could
be advantageous over a global aio_threads value.
|
|
We're using per-svc-based thread pools, so different MogileFS
instances we serve no longer affect each other. This means
changing the aio_threads count only affects the svc of the
sidechannel port which triggered the change.
|
|
The semaphore inside the dev struct will be accessed by
multiple threads frequently, so keep it cache-aligned.
To reduce memory usage for large-numbered devices, avoid
storing the prefix on output and instead just rely on
the printf-family of routines to generate stringified output
in uncommon code paths.
|
|
This is similar to the AIO channels functionality in Perlbal,
but implemented using semaphores to optimize for the uncontended
case.
|
|
This improves maintainability in case MogileFS changest these
limits.
|
|
Both hash_initialize and hash_insert may return NULL to indicate
allocation errors. So implement a mog_oom_if_null helper function to
destroy the process instead of attempting to continue and dereferencing
NULL pointers.
This may affect configurations with limited memory and lacking
overcommit; but is unlikely to trigger given the small memory footprint
of cmogstored.
|
|
This will allow us to lookup devices for per-(mog)device I/O queues.
|
|
Lines longer than 80 columns aren't readable on my screen
with gigantic fonts.
|
|
This will allow us to do lookups for IO queues/semaphores before
we attempt to fstatat/stat a path.
|
|
If the mogstored sidechannel is inactive (in HTTP-only mode), we should
still count the number of devices correctly to correctly scale the
number of worker threads.
|
|
This simplifies code, reduces contention, and reduces the
chances of independent MogileFS instances (with one instance
of cmogstored) stepping over each other.
Most cmogstored deployments are single docroot (for a single
instance of MogileFS), however cmogstored supports multiple
docroots for some rare configurations and we support them here.
|
|
I forgot why this bound was necessary, so add a comment
ensuring I do not forget again.
|
|
Having too many acceptor threads does not help, as it leads to
lock contention in the accept syscalls and the EPOLL_CTL_ADD
paths. The fair FIFO ordering of _blocking_ accept/accept4
syscalls also means we trigger unnecessary task switching and
incur cache misses under high load.
Since it is almost impossible for the acceptor threads to
be stuck on disk I/O since
commit 832316624f7a8f44b3e1d78a8a7a62a399241840
("acceptor threads push directly into event queue")
|
|
This will help ensure availability when new devices are added,
without additional user interaction to manually set aio_threads
via sidechannel.
|
|
mog_fd_init enforces setting the correct type, so relegate
mog_fd_get to private usage inside fdmap.c
|
|
libkqueue recently migrated to SourceForge and Debian 7.0 is
the new stable.
We still support Debian 6.0 and will likely support it for years to
come since CentOS 5.x remains supported.
|
|
Users unfamiliar with autotools may not realize bootstraping
is required when building from git.
|
|
Our use of chdir in this test confuses valgrind which may
create a temporary file.
|
|
There's no reason to be referencing FDs for these acceptors
since they're infrequently accessed by svc, so this should
make our internals more consistent. This also removes our
use of mog_fd_get (outside of test code).
|
|
We will key most client events by pid() and file descriptors,
as this is least ambiguous. There are some minor refactorings
to pass "struct mog_fd *" around as much as possible instead of
"struct mog_http *".
|
|
This results in a small size reduction due to better alignment:
$ ~/linux/scripts/bloat-o-meter cmogstored.before cmogstored.after
add/remove: 0/0 grow/shrink: 2/2 up/down: 20/-56 (-36)
function old new delta
mog_http_get_open 1460 1476 +16
mog_chunk_init 65 69 +4
http_forward_in_progress 63 55 -8
mog_http_parse 27171 27123 -48
|
|
It does not matter if the Content-MD5 comes from the trailer or
header, we process it the same way with the Ragel parser.
This is obvious when reading our code (and associated hunk
this commit changes) in http_put.c
|
|
getpeername() does not work on unconnected sockets. For error-handling,
unconnected sockets is a fairly common occurrence, so we want to get
the address early on when we know the address is still valid.
For IPv4 addresses, this does not increase memory overhead at all. IPv6
addresses[1] does require an additional heap allocation, but it does not
need to be aligned since it is infrequently accessed. If IPv6 becomes
common, we may need to expand our per-client storage to 192 bytes (from
128) on 64-bit (or see if we may pack data more carefully).
[1] IPv6 addresses are rare with MogileFS, as MogileFS does not
currently support them.
|
|
MogileFS currently does not support IPv6, but maybe one day
it will. When it does, we'll be ready.
|
|
This will allow us to more easily handle error reporting for
IPv6 addresses and allow for consistent formatting of
stringified IP addresses.
|
|
Linux device-mapper names show up as 'dm-0', 'dm-1' and so on.
This allows users to store MogileFS files on encrypted devices
using dm-crypt and perhaps other, similar tools.
|
|
The generic "struct sockaddr" may be padded to be the same size
as "struct sockaddr_storage" (which is what we were trying to
avoid in the first place by uinsg mog_sockaddr). This change
makes no difference on GNU/Linux.
|
|
We must set errno manually for die_errno() if posix_memalign fails
|
|
This will allow easy use of memset to reset attributes in
between requests without clobbering more important data.
|
|
Andrey Okunev noted undefined references on the MogileFS mailing
list when building cmogstored 1.2.1 on his 32-bit CentOS5 machine.
|
|
This release only fixes an assertion failure during graceful shutdown
while MogileFS fsck is running with checksumming enabled.
This only affects users running fsck with checksumming enabled during a
graceful shutdown of cmogstored. For upgrading cmogstored it is
recommended to:
1) stop fsck on the trackers (via "mogadm fsck stop")
2) wait for all tracker queues to drain and stop sending
fsck traffic to the affected host. You may wish to
"!want 0 fsck" on all your trackers and wait for the
fsck workers to stop.
3) upgrade cmogstored (in place upgrade works)
There are also several code comment updates for internal
components of cmogstored which may interest potential hackers.
|
|
We have a future!
|
|
tls_rbuf allows us to avoid nearly all dynamic allocation
for common HTTP requests. However, the mog_rbuf structure
may be detached from TLS as necessary (and another one
allocated in its place) when the need arises.
|
|
Avoiding heap allocations in common paths is important
to high performance server design; document this important
design decision.
|