about summary refs log tree commit homepage
path: root/http.c
DateCommit message (Collapse)
2013-07-26more probes WIP st-wip-broken
2013-07-19move trace.h include to global cmogstored.h
We'll have tracing everywhere, so it's too much maintenance overhead to add it to every file which wants it. Increased build-times are a problem, but less than the maintenance overhead of finding the right headers.
2013-07-19split out {mgmt,http}_parse_continue checks
Incomplete request headers are uncommon, so if we see them, something is probably off or strange. This should make it easier to maintain probe points to watch for this behavior.
2013-07-19split out {http,mgmt}_rbuf_grow functions
This should allow easier tracing of rbuf growth, and should hopefully make the code more explicit and harder to screw up.
2013-07-17document ioq and mog_{mgmt,http}_drop interaction safety
I needed to spend time to convince myself this was safe, so leave a note to others (and future self) in case there is cause for concern. Basically, this is highly dependent on our overall one-shot-based concurrency model and safe as long as basic rules are followed.
2013-07-13pass mog_accept instead of mog_svc to post-accept callbacks
This allows us to capture/trace the listen address which accepted the request without consuming additional stack space.
2013-07-13http: pass "struct mog_fd *" more consistently in API
This makes it easier to write tapsets which key objects by: PID,FD for uniqueness. This also avoids some mog_fd_of() calls.
2013-07-10http: include IP:PORT in "client died" message
This should hopefully make failures easier to track down.
2013-07-10file: embed ioq in the opened mog_file object
This allows us to avoid a redundant hash lookup every time we "activate" an open file for reading or writing.
2013-07-10ioq: implement and enable generic I/O queues
This will allow us to limit concurrency on a per-device basis with limited impact on HTTP header reading/parsing. This prevents pathological slowness on a single device from bringing down an entire host. This also allows users to more safely run with fewer aio_threads (e.g. 1:1 thread:device mapping) on fast devices with smaller low-level (kernel/hardware) I/O queues.
2013-07-10rbuf: reattach/reuse read buffers when possible
Reattaching/reusing read buffers allows us to avoid repeated reallocation/growth/free when clients repeatedly send us large headers. This may also increase cache-hits by favoring recently-used buffers as long as fragmentation is kept in check. The fragmentation should be no worse that is currently, due to the existing detach nature of rbufs
2013-07-10http: add assertion for unused wbuf
We need to ensure we do not introduce code to launch http_process_client while we have buffered data (or socket write errors).
2013-06-25avoid leaks on epoll/kqueue resources exhaustion
Simply releasing the descriptor triggering ENOSPC/ENOMEM errors from epoll_ctl and kevent is not good enough, as those descriptors may have other descriptors (e.g. files to be served) hanging off of them.
2013-06-25shrink mog_packaddr and improve portability
We cannot assume sa_family_t is the first element of "struct sockaddr_in" or "struct sockaddr_in6". FreeBSD has a "sa_len" member as the first element while Linux does not. So only keep the parts of the "struct sockaddr*" we need and use inet_ntop instead of getnameinfo. This also gives us a little more space to add additional fields to "struct mog_http" in the future without increasing memory (or CPU cache) use.
2013-06-25parse out mogilefs devid in mgmt/http requests
This will allow us to do lookups for IO queues/semaphores before we attempt to fstatat/stat a path.
2013-05-06preliminary systemtap support for tracing
We will key most client events by pid() and file descriptors, as this is least ambiguous. There are some minor refactorings to pass "struct mog_fd *" around as much as possible instead of "struct mog_http *".
2013-04-17save socket address on accept/accept4
getpeername() does not work on unconnected sockets. For error-handling, unconnected sockets is a fairly common occurrence, so we want to get the address early on when we know the address is still valid. For IPv4 addresses, this does not increase memory overhead at all. IPv6 addresses[1] does require an additional heap allocation, but it does not need to be aligned since it is infrequently accessed. If IPv6 becomes common, we may need to expand our per-client storage to 192 bytes (from 128) on 64-bit (or see if we may pack data more carefully). [1] IPv6 addresses are rare with MogileFS, as MogileFS does not currently support them.
2013-03-19http: put parser-private attrs in a private struct attr
This will allow easy use of memset to reset attributes in between requests without clobbering more important data.
2013-01-17http: avoid MSG_MORE on HEAD responses
We need to signal we do not have more bytes to write to the socket when generating HTTP HEAD responses. This avoids a 200ms delay between HTTP responses. This regression only appeared in commit 14e0684507c06439ee9c7a731fd6ca90b7b9adcb and was never in a release.
2013-01-17simplify TCP_NOPUSH support code (remove TCP_CORK)
Since we no longer use TCP_CORK under Linux (where we use MSG_MORE instead), we can cleanup the nomenclature and avoid confusing people by mentioning TCP_CORK.
2013-01-17copyright comment updates for 2013
gnulib did it for us in m4/gnulib-cache.m4, we'll match.
2012-12-09remove queue_state field from struct mog_fd
We do not need to track queue state any longer since accept threads always inject directly into the epoll/kqueue watcher nowadays.
2012-11-08queue: refactor for future, potential kqueue speedup
kevent() has the ability to insert items into the kqueue and retrieve with the same syscall. This allows us to reduce syscalls on systems with kqueue support. Regardless of whether this potential optimization can improve performance, this makes the code smaller and possibly easier to follow.
2012-10-31http_put: return 507 for excess sizes in headers
Content-Length, Content-Range, chunk size can all overflow the limit of off_t, so return a more informative 507 instead of a 400.
2012-10-16http*: do not rely on MOG_RBUF_BASE_SIZE for calculations
The rbuf may grow sometimes to accomodate larger requests, so use rbuf->rcapa instead.
2012-08-03acceptor threads push directly into event queue
This offloads work from the kernel into userspace and helps us get around the lack of a useful/non-buggy TCP_DEFER_ACCEPT semantics. After this, we may now reduce the number of acceptor threads as the acceptor threads will no longer be bound by disk performance.
2012-07-19use TCP_NOPUSH if available for FreeBSD-based systems
For many years now, TCP_NOPUSH behaves exactly like TCP_CORK on Linux so we can just enable it to save system calls on the /client/ side. Using the integrated writev-like facility of the BSD sendfile() implementation may not be worth it as it complicates error handling. Tested on Debian GNU/kFreeBSD 6.0
2012-07-09http: TCP_CORK support for Linux kernel users
This is mainly to prevent triggering potential bugs in some HTTP clients that rely on the Perl mogstored (which uses TCP_CORK). This should also make HTTP GET responses slightly more efficient in terms of network traffic. Low-latency clients may see some improvement because clients may process the response headers and body with fewer wakeups and waiting. The downside of this is slightly slower DELETE/PUT/HEAD responses due to the additional syscalls on the server.
2012-03-16additional path restrictions on HTTP PUT creating dirs
We don't want accidental /dev* directories being created due to misconfiguration. This can help prevent accidental configuration errors from spilling over or going unnoticed.
2012-03-15httpget deserves its own fd_type enum
Hopefully things are less error-prone this way.
2012-03-15http: fix uninitialized mem access for non-GET/HEAD reqs
Not only we have to be careful about not changing a bit, we also need to be careful about actually setting it for current cases... Found by valgrind.
2012-03-14support for httpgetlisten config directive
This makes it easy to support read-only HTTP traffic on a different listen port. This reduces listen queue contention and allows using iptables to block off DAV traffic from certain hosts while serving freely.
2012-03-14queue: active clients maintain thread affinity
We want to avoid global resources like the active queue as much as possible. Unnecesarly bouncing of clients between different threads and contention for the active queue lock hurts concurrency. This contention is witnessed when parallel MD5 requests are serviced during parallel fsck runs.
2012-03-08http: avoid active queue on initial GET/PUT chunk
Try to drain (or fill up) the socket as much as possible. We want to be able to be able to do some work without putting additional contention in the active queue and potentially bouncing data between CPU caches.
2012-03-07properly name mog_rbuf_detach() function
"detach" makes more sense than "defer" here. This function detaches a per-thread buffer from it's owner.
2012-03-03http: allow headers up to UINT16_MAX in size
Some folks with reproxy setups end up forwarding large headers (e.g. session cookies) to mogstored backends. Since our per-client HTTP buffer offsets are uint16_t, UINT16_MAX was chosen. Perlbal actually allows 100K, but I doubt anybody would ever actually need that much.
2012-03-03rbuf: use rcapa instead of rsize correctly
We didn't have rcapa in the past, but now we do, so use it. rsize is only used for stashing buffers in per-client (fdmap) areas.
2012-02-28unify rbuf sizes for http and mgmt
They're the same, so it should result in less fragmentation resizing if we _keep_ them the same moving forward.
2012-02-27rbuf: add rcapa element to struct
This stores the original size of the struct and makes it easier to know how much of it is used.
2012-02-25implement graceful shutdown for outstanding requests
By going into single-threaded mode, we can drastically simplify our shutdown sequence to avoid race conditions. This also allows us to not have additional overhead during normal runtime: as all the shutdown-specific logic is isolated to only a few portions of the code. Like all graceful shutdown schemes, this is one is still vulnerable to race conditions due to network latency, but this one should be no worse than any other server. Fortunately all requests we service are idempotent.
2012-02-23cleanup mog_fd insertion/initialization for queues
This will help us avoid bugs if we're transfering mog_fd structs between queues.
2012-02-20redo mog_fd_put() and actually use it
This forces us to invalidate the mog_fd structure before calling close() on the file descriptor. Eventually, this lets us gracefully shutdown by scanning fdmap to invalidate old connections.
2012-02-18http: use internal svc flag to toggle persistence
We want to be able to override keepalive/persistence set by our parser if our svc is being shut down.
2012-02-09do not log for ENOTCONN and ECONNRESET errors
They're far too common and will just flood syslog
2012-02-05http: fix missing case statement in switch
Found by clang, apparently GCC gets confused when it comes to small-sized enums.
2012-02-04cleanup HTTP chunked PUT support for odd edge cases
Unlimited-length streams are trickier to parse with minimal buffering, so we need to be careful with corner cases clients may put us through...
2012-01-31http: Date: and Last-Modified: response headers
In case MogileFS clients rely on these fields, we're closer to being a "real" HTTP server.
2012-01-31enable chunked HTTP PUT support
Still a bit iffy on the details, but it seems to basically work. There will probably be cases where this code falls down badly so it needs much more testing...
2012-01-19add Content-Range: support for PUT requests
The Perl MogileFS::Client library still send requests with Content-Range for partial PUTs
2012-01-19http_put: identity PUT (no chunking/trailers) working
Good thing is that pipelined and persistent PUT works out-of-the-box, too. We use O_EXCL when opening files, so there's currently no risk of overwriting anything, maybe it's a good thing? TODOs: * partial write (Content-Range header) * overhaul the mog_open* API for Content-Range * support overwriting existing files (maybe) * Content-MD5 verification (in trailers, too) * Transfer-Encoding: chunked support (for Content-MD5 trailers) * mmap() write support.