unicorn.git - Rack HTTP server for Unix and fast clients

Date	Commit message (Collapse)
2024-03-31	treewide: future-proof frozen_string_literal changes
	Once again Ruby seems ready to introduce more incompatibilities and force busywork upon maintainers[1]. In order to avoid incompatibilities in the future, I used a Perl script[2] to prepend `frozen_string_literal: false' to every Ruby file. Somebody interested will have to go through every Ruby source file and enable frozen_string_literal once they've thoroughly verified it's safe to do so. [1] https://bugs.ruby-lang.org/issues/20205 [2] https://yhbt.net/add-fsl.git/74d7689/s/?b=add-fsl.perl
2023-06-20	unicorn_http_common.rl: use only ASCII spaces for compatibility
	Ragel 6.10 on FreeBSD 12.4 amd64 complains and fails on this, yet the same Ragel version on Debian 11.x i386 and amd64 never has. I suspect this can fix compatibility on s390x, arm64, armel, and armhf Debian builds: https://buildd.debian.org/status/fetch.php?pkg=unicorn&arch=s390x&ver=6.1.0-1&stamp=1687156375&file=log https://buildd.debian.org/status/fetch.php?pkg=unicorn&arch=arm64&ver=6.1.0-1&stamp=1687156478&file=log https://buildd.debian.org/status/fetch.php?pkg=unicorn&arch=armel&ver=6.1.0-1&stamp=1687156619&file=log https://buildd.debian.org/status/fetch.php?pkg=unicorn&arch=armhf&ver=6.1.0-1&stamp=1687156807&file=log Fixes: d5fbbf547203061b (Add some tolerance (RFC2616 sec. 19.3), 2016-10-20)
2023-06-20	epollexclusive: handle future rb_io_t deprecation
	It looks like Ruby 3.3+ will hide rb_io_t internals and get rid of the venerable `GetOpenFile' macro in favor of `rb_io_descriptor'. `rb_io_descriptor' has been public API since Ruby 3.1 and should be safe to use, and is necessary for `raindrops' (a dependency of ours): https://yhbt.net/raindrops-public/20230609104805.39022-1-samuel.williams@oriontransfer.co.nz/ https://bugs.ruby-lang.org/issues/19057#note-17 We'll also avoid an unnecessary call to `rb_io_get_io' in `get_readers' since `epio' (aka `self') can only be of the Unicorn::Waiter IO subclass. However, we must still use `rb_io_get_io' when handling non-self args in `prep_readers'.
2023-06-05	httpdate: fix build with Ruby 2.7 (at least)
	<time.h> is still required for gmtime_r(3), and not all versions of <ruby.h> include <time.h>, already. Fixes: a6463151bd1db5b9 (httpdate: favor gettimeofday(2) over time(2) for correctness, 2023-06-01)
2023-06-05	httpdate: favor gettimeofday(2) over time(2) for correctness
	While scanning the git@vger.kernel.org mailing list, I've learned time(2) may return the wrong value in the first 1 to 2.5 ms of every second. While I'm not sure if the Date: response header matters to anyone, returning the correct time seems prudent. Link: https://lore.kernel.org/git/20230320230507.3932018-1-gitster@pobox.com/ Link: https://inbox.sourceware.org/libc-alpha/20230306160321.2942372-1-adhemerval.zanella@linaro.org/T/ Link: https://sourceware.org/bugzilla/show_bug.cgi?id=30200
2023-06-05	ext: remove a vestige of ruby <2.0 support
	The actual `id_clear' declaration was removed last year, but I missed it's (unused) initialization :x Fixes: c56eb04d683e ("drop Ruby 1.9.3 support, require 2.0+ for now")
2023-06-05	chunk unterminated HTTP/1.1 responses
	Rack::Chunked will be gone in Rack 3.1, so provide a non-middleware fallback which takes advantage of IO#write supporting multiple arguments in Ruby 2.5+. We still need to support Ruby 2.4, at least, since Rack 3.0 does. So a new (GC-unfriendly) Unicorn::WriteSplat module now exists for Ruby <= 2.4 users.
2023-06-05	epollexclusive: use maxevents=1 for epoll_wait
	This allows us to avoid both malloc (slow) and alloca (unpredictable stack usage) at the cost of needing to make more epoll_wait syscalls in a rare case. In unicorn (and most servers), I expect the most frequent setup is to have one active listener serving the majority of the connections, so the most frequent epoll_wait return value would be 1. Even with >1 events, any syscall overhead saved by having epoll_wait retrieve multiple events is dwarfed by Rack app processing overhead. Worse yet, if a worker retrieves an event sooner than it can process it, the kernel (regardless of EPOLLEXCLUSIVE or not) is able to enqueue another new event to that worker. In this example where `a' and `b' are both listeners: U=userspace, K=kernel K: client hits `a' and `b', enqueues them both (events #1 and #2) U: epoll_wait(maxevents: 2) => [ a, b ] K: enqueues another event for `b' (event #3) U: process_client(a.accept) # this takes a long time While process_client(a.accept) is happening, `b' can have two clients pending on a given worker. It's actually better to leave the first `b' event unretrieved so the second `b' event can go to the ep->rdllist of another worker. The kernel is only capable of enqueuing an item if it hasn't been enqueued. Meaning, it's impossible for epoll_wait to ever retrieve `[ b, b ]' in one call.
2021-12-25	drop Ruby version warning, fix speling errer
	The warning was probably lost in the noise of the build, anyways.
2021-12-25	epollexclusive: remove rb_gc_force_recycle call
	It's deprecated in Ruby 3.1+, and probably not relevant for past versions.
2021-10-24	allow Ruby to deduplicate remaining globals
	Most of these are bound to be used in Rack/Rails/apps/gems, (though possibly with different encodings). Give Ruby a chance to deduplicate them, at least.
2021-10-04	use EPOLLEXCLUSIVE on Linux 4.5+
	While the capabilities of epoll cannot be fully exploited given our primitive design; avoiding thundering herd wakeups on larger SMP machines while below 100% utilization is possible with Linux 4.5+. With this change, only one worker wakes up per-connect(2) (instead of all of them via select(2)), avoiding the thundering herd effect when the system is mostly idle. Saturated instances should not notice the difference if they rarely had multiple workers sleeping in select(2). This change benefits non-saturated instances. With 2 parallel clients and 8 workers on a nominally (:P) 8-core CPU (AMD FX-8320), the uconnect.perl test script invocation showed a reduction from ~3.4s to ~2.5s when reading an 11-byte response body: echo worker_processes 8 >u.conf.rb bs=11 ruby -I lib -I test/ruby-2.5.5/ext/unicorn_http/ bin/unicorn \ test/benchmark/dd.ru -E none -l /tmp/u.sock -c u.conf.rb time perl -I lib -w test/benchmark/uconnect.perl \ -n 100000 -c 2 /tmp/u.sock Times improve less as "-c" increases for uconnect.perl (system noise and timings are inconsistent). The benefit of this change should be more noticeable on systems with more workers (and more cores). I wanted to use EPOLLET (Edge-Triggered) to further reduce syscalls, here, (similar to the old select()-avoidance bet) but that would've either added too much complexity to deduplicate wakeup sources, or run into the same starvation problem we solved in April 2020[1]. Since the kernel already has the complexity and deduplication built-in for Level-Triggered epoll support, we'll just let the kernel deal with it. Note: do NOT take this as an example of how epoll should be used in a sophisticated server. unicorn is primitive by design and cannot use threads nor handle multiple clients at once, thus it it only uses epoll in this extremely limited manner. Linux 4.5+ users will notice a regression of one extra epoll FD per-worker and at least two epoll watches, so /proc/sys/fs/epoll/max_user_watches may need to be changed along with RLIMIT_NOFILE. This change has also been tested on Linux 3.10.x (CentOS 7.x) and FreeBSD 11.x to ensure compatibility with systems without EPOLLEXCLUSIVE. Various EPOLLEXCLUSIVE discussions over the years: https://yhbt.net/lore/lkml/?q=s:EPOLLEXCLUSIVE+d:..20211001&x=t&o=-1 [1] https://yhbt.net/unicorn-public/CAMBWrQ=Yh42MPtzJCEO7XryVknDNetRMuA87irWfqVuLdJmiBQ@mail.gmail.com/
2021-10-04	extconf.rb: get rid of unnecessary checks
	SIZEOF_, 2NUM and NUM2* should all be defined by ruby.h and dependencies it pulls in since Ruby 2.0 and possibly earlier. INT_MAX and LLONG_MAX are in limits.h which is POSIX. HAVE_GMTIME_R is already defined by ruby/config.h, so we shouldn't have to check for it, either. Combined, these changes speed up extconf.rb by several seconds.
2021-09-26	drop Ruby 1.9.3 support, require 2.0+ for now
	Ruby 1.9.3 was released nearly a decade ago, so there's probably few (if any) legacy users left, and they can continue using old versions of unicorn. We'll be able to take advantage of some Ruby 2.0+-only features down the road (and hopefully 2.3+). Also, I no longer have a installation of Ruby 1.8 and getting it working probably isn't worth the effort, so 4.x support is gone.
2020-09-06	Update ruby_version requirement to allow ruby 3.0
	Ruby just recently bump the master version to 3.0. This requirement bump is necessary to test unicorn against ruby master. [ew: wrap at <80 columns for hackers with poor eyesight] Acked-by: Eric Wong <bofh@yhbt.net>
2020-03-19	http: improve RFC 7230 conformance
	We need to favor "Transfer-Encoding: chunked" over "Content-Length" in the request header if they both exist. Furthermore, we now reject redundant chunking and cases where "chunked" is not the final encoding. We currently do not and have no plans to decode "gzip", "deflate", or "compress" encoding as described by RFC 7230. That's a job more appropriate for middleware, anyways. cf. https://tools.ietf.org/html/rfc7230 https://www.rfc-editor.org/errata_search.php?rfc=7230
2020-01-20	doc: s/bogomips.org/yhbt.net/g
	bogomips.org is due to expire, soon, and I'm not willing to pay extortionist fees to Ethos Capital/PIR/ICANN to keep a .org. So it's at yhbt.net, for now, but it will change again to whatever's affordable... Identity is overrated. Tor users can use .onions and kick ICANN to the curb: torsocks w3m http://unicorn.ou63pmih66umazou.onion/ torsocks git clone http://ou63pmih66umazou.onion/unicorn.git/ torsocks w3m http://ou63pmih66umazou.onion/unicorn-public/ While we're at it, `s/news.gmane.org/news.gmane.io/g', too. (but I suspect that'll need to be resynched since our mail "List-Id:" header is changing).
2018-12-26	use rb_gc_register_mark_object
	Since Ruby 2.6, it's a documented part of the API and we may depend on it: https://bugs.ruby-lang.org/issues/9894 It's been around since the early Ruby 1.9 days, and reduces overhead compared to relying on rb_global_variable: https://bogomips.org/unicorn-public/20170301002854.29198-1-e@80x24.org/
2018-12-12	deduplicate strings VM-wide in Ruby 2.5+
	String#-@ deduplicates strings starting with Ruby 2.5.0 Hash#[]= deduplicates strings starting in Ruby 2.6.0-rc1 This allows us to save a small amount of memory by sharing objects with other parts of the stack (e.g. Rack).
2017-12-16	avoid reusing env on hijack
	Hijackers may capture and reuse `env' indefinitely, so we must not use it in those cases for future requests. For non-hijack requests, we continue to reuse the `env' object to reduce memory recycling. Reported-and-tested-by: Sam Saffron <sam.saffron@gmail.com>
2017-10-03	fix GC issue on rb_global_variable array
	We need to add the array to ruby's global_list right after created it; otherwise it probably gets GCed.
2017-03-08	unicorn_http: reduce rb_global_variable calls
	rb_global_variable registers the address of the variable which refers to the object, instead of the object itself. This adds extra overhead to each global variable for our case, where the variable is frozen and never changed. Given there are currently 59 elements in this array, this saves 58 singly-linked list entries and associated malloc calls and associated overhead in the current mainline Ruby 2.x implementation. On 64-bit GNU libc malloc, this is already 16 * 58 = 928 bytes; more than the extra object slot and array slack space used by the new mark array. Mainline Ruby 1.9+ currently has a rb_gc_register_mark_object public function which would suite our needs, too, but it is currently undocumented, and may not be available in the future.
2016-11-09	drop rb_str_set_len compatibility replacement
	While it is innocuous after compiling, it can be a confusing source of errors for users with broken installations of Ruby itself: https://bogomips.org/unicorn-public/5ace6a20-e094-293d-93df-b557480e12d5@anyces.com/ https://bogomips.org/unicorn-public/02994a55-9c07-a3c5-f06b-a4c15551a67e@anyces.com/ rb_str_set_len has been provided since Ruby 1.8.7+, so we have not needed it since we dropped all 1.8.x support in unicorn 5.x.
2016-10-20	Add some tolerance (RFC2616 sec. 19.3)
	Hi all. We're implementing client certificate authentication with nginx and unicorn. Nginx configured in the following way: proxy_set_header X-SSL-Client-Cert $ssl_client_cert; When client submits certificate and nginx passes it to the unicorn, unicorn responds with 400 (Bad Request). This caused because nginx doesn't use "\r\n" they using just "\n" and multilne headers is failed to parse (I've added test). Accorording to RFC2616 section 19.3: https://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.3 "The line terminator for message-header fields is the sequence CRLF. However, we recommend that applications, when parsing such headers, recognize a single LF as a line terminator and ignore the leading CR." CRLF changed to ("\r\n" \| "\n") Github commit https://github.com/uno4ki/unicorn/commit/ed127b66e162aaf176de05720f6be758f8b41b1f PS: Googling "nginx unicorn ssl_client_cert" shows the problem.
2015-12-13	http: TypedData C-API conversion
	This provides some extra type safety if combined with other C extensions, as well as allowing us to account for memory usage of the HTTP parser in ObjectSpace. This requires Ruby 1.9.3+ and has remained a stable API since then. This will become officially supported when Ruby 2.3.0 is released later this month. This API has only been documented in doc/extension.rdoc (formerly README.EXT) in the Ruby source tree since April 2015, r50318
2015-07-15	doc: remove references to old servers
	They'll continue to be maintained, but we're no longer advertising them. Also, favor lowercase "unicorn" while we're at it since that matches the executable and gem name to avoid unnecessary escaping for RDoc.
2015-06-06	http: move response_start_sent into the C ext
	Combined with the previous commit to eliminate the `@socket' instance variable, this eliminates the last instance variable in the Unicorn::HttpRequest class. Eliminating the last instance variable avoids the creation of a internal hash table used for implementing the "generic" instance variables found in non-pure-Ruby classes. Method entry overhead remains the same. While this change doesn't do a whole lot for unicorn memory usage where the HttpRequest is a singleton, it helps other HTTP servers which rely on this code where thousands of clients may be connected.
2015-05-29	http: use rb_hash_clear in Ruby 2.0+
	Calling the function directly avoids the overhead of Ruby method table lookup and global method cache. The only downside is this is now hidden from tracers and cannot be overridden from Ruby, but I doubt anybody cares about that.
2015-03-02	http: remove experimental dechunk! method
	It was never used anywhere AFAIK and wastes precious bytes.
2015-03-02	http: remove deprecated reset method
	We use the `clear' method everywhere nowadays.
2015-02-04	http: standalone require + reduction in binary size
	This allows requiring just the C extension part of "unicorn_http", without requiring the rest of unicorn, allowing other HTTP servers using the same parser to be slimmer. On my x86-64 Debian 7.0 system: text data bss dec hex filename 44026 1976 488 46490 b59a lib/unicorn_http.so 43930 1976 456 46362 b51a lib/unicorn_http.so
2015-01-28	http: -Wshorten-64-to-32 warnings on clang
	Tested on x86_64, clang version 3.5-1ubuntu1 (trunk) (LLVM 3.5) These warnings were introduced on commit 4b2782a926d8f131b1e7382be35e3abb77bf4be5 ("http: reduce parser from 72 to 56 bytes on 64-bit") and did not affect any releases. These length checks should not be necessary in reality because HTTP header sizes never come close to 4GB in size. Fixup a minor coding style (inherited from Mongrel) violation while we're at it (tabs => spaces).
2014-09-17	http: reduce parser from 72 to 56 bytes on 64-bit
	This allows the parser struct to fit in one cache line on x86-64 systems where cache lines are 64 bytes. Using 32-bit integer lengths is safe here because these are only for tracking offsets within the HTTP header buffer. We can safely limit HTTP headers and in-memory buffers to be less than 4GB without anybody complaining. HTTP bodies continue to use off_t (usually 64-bit, even on 32-bit systems) sizes and support as much as the OS/hardware can handle.
2014-08-18	http: remove the keepalive requests limit
	This was a hack for some event loops such as those found in nginx and some Rainbows! concurrency models. Using epoll/kqueue with one-shot notification (which yahns does) avoids all fairness problems.
2014-05-29	http: remove xftrust options
	This has long been considered a mistake and not documented for very long. I considered removing X-Forwarded-Proto and X-Forwarded-SSL handling, too, so rack.url_scheme is always "http", but that might lead to compatibility issues in rare apps if Rack::Request#scheme is not used.
2013-10-26	license: allow all future versions of the GNU GPL
	There is currently no GPLv4, so this change has no effect at the moment. In case the GPLv4 arrives and I am not alive to approve/review it, the lesser of evils is have give blanket approval of all future GPL versions (as published by the FSF). The worse evil is to be stuck with a license which cannot guarantee the Free-ness of this project in the future. This unfortunately means the FSF can theoretically come out with license terms I do not agree with, but the GPLv2 and GPLv3 will always be an option to all users.
2013-05-08	HttpParser#next? becomes response_start_sent-aware
	This could allow servers with persistent connection support[1] to support our check_client_connection in the future. [1] - Rainbows!/zbatery, possibly others
2013-02-26	http: avoid frozen string bug in filter_body
	Our rb_str_modify() became no-ops due to incomplete reverts of workarounds for old Rubinius, causing rb_str_set_len to fail with: can't set length of shared string (RuntimeError) This bug was introduced due to improper workarounds for old versions of Rubinius in 2009 and 2010: commit 5e8979ad38efdc4de3a69cc53aea33710d478406 ("http: cleanups for latest Rubinius") commit f37c23704cb73d57e9e478295d1641df1d9104c7 ("http: no-op rb_str_modify() for Rubies without it")
2013-02-24	httpdate: minor size reduction in DSO
	Extra pointers waste space in the DSO. Normally I wouldn't care, but the string lengths are identical and this code already made it into another project in this form. size(1) output: text data bss dec hex filename before: 42881 2040 336 45257 b0c9 unicorn_http.so after: 42499 1888 336 44723 aeb3 unicorn_http.so ref: http://www.akkadia.org/drepper/dsohowto.pdf
2012-11-29	Begin writing HTTP request headers early to detect disconnected clients
	This patch checks incoming connections and avoids calling the application if the connection has been closed. It works by sending the beginning of the HTTP response before calling the application to see if the socket can successfully be written to. By enabling this feature users can avoid wasting application rendering time only to find the connection is closed when attempting to write, and throwing out the result. When a client disconnects while being queued or processed, Nginx will log HTTP response 499 but the application will log a 200. Enabling this feature will minimize the time window during which the problem can arise. The feature is disabled by default and can be enabled by adding 'check_client_connection true' to the unicorn config. [ew: After testing this change, Tom Burns wrote: So we just finished the US Black Friday / Cyber Monday weekend running unicorn forked with the last version of the patch I had sent you. It worked splendidly and helped us handle huge flash sales without increased response time over the weekend. Whereas in previous flash traffic scenarios we would see the number of HTTP 499 responses grow past the number of real HTTP 200 responses, over the weekend we saw no growth in 499s during flash sales. Unexpectedly the patch also helped us ward off a DoS attack where the attackers were disconnecting immediately after making a request. ref: <CAK4qKG3rkfVYLyeqEqQyuNEh_nZ8yw0X_cwTxJfJ+TOU+y8F+w@mail.gmail.com> ] Signed-off-by: Eric Wong <normalperson@yhbt.net>
2012-04-17	http: increase REQUEST_PATH maximum length to 4K
	The previous REQUEST_PATH limit of 1024 is relatively small and some users encounter problems with long URLs. 4K is a common limit for PATH_MAX on modern GNU/Linux systems and REQUEST_PATH is likely to translate to a filesystem path name. Thanks to Nuo Yan <yan.nuo@gmail.com> and Lawrence Pit <lawrence.pit@gmail.com> for their feedback on this issue. ref: http://mid.gmane.org/CB935F19-72B8-4EC2-8A1D-5084B37C09F2@gmail.com
2011-08-29	add GPLv3 option to the license
	Existing license terms (Ruby-specific) and GPLv2 remain in place, but GPLv3 is preferred as it helps with distribution of AGPLv3 code and is explicitly compatible with Apache License (v2.0). Many more reasons are documented by the FSF: https://www.gnu.org/licenses/quick-guide-gplv3.html http://gplv3.fsf.org/rms-why.html ref: http://thread.gmane.org/gmane.comp.lang.ruby.unicorn.general/933
2011-07-13	http: reject non-LWS CTL chars (0..31 + 127) in field values
	RFC 2616 doesn't appear to allow most CTL bytes even though Mongrel always did. Rack::Lint disallows 0..31, too, though we allow "\t" (HT, 09) since it's LWS and allowed by RFC 2616.
2011-06-15	http: delay CoW string invalidations in filter_body
	Not all invocations of filter_body will trigger CoW on the given destination string. We can also avoid an unnecessary rb_str_set_len() in the non-chunked path, too.
2011-06-15	http: remove tainting flag
	Needless line noise, kgio doesn't support tainting anyways.
2011-06-14	http: fix documentation for dechunk!
	chunk_ready! was my original name for it, but I'm indecisive when it comes to naming things.
2011-06-13	http: dechunk! method to enter dechunk mode
	This allows one to enter the dechunker without parsing HTTP headers beforehand. Since we skipped header parsing, trailer parsing is not supported since we don't know what trailers might be (to our knowledge, nobody uses trailers anyways)
2011-06-13	http: document reasoning for memcpy in filter_body
	copy-on-write behavior doesn't help you if your common use case triggers copies.
2011-06-13	http: rename variables in filter_body implementation
	Makes things easier-to-understand since it's based on memcpy()
2011-05-23	http: call rb_str_modify before rb_str_resize
	Ruby 1.9.3dev (trunk) requires it if the string size is unchanged.