about summary refs log tree commit homepage
path: root/ext/unicorn_http/gperf.rb
diff options
context:
space:
mode:
authorEric Wong <e@80x24.org>2019-07-04 03:49:51 +0000
committerEric Wong <e@80x24.org>2019-07-04 21:55:33 +0000
commit1cf8ca2c6e57cf8cd9794d5bb6bb4f8b22711560 (patch)
treee5b1a6869597728228cb855613bb3a62671207b2 /ext/unicorn_http/gperf.rb
parent10142db47f4b03eb00749feda660df567fde7276 (diff)
downloadunicorn-1cf8ca2c6e57cf8cd9794d5bb6bb4f8b22711560.tar.gz
GNU gperf is a commonly-used tool for generating perfect hashes
and available on every platform unicorn runs on.  C Ruby, gcc,
glibc all already use it.

Using a hash lookup instead of a linear scan already shows
measurable improvements when memoized header keys are all
used:

* test/benchmark/http_parser.rb (no options):

   100000 iterations
         user     system      total        real
  -  0.411857   0.000200   0.412057 (  0.412070)
  +  0.397960   0.000181   0.398141 (  0.398149)

Results which require generating a new string from an unmemoized
header is less significant, but still consistent measurable:

* test/benchmark/http_parser.rb -H 'DNT: 1'

   100000 iterations
         user     system      total        real
  -  0.461416   0.000000   0.461416 (  0.461417)
  +  0.461329   0.000000   0.461329 (  0.461363)

Most importantly, this change allows us to memoize more keys
without worrying too much about the overhead of a O(n) scan.
Diffstat (limited to 'ext/unicorn_http/gperf.rb')
-rw-r--r--ext/unicorn_http/gperf.rb27
1 files changed, 27 insertions, 0 deletions
diff --git a/ext/unicorn_http/gperf.rb b/ext/unicorn_http/gperf.rb
new file mode 100644
index 0000000..9765f86
--- /dev/null
+++ b/ext/unicorn_http/gperf.rb
@@ -0,0 +1,27 @@
+#!/usr/bin/ruby -w
+buf = STDIN.read # output of: gperf ext/unicorn_http/common_fields.gperf
+
+# this is supposed to fail if it doesn't subsitute anything:
+print buf.sub!(
+
+# make sure all functions are static
+/\nstruct \w+ \*\n(\w+_)?lookup/) {
+  "\nstatic#$&"
+}.
+
+gsub!(
+# gperf 3.0.x used "(int)(long)", 3.1 uses "(int)(size_t)",
+#  input: {(int)(size_t)&((struct cf_pool_t *)0)->cf_pool_str3},
+# output: {offsetof(struct cf_pool_t, cf_pool_str3)},
+/{\(int\)\(\w+\)\&\(\((struct \w+) *\*\)0\)->(\w+)}/) {
+  "{offsetof(#$1, #$2)}"
+}.
+
+# make sure everything is 64-bit safe and compilers don't truncate
+gsub!(/\b(?:unsigned )?int\b/, 'size_t').
+
+# This isn't need for %switch%, but we'll experiment with to see
+# if it's necessary, or not.
+# don't give compilers a reason to complain, (struct foo *)->name
+# is size_t, so unused slots should be size_t:
+gsub(/\{-1\}/, '{(size_t)-1}')