Timeouts longer than expected.
- by Antony Gelberg @ 01/29 13:06 UTC - next

Hi all.  We use unicorn in production across four servers, behind
nginx and HAProxy (to load balance).  We set a 300s timeout in the
config file.  On an average day, we see something like:

E, [2015-01-22T17:01:24.335600 #4311] ERROR -- : worker=2 PID:21101
timeout (301s > 300s), killing

in our logs, ~60 times.  However, one day we noticed that production
had gone down with all the unicorns showing entries like the
following:

E, [2015-01-22T12:35:15.643020 #4311] ERROR -- : worker=1 PID:4605
timeout (451s > 300s), killing

(note the 451s before it was killed).

Any clues how we can better-understand the root cause, even if it's
something we'll put in place for the future?  We lack visibility here.

Thanks in advance,

Antony

message raw reply threadlink

  Re: Timeouts longer than expected.
  - by Eric Wong @ 01/29 20:00 UTC - next/prev

  Antony Gelberg <antony.gelberg@gmail.com> wrote:
  > <Hi all. We use unicorn in production across four servers, behind > ...>
  
  What other log entries appear in the 8 minutes leading up to the 451s
  line?
  
  Which version of unicorn is this?
  
  There were some bugs fixed in ancient unicorn versions; and current
  versions are still affected by big NTP adjustments due to the lack of
  monotonic clock API in older Rubies.
  
  I have a patch queued up for 5.x to use the monotonic clock, but
  requires Ruby 2.1+ to be useful:
  http://bogomips.org/unicorn-public/m/20150118033916.GA4478%40dcvr.yhbt.net.html
  
  Maybe there's another sleep calculation bug, though.  Not many people
  rely on giant timeouts and probably never notice it.
  
  I'll take a closer look at the master sleep calculations (in
  murder_lazy_workers) in a week or so and see if there's another problem
  lurking in the current versions.
  
  > Any clues how we can better-understand the root cause, even if it's
  > something we'll put in place for the future?  We lack visibility here.
  
  I'd use an in-process timeout to get backtraces.  The
  Application_Timeouts doc in the source and online has more
  tips on dealing with timeouts in general.
  
    http://unicorn.bogomips.org/Application_Timeouts.html
  
  Also curious: you have users willing to wait 5 minutes for an
  HTTP response?  Yikes!

  message raw reply parent threadlink

[PATCH] http: -Wshorten-64-to-32 warnings on clang
- by Eric Wong @ 01/28 18:44 UTC - next/prev

Tested on x86_64, clang version 3.5-1ubuntu1 (trunk) (LLVM 3.5)
These warnings were introduced on
commit 4b2782a926d8f131b1e7382be35e3abb77bf4be5
("http: reduce parser from 72 to 56 bytes on 64-bit")
and did not affect any releases.

These length checks should not be necessary in reality because
HTTP header sizes never come close to 4GB in size.

Fixup a minor coding style (inherited from Mongrel) violation
while we're at it (tabs => spaces).

more... raw reply threadlink

global variable in after_fork
- by Joe Williams @ 01/23 18:32 UTC - next/prev

I am attempting to do something like the following:

after_fork do |server, worker|
  $unicorn_uptime = Time.now.to_i.to_s
  $unicorn_uptime.freeze
<snip>

Unfortunately the variable comes up nil when later accessed. Anyone have
thoughts on what I am missing?

Thanks!
-Joe

message raw reply threadlink

  Re: global variable in after_fork
  - by Eric Wong @ 01/24 23:30 UTC - next/prev

  Joe Williams <williams.joe@gmail.com> wrote:
  > I am attempting to do something like the following:
  > 
  > after_fork do |server, worker|
  >   $unicorn_uptime = Time.now.to_i.to_s
  >   $unicorn_uptime.freeze
  > <snip>
  > 
  > Unfortunately the variable comes up nil when later accessed. Anyone have
  > thoughts on what I am missing?
  
  I tried your after_fork hook with the following config.ru
  and it prints out $unicorn_uptime fine:
  
  -------------------- config.ru --------------------------
  use Rack::ContentLength
  use Rack::ContentType, 'text/plain'
  run lambda { |env| [ 200, {}, [ "#$unicorn_uptime\n"] ] }

  more... raw reply parent threadlink

[RFC] remove old inetd+git examples and exec_cgi
- by Eric Wong @ 01/22 05:12 UTC - next/prev

I'm tempted to remove these for 5.x, but maybe somebody depends on the
lib/unicorn/app/* portions somewhere... Does anybody care?

Subject: [PATCH] remove old inetd+git examples and exec_cgi

While it was technically interesting and fun to tunnel arbitrary
protocols over a semi-compliant Rack interface, nobody actually does
it (and anybody who does can look in our git history).  This was
from back in 2009 when this was one of the few servers that could
handle chunked uploads, nowadays everyone does it! (or do they? :)

A newer version of exec_cgi.rb still lives on in the repository of
yet another horribly-named server, but there's no point in bloating
the installation footprint of somewhat popular server such as unicorn.

more... raw reply threadlink

[PATCH] use the monotonic clock under Ruby 2.1+
- by Eric Wong @ 01/18 03:39 UTC - next/prev

The monotonic clock is immune to time adjustments so it is not
thrown off by misconfigured clocks.  Process.clock_gettime also
generates less garbage on 64-bit systems due to the use of Flonum.

more... raw reply threadlink

[PUSHED] switch docs + website to olddoc
- by Eric Wong @ 01/10 04:49 UTC - next/prev

wrongdoc was difficult to maintain because of the tidy-ffi
dependency and the HTML5 changes in Darkfish could not be
handled well by Tidy.

olddoc is superior as it generates leaner HTML which loads faster,
requires less scrolling and less processing power to render.
Aesthetic comparisons are subjective of course but completely
unimportant compared to speed and accessibility.

The presence of images and CSS on the old (Darkfish-based) site
probably set unreasonable expectations as to my ability and
willingness to view such things.  No more, the new website is
entirely simple HTML which renders well with even the wimpiest
browser.

http://bogomips.org/unicorn.git/patch?id=8e52784d

message raw reply threadlink

[PATCH] README: clarify/reduce references to unicorn_rails
- by Eric Wong @ 01/10 04:46 UTC - next/prev

unicorn_rails is an ancient compatibility wrapper for ancient
versions of Rails which did not use Rack.  Those applications have
likely moved on, so stop promoting unicorn_rails.

more... raw reply threadlink

Unicorn workers timing out intermittently
- by Aaron Price @ 12/30 16:38 UTC - next/prev

Hey guys,

I'm having the following issue and I was hoping you might be able to shed
some light on the issue. Maybe give some suggestions on how to debug the
actual problem.

http://serverfault.com/questions/655430/unicorn-workers-timing-out-intermittently

Any help is appreciated.

Cheers,

Aaron.

message raw reply threadlink

  Re: Unicorn workers timing out intermittently
  - by Eric Wong @ 12/30 18:12 UTC - next/prev

  Aaron Price <aprice@flipgive.com> wrote:
  > Hey guys,
  > 
  > I'm having the following issue and I was hoping you might be able to shed
  > some light on the issue. Maybe give some suggestions on how to debug the
  > actual problem.
  > 
  > http://serverfault.com/questions/655430/unicorn-workers-timing-out-intermittently
  
  (quoting the problem in full to save folks from having to load the page)
  
  > <I'm getting intermittent timeouts from unicorn workers for seemingly ...>
  
  OK, the problem should be reproducible hitting unicorn directly (easy to
  configure a TCP listener) and eliminating nginx as a possible
  point-of-failure.  Also, I suggest using curl so you can script and see
  everything going on more easily at the protocol/syscall level.
  
  >    Again. There's no load on the server at all. The only requests going
  >    through are my own and every 10 - 20 requests at random have this same
  >    problem.
  
  OK, the main culprit is usually external connections acting up.  Postgres,
  external monitoring services, network failures, etc...
  
  >    It doesn't look like unicorn is eating memory at all. I know this
  >    because I'm using watch -n 0.5 free -m and this is the result.
  
  What does the CPU usage look like?
  
  >              total       used       free     shared    buffers     cached
  > Mem:          1995        765       1229          0         94        405
  > -/+ buffers/cache:        264       1730
  > Swap:          511          0        511
  > 
  >    So the server isn't running out of memory.
  > 
  >    Is there anything further I can do to debug this issue? or any insight
  >    into what's happening?
  
  Basically, minimize and isolate things.  Use a single worker, strace
  that worker ("strace -f -p $PID_OF_WORKER") and send a request to it
  using curl.
  
  Is the Postgres doing OK?  How is the network connection to that?
  What other external resources (monitoring services, memcached, redis,
  etc...) might you be hitting?  You'll see blocking (on things like
  read*/write*/select/poll/recvmsg/sendmsg) on strace.
  
  You can map numeric file descriptors to ports via:
   lsof -p $PID_OF_WORKER
  
  Also, is your server running out of entropy?  (You'll see blocking on
  reading /dev/random via strace) and you may also see low values in
  /proc/sys/kernel/random/entropy_avail

  message raw reply parent threadlink

    Re: Unicorn workers timing out intermittently
    - by Aaron Price @ 12/30 20:02 UTC - next/prev

    Thank for your help, Eric.
    
    I was able to pinpoint the issue, and I've posted the solution.
    
    The issue was that I wasn't closing a database connection for code being
    run in custom Rack middleware. Which is why the logs didn't show anything.
    
    http://serverfault.com/questions/655430/unicorn-workers-timing-out-intermittently/655478#655478
    
    Thanks again.
    
    
    On Tue, Dec 30, 2014 at 1:12 PM, Eric Wong <e@80x24.org> wrote:
    
    > <Aaron Price <aprice@flipgive.com> wrote: > > Hey guys, > ...>

    more... raw reply parent threadlink

Re: [RFC] http: TypedData C-API conversion
- by Eric Wong @ 12/28 07:06 UTC - next/prev

Eric Wong <e@80x24.org> wrote:
> Note: this means we are finally dropping Ruby 1.8 support as
> TypedData requires Ruby 1.9 and later.

Unfortunately, TypedData is not an official API, yet,
so I'm hesitant to depend on it until it is official:
  http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/67089

But there ought to be tiny gains made from dropping 1.8 support,
so I'll probably go ahead and do it...

message raw reply parent threadlink

  [PATCH] tmpio: drop the "size" method
  - by Eric Wong @ 12/28 07:17 UTC - next/prev

  Eric Wong <e@80x24.org> wrote:
  > But there ought to be tiny gains made from dropping 1.8 support,
  > so I'll probably go ahead and do it...
  
  Tiny stuff like this:
  ----------------------------8<-------------------------
  From: Eric Wong <e@80x24.org>
  Subject: [PATCH] tmpio: drop the "size" method
  
  It is redundant given the existence of File#size in Ruby 1.9+
  
  This saves 1440 bytes of bytecode on x86-64 under 2.2.0,
  and at least another 120 bytes for the method entry,
  hash table entry, and method definition overhead.

  more... raw reply parent threadlink

Rails/Unicorn comparizon between ruby 2.1.5 and ruby 2.2.0
- by Bráulio Bhavamitra @ 12/26 12:17 UTC - next/prev

Hello all,

A quick comparison the average GC time per request reported by Newrelic:

Ruby 2.1.5, Rails 3.2.21 and Unicorn: ~15ms
Ruby 2.1.5 (with gctools middleware), Rails 3.2.21 and Unicorn: ~4ms
Ruby 2.2.0, Rails 3.2.21 and Unicorn: ~8ms

We see that ruby 2.2.0 improved, but still an out-of-band is even better.

cheers,
bráulio

message raw reply threadlink

[PATCH] remove SSL support
- by Eric Wong @ 12/21 11:17 UTC - next/prev

We implemented barely-advertised support for SSL for two reasons:

1) to detect corruption on LANs beyond what TCP offers
2) to support other servers based on unicorn (never happened)

Since this feature is largely not useful for unicorn itself,
there's no reason to penalize unicorn 5.x users with bloat.

In our defense, SSL support appeared in version 4.2.0 :)

more... raw reply threadlink

[PATCH] t/t0002-parser-error.sh: relax test for rack 1.6.0
- by Eric Wong @ 12/21 10:43 UTC - next/prev

This overly zealous test was broken by:
rack commit be28c6a2ac152fe4adfbef71f3db9f4200df89e8
("update HTTP status codes to IETF RFC 7231")

more... raw reply threadlink

Show request details on timeout killing
- by Bráulio Bhavamitra @ 12/04 21:15 UTC - next/prev

Hello all,

Currently, unicorn kills a worker that reached timeout with the
following message:
E, [2014-12-04T19:12:23.646053 #32612] ERROR -- : worker=4 PID:11911
timeout (61s > 60s), killing

How can I see which URL was being processed by that worker?

I need to identify the problematic request.

cheers,
bráulio

message raw reply threadlink

  Re: Show request details on timeout killing
  - by Eric Wong @ 12/05 01:06 UTC - next/prev

  Bráulio Bhavamitra <braulio@eita.org.br> wrote:
  > Hello all,
  > 
  > Currently, unicorn kills a worker that reached timeout with the
  > following message:
  > E, [2014-12-04T19:12:23.646053 #32612] ERROR -- : worker=4 PID:11911
  > timeout (61s > 60s), killing
  > 
  > How can I see which URL was being processed by that worker?
  
  I suggest adding better logging to your app, perhaps to log when
  requests start.
  
  > I need to identify the problematic request.
  
  And to use application-level timeouts, see:
  
  	http://unicorn.bogomips.org/Application_Timeouts.html
  
  The SIGKILL timeout in unicorn is only a last resort when the Ruby VM
  is broken beyond repair and cannot respond to any signals[1].
  
  [1] SIGKILL and SIGSTOP are special signals which the kernel
      enforces, Ruby has no chance to block/catch/ignore them.

  message raw reply parent threadlink

    Re: Show request details on timeout killing
    - by Bráulio Bhavamitra @ 12/05 21:58 UTC - next/prev

    Ok Eric, the problem is that some requests are in a infinite loop,
    consuming a lot of resources.
    
    Rails/nginx already log many request on production.log/access.log, but
    I can't know which of them timeouted.
    
    cheers,
    bráulio
    
    On Thu, Dec 4, 2014 at 10:06 PM, Eric Wong <e@80x24.org> wrote:
    > <Bráulio Bhavamitra <braulio@eita.org.br> wrote: >> ...>

    message raw reply parent threadlink

      Re: Show request details on timeout killing
      - by Eric Wong @ 12/05 22:16 UTC - next/prev

      Bráulio Bhavamitra <braulio@eita.org.br> wrote:
      > Rails/nginx already log many request on production.log/access.log, but
      > I can't know which of them timeouted.
      
      Can't Rails log when a request starts and finishes?
      (I haven't touched Rails in years)
      
      Look for starts on PIDs without finishes.
      
      Something along the lines of the following middlware (totally untested):
      
          class LogBeforeAfter
            def initialize(app)
              @app = app
            end
      
            def call(env)
              env["rack.logger"].info("#$$ start #{env['PATH_INFO']}")
              response = @app.call(env)
              env["rack.logger"].info("#$$   end #{env['PATH_INFO']}")
              response
            end
          end
      
          -------------- config.ru ---------------
          use LogBeforeAfter
          # other middlewares...
          run YourApp.new
      
      See Rack::CommonLogger source for more examples/detail.

      message raw reply parent threadlink

Re: No, passenger 5.0 is not faster than unicorn :)
- by Hongli Lai @ 12/03 11:00 UTC - next/prev

Unicorn *is* in general very good and very efficient, no doubt about that.
Eric Wong has made great design choices and is an excellent programmer.

Having said that, in certain specific cases there's still room for
improvement. That's why we focused so much on microoptimizations and
specific optimizations like turbocaching. Have you followed Phusion
Passenger's Server Optimization Guide?
https://www.phusionpassenger.com/documentation/ServerOptimizationGuide.html

Also, you have to ensure that your Rails app sets the correct caching
headers. By default, Rails sets "Cache-Control: private, no-store" so that
the turbocache cannot kick in. You should see very different results if you
add "headers['Cache-Control'] = 'public'" to your Rails app. If you need
any help with this, please feel free to contact me off-list. I'd be happy
to help. We have also a benchmarking kit so that you can double check the
results; email me if you're interested in this.

As Sam said, most of the time will be spent in the Rails app. But
turbocaching is one notable exception: it's the one feature that can speed
things up even if your app is slow - provided that you set HTTP caching
headers correctly.

Unicorn is excellent at what it does: it's a minimal server with a specific
I/O model that is supposed to be used behind a buffering reverse proxy.
There is nothing wrong with that, and for the workloads that it's designed
for, it's great. Phusion Passenger has merely chosen a non-generalist
approach that aims to squeeze additional performance from specific cases.
Of course, nothing's a silver bullet. Like any tool, it only works if you
use it correctly.

On Wed, Dec 3, 2014 at 10:50 AM, Bráulio Bhavamitra <braulio@eita.org.br>
wrote:

> <Hello all, > > I've just tested a one instance each (one ...>

more... raw reply parent threadlink

  Re: No, passenger 5.0 is not faster than unicorn :)
  - by Sam Saffron @ 12/03 11:05 UTC - next/prev

  Yeah, anonymous caching is super critical, we monkey it in here:
  https://github.com/discourse/discourse/blob/master/lib/middleware/anonymous_cache.rb
  to be honest this really should be part of rails.
  
  On Wed, Dec 3, 2014 at 10:00 PM, Hongli Lai <hongli@phusion.nl> wrote:
  > <Unicorn *is* in general very good and very efficient, no doubt about ...>

  message raw reply parent threadlink

    Re: No, passenger 5.0 is not faster than unicorn :)
    - by Xavier Noria @ 12/03 12:42 UTC - next/prev

    I've often found Cache-Control: public to be of limited use in practice
    because you cannot invalidate that cache by hand. (Sometimes that's fine of
    course.)
    
    While the reverse proxy cache could provide a mechanism for explicit
    expiration, there may be caches between your servers and the client, a
    corporate cache for example. Those other ones are out of your control.
    
    I suggested an extension for Rack::Cache called r-maxage that got merged:
    
        https://github.com/rtomayko/rack-cache/pull/55
    
    With that directive you trade the benefits of caching in intermediate
    proxies with more expiration control locally.

    message raw reply parent threadlink

  Re: No, passenger 5.0 is not faster than unicorn :)
  - by Bráulio Bhavamitra @ 12/03 14:10 UTC - next/prev

  Hello Hongli,
  
  Thank you the guide, I've already learned a bit from it.
  
  We already use nginx for static files and ssl and varnish for caching
  public pages, so maybe turbocaching won't help too much.
  
  In this test I've tested passenger in standalone mode (--max-pool-size
  1) and unicorn with one worker. On a slow page, the variation was
  minimal (~8.26 req/s in unicorn and ~8.11 in passenger). I haven't
  tested fast and cacheable page.
  
  Also, I've used ab for benchmarking. Next time will try wrk.
  
  cheers,
  bráulio
  
  On Wed, Dec 3, 2014 at 8:00 AM, Hongli Lai <hongli@phusion.nl> wrote:
  > <Unicorn *is* in general very good and very efficient, no doubt about ...>

  more... raw reply parent threadlink


page: next      atom permalink
- unicorn Rack HTTP server user/dev discussion
A public-inbox, anybody may post in plain-text (not HTML):
unicorn-public@bogomips.org
git URL for ssoma: git://bogomips.org/unicorn-public.git
homepage: http://unicorn.bogomips.org/
subscription optional: unicorn-public+subscribe@bogomips.org