Rainbows! Rack HTTP server user/dev discussion
 help / color / mirror / code / Atom feed
From: Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org>
To: Rainbows! list <rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org>
Cc: Cody Fauser <cody.fauser-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>,
	ops <ops-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>,
	Harry Brundage
	<harry.brundage-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>,
	Jonathan Rudenberg
	<jonathan.rudenberg-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>
Subject: Re: Unicorn is killing our rainbows workers
Date: Thu, 19 Jul 2012 00:26:41 +0000	[thread overview]
Message-ID: <20120719002641.GA17210@dcvr.yhbt.net> (raw)
In-Reply-To: <CAFFC5+N=_bnyM=0WbtLxPAncs0TV4wA9P8TXZ_-T3qOtW-+w3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Samuel Kadolph <samuel.kadolph-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org> wrote:
> On Wed, Jul 18, 2012 at 5:52 PM, Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org> wrote:
> > Samuel Kadolph <samuel.kadolph-/3HedJEncLlQ0OI7PeSoCw@public.gmane.org> wrote:
> >> Hey rainbows-talk,
> >>
> >> We have 40 servers that each run rainbows with 2 workers with 100
> >> threads using ThreadPool. We're having an issue where unicorn is
> >> killing the worker process. We use ThreadTimeout (set to 70 seconds)
> >> and originally had the unicorn timeout set to 150 seconds and we're
> >> seeing unicorn eventually killing each worker. So we bumped the
> >> timeout to 300 seconds and it took about 5 minutes but we started
> >> seeing unicorn starting to kill workers again. You can see our stderr
> >> log file (timeout at 300s) at
> >> https://gist.github.com/9ec96922e55a59753997. Any insight into why
> >> unicorn is killing our ThreadPool workers would help us greatly. If
> >> you require additional info I would be happy to provide it.

Also, are you using "preload_app true" ?

I'm a bit curious how these messages are happening, too:
D, [2012-07-18T15:12:43.185808 #17213] DEBUG -- : waiting 151.5s after
suspend/hibernation

Can you tell (from Rails logs) if the to-be-killed workers are still
processing requests/responses the 300s before when the unicorn timeout
hits it?  AFAIK, Rails logs the PID of each worker processing the
request.

Also, what in your app takes 150s, or even 70s?  I'm curious why the
timeouts are so high.  I wonder if there are bugs with unicorn/rainbows
with huge timeout values, too...

If anything, I'd lower the unicorn timeout to something low (maybe
5-10s) since that detects hard lockups at the VM level.  Individual
requests in Rainbows! _are_ allowed to take longer than the unicorn
timeout.

Can you reproduce this in a simulation environment or only with real
traffic?  If possible, can you setup an instance with a single worker
process and get an strace ("strace -f") of all the threads when this
happens?

> We're running ruby 1.9.3-p125 with the performance patches at
> https://gist.github.com/1688857.

Can you reproduce this with an unpatched 1.9.3-p194?  I'm not too
familiar with the performance patches, but I'd like to reduce the amount
of less-common/tested code to isolate the issue.

> I listed the gems we use and which
> ones that have c extension at https://gist.github.com/3139226.

Fortunately, I'm familiar with nearly all of these C gems.

Newer versions of mysql2 should avoid potential issues with
ThreadTimeout/Timeout (or anything that hits Thread#kill).  I think
mysql2 0.2.9 fixed a fairly important bug, and 0.2.18 fixed a very rare
(but possibly related to your issue) bug,

Unrelated to your current issue, I strongly suggest Ruby 1.9.3-p194,
previous versions had a nasty GC memory corruption bug triggered
by Nokogiri (ref: https://github.com/tenderlove/nokogiri/issues/616)

I also have no idea why mongrel is in there :x

> We'll try running without the ThreadTimeout. We don't think we're
> having deadlock issues because our stress tests do not timeout but
> they do 502 when the rainbows worker gets killed during a request.

OK.  I'm starting to believe ThreadTimeout isn't good for the majority
of applications out there, and perhaps the only way is to have support
for this tightly coupled with the VM.  Even then, "ensure" clauses would
still be tricky/ugly to deal with...  So maybe forcing developers to use
app/library-level timeouts for everything they do is the only way.
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying


  parent reply	other threads:[~2012-07-19  0:28 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-18 18:52 Unicorn is killing our rainbows workers Samuel Kadolph
     [not found] ` <CAFFC5+MUdUoXhBXvw8VnnVAZsQpN1idELr0nc_Xm0HYcdtQVhA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-18 19:20   ` Jason Lewis
2012-07-18 21:52   ` Eric Wong
     [not found]     ` <20120718215222.GA11539-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2012-07-18 23:06       ` Samuel Kadolph
     [not found]         ` <CAFFC5+N=_bnyM=0WbtLxPAncs0TV4wA9P8TXZ_-T3qOtW-+w3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-19  0:26           ` Eric Wong [this message]
     [not found]             ` <20120719002641.GA17210-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2012-07-19 14:29               ` Samuel Kadolph
     [not found]                 ` <CAFFC5+NfChEobr7asqPx+3-U8_mHZqOgCLjRw=w6iCZ=z0-oCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-19 20:16                   ` Eric Wong
     [not found]                     ` <20120719201633.GA8203-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2012-07-19 20:57                       ` Samuel Kadolph
     [not found]                         ` <CAFFC5+NiPhu3oyEZ8woDdmH1zdPDDy9-fK3FhWPqv-6u=yFxgg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-19 21:31                           ` Eric Wong
     [not found]                             ` <20120719213125.GA17708-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2012-07-20  0:23                               ` Samuel Kadolph
     [not found]                                 ` <CAFFC5+MKdkmLknbLeRzMNzfTVoyj9JDahFSd1Nb90vsbgS4fuQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-26 23:48                                   ` Eric Wong
     [not found]                                     ` <20120726234845.GA29453-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2012-07-27  0:00                                       ` Samuel Kadolph
     [not found]                                         ` <CAFFC5+PvKhbRWH9aLKgc3k-z+2tEPpqLrMa5+6mEUnO2K_X+9Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-27  0:11                                           ` Eric Wong
     [not found]                                             ` <20120727001125.GA30957-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2012-07-27 20:01                                               ` Samuel Kadolph
     [not found]                                                 ` <CAFFC5+MqyVEfLJN2rxae7_NPOT=8+X4cBbTz6YYgLzuC8ySXjg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-27 20:40                                                   ` Eric Wong
     [not found]                                                     ` <20120727204040.GA2192-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2012-07-31 14:09                                                       ` Samuel Kadolph
     [not found]                                                         ` <CAFFC5+OYa5+nVqLFnzVkfAyq8WU57QztkvcP5tdSBDWU-2+SaQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-31 20:28                                                           ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://yhbt.net/rainbows/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120719002641.GA17210@dcvr.yhbt.net \
    --to=normalperson-rmlxzr9ms24@public.gmane.org \
    --cc=cody.fauser-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org \
    --cc=harry.brundage-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org \
    --cc=jonathan.rudenberg-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org \
    --cc=ops-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org \
    --cc=rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhbt.net/rainbows.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).