From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-ASN: AS33070 50.56.128.0/17
X-Spam-Status: No, score=0.0 required=3.0 tests=MSGID_FROM_MTA_HEADER,
 TVD_RCVD_IP shortcircuit=no autolearn=unavailable version=3.3.2
Path: news.gmane.org!not-for-mail
From: Samuel Kadolph <samuel.kadolph-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>
Newsgroups: gmane.comp.lang.ruby.rainbows.general
Subject: Re: Unicorn is killing our rainbows workers
Date: Thu, 19 Jul 2012 20:23:35 -0400
Message-ID:
 <CAFFC5+MKdkmLknbLeRzMNzfTVoyj9JDahFSd1Nb90vsbgS4fuQ@mail.gmail.com>
References:
 <CAFFC5+MUdUoXhBXvw8VnnVAZsQpN1idELr0nc_Xm0HYcdtQVhA@mail.gmail.com>
 <20120718215222.GA11539@dcvr.yhbt.net>
 <CAFFC5+N=_bnyM=0WbtLxPAncs0TV4wA9P8TXZ_-T3qOtW-+w3Q@mail.gmail.com>
 <20120719002641.GA17210@dcvr.yhbt.net>
 <CAFFC5+NfChEobr7asqPx+3-U8_mHZqOgCLjRw=w6iCZ=z0-oCg@mail.gmail.com>
 <20120719201633.GA8203@dcvr.yhbt.net>
 <CAFFC5+NiPhu3oyEZ8woDdmH1zdPDDy9-fK3FhWPqv-6u=yFxgg@mail.gmail.com>
 <20120719213125.GA17708@dcvr.yhbt.net>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Trace: dough.gmane.org 1342743830 655 80.91.229.3 (20 Jul 2012 00:23:50
 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Fri, 20 Jul 2012 00:23:50 +0000 (UTC)
Cc: Cody Fauser <cody.fauser-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>, ops
 <ops-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>, Harry Brundage
 <harry.brundage-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>, Jonathan Rudenberg
 <jonathan.rudenberg-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>
To: "Rainbows! list" <rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org>
Original-X-From:
 rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Fri Jul 20
 02:23:49 2012
Return-path: <rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org>
Envelope-to: gclrrg-rainbows-talk@m.gmane.org
X-Original-To: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
Delivered-To: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shopify.com;
 s=google;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type; bh=QQEF8GJNO3IS1rYhLlp6wuI1diPIgzUAiEs6RMfMbaE=;
 b=C3E7uxwuE0UwhVG0Xb4Bs7UysqF5EMOncGJBZotrRALlWYJQKEhvFHHxzs/WTZgrkz
 kulNJmzY7z+4uRcTBpLI0tyr6fShOsktpRwb4vZ0B/GS9PX8EFCi4KIVRqeabCLcLf/Z
 Y8zr4dFihc8/dTMsRe3Dika6qHkLC5U15TUrg=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type:x-gm-message-state;
 bh=QQEF8GJNO3IS1rYhLlp6wuI1diPIgzUAiEs6RMfMbaE=;
 b=Ov/r7bo4v8tMxl51d9n0goT1zn9xTpetscP9ih7SQC6NKQCru+/feWI5g48bLp4SBR
 FHnI/DWT2MeR2GoLywUbkCIbu9L30SJqpfFjQ+eKd2Km/Ok4AA+2UF+2P+yEwdgfgkPM
 /1htXW0G8zvOvbH1jxSqtHsnB714HWIPaQ0S5cfpYSNUrN0Rv3+W9aBOplGb21VdZwL/
 RIbsGwpscIq4Mmv8YDF6mdaiUmRkhWifKnxDwAAm+IIEnS7wV4aH0hlyss13JIZM2Z30
 iAAMjSZjXzf9sApqy/xBnOe3L7484hkRcUqo2RU5b7kTOX1phgI/qqxHFc2Hh+ndZp+y NxZA==
In-Reply-To: <20120719213125.GA17708-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
X-Gm-Message-State:
 ALoCoQmGpej41DXJbZmm5+Bppsp7pPDo2+lVavD+uRUmP2TIbyiPOzlx2cK5tWWwplqenVLSDK/a
X-BeenThere: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: <rainbows-public@bogomips.org>
List-Unsubscribe: <http://rubyforge.org/mailman/options/rainbows-talk>,
 <mailto:rainbows-talk-request-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org?subject=unsubscribe>
List-Archive: <http://rubyforge.org/pipermail/rainbows-talk>
List-Post: <rainbows-public@bogomips.org>
List-Help:
 <mailto:rainbows-talk-request-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org?subject=help>
List-Subscribe: <http://rubyforge.org/mailman/listinfo/rainbows-talk>,
 <mailto:rainbows-talk-request-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org?subject=subscribe>
Original-Sender:
 rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
Errors-To: rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
Xref: news.gmane.org gmane.comp.lang.ruby.rainbows.general:380
Archived-At:
 <http://permalink.gmane.org/gmane.comp.lang.ruby.rainbows.general/380>
Received: from 50-56-192-79.static.cloud-ips.com ([50.56.192.79]
 helo=rubyforge.org) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from
 <rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org>) id
 1Ss10T-0002HU-Dd for gclrrg-rainbows-talk@m.gmane.org; Fri, 20 Jul 2012
 02:23:45 +0200
Received: from localhost.localdomain (localhost [127.0.0.1]) by rubyforge.org
 (Postfix) with ESMTP id 537AB2E06E; Fri, 20 Jul 2012 00:23:42 +0000 (UTC)
Received: from mail-pb0-f50.google.com (mail-pb0-f50.google.com
 [209.85.160.50]) by rubyforge.org (Postfix) with ESMTP id 6BB812E069 for
 <rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org>; Fri, 20 Jul 2012
 00:23:36 +0000 (UTC)
Received: by pbbrr4 with SMTP id rr4so5725384pbb.23 for
 <rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org>; Thu, 19 Jul 2012
 17:23:35 -0700 (PDT)
Received: by 10.68.235.68 with SMTP id uk4mr9238111pbc.52.1342743815831; Thu,
 19 Jul 2012 17:23:35 -0700 (PDT)
Received: by 10.66.217.225 with HTTP; Thu, 19 Jul 2012 17:23:35 -0700 (PDT)

On Thu, Jul 19, 2012 at 5:31 PM, Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org> wrote:
> Samuel Kadolph <samuel.kadolph-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org> wrote:
>> On Thu, Jul 19, 2012 at 4:16 PM, Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org> wrote:
>> > Samuel Kadolph <samuel.kadolph-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org> wrote:
>> > > On Wed, Jul 18, 2012 at 8:26 PM, Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org> wrote:
>> > > > Samuel Kadolph <samuel.kadolph-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org> wrote:
>> > > >> On Wed, Jul 18, 2012 at 5:52 PM, Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org> wrote:
>> > > >> > Samuel Kadolph <samuel.kadolph-/3HedJEncLlQ0OI7PeSoCw@public.gmane.org> wrote:
>> > > >> >> https://gist.github.com/9ec96922e55a59753997. Any insight into why
>> > > >> >> unicorn is killing our ThreadPool workers would help us greatly. If
>> > > >> >> you require additional info I would be happy to provide it.
>> > > >
>> > > > Also, are you using "preload_app true" ?
>> > >
>> > > Yes we are using preload_app true.
>> > >
>> > > > I'm a bit curious how these messages are happening, too:
>> > > > D, [2012-07-18T15:12:43.185808 #17213] DEBUG -- : waiting 151.5s after
>> > > > suspend/hibernation
>> > >
>> > > They are strange. My current hunch is the killing and that message are
>> > > symptoms of the same issue. Since it always follows a killing.
>> >
>> > I wonder if there's some background thread one of your gems spawns on
>> > load that causes the master to stall.  I'm not seeing how else unicorn
>> > could think it was in suspend/hibernation.
>
>> > Anyways, I'm happy your problem seems to be fixed with the mysql2
>> > upgrade :)
>>
>> Unfortunately that didn't fix the problem. We had a large sale today
>> and had 2 502s. We're going to try p194 on next week and I'll let you
>> know if that fixes it.
>
> Are you seeing the same errors as before in stderr for those?

Yeah, we get the same killing, reaping and suspend/hibernation
messages with the 5 second timeout. Upgrading mysql2 seemed to have
prevented any 502s during our stress tests but we that was no the
case.

> Can you also try disabling preload_app?
>
> But before disabling preload_app, you can also check a few things on
> a running master?
>
> * "lsof -p <pid_of_master>"
>
>   To see if there's odd connections the master is making.
>
> * Assuming you're on Linux, can you also check for any other threads
>   the master might be running (and possibly stuck on)?
>
>     ls /proc/<pid_of_master>/task/
>
>   The output should be 2 directories:
>
>     <pid_of_master>/
>     <tid_of_timer_thread>/
>
>   If you have a 3rd entry, you can confirm something in your app one of
>   your gems is spawning a background thread which could be throwing
>   the master off...

I'll see if we can try this tomorrow but it will probably be on Monday.

>> > > Our ops guys say we had this problem before we were using ThreadTimeout.
>> >
>> > OK.  That's somewhat reassuring to know (especially since the culprit
>> > seems to be an old mysql2 gem).  I've had other users (privately) report
>> > issues with recursive locking because of ensure clauses (e.g.
>> > Mutex#synchronize) that I forgot to document.
>>
>> We're going to try going without ThreadTimeout again to make sure
>> that's not the issue.
>
> Alright.
>
> Btw, I also suggest any Rails/application-level logs include the PID and
> timestamp of the request.  This way you can see and correlate the worker
> killing the request to when/if the Rails app stopped processing
> requests.

We found that one of our servers was actually out of the ELB pool so
it wasn't getting pinged constantly and it does not have any killing
messages (other than deploys, which also had the suspend/hibernation
messages). We'll have more time free next week to dig further into
this.
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying