hopefully the end of *any* OobGC Eric Wong - 09/15
next
We may finally start deprecating OobGC, as it looks like incremental
GC[1] is working well in ruby-trunk and will make it into the Ruby
2.2.0-preview1 release (should be out soon).

I've been running unicorn.bogomips.org on ruby-trunk for most of the
year without problems, and hte past few weeks with incremental GC.
However, keep in mind I mainly serve static files (served by extras/* in
yahns.git[2]) and don't use lot of heavy-weight code.

If you find bugs in ruby-trunk itself, please report them to the
https://bugs.ruby-lang.org/ issue tracker.  That has ruby-core ML
integration[3]) so I can take a look at them.  Thanks.


[1] https://bugs.ruby-lang.org/issues/10137
[2] git clone git://yhbt.net/yahns
[3] yes, I still hate using web browsers and logins :P
message raw reply permalink

  Re: hopefully the end of *any* OobGC Bráulio Bhavamitra - 09/15
  next prev
  Eric, do you have any kind of benchmark comparing the ruby versions?
  
  On Mon, Sep 15, 2014 at 4:21 PM, Eric Wong <e@80x24.org> wrote:
  
  > <We may finally start deprecating OobGC, as it looks like incremental ...>
  more... raw reply parent permalink

    Re: hopefully the end of *any* OobGC Eric Wong - 09/15
    next prev
    Bráulio Bhavamitra <braulio@eita.org.br> wrote:
    > Eric, do you have any kind of benchmark comparing the ruby versions?
    
    See ko1's links and benchmarks in the original page.
    
    tl;dr: throughput is slightly worse, but performance is more
           consistent (fewer spikes).  Consistent performance is important
           for interactive use such as web servers.
    
    > > [1] https://bugs.ruby-lang.org/issues/10137
    
    Also, as I've told you before: please stop top-posting, sending HTML,
    and using the ridiculously large signature.  These messages get seen by
    hundreds, if not thousands of people and you're wasting precious
    bandwidth/storage/cache space for all of them.  Paying attention to
    these things is an important part of what makes unicorn work.
    
    And FWIW, I've been spending much of the past few months applying this
    philosophy to ruby-trunk and making tiny reductions in the core RubyVM
    data structures here and there, slowly working up to saving several
    hundreds of kilobytes in the hottest sections of memory (this will
    become megabytes saved with bigger Rails apps).
    
    There's much more work to be done, but the secondary goal of this is to
    get me more acquainted with VM internals more so I can dig into less
    trivial improvements.
    message raw reply parent permalink

fork() errors lead to a completely dead unicorn Jonathan del Strother - 09/03
next prev
Hi - on SmartOS & Solaris, we occasionally run into problems where
unicorn receives USR2 to reload itself, but can't fork off its workers
due to not having enough RAM.  It then kills all of its workers and
sits there failing to process any requests.  Unfortunately, the master
process stays alive - if it actually died, we'd be able to
automatically restart it.

Can we do anything to handle this more elegantly?

Jonathan

PS: An example log file from when this occurs -

I, [2014-09-03T08:51:29.034227 #7556]  INFO -- : executing
["/app/common/bundle/ruby/2.1.0/bin/unicorn", "--env",
"production-live", "--daemonize", "--config-file",
"/app/code/config/unicorn.rb", {17=>#<Kgio::TCPServer:fd 17>}] (in
/app/code)
I, [2014-09-03T08:51:29.035223 #7556]  INFO -- : forked child re-executing...
I, [2014-09-03T08:51:30.480393 #7556]  INFO -- : inherited
addr=0.0.0.0:8090 fd=17
I, [2014-09-03T08:51:30.481257 #7556]  INFO -- : Refreshing Gem list
D, [2014-09-03T08:51:41.715061 #7556] DEBUG -- : ** [Airbrake]
Notifier 3.1.14 ready to catch errors
I, [2014-09-03T08:51:45.437499 #7952]  INFO -- : worker=0 ready
I, [2014-09-03T08:51:45.471084 #7959]  INFO -- : worker=1 ready
I, [2014-09-03T08:51:45.513301 #7960]  INFO -- : worker=2 ready
I, [2014-09-03T08:51:45.558417 #7961]  INFO -- : worker=3 ready
E, [2014-09-03T08:51:45.931282 #7556] ERROR -- : Not enough space -
fork(2) (Errno::ENOMEM)
/app/common/bundle/ruby/2.1.0/gems/unicorn-4.8.3/lib/unicorn/http_server.rb:520:in
`fork'
/app/common/bundle/ruby/2.1.0/gems/unicorn-4.8.3/lib/unicorn/http_server.rb:520:in
`spawn_missing_workers'
/app/common/bundle/ruby/2.1.0/gems/unicorn-4.8.3/lib/unicorn/http_server.rb:140:in
`start'
/app/common/bundle/ruby/2.1.0/gems/unicorn-4.8.3/bin/unicorn:126:in
`<top (required)>'
/app/common/bundle/ruby/2.1.0/bin/unicorn:23:in `load'
/app/common/bundle/ruby/2.1.0/bin/unicorn:23:in `<main>'
E, [2014-09-03T08:51:46.139737 #10484] ERROR -- : reaped
#<Process::Status: pid 7556 exit 1> exec()-ed
I, [2014-09-03T08:51:48.069452 #10484]  INFO -- : reaped
#<Process::Status: pid 21801 exit 0> worker=1
I, [2014-09-03T08:51:48.372431 #10484]  INFO -- : reaped
#<Process::Status: pid 67829 exit 0> worker=3
I, [2014-09-03T08:51:48.473412 #10484]  INFO -- : reaped
#<Process::Status: pid 57211 exit 0> worker=0
I, [2014-09-03T08:51:48.574279 #10484]  INFO -- : reaped
#<Process::Status: pid 70992 exit 0> worker=2
I, [2014-09-03T08:51:48.675085 #10484]  INFO -- : reaped
#<Process::Status: pid 11195 exit 0> worker=5
I, [2014-09-03T08:51:48.876051 #10484]  INFO -- : reaped
#<Process::Status: pid 11194 exit 0> worker=4
I, [2014-09-03T08:51:48.876341 #10484]  INFO -- : master complete
message raw reply permalink

  Re: fork() errors lead to a completely dead unicorn Eric Wong - 09/03
  next prev
  Jonathan del Strother <maillist@steelskies.com> wrote:
  > Hi - on SmartOS & Solaris, we occasionally run into problems where
  > unicorn receives USR2 to reload itself, but can't fork off its workers
  > due to not having enough RAM.  It then kills all of its workers and
  > sits there failing to process any requests.  Unfortunately, the master
  > process stays alive - if it actually died, we'd be able to
  > automatically restart it.
  
  I wonder if this is an SMF problem.  At the bottom of your log,
  it says "master complete", which seems to be the master which received
  the USR2.
  
  I'll walk through the log to see how things look from my end...
  
  > Can we do anything to handle this more elegantly?
  
  > PS: An example log file from when this occurs -
  > 
  > I, [2014-09-03T08:51:29.034227 #7556]  INFO -- : executing
  > ["/app/common/bundle/ruby/2.1.0/bin/unicorn", "--env",
  > "production-live", "--daemonize", "--config-file",
  
  7556 is the new child which eventually fails.
  
  > <"/app/code/config/unicorn.rb", {17=>#<Kgio::TCPServer:fd ...>
  
  OK, fork fails from the new child; current behavior is to exit.
  
  > /app/common/bundle/ruby/2.1.0/gems/unicorn-4.8.3/lib/unicorn/http_server.rb:520:in
  > `fork'
  > /app/common/bundle/ruby/2.1.0/gems/unicorn-4.8.3/lib/unicorn/http_server.rb:520:in
  > `spawn_missing_workers'
  > /app/common/bundle/ruby/2.1.0/gems/unicorn-4.8.3/lib/unicorn/http_server.rb:140:in
  > `start'
  > /app/common/bundle/ruby/2.1.0/gems/unicorn-4.8.3/bin/unicorn:126:in
  > `<top (required)>'
  
  OK, this should've hit the exit! case in spawn_missing_workers,
  and it does...
  
  > /app/common/bundle/ruby/2.1.0/bin/unicorn:23:in `load'
  > /app/common/bundle/ruby/2.1.0/bin/unicorn:23:in `<main>'
  > E, [2014-09-03T08:51:46.139737 #10484] ERROR -- : reaped
  > #<Process::Status: pid 7556 exit 1> exec()-ed
  
  old master which originally received USR2 is notified the new master
  (7556) died
  
  > I, [2014-09-03T08:51:48.069452 #10484]  INFO -- : reaped
  > #<Process::Status: pid 21801 exit 0> worker=1
  > I, [2014-09-03T08:51:48.372431 #10484]  INFO -- : reaped
  > #<Process::Status: pid 67829 exit 0> worker=3
  > I, [2014-09-03T08:51:48.473412 #10484]  INFO -- : reaped
  > #<Process::Status: pid 57211 exit 0> worker=0
  > I, [2014-09-03T08:51:48.574279 #10484]  INFO -- : reaped
  > #<Process::Status: pid 70992 exit 0> worker=2
  > I, [2014-09-03T08:51:48.675085 #10484]  INFO -- : reaped
  > #<Process::Status: pid 11195 exit 0> worker=5
  > I, [2014-09-03T08:51:48.876051 #10484]  INFO -- : reaped
  > #<Process::Status: pid 11194 exit 0> worker=4
  
  Workers in the old master dying looks like the SMF problem you
  encountered with SIGABRT earlier.
  
  > I, [2014-09-03T08:51:48.876341 #10484]  INFO -- : master complete
  
  But the original master does not die after this?
  
  Can you truss it and see if it's stuck on reading/unlinking the pidfile?
  That would the only thing preventing the master from actually dying,
  but the old master dying should not happen in the first place.
  more... raw reply parent permalink

    Re: fork() errors lead to a completely dead unicorn Jonathan del Strother - 09/07
    next prev
    Just wanted to say thanks for the reply - I've been trying to figure
    this out over the weekend and not succeeding.  I can't seem to
    reproduce it in a self-contained environment, it only ever happens in
    production, which is making debugging a bit frustrating...
    
    >
    >> I, [2014-09-03T08:51:48.876341 #10484]  INFO -- : master complete
    >
    > But the original master does not die after this?
    
    99% sure it doesn't - it just sits there in a zombie state with no
    workers.  But I want to verify that, so I guess I'm stuck waiting
    until it happens in production again.  Will let you know.
    message raw reply parent permalink

Worker SIGABRT takes down all workers? Jonathan del Strother - 08/21
next prev
Hi there,
We're trying to figure out a problem we're having where Ruby 2.1.2 /
ImageMagick is causing a SIGABRT in a unicorn worker process.  When it
does so, I see this in unicorn.log :

I, [2014-08-21T11:40:44.282905 #65706]  INFO -- : reaped
#<Process::Status: pid 66787 exit 0> worker=0
I, [2014-08-21T11:40:44.283289 #65706]  INFO -- : reaped
#<Process::Status: pid 66801 exit 0> worker=7
I, [2014-08-21T11:40:44.283583 #65706]  INFO -- : reaped
#<Process::Status: pid 66790 exit 0> worker=3
I, [2014-08-21T11:40:44.283871 #65706]  INFO -- : reaped
#<Process::Status: pid 66793 exit 0> worker=4
I, [2014-08-21T11:40:44.284198 #65706]  INFO -- : reaped
#<Process::Status: pid 66794 exit 0> worker=5
I, [2014-08-21T11:40:44.284538 #65706]  INFO -- : reaped
#<Process::Status: pid 66798 exit 0> worker=6
I, [2014-08-21T11:40:44.284832 #65706]  INFO -- : reaped
#<Process::Status: pid 66788 exit 0> worker=1
E, [2014-08-21T11:40:44.385491 #65706] ERROR -- : reaped
#<Process::Status: pid 66789 SIGABRT (signal 6) (core dumped)>
worker=2
I, [2014-08-21T11:40:44.385951 #65706]  INFO -- : master complete
I, [2014-08-21T11:40:48.073191 #72528]  INFO -- : Refreshing Gem list
I, [2014-08-21T11:41:06.841582 #72528]  INFO -- : listening on
addr=0.0.0.0:8090 fd=17

So, as far as I can tell, the SIGABRT in a worker causes all siblings
workers to be reaped and restarted.   Is that intended/expected?
Shouldn't it just kill the single worker & restart that?

-Jonathan
message raw reply permalink

  Re: Worker SIGABRT takes down all workers? Eric Wong - 08/21
  next prev
  Jonathan del Strother <jon.delStrother@audioboo.fm> wrote:
  > So, as far as I can tell, the SIGABRT in a worker causes all siblings
  > workers to be reaped and restarted.   Is that intended/expected?
  > Shouldn't it just kill the single worker & restart that?
  
  Not intended or expected.  I cannot reproduce it by doing:
  
  	kill -ABRT $WORKER_PID
  
  Can you?
  Can you setup a small test case which reproduces the issue?
  Is anything else running that could be sending signals to all workers?
  
  Thanks for any more info you can provide.
  message raw reply parent permalink

    Re: Worker SIGABRT takes down all workers? Jonathan del Strother - 08/22
    next prev
    On 21 August 2014 21:32, Eric Wong <e@80x24.org> wrote:
    > Jonathan del Strother <jon.delStrother@audioboo.fm> wrote:
    >> So, as far as I can tell, the SIGABRT in a worker causes all siblings
    >> workers to be reaped and restarted.   Is that intended/expected?
    >> Shouldn't it just kill the single worker & restart that?
    >
    > Not intended or expected.  I cannot reproduce it by doing:
    >
    >         kill -ABRT $WORKER_PID
    >
    
    You're quite right, sorry.  I'm running via SMF, which is detecting
    the child process's SIGABRT and deciding to restart the entire group.
    I'll look into getting SMF to ignore those.
    message raw reply parent permalink

more house-cleaning for unicorn 5 Eric Wong - 08/17
next prev
The only user-visible change would be the removal of the Status:
header from the HTTP response, but I doubt anybody would even
notice.

Eric Wong (3):
      dev: remove isolate dependency
      unicorn.gemspec: depend on test-unit 3.0
      http_response: remove Status: header
message raw reply permalink

  [PATCH 1/3] dev: remove isolate dependency Eric Wong - 08/17
  next prev
  It seems unnecessary with current versions of RubyGems
  supporting development dependencies.
  more... raw reply parent permalink

  [PATCH 2/3] unicorn.gemspec: depend on test-unit 3.0 Eric Wong - 08/17
  next prev
  test-unit 3 and minitest 5 will have equal support status as a
  bundled gems when Ruby 2.2.0 is released in December 2014.  These
  bundled gems will appear in the user-oriented tarball installations,
  but do not get installed by "make install" when installing Ruby
  from SVN or git.
  
  test-unit appears to be actively maintained and good at keeping
  backwards compatibility even on a major version change, so this
  means no code changes on our end.  I am not convinced switching to
  minitest is worth the effort.
  
  Cc: Ken Dreyer <ktdreyer@ktdreyer.com>
  more... raw reply parent permalink

  [PATCH 3/3] http_response: remove Status: header Eric Wong - 08/17
  next prev
  Whatever compatibility reasons which existed in 2009 likely do not exist
  now.  Other servers (e.g. thin, puma) seem to work alright without it,
  so there's no reason to waste precious bytes.
  more... raw reply parent permalink

Re: Rack encodings (was: Please move to github) Gary Grossman - 08/05
next prev
It feels like we were getting some momentum here on an important but
long-dormant issue here... maybe it's time to move this discussion
to rack-devel? Perhaps there's another Rack luminary who can lead
the charge, or at least see if there's some consensus after a few
more years of shared experience on what "sane" encodings might
look like.

A lightweight way to move the implementation forward might be a
simple Rack middleware gem which sets the new encodings on the 
environment, or adding the functionality to rack itself. Once
developers were comfortable with the new regime, the app servers
could follow suit and put those encodings in the env natively,
and the Rubyland implementation of the new encodings could be
dropped.

Gary
message raw reply permalink

  Re: Rack encodings (was: Please move to github) Eric Wong - 08/05
  next prev
  Gary Grossman <gary.grossman@gmail.com> wrote:
  > It feels like we were getting some momentum here on an important but
  > long-dormant issue here... maybe it's time to move this discussion
  > to rack-devel?
  
  Sure, rack-devel is a pretty dormant mailing list but there's been a
  burst of activity a few weeks ago.
  
  Unlike this list, subscription is required to post; and first posts
  from newbies are moderated.  For folks who do not login to Google
  (crazies like me :P) subscription is possible without any login
  or password: rack-devel+subscribe@googlegroups.com
  
  > Perhaps there's another Rack luminary who can lead
  > the charge, or at least see if there's some consensus after a few
  > more years of shared experience on what "sane" encodings might
  > look like.
  
  At least there's other server implementers who'll probably
  chime in.
  
  > A lightweight way to move the implementation forward might be a
  > simple Rack middleware gem which sets the new encodings on the 
  > environment, or adding the functionality to rack itself. Once
  > developers were comfortable with the new regime, the app servers
  > could follow suit and put those encodings in the env natively,
  > and the Rubyland implementation of the new encodings could be
  > dropped.
  
  Sounds like a good plan.  Thanks for bringing more attention to this.
  message raw reply parent permalink

  Re: Weird Unicorn Timeout Issues (Hibernation problem?) Eric Wong - 08/04
  next prev
  Daniel Condomitti <daniel@condomitti.com> wrote:
  > It could also be that your TCP keepalive interval is higher than your
  > database server’s connection timeout. I’ve run into that in the past.
  
  That kicks in at around 2 hours by default on Linux systems.
  I'm not sure it would matter for Tony's case since he hit it
  after ~30 minutes of idle (unless he tuned the knobs himself).
  
  ref: tcp_keep* knobs in
  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/ip-sysctl.txt
  
  unicorn itself has no timers outside of the configurable timeout.
  message raw reply parent permalink

    Re: Weird Unicorn Timeout Issues (Hibernation problem?) Tony Devlin - 08/04
    next prev
    Yep, it occurs after 30 minutes of inactivity.  Down to the minute; I hit
    the site at 3:40 and tried at 4:10 and sure enough:
    
    E, [2014-08-04T16:10:52.143541 #2596] ERROR -- : worker=3D0 PID:2599 timeou=
    t
    (21s > 20s), killing
    E, [2014-08-04T16:10:52.158459 #2596] ERROR -- : reaped #<Process::Status:
    pid 2599 SIGKILL (signal 9)> worker=3D0
    I, [2014-08-04T16:10:52.181648 #3086]  INFO -- : worker=3D0 ready
    
    2014/08/04 16:10:52 [error] 1684#0: *13 upstream prematurely closed
    connection while reading response header from upstream, client: *.*.*.*,
    server: ***.org, request: "GET /outages HTTP/1.1", upstream:
    "http://unix:/var/www/sites/***/shared/sockets/.unicorn.sock.0:/outages",
    host: "***.org", referrer: "http://***.org/outages"
    
    =E2=80=8B=E2=80=8B=3D=3D=3D
    
    This occurs on both instances of unicorn workers that we have opened.  I'm
    going to reduce that to one instance, per Eric, to continue troubleshooting
    in the smallest possible way.
    
    1) It does not appear to be an nginx persistent connection issue, because
    once the worker is reaped and restarted, nginx serves the content with no
    problems.
    2) No NFS mounts, no file locks, no FIFO issues.  (note: one of the apps
    does write to files, aside from logs, but problem exists in both apps).
    
    It's also important to note that once the worker is reaped the site is
    blazingly fast, sub second responses (2s most time spent to show the
    biggest page).  Until 30 minutes of inactivity, in which case timeout issue
    and worker is reaped (rinse and repeat).
    
    For the database portion, the DBA says inactivity is killed after 3 hours.
     Far greater time span than this issue is occurring.
    
    Have any other ideas of places I can look?  It's too consistent, it has to
    be some specific setting or functionality that does this.
    
    I checked my TCP Timeout settings just in case, but the timeout is set to
    2hrs.
    
    
    On Mon, Aug 4, 2014 at 3:34 PM, Eric Wong <e@80x24.org> wrote:
    
    > Daniel Condomitti <daniel@condomitti.com> wrote:
    > > It could also be that your TCP keepalive interval is higher than your
    > > database server=E2=80=99s connection timeout. I=E2=80=99ve run into tha=
    t in the past.
    >
    > That kicks in at around 2 hours by default on Linux systems.
    > I'm not sure it would matter for Tony's case since he hit it
    > after ~30 minutes of idle (unless he tuned the knobs himself).
    >
    > ref: tcp_keep* knobs in
    >
    > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Docu=
    mentation/networking/ip-sysctl.txt
    >
    > unicorn itself has no timers outside of the configurable timeout.
    >
    message raw reply parent permalink

      Re: Weird Unicorn Timeout Issues (Hibernation problem?) Eric Wong - 08/04
      next prev
      Tony Devlin <tonydevlin@gmail.com> wrote:
      > Have any other ideas of places I can look?  It's too consistent, it has to
      > be some specific setting or functionality that does this.
      
      Unless you find something out-of-the-ordinary with lsof,
      we'd have to pull apart your apps to see what they're doing.
      
      I just tested my hello world app (inactive for ~45 minutes) and
      could not reproduce the error.
      
      Did you try strace-ing for 30 minutes and reproducing the error?
      
      I'm running out of ideas...
      
      Perhaps your NTP setup is broken?  Or even hardware clock failure
      (one of my machines hit that a few weeks ago).
      message raw reply parent permalink

        Re: Weird Unicorn Timeout Issues (Hibernation problem?) Eric Wong - 08/04
        next prev
        Eric Wong <e@80x24.org> wrote:
        > Did you try strace-ing for 30 minutes and reproducing the error?
        
        You can also try setting the unicorn timeout to longer than 30
        minutes and get a longer/stalled strace.
        message raw reply parent permalink

          Re: Weird Unicorn Timeout Issues (Hibernation problem?) Tony Devlin - 08/05
          next prev
          I appreciate all your help Eric and Daniel.  I have not solved this yet,
          but I think I have narrowed it down to a Firewall timeout issue.  One app
          uses a database connection to Oracle, the other app uses a 3rd Party API
          (still on location, but across the network).  The ping times to both of
          these devices are extremely fast, however 30 minutes of inactivity across
          the Firewall seems to disconnect these connections.  At least that appears
          to be what the strace is telling me.  The place in the strace that the
          timeout occurs is consistent, every time.  For example the strace of the
          app that connects to Oracle shows this:
          
          pid  7825] write(14,
          "\0\373\0\0\6\0\0\0\0\0\21iB\376\377\377\377\377\377\377\377\1\0\0\0\0\0\0\0\v\0\0\0\3^Ca\201\0\0\0\0\0\0\376\377\377\377\377\377\377\377\22\0\0\0\376\377\377\377\377\377\377\377\r\0\0\0\376\377\377\377\377\377\377\377\376\377\377\377\377\377\377\377\0\0\0\0d\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\376\377\377\377\377\377\377\377\0\0\0\0\0\0\0\0\376\377\377\377\377\377\377\377\376\377\377\377\377\377\377\377\376\377\377\377\377\377\377\377\0\0\0\0\0\0\0\0\376\377\377\377\377\377\377\377\376\377\377\377\377\377\377\377\22select
          1 from
          dual\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0",
          251) = 251
          [pid  7825] read(14,  <unfinished ...>
          [pid  7827] +++ killed by SIGKILL +++
          PANIC: handle_group_exit: 7827 leader 7825
          [pid  7846] +++ killed by SIGKILL +++
          PANIC: handle_group_exit: 7846 leader 7825
          +++ killed by SIGKILL +++
          
          Clearly that is a database query 'select 1 from dual'.  It times out trying
          to read the response.  At the same time if I watch the lsof -p <pid>, I see
          that the database connection drops after 30 minutes.
          
          I'll update this thread again once it is solved, for historical and future
          issues (in case someone else experiences something similar).
          
          Again thank you for your help Eric!
          
          
          On Mon, Aug 4, 2014 at 4:46 PM, Eric Wong <e@80x24.org> wrote:
          
          > Eric Wong <e@80x24.org> wrote:
          > > Did you try strace-ing for 30 minutes and reproducing the error?
          >
          > You can also try setting the unicorn timeout to longer than 30
          > minutes and get a longer/stalled strace.
          >
          message raw reply parent permalink

            Re: Weird Unicorn Timeout Issues (Hibernation problem?) Eric Wong - 08/06
            next prev
            Tony Devlin <tonydevlin@gmail.com> wrote:
            > pid  7825] write(14,
            > "\0\373\0\0\6\0\0\0\0\0\21iB\376\377\377\377\377\377\377\377\1\0\0\0\0\0\0\0\v\0\0\0\3^Ca\201\0\0\0\0\0\0\376\377\377\377\377\377\377\377\22\0\0\0\376\377\377\377\377\377\377\377\r\0\0\0\376\377\377\377\377\377\377\377\376\377\377\377\377\377\377\377\0\0\0\0d\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\376\377\377\377\377\377\377\377\0\0\0\0\0\0\0\0\376\377\377\377\377\377\377\377\376\377\377\377\377\377\377\377\376\377\377\377\377\377\377\377\0\0\0\0\0\0\0\0\376\377\377\377\377\377\377\377\376\377\377\377\377\377\377\377\22select
            > 1 from
            > dual\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0",
            > 251) = 251
            > [pid  7825] read(14,  <unfinished ...>
            > [pid  7827] +++ killed by SIGKILL +++
            
            Any update?  It looks like your DB driver is not using/respecting any
            timeout at all[1].  It is bad to not have a timeout there.  There should
            be a way to set a timeout so you can at least tell the user the DB
            connection dropped or maybe get your app to disconnect+retry once.
            
            A better looking strace would be something like:
            
                write(fd, ...); => success
                (poll|select|ppoll) syscall ...
                read(fd, ...); /* only if (poll|select|ppoll) was successful[2] */
            
            This goes for configuring all connections/services for any app.
            
            [1] or if it's relying on SO_RCVTIMEO socket option(rare), that's set
                way too high.  Any timeout set for any external connection should
                be lower than the unicorn (last-resort) timeout feature.
            
            [2] any read() syscall after (poll|select|ppoll) should be non-blocking,
                because (poll|select|ppoll) may spuriously wakeup.
            message raw reply parent permalink

              Re: Weird Unicorn Timeout Issues (Hibernation problem?) Tony Devlin - 08/06
              next prev
              Eric,
              
              The problem is a firewall that sits between the servers and the database.
               It is an idle session timeout of 30 minutes, so it is silently killing the
              connection.  I have reached out to our Network Engineering department but
              they are saying they can not change that idle session timeout, nor create a
              special rule to allow this connection to bypass that rule.
              
              Currently, I setup a polling device that calls the applications URL every
              20 minutes.  This causes the connection between the server and DB to
              refresh it's idle timeout.  This is obviously a very hacky way to handle
              it, so I am trying to look into AR and Oracle_Enhanced to see if they have
              some sort of keepalive option for the database.  I thought it would work
              with the reaping_frequency, but apparently that does not work out as I had
              expected when you are not running in pools or a thread.  So I'm still on
              the lookout for something to handle that.
              
              
              
              
              On Wed, Aug 6, 2014 at 5:45 AM, Eric Wong <e@80x24.org> wrote:
              > <> > Any update? It looks like your DB driver is not ...>
              message raw reply parent permalink

                Re: Weird Unicorn Timeout Issues (Hibernation problem?) Daniel Condomitti - 08/06
                next prev
                This is exactly what happened to us and I should have been clearer. I wasn’t referring to the default Linux kernel settings causing the killing the connection; it was a network device between our application servers and the database server. It only affected certain applications as some were hit hundreds of times per second and would never be disconnected and the ones that would disconnect were only hit a few times per hour. I -believe- we just dropped the keepalive interval on both sides of the firewall below its idle timeout.  
                
                
                On Wednesday, August 6, 2014 at 7:05 AM, Tony Devlin wrote:
                
                > <Eric, > > The problem is a firewall that sits between the servers ...>
                message raw reply parent permalink

  Re: Weird Unicorn Timeout Issues (Hibernation problem?) Eric Wong - 08/04
  next prev
  Tony Devlin <tonydevlin@gmail.com> wrote:
  > Thank you Eric,
  > 
  > I will look into the other worker to see what is going on with it.  I still
  > appreciate any hints you all can give me on where I can check.   I'm also
  > looking into the OS TCP timeouts to see if what Daniel said may be a
  > problem.
  
  General rule for me is to get the problem reproducible in the smallest
  possible way.  That could mean removing features, cutting out large
  chunks of code, cutting out certain request types, reducing
  workers.
  
  More things:
  
  1) Can you make sure nginx is not trying to maintain persistent
    connections?  nginx should respect unicorn closing the connection
    but I haven't checked the latest versions of nginx.
    lsof can help here, too.
  
    unicorn currently does not do persistent connections, allowing
    an M:N relationship between nginx instances and unicorn
    instances[1]
  
  2) Any other odd external dependencies such as NFS mounts,
     file locks, FIFOs, etc?
  
  
  [1] Perhaps persistent connections will be an option in the future
      if the support/documentation overhead is worth it, as nginx
      supports persistent connections to backends nowadays.
  message raw reply parent permalink

    Re: Weird Unicorn Timeout Issues (Hibernation problem?) Michael Fischer - 08/04
    next prev
    On Mon, Aug 4, 2014 at 12:45 PM, Eric Wong <e@80x24.org> wrote:
    
    [1] Perhaps persistent connections will be an option in the future
    >     if the support/documentation overhead is worth it, as nginx
    >     supports persistent connections to backends nowadays.
    >
    
    I don't believe the added complexity is worth the effort.
    
    --Michael
    message raw reply parent permalink


page: next      atom permalink
- unicorn Rack HTTP server user/dev discussion
A public-inbox, anybody may post in plain-text (not HTML):
unicorn-public@bogomips.org
git URL for ssoma: git://bogomips.org/unicorn-public.git
homepage: http://unicorn.bogomips.org/
subscription optional: unicorn-public+subscribe@bogomips.org