$USER and $HOME shell variables not set
- by Russell Jennings @ 03/27 14:32 UTC - next

Hello,

I am running into some issues with these variables being set - since I am spawning a script from a unicorn worker (via a rails controller) I figured I’d ask here. 

Here is the stackoverflow with the full background:
http://stackoverflow.com/questions/29233181/why-is-envhome-nil-in-my-rake-task

in short: does unicorn have anything to do with $HOME or $USER not being defined? From what I can tell, that its unicorn is the only thing thats different versus running the same ruby via a rails console (which does indeed set those shell variables correctly)

in no mans land, so any hep or insight would be greatly appreciated. 

Thanks,
Russell

message raw reply threadlink

  Re: $USER and $HOME shell variables not set
  - by Eric Wong @ 03/27 18:34 UTC - next/prev

  Russell Jennings <violentpurr@gmail.com> wrote:
  > Hello,
  > 
  > I am running into some issues with these variables being set - since I
  > am spawning a script from a unicorn worker (via a rails controller) I
  > figured I’d ask here. 
  > 
  > Here is the stackoverflow with the full background:
  > http://stackoverflow.com/questions/29233181/why-is-envhome-nil-in-my-rake-task
  > 
  > in short: does unicorn have anything to do with $HOME or $USER not
  > being defined?
  
  Nope, unicorn itself does not change these variables.  We only set
  UNICORN_FD (for SIGUSR2 upgrades), PWD (if working_directory is used).
  
  Your init system may change users and clobber these options via
  sudo/su/env and similar wrappers, so I think it has to do with how
  you're starting unicorn.  If you're using sudo anywhere, the env_*
  options from the sudoers files will also affect which envs get
  clobbered/preserved/added.
  
  > From what I can tell, that its unicorn is the only
  > thing thats different versus running the same ruby via a rails console
  > (which does indeed set those shell variables correctly)
  > 
  > in no mans land, so any hep or insight would be greatly appreciated. 
  
  Can you add something to log the contents of ENV.inspect to
  a log file.  Perhaps something like:
  
  	Rails.logger.debug("env: #{ENV.inspect}")
  
  It would also be helpful to show the snippet of code from where you're
  running Rake in case you're accidentally setting an option wrong.
  
  
  Under Linux, you can also inspect the original environment of any running
  process from /proc/$PID/environ  In most cases/kernel versions, it won't
  keep up with env changes once a process is running.
  I use tr to replace '\0' with '\n' (newline) to help with readability
  
  	tr '\0' '\n' </proc/$PID/environ

  message raw reply parent threadlink

nginx reverse proxy getting ECONNRESET
- by Michael Fischer @ 03/24 22:43 UTC - next/prev

We have an nginx 1.6.2 proxy in front of a Unicorn 4.8.3 server that
is frequently reporting the following error:

2015/03/24 01:46:01 [error] 11217#0: *872231 readv() failed (104:
Connection reset by peer) while reading upstream

The interesting things are:

1) The upstream is a Unix domain socket (to which Unicorn is bound)
2) Unicorn isn't reporting that a child died in the error log (and I
verified their lifetimes with ps(1))

Any hints as to what we should look for?

Thanks,

--Michael

message raw reply threadlink

  Re: nginx reverse proxy getting ECONNRESET
  - by Eric Wong @ 03/24 22:54 UTC - next/prev

  Michael Fischer <mfischer@zendesk.com> wrote:
  > <We have an nginx 1.6.2 proxy in front of a Unicorn 4.8.3 server that ...>
  
  What changed recently with your setup?
  
  Which OS/kernel version + vendor version?
  
  Since you've been around a while, I take it this is only a recent issue?
  
  Can you setup a test instance on a different nginx port/unicorn socket
  and with a config.ru such as:
  
  ------------------------- 8< ----------------------
  run(lambda do |env|
    $stderr.write("#$$ starting at #{Time.now}\n")
    # be sure to configure your unicorn timeout
    sleep
    # should not return, wait for unicorn to kill
  end)

  more... raw reply parent threadlink

    Re: nginx reverse proxy getting ECONNRESET
    - by Eric Wong @ 03/24 22:59 UTC - next/prev

    Another likely explanation might be you're not draining rack.input every
    request, since unicorn does lazy reads off the socket to prevent
    rejected uploads from wasting disk I/O[1]
    
    So you can send a bigger POST request with my example to maybe
    reproduce the issue.
    
    [1] you can use the Unicorn::PrereadInput middleware to forcibly
        disable the lazy read:
        http://unicorn.bogomips.org/Unicorn/PrereadInput.html

    message raw reply parent threadlink

      Re: nginx reverse proxy getting ECONNRESET
      - by Michael Fischer @ 03/24 23:04 UTC - next/prev

      On Tue, Mar 24, 2015 at 10:59 PM, Eric Wong <e@80x24.org> wrote:
      > Another likely explanation might be you're not draining rack.input every
      > request, since unicorn does lazy reads off the socket to prevent
      > rejected uploads from wasting disk I/O[1]
      >
      > So you can send a bigger POST request with my example to maybe
      > reproduce the issue.
      >
      > [1] you can use the Unicorn::PrereadInput middleware to forcibly
      >     disable the lazy read:
      >     http://unicorn.bogomips.org/Unicorn/PrereadInput.html
      
      Actually, these are quite large POST requests we're attempting to
      service (on the order of 4MB).  Can you elaborate on the mechanism in
      play here?
      
      Thanks,
      
      --Michael

      message raw reply parent threadlink

        Re: nginx reverse proxy getting ECONNRESET
        - by Eric Wong @ 03/24 23:23 UTC - next/prev

        Michael Fischer <mfischer@zendesk.com> wrote:
        > <On Tue, Mar 24, 2015 at 10:59 PM, Eric Wong <e@80x24.org> wrote: ...>
        
        Unlike a lot of servers, unicorn will not attempt to buffer request
        bodies on its own.  You'll need to actually process the POST request in
        your app (via Rack/Rails/whatever accessing env["rack.input"]).
        
        The PrereadInput middleware makes it behave like other servers
        (at the cost of being slower if a request is worthless and rejected)
        
        So there might be data sitting on the socket if your application
        processing returns a response before it parsed the POST request.
        
        In this case, the ECONNRESET messages are harmless and saves your
        Ruby process from GC thrashing.
        
        Actually, you can try setting up a Rack::Lobster instance but sending
        a giant POST request?
        
        ------------- config.ru --------------
        require 'rack/lobster'
        run Rack::Lobster.new

        more... raw reply parent threadlink

          Re: nginx reverse proxy getting ECONNRESET
          - by Michael Fischer @ 03/24 23:29 UTC - next/prev

          On Tue, Mar 24, 2015 at 11:23 PM, Eric Wong <e@80x24.org> wrote:
          
          > So there might be data sitting on the socket if your application
          > processing returns a response before it parsed the POST request.
          
          When this occurs, the nginx access logs show an HTTP 200 (OK) response
          with a 0 byte response body.
          
          Is it your hypothesis that the application is just failing to consume
          the entire POST body in this instance?   In that case, wouldn't we
          expect to see nginx failing to write on the socket instead of read?
          
          > Actually, you can try setting up a Rack::Lobster instance but sending
          > a giant POST request?
          >
          > ------------- config.ru --------------
          > require 'rack/lobster'
          > run Rack::Lobster.new
          > --------------------------------------
          
          I don't know what this is -- systems guy here, not a Rack expert...
          how will this help?
          
          Thanks,
          
          --Michael

          message raw reply parent threadlink

            Re: nginx reverse proxy getting ECONNRESET
            - by Eric Wong @ 03/24 23:46 UTC - next/prev

            Michael Fischer <mfischer@zendesk.com> wrote:
            > On Tue, Mar 24, 2015 at 11:23 PM, Eric Wong <e@80x24.org> wrote:
            > 
            > > So there might be data sitting on the socket if your application
            > > processing returns a response before it parsed the POST request.
            > 
            > When this occurs, the nginx access logs show an HTTP 200 (OK) response
            > with a 0 byte response body.
            > 
            > Is it your hypothesis that the application is just failing to consume
            > the entire POST body in this instance?   In that case, wouldn't we
            > expect to see nginx failing to write on the socket instead of read?
            
            It could've been just big enough to fit inside the kernel socket
            buffers, but not enough for nginx to wait on.  In the below standalone
            example, the server only reads 4092 bytes of the 4096-byte request.
            
            > > Actually, you can try setting up a Rack::Lobster instance but sending
            > > a giant POST request?
            > >
            > > ------------- config.ru --------------
            > > require 'rack/lobster'
            > > run Rack::Lobster.new
            > > --------------------------------------
            > 
            > I don't know what this is -- systems guy here, not a Rack expert...
            > how will this help?
            
            Just a dumb "hello world" type app which doesn't read the input.
            
            Here's a bare-bones example without nginx/unicorn at all:
            
            require 'tmpdir'
            require 'socket'
            Dir.mktmpdir do |dir|
              Dir.chdir(dir) do
                path = 'sock'
                a = UNIXServer.new(path)
                srv = Thread.new do
                  c = a.accept
                  c.readpartial(4092)
                  c.write 'HTTP/1.1 200 OK\r\n\r\n'
                  c.close
                  puts "thread done"
                end
            
                con = UNIXSocket.new(path)
                bytes = ' ' * 4096
                con.write(bytes)
                while buf = con.sysread(4096)
                  p [ 'client read' , buf ]
                end
              end
            end
            
            The above causes sysread to fail with ECONNRESET in the server.  If you
            change the above server to read 4096 bytes instead of 4092, and you get
            the expected EOFError instead.
            
            ECONNRESET is harmless in this case (unless nginx started pipelining or
            blindly attempting persistent connections to unicorn, which it should
            not be doing since unicorn sends "Connection: close" on every response)

            message raw reply parent threadlink

              Re: nginx reverse proxy getting ECONNRESET
              - by Eric Wong @ 03/24 23:55 UTC - next/prev

              Eric Wong <e@80x24.org> wrote:
              > ECONNRESET is harmless in this case (unless nginx started pipelining or
              > blindly attempting persistent connections to unicorn, which it should
              > not be doing since unicorn sends "Connection: close" on every response)
              
              Actually, are you getting 502 errors returned from nginx in this case?
              That would not be harmless.  I suggest ensuring rack.input is
              fully-drained if that is the case (perhaps using PrereadInput).

              message raw reply parent threadlink

                Re: nginx reverse proxy getting ECONNRESET
                - by Michael Fischer @ 03/25 09:41 UTC - next/prev

                On Tue, Mar 24, 2015 at 11:55 PM, Eric Wong <e@80x24.org> wrote:
                
                > Actually, are you getting 502 errors returned from nginx in this case?
                > That would not be harmless.  I suggest ensuring rack.input is
                > fully-drained if that is the case (perhaps using PrereadInput).
                
                No, they're all 200 responses with a zero-length body size.  It's the
                first time I'd ever seen such a combination of symptoms.
                
                Thanks,
                
                --Michael

                message raw reply parent threadlink

                  Re: nginx reverse proxy getting ECONNRESET
                  - by Eric Wong @ 03/25 10:12 UTC - next/prev

                  Michael Fischer <mfischer@zendesk.com> wrote:
                  > On Tue, Mar 24, 2015 at 11:55 PM, Eric Wong <e@80x24.org> wrote:
                  > 
                  > > Actually, are you getting 502 errors returned from nginx in this case?
                  > > That would not be harmless.  I suggest ensuring rack.input is
                  > > fully-drained if that is the case (perhaps using PrereadInput).
                  > 
                  > No, they're all 200 responses with a zero-length body size.  It's the
                  > first time I'd ever seen such a combination of symptoms.
                  
                  OK, thanks for the update.
                  
                  I was wondering if including a Unicorn::PostreadInput middleware should
                  be introduced to quiet your logs.  It should have the same effect as
                  PrereadInput, but should provide better performance in the common case
                  and also be compatible with "rewindable_input false" users.
                  
                  class Unicorn::PostreadInput
                    def initialize(app)
                      @app = app
                    end
                  
                    def call(env)
                      input = env["rack.input"] # save it here, in case the app reassigns it
                      @app.call(env)
                    ensure
                      # Ensure the HTTP request is entirely read off the socket even
                      # if the app aborts early.  This should prevent nginx from
                      # complaining about ECONNRESET errors.
                      unless env["rack.hijack_io"]
                        buf = ''
                        true while input.read(16384, buf)
                        buf.clear
                      end
                    end
                  end
                  
                  (totally untested)
                  
                  "Postread" doesn't sound quite right, though...

                  message raw reply parent threadlink

              Re: nginx reverse proxy getting ECONNRESET
              - by Michael Fischer @ 03/25 09:48 UTC - next/prev

              On Tue, Mar 24, 2015 at 11:46 PM, Eric Wong <e@80x24.org> wrote:
              
              >> Is it your hypothesis that the application is just failing to consume
              >> the entire POST body in this instance?   In that case, wouldn't we
              >> expect to see nginx failing to write on the socket instead of read?
              >
              > It could've been just big enough to fit inside the kernel socket
              > buffers, but not enough for nginx to wait on.  In the below standalone
              > example, the server only reads 4092 bytes of the 4096-byte request.
              
              > Here's a bare-bones example without nginx/unicorn at all:
              
              [snip]
              
              Thank you for the example; it really zeroes in on the situation.  I
              learn something new every day!
              
              I'll check with the developers to find out whether the app is
              misbehaving in a similar way.
              
              Best regards,
              
              --Michael

              message raw reply parent threadlink

    Re: nginx reverse proxy getting ECONNRESET
    - by Michael Fischer @ 03/24 23:02 UTC - next/prev

    On Tue, Mar 24, 2015 at 10:54 PM, Eric Wong <e@80x24.org> wrote:
    > <Michael Fischer <mfischer@zendesk.com> wrote: >> We have an ...>
    
    We upgraded nginx from 1.4.7 to 1.6.2.  The frequency of the error has
    increased significantly since.  But I hesitate to point the finger at
    nginx without more evidence, since its developers are very skilled.
    
    > Which OS/kernel version + vendor version?
    
    uname -3.13.0-40-generic #69~precise1-Ubuntu
    
    Ruby 2.1.1
    
    > <Can you setup a test instance on a different nginx port/unicorn socket ...>
    
    I'll take that step later if I have to, but I'm not sure what evidence
    that would provide, since we're not having timeout issues -- when this
    happens, the response time reported by nginx is usually just a few
    seconds (Unicorn timeout is 90 seconds),
    
    Thanks,
    
    --Michael

    message raw reply parent threadlink

Is it possible to create a thread in Rails subscribe to Redis message channel?
- by 胡明 @ 03/24 15:28 UTC - next/prev

Hi there

I have one question about related to thread in Unicorn; I have asked the
quesiton on stackoverflow.com but no one answers:

http://stackoverflow.com/questions/29180275/is-it-possible-to-create-a-thread-in-rails-subscribe-to-redis-message-channel

I also pasted the question in this email, hope that someone could help me
with this:

==============================================

I am trying to create a thread in Rails to subscribe a message channel of
Redis. Is there a way to do this? I am using unicorn.

I have tried to do this in the unicorn configuration like this:

after_fork do |server, worker|

  Thread.new do
    begin
      $redis.subscribe(:one, :two) do |on|
        on.subscribe do |channel, subscriptions|
          puts "Subscribed to ##{channel} (#{subscriptions} subscriptions)"
        end
        on.message do |channel, message|
          puts "##{channel}: #{message}"
          $redis.unsubscribe if message == "exit"
        end
        on.unsubscribe do |channel, subscriptions|
          puts "Unsubscribed from ##{channel} (#{subscriptions} subscriptions)"
        end
      end
    rescue Redis::BaseConnectionError => error
      puts "#{error}, retrying in 1s"
      sleep 1
      retry
    end
  endend

But it will make the unicorn server unable to handle any web request. I
thought that if I am using a different thread to subscribe to Redis, it
won't block the main thread; am I missing something here?

more... raw reply threadlink

  Re: Is it possible to create a thread in Rails subscribe to Redis message channel?
  - by Eric Wong @ 03/24 22:44 UTC - next/prev

  胡明 <humings@gmail.com> wrote:
  > I am trying to create a thread in Rails to subscribe a message channel of
  > Redis. Is there a way to do this? I am using unicorn.
  
  Theoretically, yes, similar things are done with other services.
  
  <snip>
  
  > But it will make the unicorn server unable to handle any web request. I
  > thought that if I am using a different thread to subscribe to Redis, it
  > won't block the main thread; am I missing something here?
  
  I'm not familiar with Redis, but I know it's a server based on stream
  sockets similar to anything else HTTP/IMAP/MySQL/Postgres/memcached-based.
  
  But based on your observation, your client library is probably blocking
  the entire Ruby VM by not releasing the GVL when it the thread waits on
  Redis.
  
  Which Redis client library are you using?
  
  If the client library is written in C, it should have rb_thread_* calls
  when it needs to wait on the server for anything (e.g.
  rb_thread_call_without_gvl, rb_thread_fd_select, rb_thread_fd_select,
  etc...).  If you don't see those in the source code, get it fixed :)

  message raw reply parent threadlink

[PATCH] doc: document Etc.nprocessors for worker_processes
- by Eric Wong @ 03/12 22:32 UTC - next/prev

Ruby 2.2 has Etc.nprocessors, and using that (directly or as a
factor) for setting worker_processes is often (but not always)
appropriate.

more... raw reply threadlink

[PATCH] doc: document UNICORN_FD in manpage
- by Eric Wong @ 03/12 22:32 UTC - next/prev

Due to the prevalence of socket activation in modern init systems,
we shall document UNICORN_FD (previously an implementation detail)
in the manpage.

more... raw reply threadlink

On USR2, new master runs with same PID
- by Kevin Yank @ 03/12 01:04 UTC - next/prev

Having recently migrated our Rails app to MRI 2.2.0 (which may or may not be related), we’re experiencing problems with our Unicorn zero-downtime restarts.

When I send USR2 to the master process (PID 19216 in this example), I get the following in the Unicorn log:

I, [2015-03-11T23:47:33.992274 #6848]  INFO -- : executing ["/srv/ourapp/shared/bundle/ruby/2.2.0/bin/unicorn", "/srv/ourapp/current/config.ru", "-Dc", "/srv/ourapp/shared/config/unicorn.rb", {10=>#<Kgio::UNIXServer:/srv/ourapp/shared/sockets/unicorn.sock>}] (in /srv/ourapp/releases/a0e8b5df474ad5129200654f92a76af00a750f47)
I, [2015-03-11T23:47:36.504235 #6848]  INFO -- : inherited addr=/srv/ourapp/shared/sockets/unicorn.sock fd=10
/srv/ourapp/shared/bundle/ruby/2.2.0/gems/unicorn-4.8.1/lib/unicorn/http_server.rb:206:in `pid=': Already running on PID:19216 (or pid=/srv/ourapp/shared/pids/unicorn.pid is stale) (ArgumentError)
  from /srv/ourapp/shared/bundle/ruby/2.2.0/gems/unicorn-4.8.1/lib/unicorn/http_server.rb:134:in `start'
  from /srv/ourapp/shared/bundle/ruby/2.2.0/gems/unicorn-4.8.1/bin/unicorn:126:in `<top (required)>'
  from /srv/ourapp/shared/bundle/ruby/2.2.0/bin/unicorn:23:in `load'
  from /srv/ourapp/shared/bundle/ruby/2.2.0/bin/unicorn:23:in `<main>'
E, [2015-03-11T23:47:36.519549 #19216] ERROR -- : reaped #<Process::Status: pid 6848 exit 1> exec()-ed
E, [2015-03-11T23:47:36.520296 #19216] ERROR -- : master loop error: Already running on PID:19216 (or pid=/srv/ourapp/shared/pids/unicorn.pid is stale) (ArgumentError)
E, [2015-03-11T23:47:36.520496 #19216] ERROR -- : /srv/ourapp/shared/bundle/ruby/2.2.0/gems/unicorn-4.8.1/lib/unicorn/http_server.rb:206:in `pid='
E, [2015-03-11T23:47:36.520650 #19216] ERROR -- : /srv/ourapp/shared/bundle/ruby/2.2.0/gems/unicorn-4.8.1/lib/unicorn/http_server.rb:404:in `reap_all_workers'
E, [2015-03-11T23:47:36.520790 #19216] ERROR -- : /srv/ourapp/shared/bundle/ruby/2.2.0/gems/unicorn-4.8.1/lib/unicorn/http_server.rb:279:in `join'
E, [2015-03-11T23:47:36.520928 #19216] ERROR -- : /srv/ourapp/shared/bundle/ruby/2.2.0/gems/unicorn-4.8.1/bin/unicorn:126:in `<top (required)>'
E, [2015-03-11T23:47:36.521115 #19216] ERROR -- : /srv/ourapp/shared/bundle/ruby/2.2.0/bin/unicorn:23:in `load'
E, [2015-03-11T23:47:36.521254 #19216] ERROR -- : /srv/ourapp/shared/bundle/ruby/2.2.0/bin/unicorn:23:in `<main>’

And when I check, indeed, there is now unicorn.pid and unicorn.pid.oldbin, both containing 19216.

What could cause this situation to arise?


Here’s my unicorn.rb FWIW:

# Set your full path to application.
app_path = "/srv/ourapp/current"

# Set unicorn options
worker_processes 3
preload_app true
timeout 30
listen "/srv/ourapp/shared/sockets/unicorn.sock", :backlog => 64

# Spawn unicorn master worker for user deploy (group: deploy)
user 'deploy', 'deploy'

# Fill path to your app
working_directory app_path

# Should be 'production' by default, otherwise use other env
rails_env = ENV['RAILS_ENV'] || 'production'

# Log everything to one file
stderr_path "/srv/ourapp/shared/log/unicorn.log"
stdout_path "/srv/ourapp/shared/log/unicorn.log"

# Set master PID location
pid "/srv/ourapp/shared/pids/unicorn.pid"

before_exec do |server|
  ENV["BUNDLE_GEMFILE"] = "#{app_path}/Gemfile"
end

before_fork do |server, worker|
  ActiveRecord::Base.connection.disconnect!

  sleep 10

  old_pid = "#{server.config[:pid]}.oldbin"
  if File.exists?(old_pid) && server.pid != old_pid
    begin
      Process.kill("QUIT", File.read(old_pid).to_i)
    rescue Errno::ENOENT, Errno::ESRCH
      # someone else did our job for us
    end
  end
end

after_fork do |server, worker|
  ActiveRecord::Base.establish_connection

  Sidekiq.configure_client do |config|
    config.redis = { namespace: 'sidekiq' }
  end
end


--
Kevin Yank
Chief Technology Officer, Avalanche Technology Group
http://avalanche.com.au/

ph: +61 4 2241 0083

message raw reply threadlink

  Re: On USR2, new master runs with same PID
  - by Eric Wong @ 03/12 01:45 UTC - next/prev

  Kevin Yank <kyank@avalanche.com.au> wrote:
  > Having recently migrated our Rails app to MRI 2.2.0 (which may or may
  > not be related), we’re experiencing problems with our Unicorn
  > zero-downtime restarts.
  > 
  > When I send USR2 to the master process (PID 19216 in this example), I
  > get the following in the Unicorn log:
  >
  > I, [2015-03-11T23:47:33.992274 #6848]  INFO -- : executing ["/srv/ourapp/shared/bundle/ruby/2.2.0/bin/unicorn", "/srv/ourapp/current/config.ru", "-Dc", "/srv/ourapp/shared/config/unicorn.rb", {10=>#<Kgio::UNIXServer:/srv/ourapp/shared/sockets/unicorn.sock>}] (in /srv/ourapp/releases/a0e8b5df474ad5129200654f92a76af00a750f47)
  > I, [2015-03-11T23:47:36.504235 #6848]  INFO -- : inherited addr=/srv/ourapp/shared/sockets/unicorn.sock fd=10
  > /srv/ourapp/shared/bundle/ruby/2.2.0/gems/unicorn-4.8.1/lib/unicorn/http_server.rb:206:in `pid=': Already running on PID:19216 (or pid=/srv/ourapp/shared/pids/unicorn.pid is stale) (ArgumentError)
  
  Nothing suspicious until the above line...
  
  <snip>
  
  > And when I check, indeed, there is now unicorn.pid and
  > unicorn.pid.oldbin, both containing 19216.
  > 
  > What could cause this situation to arise?
  
  Any chance you have a process manager or something else creating the
  (non-.oldbin) pid file behind unicorn's back?
  
  Can you check your process table to ensure there's not multiple
  unicorn instances running off the same config and pid files, too?
  
  Also, were there other activity (another USR2 or HUP) in the logs
  a few seconds beforehand?
  
  What kind of filesystem / kernel is the pid file on?
  A network FS or anything without the consistency guarantees provided
  by POSIX would not work for pid files.
  
  pid files are unfortunately prone to to nasty race conditions,
  but I'm not sure what you're seeing happens very often.
  
  Likewise, the check for stale Unix domain socket paths at startup is
  inevitably racy, too, but the window is usually small enough to be
  unnoticeable.  But yes, just in case, check the process table to make
  sure there aren't multiple, non-related masters running on off the
  same paths.
  
  <snip>
  
  > before_fork do |server, worker|
  >   ActiveRecord::Base.connection.disconnect!
  
  -------------------8<-------------------
  >   sleep 10
  > 
  >   old_pid = "#{server.config[:pid]}.oldbin"
  >   if File.exists?(old_pid) && server.pid != old_pid
  >     begin
  >       Process.kill("QUIT", File.read(old_pid).to_i)
  >     rescue Errno::ENOENT, Errno::ESRCH
  >       # someone else did our job for us
  >     end
  >   end
  -------------------8<-------------------
  
  I'd get rid of that hunk starting with the "sleep 10" (at least while
  debugging this issue).  If you did a USR2 previously, maybe it could
  affect the current USR2 upgrade process.  Sleeping so long in the master
  like that is pretty bad it throws off timing and delays signal handling.
  
  That's a pretty fragile config and I regret ever including it in the
  example config files

  message raw reply parent threadlink

    Re: On USR2, new master runs with same PID
    - by Kevin Yank @ 03/12 06:26 UTC - next/prev

    Thanks for your help, Eric!
    
    > On 12 Mar 2015, at 12:45 pm, Eric Wong <e@80x24.org> wrote:
    > 
    > Kevin Yank <kyank@avalanche.com.au> wrote:
    >> When I send USR2 to the master process (PID 19216 in this example), I
    >> get the following in the Unicorn log:
    >> 
    >> I, [2015-03-11T23:47:33.992274 #6848]  INFO -- : executing ["/srv/ourapp/shared/bundle/ruby/2.2.0/bin/unicorn", "/srv/ourapp/current/config.ru", "-Dc", "/srv/ourapp/shared/config/unicorn.rb", {10=>#<Kgio::UNIXServer:/srv/ourapp/shared/sockets/unicorn.sock>}] (in /srv/ourapp/releases/a0e8b5df474ad5129200654f92a76af00a750f47)
    >> I, [2015-03-11T23:47:36.504235 #6848]  INFO -- : inherited addr=/srv/ourapp/shared/sockets/unicorn.sock fd=10
    >> /srv/ourapp/shared/bundle/ruby/2.2.0/gems/unicorn-4.8.1/lib/unicorn/http_server.rb:206:in `pid=': Already running on PID:19216 (or pid=/srv/ourapp/shared/pids/unicorn.pid is stale) (ArgumentError)
    > 
    > Nothing suspicious until the above line...
    
    That’s right.
    
    >> And when I check, indeed, there is now unicorn.pid and
    >> unicorn.pid.oldbin, both containing 19216.
    > 
    > Any chance you have a process manager or something else creating the
    > (non-.oldbin) pid file behind unicorn's back?
    
    It’s possible; I’m using eye (https://github.com/kostya/eye) as a process monitor. I’m not aware of it writing .pid files for processes that self-daemonize like Unicorn, though. And once one of my servers “goes bad” (i.e. Unicorn starts failing to restart in response to a USR2), it does so 100% consistently until I stop and restart Unicorn entirely. Based on that, I don’t believe it’s a race condition where my process monitor is slipping in a new unicorn.pid file some of the time.
    
    > Can you check your process table to ensure there's not multiple
    > unicorn instances running off the same config and pid files, too?
    
    As far as I can tell, I have some servers that have gotten themselves into this “USR2 restart fails” state, while others are working just fine. In both cases, the Unicorn process tree (as show in htop) looks like this “at rest” (i.e. no deployments/restarts in progress):
    
    unicorn master
    `- unicorn master
    `- unicorn worker[2]
    |  `- unicorn worker[2]
    `- unicorn worker[1]
    |  `- unicorn worker[1]
    `- unicorn worker[0]
       `- unicorn worker[0]
    
    At first glance I’d definitely say it appears that I have two masters running from the same config files. However, there’s only one unicorn.pid file of course (the root process in the tree above), and when I try to kill -TERM the master process that doesn’t have a .pid file, the entire process tree exits. Am I misinterpreting the process table? Is this process list actually normal?
    
    Thus far I’ve been unable to find any difference in state between a properly-behaving server and a misbehaving server, apart from the behaviour of the Unicorn master when it receives a USR2.
    
    > Also, were there other activity (another USR2 or HUP) in the logs
    > a few seconds beforehand?
    
    No, didn’t see anything like that (and I was looking for it).
    
    > What kind of filesystem / kernel is the pid file on?
    
    EXT4 / Ubuntu Server 12.04 LTS
    
    > A network FS or anything without the consistency guarantees provided
    > by POSIX would not work for pid files.
    
    Given my environment above, I should be OK, right?
    
    > pid files are unfortunately prone to to nasty race conditions,
    > but I'm not sure what you're seeing happens very often.
    
    This has been happening pretty frequently across multiple server instances, and again once it starts happening on an instance, it keeps happening 100% of the time (until I restart Unicorn completely). So it’s not a rare edge case.
    
    > <-------------------8<------------------- >> sleep 10 >> ...>
    
    If I simply delete this hunk, I’ll have old masters left running on my servers because they’ll never get sent the QUIT signal. I can definitely remove it temporarily (and kill the old master myself) while debugging, though.
    
    > If you did a USR2 previously, maybe it could
    > affect the current USR2 upgrade process.  Sleeping so long in the master
    > like that is pretty bad it throws off timing and delays signal handling.
    
    I’d definitely like to get rid of the sleep, as my restarts definitely feel slow. I’m not clear on what a better approach would be, though.
    
    > That's a pretty fragile config and I regret ever including it in the
    > example config files
    
    Any better examples/docs you’d recommend I consult for guidance? Or is expecting to achieve a robust zero-downtime restart using before_fork/after_fork hooks unrealistic?
    
    --
    Kevin Yank
    Chief Technology Officer, Avalanche Technology Group
    http://avalanche.com.au/
    
    ph: +61 4 2241 0083

    message raw reply parent threadlink

      Re: On USR2, new master runs with same PID
      - by Eric Wong @ 03/12 06:45 UTC - next/prev

      Kevin Yank <kyank@avalanche.com.au> wrote:
      > It’s possible; I’m using eye (https://github.com/kostya/eye) as a
      
      Aha!  I forgot about that one, try upgrading to unicorn 4.8.3 which
      fixed this issue last year.  ref:
      
      http://bogomips.org/unicorn-public/m/20140504023338.GA583@dcvr.yhbt.net.html
      http://bogomips.org/unicorn-public/m/20140502231558.GA4215@dcvr.yhbt.net.html
      
      <snip>
      
      > unicorn master
      > `- unicorn master 
      > `- unicorn worker[2]
      > |  `- unicorn worker[2]
      > `- unicorn worker[1]
      > |  `- unicorn worker[1]
      > `- unicorn worker[0]
      >    `- unicorn worker[0]
      
      The second master there is the new one, and the first is the old one.
      
      I was more concerned if there are multiple masters with no parent-child
      relationship to each other; but that seems to not be the case.
      
      But yeah, give 4.8.3 a try since it should've fixed the problem
      (which was reported and confirmed privately)
      
      > This has been happening pretty frequently across multiple server
      > instances, and again once it starts happening on an instance, it keeps
      > happening 100% of the time (until I restart Unicorn completely). So
      > it’s not a rare edge case.
      
      You can probably recover by removing the pid files entirely
      and HUP the (only) master; but I haven't tried it...
      
      > > That's a pretty fragile config and I regret ever including it in the
      > > example config files
      
      > Any better examples/docs you’d recommend I consult for guidance? Or is
      > expecting to achieve a robust zero-downtime restart using
      > before_fork/after_fork hooks unrealistic?
      
      Best bet would be to run with double the workers temporarily unless
      you're too low on memory (and swapping) or backend (DB) connections or
      any other resource.
      
      If you're really low on memory/connections, do WINCH + USR2 + QUIT (and
      pray the new master works right :).  Clients will suffer in response
      time as your new app loads, so do it during non-peak traffic and/or
      shift traffic away from that machine.
      
      You can also send TTOU a few times to reduce worker counts instead of
      WINCH to nuke all of them, but I'd send all signals via deploy
      script/commands rather than automatically via (synchronous) hooks.
      
      But that's just me; probably others do things differently...

      message raw reply parent threadlink

        Re: On USR2, new master runs with same PID
        - by Kevin Yank @ 03/20 01:55 UTC - next/prev

        > On 12 Mar 2015, at 5:45 pm, Eric Wong <e@80x24.org> wrote:
        > 
        > Kevin Yank <kyank@avalanche.com.au> wrote:
        >> It’s possible; I’m using eye (https://github.com/kostya/eye) as a
        > 
        > Aha!  I forgot about that one, try upgrading to unicorn 4.8.3 which
        > fixed this issue last year.  ref:
        > 
        > http://bogomips.org/unicorn-public/m/20140504023338.GA583@dcvr.yhbt.net.html
        > http://bogomips.org/unicorn-public/m/20140502231558.GA4215@dcvr.yhbt.net.html
        
        Finally solved this definitively. It was user error to do with my setup of the eye process monitor.
        
        I’d accidentally deployed a buggy logrotate configuration for eye, which was causing a second eye daemon to be spawned once a day (when the logs were rotated). Those two eye daemons ran side-by-side, and fought with each other when one was told to restart Unicorn. I’d already anticipated and fixed this problem, but failed to deploy the correct version of the config to our cluster.
        
        All fixed now. Thanks for your pointers; they put me on the right track. :)
        
        --
        Kevin Yank
        Chief Technology Officer, Avalanche Technology Group
        http://avalanche.com.au/
        
        ph: +61 4 2241 0083

        message raw reply parent threadlink

        Re: On USR2, new master runs with same PID
        - by Kevin Yank @ 03/20 01:58 UTC - next/prev

        Regarding zero-downtime deploys:
        
        On 12 Mar 2015, at 5:45 pm, Eric Wong <e@80x24.org> wrote:
        
        > Best bet would be to run with double the workers temporarily unless
        > you're too low on memory (and swapping) or backend (DB) connections or
        > any other resource.
        
        I’d like to take this approach as I do have enough memory to spare. How do you usually implement this? Any good write-ups or sample configs you can point me to?
        
        Thanks again,
        
        --
        Kevin Yank
        Chief Technology Officer, Avalanche Technology Group
        http://avalanche.com.au/
        
        ph: +61 4 2241 0083

        message raw reply parent threadlink

          Re: On USR2, new master runs with same PID
          - by Eric Wong @ 03/20 02:08 UTC - next/prev

          Kevin Yank <kyank@avalanche.com.au> wrote:
          > Regarding zero-downtime deploys:
          > 
          > On 12 Mar 2015, at 5:45 pm, Eric Wong <e@80x24.org> wrote:
          > 
          > > Best bet would be to run with double the workers temporarily unless
          > > you're too low on memory (and swapping) or backend (DB) connections or
          > > any other resource.
          >  
          > I’d like to take this approach as I do have enough memory to spare.
          > How do you usually implement this? Any good write-ups or sample
          > configs you can point me to?
          
          Only send SIGUSR2 to the master, leaving you with two masters and two
          sets of workers.  Skip (automated) sending of SIGTTOU signals to lower
          worker count to the old master.
          
          Eventually, you'll decide to send SIGQUIT to the old master to stop
          it (or the new one, if you decide the new code is broken).
          
          You can still combine this with SIGWINCH (or SIGTTOU) to stop traffic
          flow to the old master, too.
          
          Thanks for following up on your logrotate/eye issue, by the way.

          message raw reply parent threadlink


page: next      atom permalink
- unicorn Rack HTTP server user/dev discussion
A public-inbox, anybody may post in plain-text (not HTML):
unicorn-public@bogomips.org
git URL for ssoma: git://bogomips.org/unicorn-public.git
homepage: http://unicorn.bogomips.org/
subscription optional: unicorn-public+subscribe@bogomips.org