Dead PostgreSQL connections on worker restart

unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed

* Dead PostgreSQL connections on worker restart
@ 2016-04-15  3:26 Adam Fields
  2016-04-15  5:42 ` Eric Wong
  0 siblings, 1 reply; 3+ messages in thread
From: Adam Fields @ 2016-04-15  3:26 UTC (permalink / raw)
  To: unicorn-public

We have discovered that every time a unicorn worker is restarted, rails throws "PG::ConnectionBad: connection is closed” errors.

We are using preload: true, and have before and after fork rules to close and reopen the connections:

---------------------------------------------------
before_fork do |server, worker|
 defined?(ActiveRecord::Base) and
     ActiveRecord::Base.connection.disconnect!

   old_pid = "#{server.config[:pid]}.oldbin"
   if File.exist?(old_pid) && server.pid != old_pid
     begin
       sig = (worker.nr + 1) >= server.worker_processes ? :QUIT : :TTOU
       Process.kill(sig, File.read(old_pid).to_i)
     rescue Errno::ENOENT, Errno::ESRCH
       # someone else did our job for us
     end
   end
end

after_fork do |_server, _worker|
  defined?(ActiveRecord::Base) and ActiveRecord::Base.establish_connection

  child_pid = _server.config[:pid].sub('.pid', ".#{_worker.nr}.pid")
  system("echo #{Process.pid} > #{child_pid}")
end
---------------------------------------------------

Yet it seems that something is holding onto a dead connection and trying to use it. 

We have definitely correlated it to worker restarts - we have a monit process in place to restart individual workers if they exceeded a memory threshold, and when this number was too low, they were getting recycled often and we saw a very high number of these errors. When we raised the threshold, the error almost completely disappeared (but it still happens sometimes when a worker is recycled).

How can we troubleshoot this?

Thanks!

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Dead PostgreSQL connections on worker restart
  2016-04-15  3:26 Dead PostgreSQL connections on worker restart Adam Fields
@ 2016-04-15  5:42 ` Eric Wong
  2016-04-21 19:42   ` Eric Wong
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Wong @ 2016-04-15  5:42 UTC (permalink / raw)
  To: Adam Fields; +Cc: unicorn-public

Adam Fields <unicorn5958@street86.com> wrote:
> We have discovered that every time a unicorn worker is restarted,
> rails throws "PG::ConnectionBad: connection is closed” errors.

<snip>

> We are using preload: true, and have before and after fork
> rules to close and reopen the connections:

<snip>

> before_fork do |server, worker|
>  defined?(ActiveRecord::Base) and
>      ActiveRecord::Base.connection.disconnect!

<snip>

> Yet it seems that something is holding onto a dead connection
> and trying to use it. 

I'm not familiar with the Rails side anymore, so hopefully
others can chime in.  However, from the OS perspective,
it's generic and not dependent on Ruby or Pg, just *nix...

> How can we troubleshoot this?

Use lsof to determine which sockets are shared.
lsof should be available and common on most platforms,
definitely Linux-based ones.

	lsof -p $PID_OF_MASTER
	lsof -p $PID_OF_WORKER

And compare the sockets shared between the processes based on
the output of lsof.  I'm mostly going off Linux lsof output,
here, but the idea is the same across all *nixes...

Sockets marked as "LISTEN" are intended to be shared by unicorn.
(UDP and other packet-oriented so sockets are fine to share, too;
 but most apps don't use them)

If your Pg connection is over TCP, look for TCP sockets
in the ESTABLISHED state and ensure each process has
a unique number in the DEVICE column (and/or local port).

Likewise if your Pg connection is over a Unix socket.  Just make
sure the path is the correct one (to distinguish from any Unix
sockets unicorn listens on).

In other words, no stream-oriented connections you make should
be shared.  There probably shouldn't be any stream-oriented
connections in your master process at all.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Dead PostgreSQL connections on worker restart
  2016-04-15  5:42 ` Eric Wong
@ 2016-04-21 19:42   ` Eric Wong
  0 siblings, 0 replies; 3+ messages in thread
From: Eric Wong @ 2016-04-21 19:42 UTC (permalink / raw)
  To: Adam Fields; +Cc: unicorn-public

Ping.  Were you able to troubleshoot this based on the info I gave in

http://bogomips.org/unicorn-public/20160415054202.GA4043@dcvr.yhbt.net/

?
Any followup would be appreciated for others following this, thanks.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-04-21 19:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-15  3:26 Dead PostgreSQL connections on worker restart Adam Fields
2016-04-15  5:42 ` Eric Wong
2016-04-21 19:42   ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).