unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
* Log URL with murder_lazy_workers
@ 2018-01-14 23:27 Sam Saffron
  2018-01-15  1:57 ` Eric Wong
  2018-01-15 15:06 ` Carl Brasic
  0 siblings, 2 replies; 5+ messages in thread
From: Sam Saffron @ 2018-01-14 23:27 UTC (permalink / raw)
  To: unicorn-public

I would love to start logging the actual URL that timed out when
murder_lazy_workers does its thing.

Clearly the master process has no knowledge here, but perhaps if we
had a named pipe from each child to master we could quickly post
current url down the pipe so we would have something to log when we
murder a url.

Clearly an opt-in thing, but would be very handy for quick diagnostics
cause we can then avoid deeper log analysis and raise events just as
this happens.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Log URL with murder_lazy_workers
  2018-01-14 23:27 Log URL with murder_lazy_workers Sam Saffron
@ 2018-01-15  1:57 ` Eric Wong
  2018-01-15  2:18   ` Sam Saffron
  2018-01-15 15:06 ` Carl Brasic
  1 sibling, 1 reply; 5+ messages in thread
From: Eric Wong @ 2018-01-15  1:57 UTC (permalink / raw)
  To: Sam Saffron; +Cc: unicorn-public

Sam Saffron <sam.saffron@gmail.com> wrote:
> I would love to start logging the actual URL that timed out when
> murder_lazy_workers does its thing.
> 
> Clearly the master process has no knowledge here, but perhaps if we
> had a named pipe from each child to master we could quickly post
> current url down the pipe so we would have something to log when we
> murder a url.

That would make the master a bottleneck.

Instead, I suggest logging a START action with the
URL+PID+Thread(*)+serial number; and then matching it a
corresponding END action in the response_body#close Anything
without a corresponding END action can be deemed a loss and
matched up with the KILL action based on PID.

(*) Log Thread/Fiber so it can work with other servers, too.


I seem to remember it was possible to get that information out of
Rails logs pretty easily, already; and I seem recall doing that
back many years ago when I used Rails. (This is probably why
USR1 log reopening waits until a response is done before
triggering...)


And as I've stated many times before, I don't want any sort of
lock-in or even guide-in to make people feel like they're stuck
using unicorn (by having code which depends on it).  I also
believe the unicorn `timeout' is a misfeature that probably set
the entire Rack/Ruby ecosystem back 10 years or more, so I'd
rather people stop depending on it and fix their timeouts.

(To that end, I may see about making timeout.rb in stdlib better
 for Ruby 2.6...)

> Clearly an opt-in thing, but would be very handy for quick diagnostics
> cause we can then avoid deeper log analysis and raise events just as
> this happens.

Sorry, I prefer generic solutions which work with other servers, too.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Log URL with murder_lazy_workers
  2018-01-15  1:57 ` Eric Wong
@ 2018-01-15  2:18   ` Sam Saffron
       [not found]     ` <CADGZSScpXo7-PvM=Ki64hhPSzWAsjyT+fWKAZ9-30U69x+54iA@mail.gmail.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Sam Saffron @ 2018-01-15  2:18 UTC (permalink / raw)
  To: Eric Wong; +Cc: unicorn-public

> I seem to remember it was possible to get that information out of
Rails logs pretty easily, already; and I seem recall doing that
back many years ago when I used Rails. (This is probably why
USR1 log reopening waits until a response is done before
triggering...)

Yes I am able to get the info out of our logs, I just read through
haproxy logs filtering on stuff that takes longer than N. Getting
alerts though is a bit trickier, but I can probably workaround without
hacking at unicorn.

> I also believe the unicorn `timeout' is a misfeature that probably set
the entire Rack/Ruby ecosystem back 10 years or more

This is super duper tricky and I do not think this is deficiency of
timeout. It is merely being employed as a workaround for design
problems.

There is usually only 1 or 2 spots that could cause timeouts, 99% of
the time it is slow database queries under extreme load.

It is super likely this could be handled in the app if we had:

db_connection.timeout_at Time.now + 29

Then the connection could trigger the timeout and kill off the request
without needing to tear down the entire process and re-forking.

Making this happen is a bit tricky cause it would require some hacking
on the pg gem.

I am just not sure how hacking at timeout can make stuff any better,
it is an escape hatch, just in case code misbehaves.


On Mon, Jan 15, 2018 at 12:57 PM, Eric Wong <e@80x24.org> wrote:
> Sam Saffron <sam.saffron@gmail.com> wrote:
>> I would love to start logging the actual URL that timed out when
>> murder_lazy_workers does its thing.
>>
>> Clearly the master process has no knowledge here, but perhaps if we
>> had a named pipe from each child to master we could quickly post
>> current url down the pipe so we would have something to log when we
>> murder a url.
>
> That would make the master a bottleneck.
>
> Instead, I suggest logging a START action with the
> URL+PID+Thread(*)+serial number; and then matching it a
> corresponding END action in the response_body#close Anything
> without a corresponding END action can be deemed a loss and
> matched up with the KILL action based on PID.
>
> (*) Log Thread/Fiber so it can work with other servers, too.
>
>
> I seem to remember it was possible to get that information out of
> Rails logs pretty easily, already; and I seem recall doing that
> back many years ago when I used Rails. (This is probably why
> USR1 log reopening waits until a response is done before
> triggering...)
>
>
> And as I've stated many times before, I don't want any sort of
> lock-in or even guide-in to make people feel like they're stuck
> using unicorn (by having code which depends on it).  I also
> believe the unicorn `timeout' is a misfeature that probably set
> the entire Rack/Ruby ecosystem back 10 years or more, so I'd
> rather people stop depending on it and fix their timeouts.
>
> (To that end, I may see about making timeout.rb in stdlib better
>  for Ruby 2.6...)
>
>> Clearly an opt-in thing, but would be very handy for quick diagnostics
>> cause we can then avoid deeper log analysis and raise events just as
>> this happens.
>
> Sorry, I prefer generic solutions which work with other servers, too.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Log URL with murder_lazy_workers
       [not found]     ` <CADGZSScpXo7-PvM=Ki64hhPSzWAsjyT+fWKAZ9-30U69x+54iA@mail.gmail.com>
@ 2018-01-15  3:22       ` Sam Saffron
  0 siblings, 0 replies; 5+ messages in thread
From: Sam Saffron @ 2018-01-15  3:22 UTC (permalink / raw)
  To: Jeremy Evans; +Cc: Eric Wong, unicorn-public

Hi Jeremy,

Yes we already have statement timeout set, it just does not trigger
under extreme load cause its usually a piling of papercuts and not one
big nasty query, 26 seconds in you issue yet another query and...
boom. I guess we could prepend a custom exec that times out early and
re-sets it per statement.

Thanks for the suggestion of the info file, I will consider adding
something like it.

Sam

On Mon, Jan 15, 2018 at 1:40 PM, Jeremy Evans <jeremyevans0@gmail.com> wrote:
> On Sun, Jan 14, 2018 at 6:18 PM, Sam Saffron <sam.saffron@gmail.com> wrote:
>>
>> It is super likely this could be handled in the app if we had:
>>
>> db_connection.timeout_at Time.now + 29
>>
>> Then the connection could trigger the timeout and kill off the request
>> without needing to tear down the entire process and re-forking.
>>
>> Making this happen is a bit tricky cause it would require some hacking
>> on the pg gem.
>
>
> Sam,
>
> For this particular case, you can SET statement_timeout (29000) on the
> PostgreSQL connection, but note that is a per statement timeout, not a per
> web request timeout.
>
> What I end up doing for important production apps is have the worker log
> request information for each request to an open file descriptor
> (seek(0)/write/truncate), and having the master process use the contents of
> the worker-specific file to report errors in case of worker crashes.
>
> Thanks,
> Jeremy
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Log URL with murder_lazy_workers
  2018-01-14 23:27 Log URL with murder_lazy_workers Sam Saffron
  2018-01-15  1:57 ` Eric Wong
@ 2018-01-15 15:06 ` Carl Brasic
  1 sibling, 0 replies; 5+ messages in thread
From: Carl Brasic @ 2018-01-15 15:06 UTC (permalink / raw)
  Cc: unicorn-public

We solved this issue by writing a small rack middleware that launches a
background thread for every request that sleeps for one second short of the
unicorn timeout and then logs some info (including the URL) to a file.

The nice thing about this approach is that you have access to the main thread's
information, including env['action_controller.instance'] to get controller
params, current user information, app backtrace, etc.  That last one in
particular has been very valuable in diagnosing complex performance problems.

In very high volume applications the thread per request model might be a
bottleneck, it could be modified to a persistent buddy thread approach.

Carl

On Sun, Jan 14, 2018 at 5:27 PM, Sam Saffron <sam.saffron@gmail.com> wrote:
> I would love to start logging the actual URL that timed out when
> murder_lazy_workers does its thing.
>
> Clearly the master process has no knowledge here, but perhaps if we
> had a named pipe from each child to master we could quickly post
> current url down the pipe so we would have something to log when we
> murder a url.
>
> Clearly an opt-in thing, but would be very handy for quick diagnostics
> cause we can then avoid deeper log analysis and raise events just as
> this happens.
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-01-15 15:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-14 23:27 Log URL with murder_lazy_workers Sam Saffron
2018-01-15  1:57 ` Eric Wong
2018-01-15  2:18   ` Sam Saffron
     [not found]     ` <CADGZSScpXo7-PvM=Ki64hhPSzWAsjyT+fWKAZ9-30U69x+54iA@mail.gmail.com>
2018-01-15  3:22       ` Sam Saffron
2018-01-15 15:06 ` Carl Brasic

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).