* Log URL with murder_lazy_workers @ 2018-01-14 23:27 Sam Saffron 2018-01-15 1:57 ` Eric Wong 2018-01-15 15:06 ` Carl Brasic 0 siblings, 2 replies; 5+ messages in thread From: Sam Saffron @ 2018-01-14 23:27 UTC (permalink / raw) To: unicorn-public I would love to start logging the actual URL that timed out when murder_lazy_workers does its thing. Clearly the master process has no knowledge here, but perhaps if we had a named pipe from each child to master we could quickly post current url down the pipe so we would have something to log when we murder a url. Clearly an opt-in thing, but would be very handy for quick diagnostics cause we can then avoid deeper log analysis and raise events just as this happens. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Log URL with murder_lazy_workers 2018-01-14 23:27 Log URL with murder_lazy_workers Sam Saffron @ 2018-01-15 1:57 ` Eric Wong 2018-01-15 2:18 ` Sam Saffron 2018-01-15 15:06 ` Carl Brasic 1 sibling, 1 reply; 5+ messages in thread From: Eric Wong @ 2018-01-15 1:57 UTC (permalink / raw) To: Sam Saffron; +Cc: unicorn-public Sam Saffron <sam.saffron@gmail.com> wrote: > I would love to start logging the actual URL that timed out when > murder_lazy_workers does its thing. > > Clearly the master process has no knowledge here, but perhaps if we > had a named pipe from each child to master we could quickly post > current url down the pipe so we would have something to log when we > murder a url. That would make the master a bottleneck. Instead, I suggest logging a START action with the URL+PID+Thread(*)+serial number; and then matching it a corresponding END action in the response_body#close Anything without a corresponding END action can be deemed a loss and matched up with the KILL action based on PID. (*) Log Thread/Fiber so it can work with other servers, too. I seem to remember it was possible to get that information out of Rails logs pretty easily, already; and I seem recall doing that back many years ago when I used Rails. (This is probably why USR1 log reopening waits until a response is done before triggering...) And as I've stated many times before, I don't want any sort of lock-in or even guide-in to make people feel like they're stuck using unicorn (by having code which depends on it). I also believe the unicorn `timeout' is a misfeature that probably set the entire Rack/Ruby ecosystem back 10 years or more, so I'd rather people stop depending on it and fix their timeouts. (To that end, I may see about making timeout.rb in stdlib better for Ruby 2.6...) > Clearly an opt-in thing, but would be very handy for quick diagnostics > cause we can then avoid deeper log analysis and raise events just as > this happens. Sorry, I prefer generic solutions which work with other servers, too. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Log URL with murder_lazy_workers 2018-01-15 1:57 ` Eric Wong @ 2018-01-15 2:18 ` Sam Saffron [not found] ` <CADGZSScpXo7-PvM=Ki64hhPSzWAsjyT+fWKAZ9-30U69x+54iA@mail.gmail.com> 0 siblings, 1 reply; 5+ messages in thread From: Sam Saffron @ 2018-01-15 2:18 UTC (permalink / raw) To: Eric Wong; +Cc: unicorn-public > I seem to remember it was possible to get that information out of Rails logs pretty easily, already; and I seem recall doing that back many years ago when I used Rails. (This is probably why USR1 log reopening waits until a response is done before triggering...) Yes I am able to get the info out of our logs, I just read through haproxy logs filtering on stuff that takes longer than N. Getting alerts though is a bit trickier, but I can probably workaround without hacking at unicorn. > I also believe the unicorn `timeout' is a misfeature that probably set the entire Rack/Ruby ecosystem back 10 years or more This is super duper tricky and I do not think this is deficiency of timeout. It is merely being employed as a workaround for design problems. There is usually only 1 or 2 spots that could cause timeouts, 99% of the time it is slow database queries under extreme load. It is super likely this could be handled in the app if we had: db_connection.timeout_at Time.now + 29 Then the connection could trigger the timeout and kill off the request without needing to tear down the entire process and re-forking. Making this happen is a bit tricky cause it would require some hacking on the pg gem. I am just not sure how hacking at timeout can make stuff any better, it is an escape hatch, just in case code misbehaves. On Mon, Jan 15, 2018 at 12:57 PM, Eric Wong <e@80x24.org> wrote: > Sam Saffron <sam.saffron@gmail.com> wrote: >> I would love to start logging the actual URL that timed out when >> murder_lazy_workers does its thing. >> >> Clearly the master process has no knowledge here, but perhaps if we >> had a named pipe from each child to master we could quickly post >> current url down the pipe so we would have something to log when we >> murder a url. > > That would make the master a bottleneck. > > Instead, I suggest logging a START action with the > URL+PID+Thread(*)+serial number; and then matching it a > corresponding END action in the response_body#close Anything > without a corresponding END action can be deemed a loss and > matched up with the KILL action based on PID. > > (*) Log Thread/Fiber so it can work with other servers, too. > > > I seem to remember it was possible to get that information out of > Rails logs pretty easily, already; and I seem recall doing that > back many years ago when I used Rails. (This is probably why > USR1 log reopening waits until a response is done before > triggering...) > > > And as I've stated many times before, I don't want any sort of > lock-in or even guide-in to make people feel like they're stuck > using unicorn (by having code which depends on it). I also > believe the unicorn `timeout' is a misfeature that probably set > the entire Rack/Ruby ecosystem back 10 years or more, so I'd > rather people stop depending on it and fix their timeouts. > > (To that end, I may see about making timeout.rb in stdlib better > for Ruby 2.6...) > >> Clearly an opt-in thing, but would be very handy for quick diagnostics >> cause we can then avoid deeper log analysis and raise events just as >> this happens. > > Sorry, I prefer generic solutions which work with other servers, too. ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <CADGZSScpXo7-PvM=Ki64hhPSzWAsjyT+fWKAZ9-30U69x+54iA@mail.gmail.com>]
* Re: Log URL with murder_lazy_workers [not found] ` <CADGZSScpXo7-PvM=Ki64hhPSzWAsjyT+fWKAZ9-30U69x+54iA@mail.gmail.com> @ 2018-01-15 3:22 ` Sam Saffron 0 siblings, 0 replies; 5+ messages in thread From: Sam Saffron @ 2018-01-15 3:22 UTC (permalink / raw) To: Jeremy Evans; +Cc: Eric Wong, unicorn-public Hi Jeremy, Yes we already have statement timeout set, it just does not trigger under extreme load cause its usually a piling of papercuts and not one big nasty query, 26 seconds in you issue yet another query and... boom. I guess we could prepend a custom exec that times out early and re-sets it per statement. Thanks for the suggestion of the info file, I will consider adding something like it. Sam On Mon, Jan 15, 2018 at 1:40 PM, Jeremy Evans <jeremyevans0@gmail.com> wrote: > On Sun, Jan 14, 2018 at 6:18 PM, Sam Saffron <sam.saffron@gmail.com> wrote: >> >> It is super likely this could be handled in the app if we had: >> >> db_connection.timeout_at Time.now + 29 >> >> Then the connection could trigger the timeout and kill off the request >> without needing to tear down the entire process and re-forking. >> >> Making this happen is a bit tricky cause it would require some hacking >> on the pg gem. > > > Sam, > > For this particular case, you can SET statement_timeout (29000) on the > PostgreSQL connection, but note that is a per statement timeout, not a per > web request timeout. > > What I end up doing for important production apps is have the worker log > request information for each request to an open file descriptor > (seek(0)/write/truncate), and having the master process use the contents of > the worker-specific file to report errors in case of worker crashes. > > Thanks, > Jeremy > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Log URL with murder_lazy_workers 2018-01-14 23:27 Log URL with murder_lazy_workers Sam Saffron 2018-01-15 1:57 ` Eric Wong @ 2018-01-15 15:06 ` Carl Brasic 1 sibling, 0 replies; 5+ messages in thread From: Carl Brasic @ 2018-01-15 15:06 UTC (permalink / raw) Cc: unicorn-public We solved this issue by writing a small rack middleware that launches a background thread for every request that sleeps for one second short of the unicorn timeout and then logs some info (including the URL) to a file. The nice thing about this approach is that you have access to the main thread's information, including env['action_controller.instance'] to get controller params, current user information, app backtrace, etc. That last one in particular has been very valuable in diagnosing complex performance problems. In very high volume applications the thread per request model might be a bottleneck, it could be modified to a persistent buddy thread approach. Carl On Sun, Jan 14, 2018 at 5:27 PM, Sam Saffron <sam.saffron@gmail.com> wrote: > I would love to start logging the actual URL that timed out when > murder_lazy_workers does its thing. > > Clearly the master process has no knowledge here, but perhaps if we > had a named pipe from each child to master we could quickly post > current url down the pipe so we would have something to log when we > murder a url. > > Clearly an opt-in thing, but would be very handy for quick diagnostics > cause we can then avoid deeper log analysis and raise events just as > this happens. > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-01-15 15:07 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-01-14 23:27 Log URL with murder_lazy_workers Sam Saffron 2018-01-15 1:57 ` Eric Wong 2018-01-15 2:18 ` Sam Saffron [not found] ` <CADGZSScpXo7-PvM=Ki64hhPSzWAsjyT+fWKAZ9-30U69x+54iA@mail.gmail.com> 2018-01-15 3:22 ` Sam Saffron 2018-01-15 15:06 ` Carl Brasic
Code repositories for project(s) associated with this public inbox https://yhbt.net/unicorn.git/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).