Shared Metrics Between Workers

unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed

* Shared Metrics Between Workers
@ 2015-11-13  0:51 Jeff Utter
  2015-11-13  1:23 ` Eric Wong
  2015-11-13 11:04 ` Michael Fischer
  0 siblings, 2 replies; 6+ messages in thread
From: Jeff Utter @ 2015-11-13  0:51 UTC (permalink / raw)
  To: unicorn-public

Hello,

I was wondering if anyone can offer any advice in handling stats
collections between worker processes in forking servers (like unicorn).
Specifically, I am attempting to work on a solution for the Prometheus ruby
gem. Some details are in this issue here:
https://github.com/prometheus/client_ruby/issues/9

Prometheus works with a "scrape" model, where every few seconds a
prometheus server hits a http endpoint that exposes status. With the
current middleware the stats will only represent whichever worker is hit.

I have read through the documentation for unicorn and poked around the
source code some  -- as well as searched for similar projects for
inspiration.

The earliest, promising solution I considered was raindrops, but it looks
as though you need to know all of the possible metrics up front - which
won't necessarily work as prometheus could use metrics based on parameters
which could vary.

Does anyone have any experience working with something like this?

Thanks for any suggestions.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Shared Metrics Between Workers
  2015-11-13  0:51 Shared Metrics Between Workers Jeff Utter
@ 2015-11-13  1:23 ` Eric Wong
  2015-11-13 14:33   ` Jeff Utter
  2015-11-13 11:04 ` Michael Fischer
  1 sibling, 1 reply; 6+ messages in thread
From: Eric Wong @ 2015-11-13  1:23 UTC (permalink / raw)
  To: Jeff Utter; +Cc: unicorn-public

Jeff Utter <jeff.utter@firespring.com> wrote:
> The earliest, promising solution I considered was raindrops, but it looks
> as though you need to know all of the possible metrics up front - which
> won't necessarily work as prometheus could use metrics based on parameters
> which could vary.

You don't have to return all the data you'd aggregate with raindrops,
though.  Just what was requested.

GDBM (in the stdlib), SQLite, RRD, or any "on-filesystem"[1] data store
should work, even.  If you do have a lot of stats; batch the updates
locally (per-worker) and write them to a shared area periodically.

I'd even batch incrementing the shared counters with raindrops if
you have too many stats.  Atomic increments (implemented with
GCC intrinsics) may get expensive (as they require SMP memory
barriers).

[1] Definitely put any on-disk stats files on tmpfs (/dev/shm on most
GNU/Linux setups) to avoid wearing out hard drives or SSDs.  Otherwise
disable fsync in GDBM (off by default, I think...) or SQLite (set the
synchronous pragma to "off").

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Shared Metrics Between Workers
  2015-11-13  0:51 Shared Metrics Between Workers Jeff Utter
  2015-11-13  1:23 ` Eric Wong
@ 2015-11-13 11:04 ` Michael Fischer
  2015-11-13 14:37   ` Jeff Utter
  1 sibling, 1 reply; 6+ messages in thread
From: Michael Fischer @ 2015-11-13 11:04 UTC (permalink / raw)
  To: Jeff Utter; +Cc: unicorn-public

On Fri, Nov 13, 2015 at 8:51 AM, Jeff Utter <jeff.utter@firespring.com> wrote:

> I was wondering if anyone can offer any advice in handling stats
> collections between worker processes in forking servers (like unicorn).
> Specifically, I am attempting to work on a solution for the Prometheus ruby
> gem. Some details are in this issue here:
> https://github.com/prometheus/client_ruby/issues/9

We run a statsd server on our application servers, and our
applications invoke statsd operations against various counters and
gauges.  The statsd protocol is UDP based and very fast.  statsd
itself keeps all data in memory and flushes it to its backend every
few seconds.

Multiple implementations of statsd exist.  We use Datadog's, but there
are lots of implementations out there.

Best regards,

--Michael

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Shared Metrics Between Workers
  2015-11-13  1:23 ` Eric Wong
@ 2015-11-13 14:33   ` Jeff Utter
  2015-11-13 20:54     ` Eric Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff Utter @ 2015-11-13 14:33 UTC (permalink / raw)
  To: Eric Wong; +Cc: unicorn-public

Thanks for the quick reply. I very much appreciate the insight.

On November 12, 2015 at 7:23:13 PM, Eric Wong (e@80x24.org(mailto:e@80x24.org)) wrote:
> Jeff Utter wrote:  
> > The earliest, promising solution I considered was raindrops, but it looks  
> > as though you need to know all of the possible metrics up front - which  
> > won't necessarily work as prometheus could use metrics based on parameters  
> > which could vary.  
>  
> You don't have to return all the data you'd aggregate with raindrops,  
> though. Just what was requested.  

Just to make sure I understand this correctly though, in order for the metrics to be available between workers, the raindrops structs would need to be setup for each metric before unicorn forks? 

> GDBM (in the stdlib), SQLite, RRD, or any "on-filesystem"[1] data store  
> should work, even. If you do have a lot of stats; batch the updates  
> locally (per-worker) and write them to a shared area periodically.

Are you suggesting this data store would be shared between workers or one per worker (and whatever displays the metrics would read all the stores)? I tried sharing between workers with DBM and GDBM and both of them end up losing metrics due to being overwritten by other threads. I imagine I would have to lock the file whenever one is writing, which would block other workers (not ideal). Out of the box PStore works fine for this (surprisingly). I'm guessing it does file locks behind the scenes.

Right now I'm thinking that the best way to handle this would be one data store per worker and then whatever reads the metrics scrapes them all read-only. My biggest concern with this approach is knowing which data-stores are valid. I suppose I could put them all in a folder based off the parent's pid. However, would it be possible that some could be orphaned if a worker is killed by the master? I would need some way for the master to communicate to the collector (probably in a worker) what other workers are actively running. Is that possible?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Shared Metrics Between Workers
  2015-11-13 11:04 ` Michael Fischer
@ 2015-11-13 14:37   ` Jeff Utter
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff Utter @ 2015-11-13 14:37 UTC (permalink / raw)
  To: Michael Fischer; +Cc: unicorn-public

On November 13, 2015 at 5:04:28 AM, Michael Fischer (mfischer@zendesk.com(mailto:mfischer@zendesk.com)) wrote:

> On Fri, Nov 13, 2015 at 8:51 AM, Jeff Utter wrote:
> 
> > I was wondering if anyone can offer any advice in handling stats
> > collections between worker processes in forking servers (like unicorn).
> > Specifically, I am attempting to work on a solution for the Prometheus ruby
> > gem. Some details are in this issue here:
> > https://github.com/prometheus/client_ruby/issues/9
> 
> We run a statsd server on our application servers, and our
> applications invoke statsd operations against various counters and
> gauges. The statsd protocol is UDP based and very fast. statsd
> itself keeps all data in memory and flushes it to its backend every
> few seconds.

Yeah, this does seem simpler in the case of forking servers. Part of Prometheus' ethos, however, is that metrics are scraped. I suppose it might be possible to have each worker push (with statsd) to a locally running collector that then creates a scraping endpoint. This, however creates additional load on the server to handle all the incoming stats into the statsd server which would otherwise not be needed if the workers could just increment their own counts.

For this specific project I may look into statsd instead of Prometheus, since it doesn't seem to play well with forking servers at the moment. However, I would really prefer to find a way to make it play well with forking servers.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Shared Metrics Between Workers
  2015-11-13 14:33   ` Jeff Utter
@ 2015-11-13 20:54     ` Eric Wong
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2015-11-13 20:54 UTC (permalink / raw)
  To: Jeff Utter; +Cc: unicorn-public

Jeff Utter <jeff.utter@firespring.com> wrote:
> On November 12, 2015 at 7:23:13 PM, Eric Wong (e@80x24.org) wrote:
> > You don't have to return all the data you'd aggregate with raindrops,  
> > though. Just what was requested.  
> 
> Just to make sure I understand this correctly though, in order for the
> metrics to be available between workers, the raindrops structs would
> need to be setup for each metric before unicorn forks? 

Yes.  But most (if not all) metrics you'd care about will need
aggregation, and thus must be known/aggregated for the lifetime
of a process, correct?

> > GDBM (in the stdlib), SQLite, RRD, or any "on-filesystem"[1] data store  
> > should work, even. If you do have a lot of stats; batch the updates  
> > locally (per-worker) and write them to a shared area periodically.
> 
> Are you suggesting this data store would be shared between workers or
> one per worker (and whatever displays the metrics would read all the
> stores)? I tried sharing between workers with DBM and GDBM and both of
> them end up losing metrics due to being overwritten by other threads.
> I imagine I would have to lock the file whenever one is writing, which
> would block other workers (not ideal). Out of the box PStore works
> fine for this (surprisingly). I'm guessing it does file locks behind
> the scenes.

The data in the on-filesystem store would be shared across processes.
But you'd probably want to aggregate locally in a hash before flushing
periodically.

You're probably losing data because DB file descriptors are shared
across fork.  You need to open DBs/connections after forking.  With any
DB, you can't expect to open/share open file descriptors across fork.

You can safely share UDP sockets across fork, and likely SCTP if
implemented in the kernel (I haven't tried).  But any userland wrappers
on top of the UDP socket (e.g. statsd, as Michael mentioned) will need
to be checked for fork-friendliness.

> Right now I'm thinking that the best way to handle this would be one
> data store per worker and then whatever reads the metrics scrapes them
> all read-only. My biggest concern with this approach is knowing which
> data-stores are valid. I suppose I could put them all in a folder
> based off the parent's pid. However, would it be possible that some
> could be orphaned if a worker is killed by the master? I would need
> some way for the master to communicate to the collector (probably in a
> worker) what other workers are actively running. Is that possible?

I don't think you to worry about all that.  You'd want stats even for
dead workers to stick around if they were running the same code as
current worker.

OTOH, you probably want to reset/drop stats on new deploys;
so maybe key the stats based on the version of the app you're running.

I also forget to mention I've used memcached for some stats, too.  It's
great when the data is fast-expiring, disposable and needs to be shared
across several machines; not just processes within the same host.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-11-13 20:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-13  0:51 Shared Metrics Between Workers Jeff Utter
2015-11-13  1:23 ` Eric Wong
2015-11-13 14:33   ` Jeff Utter
2015-11-13 20:54     ` Eric Wong
2015-11-13 11:04 ` Michael Fischer
2015-11-13 14:37   ` Jeff Utter

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).