WTF is up with memory usage nowadays?

unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed

* WTF is up with memory usage nowadays?
@ 2016-12-12  2:10 Eric Wong
  2016-12-12  4:05 ` Sam Saffron
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Eric Wong @ 2016-12-12  2:10 UTC (permalink / raw)
  To: unicorn-public

<rant> Came across this in my feeds today:

https://about.gitlab.com/2016/12/11/proposed-server-purchase-for-gitlab-com/

... Yeah, they cite 0.5 GB of memory usage per unicorn worker.
I guess this is typical nowadays, but damn, it sucks :<

This is not the future I had in mind or ever wanted unicorn to
be associated with back in 2009 when I started.

I don't think it's the fault of unicorn itself; unicorn recycles
request buffers, uses pre-frozen hash keys, and even
uses String#clear nowadays to discard heap memory, and never
buffers more than it has to.

Since day one, unicorn was built to handle multi-gigabyte
uploads and responses; even from a crappy 256MB laptop.
"curl -T-" is my co-pilot :)

So... I guess the problem is up the stack in the app or
framework.  Maybe Rails?  *shrug*  I don't use that anymore...

I remember using Rails over a decade ago and being shocked at
50MB (yes, fifty megabytes) of RSS usage.  This was on 32-bit,
but even in the worst case on 64-bit, it would be 100MB.
Of course, nowadays Rails has grown to the point where I'm
afraid to go near it; instead I work directly off Rack.

And yes, I still freak out nowadays when my Rack processes
exceed 100MB...

So, what can and should we do about it?

* First step: Limit ourselves.

  Use slower, older hardware, slower Internet connection so you
  force yourself to eke out every bit of performance out of
  what you have.

  It's utterly hilarious for me to hear about people complain
  about laptops which can "only" have 16GB RAM.

  I've definitely made transgressions in the past, and the worst
  code I've written was on powerful hardware.

Disclaimer: Some of the following may not be very Ruby-ish :P
And everything else is optional and the result of the first step
above.

* Recycle.  Don't waste object slots: {Array,Hash,String}#clear
  can allow you to recycle heap memory for large objects
  and minimize GC pressure.  Using thread-local variables
  in your app helps maintain compatibility with multi-threaded
  Rack servers; or perhaps go Rack env-local for compatibility
  with single-threaded non-blocking servers.

* Can't recycle?  Discard objects you don't need, ASAP,
  and continue #clear-ing what you can. Take advantage
  of streaming built into Rack.

  The Rack response body only needs to respond to #each.
  There should be no reason to build giant response
  documents in memory before sending them to a client.

  unicorn can't do the following for you automatically since
  we don't know how/if a Rack app will reuse a string;
  but upstack authors can String#clear after yielding
  in #each to ensure any malloced heap memory is immediately
  available for future use (but beware of downstream middlewares
  which do not expect this, too(**)):

    def each
      # .. do something to generate a giant string
      yield giant_string
      giant_string.clear # String#clear
    end

  A Rack response body may also respond to #close; it can
  be used to explicitly release any response-local resources.
  Rack::TempfileReaper + Rack::BodyProxy is an example of
  this for Tempfiles.

  Smaller functions and smaller code helps keep this manageable.

* Avoid slurping.  Large datasets do not need everything up
  front.  For example, threading 10K messages entirely
  in memory is no problem: just don't load entire messages
  into memory up front, only what you need.
  JWZ's algorithm was doing this in the 90s:
  https://www.jwz.org/doc/threading.html

Disclaimer: Some of these things may hurt throughput and
performance in benchmarks, especially with smaller datasets;
but I consider predictable and consistent performance more
far more important than burst throughput.

** Know your entire stack; top to bottom.
   You ought to be able to track every single line of code
   in a high-level Rack app you maintain down through each
   and every layer of framework, middleware, Rack server,
   Ruby VM, C library, down to the OS kernel.

   Yes, this limits you to using smaller and simpler stacks :P

*** Why stick with Ruby if you care about memory usage?

I'm too impatient to wait on compilers, and don't like the extra
storage of binaries.  Scripting languages forces authors to
distribute (hopefully non-obfuscated) code; reducing network and
storage costs, and that also lowers the barrier from user to
hacker.  Fwiw, I actually prefer Perl5 with the predictability
(and caveats of) refcounting over a GC like Ruby's.

</rant>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: WTF is up with memory usage nowadays?
  2016-12-12  2:10 WTF is up with memory usage nowadays? Eric Wong
@ 2016-12-12  4:05 ` Sam Saffron
  2016-12-12  5:48   ` Eric Wong
  2016-12-12  9:49 ` hukl
  2017-02-08 20:00 ` Eric Wong
  2 siblings, 1 reply; 5+ messages in thread
From: Sam Saffron @ 2016-12-12  4:05 UTC (permalink / raw)
  To: unicorn-public

As to who is at fault here, it is a little bit of "everyone" in a big bucket.

- The new GC is far more memory hungry than 1.9 line, even with
`RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.5` it is still a lot more space
than it used to

- Ruby have been very slow at dealing with the "elephant in the room"
which is larger processes, stuff like
https://bugs.ruby-lang.org/issues/12967 goes ignored and is sadly
unlikely to happen for upcoming release

- The current focus for "Ruby" is 3x3 (ruby 3 being 3 times faster)
... there is no focus on reducing memory usage

- Most people use default c memory allocator, despite tcmalloc
offering quite a decent win
https://github.com/SamSaffron/allocator_bench

- mime types gem is a pariah... apparently EVERY install of rails
needs to hold 46 megabytes of mime types in memory cause ... I don't
know why ... (even with 'mime/types/columnar')

- Booting a rails app eats up half a million slots in your ruby heaps,
clearly the vast majority of this is unneeded waste. Stuff like
keeping the string "MIT" in memory 125 times forever cause "rubygems",
jumps out.  There is zero focus anywhere to fix this issue.

- Ruby heaps grow too fast, even if you put on the breaks, it is hard
to diagnose how you reached 7000 heaps at runtime when 6000 of them
are empty.

Overall there is tons that can be done, but unfortunately there is no
focus anywhere to fix stuff.

On Mon, Dec 12, 2016 at 1:10 PM, Eric Wong <e@80x24.org> wrote:
> <rant> Came across this in my feeds today:
>
> https://about.gitlab.com/2016/12/11/proposed-server-purchase-for-gitlab-com/
>
> ... Yeah, they cite 0.5 GB of memory usage per unicorn worker.
> I guess this is typical nowadays, but damn, it sucks :<
>
> This is not the future I had in mind or ever wanted unicorn to
> be associated with back in 2009 when I started.
>
> I don't think it's the fault of unicorn itself; unicorn recycles
> request buffers, uses pre-frozen hash keys, and even
> uses String#clear nowadays to discard heap memory, and never
> buffers more than it has to.
>
> Since day one, unicorn was built to handle multi-gigabyte
> uploads and responses; even from a crappy 256MB laptop.
> "curl -T-" is my co-pilot :)
>
>
> So... I guess the problem is up the stack in the app or
> framework.  Maybe Rails?  *shrug*  I don't use that anymore...
>
> I remember using Rails over a decade ago and being shocked at
> 50MB (yes, fifty megabytes) of RSS usage.  This was on 32-bit,
> but even in the worst case on 64-bit, it would be 100MB.
> Of course, nowadays Rails has grown to the point where I'm
> afraid to go near it; instead I work directly off Rack.
>
> And yes, I still freak out nowadays when my Rack processes
> exceed 100MB...
>
>
> So, what can and should we do about it?
>
> * First step: Limit ourselves.
>
>   Use slower, older hardware, slower Internet connection so you
>   force yourself to eke out every bit of performance out of
>   what you have.
>
>   It's utterly hilarious for me to hear about people complain
>   about laptops which can "only" have 16GB RAM.
>
>   I've definitely made transgressions in the past, and the worst
>   code I've written was on powerful hardware.
>
>
> Disclaimer: Some of the following may not be very Ruby-ish :P
> And everything else is optional and the result of the first step
> above.
>
> * Recycle.  Don't waste object slots: {Array,Hash,String}#clear
>   can allow you to recycle heap memory for large objects
>   and minimize GC pressure.  Using thread-local variables
>   in your app helps maintain compatibility with multi-threaded
>   Rack servers; or perhaps go Rack env-local for compatibility
>   with single-threaded non-blocking servers.
>
>
> * Can't recycle?  Discard objects you don't need, ASAP,
>   and continue #clear-ing what you can. Take advantage
>   of streaming built into Rack.
>
>   The Rack response body only needs to respond to #each.
>   There should be no reason to build giant response
>   documents in memory before sending them to a client.
>
>   unicorn can't do the following for you automatically since
>   we don't know how/if a Rack app will reuse a string;
>   but upstack authors can String#clear after yielding
>   in #each to ensure any malloced heap memory is immediately
>   available for future use (but beware of downstream middlewares
>   which do not expect this, too(**)):
>
>     def each
>       # .. do something to generate a giant string
>       yield giant_string
>       giant_string.clear # String#clear
>     end
>
>   A Rack response body may also respond to #close; it can
>   be used to explicitly release any response-local resources.
>   Rack::TempfileReaper + Rack::BodyProxy is an example of
>   this for Tempfiles.
>
>   Smaller functions and smaller code helps keep this manageable.
>
> * Avoid slurping.  Large datasets do not need everything up
>   front.  For example, threading 10K messages entirely
>   in memory is no problem: just don't load entire messages
>   into memory up front, only what you need.
>   JWZ's algorithm was doing this in the 90s:
>   https://www.jwz.org/doc/threading.html
>
> Disclaimer: Some of these things may hurt throughput and
> performance in benchmarks, especially with smaller datasets;
> but I consider predictable and consistent performance more
> far more important than burst throughput.
>
>
> ** Know your entire stack; top to bottom.
>    You ought to be able to track every single line of code
>    in a high-level Rack app you maintain down through each
>    and every layer of framework, middleware, Rack server,
>    Ruby VM, C library, down to the OS kernel.
>
>    Yes, this limits you to using smaller and simpler stacks :P
>
>
> *** Why stick with Ruby if you care about memory usage?
>
> I'm too impatient to wait on compilers, and don't like the extra
> storage of binaries.  Scripting languages forces authors to
> distribute (hopefully non-obfuscated) code; reducing network and
> storage costs, and that also lowers the barrier from user to
> hacker.  Fwiw, I actually prefer Perl5 with the predictability
> (and caveats of) refcounting over a GC like Ruby's.
>
> </rant>
> --
> unsubscribe: unicorn-public+unsubscribe@bogomips.org
> archive: https://bogomips.org/unicorn-public/
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: WTF is up with memory usage nowadays?
  2016-12-12  4:05 ` Sam Saffron
@ 2016-12-12  5:48   ` Eric Wong
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2016-12-12  5:48 UTC (permalink / raw)
  To: Sam Saffron; +Cc: unicorn-public

Sam Saffron <sam.saffron@gmail.com> wrote:
> As to who is at fault here, it is a little bit of "everyone" in a big bucket.

I agree.  At least everyone who was blindly upgrading hardware :)

But yeah, memory usage really is a whack-a-mole affair and
programmers need to constantly watch out for this.

> - The new GC is far more memory hungry than 1.9 line, even with
> `RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.5` it is still a lot more space
> than it used to
> 
> - Ruby have been very slow at dealing with the "elephant in the room"
> which is larger processes, stuff like
> https://bugs.ruby-lang.org/issues/12967 goes ignored and is sadly
> unlikely to happen for upcoming release

I suggest re-ping ruby-core every week or two about these things.
Goes for anyone for just about any maintained project :)
Sometimes stuff is honestly forgotten, not conciously ignored.

> - The current focus for "Ruby" is 3x3 (ruby 3 being 3 times faster)
> ... there is no focus on reducing memory usage

Yeah, I was disappointed about that, too.  And I think it was
too lofty of a goal and marketing hype.

> - Most people use default c memory allocator, despite tcmalloc
> offering quite a decent win
> https://github.com/SamSaffron/allocator_bench

I haven't checked in a while, but doesn't tcmalloc hold
once-allocated memory forever?  But yeah, I remember tcmalloc
having good fragmentation avoidance, at least.

Unfortunately, tcmalloc being C++ means it's less-debuggable to
ordinary C programmers; so it wasn't something I wanted to deal
with myself.

And I thought jemalloc was the accepted standard, nowadays;
though I don't notice a difference from glibc
(I cap both at one arena for MRI).

> - mime types gem is a pariah... apparently EVERY install of rails
> needs to hold 46 megabytes of mime types in memory cause ... I don't
> know why ... (even with 'mime/types/columnar')

Eeep!  I guess I'm lucky to not have that...

> - Booting a rails app eats up half a million slots in your ruby heaps,
> clearly the vast majority of this is unneeded waste. Stuff like
> keeping the string "MIT" in memory 125 times forever cause "rubygems",
> jumps out.  There is zero focus anywhere to fix this issue.

...Or that problem.  Though I do sometimes wish pure Ruby could
have a reasonable way to use the rb_fstring C API
(for dedupe+freeze).

But yeah; aside from obscure projects, I've mostly left Ruby due
to the expectation to use JS, accept ToS and have accounts on
non-Free services; in addition to...

> - Ruby heaps grow too fast, even if you put on the breaks, it is hard
> to diagnose how you reached 7000 heaps at runtime when 6000 of them
> are empty.

...the unpredictability of GC.

> Overall there is tons that can be done, but unfortunately there is no
> focus anywhere to fix stuff.

Maybe it starts with things like not top posting :)
Unfortunately, most mail processing software, even Email::MIME
in Perl (which I use for hosting the archives), still requires
loading an entire message into memory :<  Gross, but at least
I only have one full message in memory at once; even for
rendering threads with 368 messages

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: WTF is up with memory usage nowadays?
  2016-12-12  2:10 WTF is up with memory usage nowadays? Eric Wong
  2016-12-12  4:05 ` Sam Saffron
@ 2016-12-12  9:49 ` hukl
  2017-02-08 20:00 ` Eric Wong
  2 siblings, 0 replies; 5+ messages in thread
From: hukl @ 2016-12-12  9:49 UTC (permalink / raw)
  To: unicorn-public

I agree with most of what you've written. I drew my conclusions a while 
ago and switched to using Erlang (not Elixir!) for all my backend jobs 
in the past couple of years. In a way it is a more constraint / 
restricted environment. There are less libraries and frameworks 
available and the language itself is smaller in terms of available 
constructs, ways of doing things and how dynamic it feels. But I like 
the constraints. A small language is easier to reason about especially 
if there are not 23 different ways of assigning a value to a variable etc.

Looking back at my Ruby times I often feel like that Ruby was not 
created with the intention to become a universal, high performance 
backend programming language and its amazing that still it got that far. 
Unicorn was one of the big steps enabling us to build somewhat large 
scale backend and that was a big help.

However it feels like people keep putting lipstick on a pig, hoping that 
eventually it will become a shiny, metallic, high performance backend 
programming environment. But it will always remain a pig … or Ruby - no 
matter how many layers of lipstick or abstractions you put on it - and 
in Rubyland people love adding more layers :)

I'm happy with Erlang now because it feels much more designed for the 
task and it has a very interesting and practical mix of design decisions.

What you've described is not an isolated Ruby problem of course as the 
general trend is always towards more abstractions than less, but I guess 
its a little amplified in this particular eco system.

Coming back to what I initially wrote:

I agree with most of what you've written, I just came to a different 
conclusion which is basically giving up the constraint or desire to keep 
programming in Ruby. (But I'm still reading this mailing list ;) )

~ John











Eric Wong wrote:
> <rant>  Came across this in my feeds today:
>
> https://about.gitlab.com/2016/12/11/proposed-server-purchase-for-gitlab-com/
>
> ... Yeah, they cite 0.5 GB of memory usage per unicorn worker.
> I guess this is typical nowadays, but damn, it sucks :<
>
> This is not the future I had in mind or ever wanted unicorn to
> be associated with back in 2009 when I started.
>
> I don't think it's the fault of unicorn itself; unicorn recycles
> request buffers, uses pre-frozen hash keys, and even
> uses String#clear nowadays to discard heap memory, and never
> buffers more than it has to.
>
> Since day one, unicorn was built to handle multi-gigabyte
> uploads and responses; even from a crappy 256MB laptop.
> "curl -T-" is my co-pilot :)
>
>
> So... I guess the problem is up the stack in the app or
> framework.  Maybe Rails?  *shrug*  I don't use that anymore...
>
> I remember using Rails over a decade ago and being shocked at
> 50MB (yes, fifty megabytes) of RSS usage.  This was on 32-bit,
> but even in the worst case on 64-bit, it would be 100MB.
> Of course, nowadays Rails has grown to the point where I'm
> afraid to go near it; instead I work directly off Rack.
>
> And yes, I still freak out nowadays when my Rack processes
> exceed 100MB...
>
>
> So, what can and should we do about it?
>
> * First step: Limit ourselves.
>
>    Use slower, older hardware, slower Internet connection so you
>    force yourself to eke out every bit of performance out of
>    what you have.
>
>    It's utterly hilarious for me to hear about people complain
>    about laptops which can "only" have 16GB RAM.
>
>    I've definitely made transgressions in the past, and the worst
>    code I've written was on powerful hardware.
>
>
> Disclaimer: Some of the following may not be very Ruby-ish :P
> And everything else is optional and the result of the first step
> above.
>
> * Recycle.  Don't waste object slots: {Array,Hash,String}#clear
>    can allow you to recycle heap memory for large objects
>    and minimize GC pressure.  Using thread-local variables
>    in your app helps maintain compatibility with multi-threaded
>    Rack servers; or perhaps go Rack env-local for compatibility
>    with single-threaded non-blocking servers.
>
>
> * Can't recycle?  Discard objects you don't need, ASAP,
>    and continue #clear-ing what you can. Take advantage
>    of streaming built into Rack.
>
>    The Rack response body only needs to respond to #each.
>    There should be no reason to build giant response
>    documents in memory before sending them to a client.
>
>    unicorn can't do the following for you automatically since
>    we don't know how/if a Rack app will reuse a string;
>    but upstack authors can String#clear after yielding
>    in #each to ensure any malloced heap memory is immediately
>    available for future use (but beware of downstream middlewares
>    which do not expect this, too(**)):
>
>      def each
>        # .. do something to generate a giant string
>        yield giant_string
>        giant_string.clear # String#clear
>      end
>
>    A Rack response body may also respond to #close; it can
>    be used to explicitly release any response-local resources.
>    Rack::TempfileReaper + Rack::BodyProxy is an example of
>    this for Tempfiles.
>
>    Smaller functions and smaller code helps keep this manageable.
>
> * Avoid slurping.  Large datasets do not need everything up
>    front.  For example, threading 10K messages entirely
>    in memory is no problem: just don't load entire messages
>    into memory up front, only what you need.
>    JWZ's algorithm was doing this in the 90s:
>    https://www.jwz.org/doc/threading.html
>
> Disclaimer: Some of these things may hurt throughput and
> performance in benchmarks, especially with smaller datasets;
> but I consider predictable and consistent performance more
> far more important than burst throughput.
>
>
> ** Know your entire stack; top to bottom.
>     You ought to be able to track every single line of code
>     in a high-level Rack app you maintain down through each
>     and every layer of framework, middleware, Rack server,
>     Ruby VM, C library, down to the OS kernel.
>
>     Yes, this limits you to using smaller and simpler stacks :P
>
>
> *** Why stick with Ruby if you care about memory usage?
>
> I'm too impatient to wait on compilers, and don't like the extra
> storage of binaries.  Scripting languages forces authors to
> distribute (hopefully non-obfuscated) code; reducing network and
> storage costs, and that also lowers the barrier from user to
> hacker.  Fwiw, I actually prefer Perl5 with the predictability
> (and caveats of) refcounting over a GC like Ruby's.
>
> </rant>
> --
> unsubscribe: unicorn-public+unsubscribe@bogomips.org
> archive: https://bogomips.org/unicorn-public/
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: WTF is up with memory usage nowadays?
  2016-12-12  2:10 WTF is up with memory usage nowadays? Eric Wong
  2016-12-12  4:05 ` Sam Saffron
  2016-12-12  9:49 ` hukl
@ 2017-02-08 20:00 ` Eric Wong
  2 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2017-02-08 20:00 UTC (permalink / raw)
  To: unicorn-public

Eric Wong <e@80x24.org> wrote:
>   The Rack response body only needs to respond to #each.
>   There should be no reason to build giant response
>   documents in memory before sending them to a client.
> 
>   unicorn can't do the following for you automatically since
>   we don't know how/if a Rack app will reuse a string;
>   but upstack authors can String#clear after yielding
>   in #each to ensure any malloced heap memory is immediately
>   available for future use (but beware of downstream middlewares
>   which do not expect this, too(**)):
> 
>     def each
>       # .. do something to generate a giant string
>       yield giant_string

... The yield above is so unicorn (or any server) can call
IO#write or similar (send(,_nonblock), write_nonblock, etc...).

That means once IO#write is complete, the contents of the string
is shipped off to the OS TCP stack and Ruby can forget about it:

>       giant_string.clear # String#clear
>     end

However, this is largely ineffective since Ruby 2.0.0 - 2.4.0
has a thread-safety fix which causes excessive garbage:

https://svn.ruby-lang.org/cgi-bin/viewvc.cgi?view=revision&revision=34847
https://svn.ruby-lang.org/cgi-bin/viewvc.cgi/trunk/io.c?r1=34847&r2=34846&pathrev=34847

And it looks like that went unnoticed for a few years...
Shame on all of us! :<

Anyways, it looks like an acceptable fix finally got accepted
for Ruby 2.5 (December 2017): https://bugs.ruby-lang.org/issues/13085

I've considered backporting a workaround into unicorn; but I'm
leaning against it since most apps do not recycle buffers at the
moment, so they make a lot of garbage, anyways.

Another place unicorn uses IO#write is buffering large files in
the TeeInput class; but I'm not sure if enough people care about
large uploads.  Embarrassingly, most of the large I/O apps I
maintain are still on 1.9.3, so I did not notice it :x
In any case, anybody running an up-to-date trunk or willing
to wait until Ruby 2.5 won't have to worry or think about this.

But yeah, all of this means there's still a both a runtime and
code complexity cost for supporting 1:1 native threads (at least
the way MRI does it).  This cost is there regardless of whether
or not the code you run uses threads.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-02-08 20:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-12  2:10 WTF is up with memory usage nowadays? Eric Wong
2016-12-12  4:05 ` Sam Saffron
2016-12-12  5:48   ` Eric Wong
2016-12-12  9:49 ` hukl
2017-02-08 20:00 ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).