From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,RCVD_IN_DNSWL_LOW,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from mail-wm0-x22e.google.com (mail-wm0-x22e.google.com [IPv6:2a00:1450:400c:c09::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 4A28D20257 for ; Wed, 22 Feb 2017 20:09:41 +0000 (UTC) Received: by mail-wm0-x22e.google.com with SMTP id v77so10092484wmv.0 for ; Wed, 22 Feb 2017 12:09:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shopify.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=vVmKLoHe4Vl5ORVMjWVwU4kGSW8oo4yEcLhU1kHEpbg=; b=AQig6Y1biBKl9zHJbJAB1vtnds6zybCs+cMzLpp1GncxEr0ew3V+lisyY4CiRC++gg CUYn67kFNA/YQX/zoQOzQKdPxtEdgEyxoRxKx6mTgAPb4WKKWwNUAp4XckHUumGKnsoQ 7k09kKgR6LhPeXW77EuaZHQo3wd1jisTgSCi0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=vVmKLoHe4Vl5ORVMjWVwU4kGSW8oo4yEcLhU1kHEpbg=; b=RGpWpQZSXunLZSvN1oZx1vCfchnn8Zk7mIT9GN0ImullzJrrP6Eni4q9j373sdj49F 98Ghymn0B9pNwevz/swvFL2AStqmGU3OM2Ys/GpZxRT0wuTMBHqT+tKk43AmwZqYeiT0 eGGWbuG13UhZnc4rwJ3qWqAdigNMs4oML/vlu7Ra2ZGrNoWF+M6mcN49mfBXS4gL11CY S6boWhDSXs2dZ9ap8XQeyQF57RNYrx47wFDEbqjiFdut96AY1fyhqyGkbxnAXY8CdUMZ b1XMl3JW/r9tZ5tAZ0Qq4wvkfmpQfYm8ZerZbEliBroIIVlQTz256e25Vz3+sgb0VapP wY8w== X-Gm-Message-State: AMke39kNCJipjVlN8m9o43CEAx0z9gLssOGQ8KoPb2ZWaDkMQtflchbTSxfYOr2S6wL4whcq4GLCbWsKhhCRVHz8 X-Received: by 10.28.148.66 with SMTP id w63mr49714wmd.43.1487794179250; Wed, 22 Feb 2017 12:09:39 -0800 (PST) MIME-Version: 1.0 Received: by 10.28.224.197 with HTTP; Wed, 22 Feb 2017 12:09:38 -0800 (PST) In-Reply-To: <20170222183352.GA28771@starla> References: <20170222183352.GA28771@starla> From: Simon Eskildsen Date: Wed, 22 Feb 2017 15:09:38 -0500 Message-ID: Subject: Re: check_client_connection using getsockopt(2) To: Eric Wong Cc: unicorn-public@bogomips.org, raindrops-public@bogomips.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable List-Id: On Wed, Feb 22, 2017 at 1:33 PM, Eric Wong wrote: > Simon Eskildsen wrote: > > great to know it's still working after all these years :> > >> This confirms Eric's comment that the existing >> `check_client_connection` works perfectly on loopback, but as soon as >> you put an actual network between the Unicorn and client=E2=80=94it's on= ly >> effective 20% of the time, even with `TCP_NODELAY`. I'm assuming, due >> to buffering, even when disabling Nagle's. As we're changing our >> architecture, we move from ngx (lb) -> ngx (host) -> unicorn to ngx >> (lb) -> unicorn. That means this patch will no longer work for us. > > Side comment: I'm a bit curious how you guys arrived at removing > nginx at the host level (if you're allowed to share that info) > > I've mainly kept nginx on the same host as unicorn, but used > haproxy or similar (with minimal buffering) at the LB level. > That allows better filesystem load distribution for large uploads. Absolutely. We're starting to experiment with Kubernetes, and a result we're interested in simplifying ingress as much as possible. We could still run them, but if I can avoid explaining why they're there for the next 5 years=E2=80=94I'll be happy :) As I see, the current use-cases w= e have for the host nginx are (with why we can get rid of it): * Keepalive between edge nginx LBs and host LBs to avoid excessive network traffic. This is not a deal-breaker, so we'll just ignore this problem. It's a _massive_ amount of traffic normally though. * Out of rotation. We take nodes gracefully out of rotation by touching a file that'll return 404 on a health-checking endpoint from the edge LBs. Kubernetes implements this by just removing all the containers on that node. * Graceful deploys. When we deploy with containers, we take each container out of rotation with the local host Nginx, wait for it to come up, and put it back in rotation. We don't utilize Unicorns graceful restarts because we want a Ruby upgrade deploy (replacing the container) to be the same as an updated code rollout. * File uploads. As you mention, currently we load-balance this between them. I haven't finished the investigation on whether this is feasible on the front LBs. Otherwise we may go for a 2-tier Nginx solution or expand the edge. However, with Kubernetes I'd really like to avoid having host nginxes. It's not very native to the Kubernetes paradigm. Does it balance across the network, or only to the pods on that node? * check_client_connection working. :-) This thread. Slow clients and other advantages we find from the edge Nginxes. >> I propose instead of the early `write(2)` hack, that we use >> `getsockopt(2)` with the `TCP_INFO` flag and read the state of the >> socket. If it's in `CLOSE_WAIT` or `CLOSE`, we kill the connection and >> move on to the next. >> >> https://github.com/torvalds/linux/blob/8fa3b6f9392bf6d90cb7b908e07bd9016= 6639f0a/include/uapi/linux/tcp.h#L163 >> >> This is not going to be portable, but we can do this on only Linux >> which I suspect is where most production deployments of Unicorn that >> would benefit from this feature run. It's not useful in development >> (which is likely more common to not be Linux). We could also introduce >> it under a new configuration option if desired. In my testing, this >> works to reject 100% of requests early when not on loopback. > > Good to know about it's effectiveness! I don't mind adding > Linux-only features as long as existing functionality still > works on the server-oriented *BSDs. > >> The code is essentially: >>)? >> def client_closed? >> tcp_info =3D socket.getsockopt(Socket::SOL_TCP, Socket::TCP_INFO) >> state =3D tcp_info.unpack("c")[0] >> state =3D=3D TCP_CLOSE || state =3D=3D TCP_CLOSE_WAIT >> end > > +Cc: raindrops-public@bogomips.org -> https://bogomips.org/raindrops-publ= ic/ > > Fwiw, raindrops (already a dependency of unicorn) has TCP_INFO > support under Linux: > > https://bogomips.org/raindrops.git/tree/ext/raindrops/linux_tcp_info.c > > I haven't used it, much, but FreeBSD also implements a subset of > this feature, nowadays, and ought to be supportable, too. We > can look into adding it for raindrops. Cool, I think it makes sense to use Raindrops here, advantage being it'd use the actual struct instead of a sketchy `#unpack`. I meant to ask, in Raindrops why do you use the netlink API to get the socket backlog instead of `getsockopt(2)` with `TCP_INFO` to get `tcpi_unacked`? (as described in http://www.ryanfrantz.com/posts/apache-tcp-backlog/) We use this to monitor socket backlogs with a sidekick Ruby daemon. Although we're looking to replace it with a middleware to simplify for Kubernetes. It's one of our main metrics for monitoring performance, especially around deploys. > > I don't know about other BSDs... > >> This could be called at the top of `#process_client` in `http_server.rb`= . >> >> Would there be interest in this upstream? Any comments on this >> proposed implementation? Currently, we're using a middleware with the >> Rack hijack API, but this seems like it belongs at the webserver >> level. > > I guess hijack means you have to discard other middlewares and > the normal rack response handling in unicorn? If so, ouch! > > Unfortunately, without hijack, there's no portable way in Rack > to get at the underlying IO object; so I guess this needs to > be done at the server level... > > So yeah, I guess it belongs in the webserver. I was going to use `env["unicorn.socket"]`/`env["puma.socket"]`, but you could also do `env.delete("hijack_io")` after hijacking to allow Unicorn to still render the response. Unfortunately the `.socket` key is not part of the Rack standard, so I'm hesitant to use it. When this gets into Unicorn I'm planning to propose it upstream to Puma as well. > > I think we can automatically use TCP_INFO if available for > check_client_connection; then fall back to the old early write > hack for Unix sockets and systems w/o TCP_INFO (which would set > the response_start_sent flag). > > No need to introduce yet another configuration option. Cool. How would you suggest I check for TCP_INFO compatible platforms in Unicorn? Is `RUBY_PLATFORM.ends_with?("linux".freeze)` sufficient or do you prefer another mechanism? I agree that we should fall back to the write hack on other platforms.