From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-3.3 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from mail-lj1-x232.google.com (mail-lj1-x232.google.com [IPv6:2a00:1450:4864:20::232]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 2DD481F5AE for ; Mon, 24 May 2021 19:00:40 +0000 (UTC) Received: by mail-lj1-x232.google.com with SMTP id o8so35041844ljp.0 for ; Mon, 24 May 2021 12:00:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=h58o/ldPNsM2k3XyveVigBFnLARAIDlh4qpcIbfO9jk=; b=RPpexlwD4k7yFnLGpu9SQcplGVIh7j/8kDo9B1jUuDkfYtFzSeCBJHhjOIVFZRmkbP HPlpxrT0HD8KL99swhLFoXhdlGIaOZQI2kyCKXmv1Kg8ODMSuJHc7isxAAuhZox9SCjS OiUacL41Z4xvgQaQLMINc1/Rp7/s27B/qNz0ccxnOTxNxUHFb8doO8D30RBEL5c34lTl uP7nKHfYyd3T+aNExQLAze5Um3ZCPRT8hXc+yL3w6m08Oj1qXkK9aFFfGZbDg1ekdipd 4f+r0vt1QT7uwroiizQzZMeA5e6g2naqB26uJcRjJ9rtNWvg/M3ffu6tuw2l3lIH5+Gt mfwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=h58o/ldPNsM2k3XyveVigBFnLARAIDlh4qpcIbfO9jk=; b=PWpkPfWr7kxP4cyeOyXJwU2kzMwCmXbSe1cWJzr+17IFazUuCLGQook3AGV4N69pTu i77P+ojKl39fygIxPOKEXD+d/4/uEwSsQP4vlCu5mju+PRg5MkF2j0k2dS4SqXXbx7hy 7uQ9/Oeep+XmK47vJhhexZMVMZCct0NRDxAJ3Ztv6xBw/OJgxLMgPZ9CFq6mtfk4NX07 Php7yfzLJxFE0tljVK1WNTvnzkQ2gIjbewWnNP7uKXs+uUxS+G75WYCBVHelt0XBCwyL UDAlgZGyWxgZ36bJh+PZWuppYUHbMexByPLhtlrB+jmMbXC9vs9KAbFki1zmQAyxYXtK S2lQ== X-Gm-Message-State: AOAM530s8v8Y/PiHxDOKwcg45jQnWlNAls2Qztk1JSAT9gTyXi/1bW7V kOmjW8IgfkMu6LssLJARRSmabWJJhyJVVsz0DbY= X-Google-Smtp-Source: ABdhPJyH/vTAOgE0TU7ugeJiKE233oFU62gmfgX8U0OFa1Fc70cFUMdMWsx1mk5bFOCn/F79105sz9RyeozhOZ2HaT0= X-Received: by 2002:a2e:502:: with SMTP id 2mr18133337ljf.384.1621882838262; Mon, 24 May 2021 12:00:38 -0700 (PDT) MIME-Version: 1.0 References: <20210524172208.GA7627@dcvr> In-Reply-To: <20210524172208.GA7627@dcvr> From: Ngan Pham Date: Mon, 24 May 2021 12:00:26 -0700 Message-ID: Subject: Re: Patch for GC.compact memory issue To: Eric Wong Cc: kgio-public@yhbt.net Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: Hey Eric, actually there's 2 more references that need to be marked. I've removed the comments since it's not really worth mentioning. Here's the final patch: diff --git a/ext/kgio/accept.c b/ext/kgio/accept.c index f1de39d..4bd9cfe 100644 --- a/ext/kgio/accept.c +++ b/ext/kgio/accept.c @@ -498,9 +498,12 @@ void init_kgio_accept(void) rb_define_const(mKgio, "SOCK_CLOEXEC", INT2NUM(SOCK_CLOEXEC)); localhost =3D rb_const_get(mKgio, rb_intern("LOCALHOST")); + rb_gc_register_mark_object(localhost); cKgio_Socket =3D rb_const_get(mKgio, rb_intern("Socket")); + rb_gc_register_mark_object(cKgio_Socket); cClientSocket =3D cKgio_Socket; mSocketMethods =3D rb_const_get(mKgio, rb_intern("SocketMethods")); + rb_gc_register_mark_object(mSocketMethods); rb_define_method(mSocketMethods, "kgio_addr!", addr_bang, 0); Twitter conversation: @tenderlove: It introduces a new function `GC.compact` which we added to the `before_fork` Unicorn hook @nganpham: Have you tried this on Unicorn 6.0? We seem to be getting memory corruption. Specifically, the `env` hash is getting unexpected data. More specifically, `env["REMOTE_ADDR"]` is resolving to random non-string objects. @nganpham: This was the big change: https://github.com/defunkt/unicorn/commit/c917ac526df214611ec33c21de2b07045= 2ec8434 @tenderlove: Can you send me a repro? I haven't heard of anything like it @nganpham: Not at the moment :-(. We're only seeing it happen pretty randomly...1 pod out of ~100 pods will exhibit this problem. I'm trying to reproduce it. Will keep you posted. @nganpham: No reproduction yet...but I've been digging into kgio and noticed `addr` is never rb_gc_mark'd: https://yhbt.net/kgio.git/tree/ext/kgio/accept.c#n25. maybe GC.compact in before_fork botches the pointer? https://github.com/defunkt/unicorn/blob/master/lib/unicorn/http_request.rb#= L74=E2=80=A6 seems to return the same "random" object for subsequent requests. @tenderlove: Hmm. `addr` isn=E2=80=99t a Ruby object, so it needn=E2=80=99t= be marked. I would find the implementation of `kgio_addr`. It=E2=80=99s probably point= ing at a global that doesn=E2=80=99t get pinned. @nganpham: So `kgio_addr` always returns the value of `Kgio::LOCALHOST`, which is defined in Ruby: https://yhbt.net/kgio.git/tree/lib/kgio.rb#n10. It's then assigned on `accept` with `rb_const_get`: https://yhbt.net/kgio.git/tree/ext/kgio/accept.c#n500. Pinning needed? Or something has overwritten that global constant? @nganpham: Yea, I think that's it. Ptr to `localhost` is only resolved once at boot: https://yhbt.net/kgio.git/tree/ext/kgio/accept.c#n500 but doesn't get pinned. If object is moved during compact, `localhost` now references a different Ruby obj: https://yhbt.net/kgio.git/tree/ext/kgio/accept.c#n220 @tenderlove: Yep, that's the problem. The localhost C global needs to be pinned. Just pass it to `rb_gc_register_mark_object` and that should solve the issue. Thanks for finding this!! @nganpham: Downgrading to Unicorn 5 seems to mitigated the issue for us. My hypothesis is that in 5, `Unicorn::http://HttpRequest.new` is allocated much earlier in the process so its internals are less likely to be moved during compact. In 6, it=E2=80=99s initialized on request. @nganpham: Regarding actually fixing this on kgio, do you have contacts to maintainer(s)? Or should I follow the instructions on the (super confusing) site? https://yhbt.net/kgio/ @tenderlove: Unfortunately I don't have any better contact with the author than you. Though if you follow the instructions I'm sure he'll help. My only advice is to make very sure you send a plain text email. Alternatively I can send a patch tomorrow (up to you just lmk!) @nganpham: I can take a crack it in the morning (will lyk). Also, I=E2=80= =99ve confirmed that this was affecting Unicorn 5, just much less so. Not sure what version you=E2=80=99re running on, but I=E2=80=99d be curious to = know if you=E2=80=99re seeing `NoMethodError`s. @nganpham: Patch submitted: https://yhbt.net/kgio-public/CAAvYYt480L4izvVmLnOvQCe5N=3Dbq3cBZ=3DFrDyQoo= =3DjD3e0qfsg@mail.gmail.com/T/#u @tenderlove: I think the other "const_gets" in that file probably need the same treatment. https://yhbt.net/kgio.git/tree/ext/kgio/accept.c#n501 and https://yhbt.net/kgio.git/tree/ext/kgio/accept.c#n503 If a constant is defined in pure Ruby (not with rb_define_class), then it's also allowed to move. @nganpham: oof! You're right! I will amend the patch. Good catch. Thanks, Ngan On Mon, May 24, 2021 at 10:22 AM Eric Wong wrote: > > Ngan Pham wrote: > > Hello, I'd like to submit a patch to fix an issue after running > > `GC.compact`. The LOCALHOST variable is defined in Ruby and is > > resolved and memoized (to `localhost`) in C once. However, the pointer > > in C is not GC marked. Running `GC.compact` potentially moves the > > `LOCALHOST` object in Ruby causing `localhost` to point to a bad or > > empty slot. This causes `env['REMOTE_ADDR']` to a bad object in > > Unicorn. > > Seems to make sense. > > > Some useful links: > > - `GC.compact` introduced: https://bugs.ruby-lang.org/issues/15626. > > - My conversation with tenderlove over twitter on this issue: > > https://twitter.com/tenderlove/status/1396625048861954049 > > Can you please copy+paste the conversation here? > Accessibility-challenged users such as myself cannot > read it without JavaScript. Thanks.