From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on dcvr.yhbt.net X-Spam-Level: * X-Spam-ASN: AS14383 205.234.109.0/24 X-Spam-Status: No, score=1.0 required=3.0 tests=AWL,HK_RANDOM_FROM, MSGID_FROM_MTA_HEADER shortcircuit=no autolearn=no version=3.3.2 Path: news.gmane.org!not-for-mail From: Eric Wong Newsgroups: gmane.comp.lang.ruby.rainbows.general Subject: [RFC] thread_timeout middleware safety Date: Thu, 21 Apr 2011 15:52:53 -0700 Message-ID: <20110421225253.GA23446@dcvr.yhbt.net> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: dough.gmane.org 1303427007 1876 80.91.229.12 (21 Apr 2011 23:03:27 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 21 Apr 2011 23:03:27 +0000 (UTC) To: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Original-X-From: rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Fri Apr 22 01:03:19 2011 Return-path: Envelope-to: gclrrg-rainbows-talk@m.gmane.org X-Original-To: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Delivered-To: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Content-Disposition: inline User-Agent: Mutt/1.5.18 (2008-05-17) X-BeenThere: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Errors-To: rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Xref: news.gmane.org gmane.comp.lang.ruby.rainbows.general:222 Archived-At: Received: from rubyforge.org ([205.234.109.19]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QD2u6-0001In-KO for gclrrg-rainbows-talk@m.gmane.org; Fri, 22 Apr 2011 01:03:19 +0200 Received: from rubyforge.org (rubyforge.org [127.0.0.1]) by rubyforge.org (Postfix) with ESMTP id 30CF61858346; Thu, 21 Apr 2011 19:03:17 -0400 (EDT) Received: from dcvr.yhbt.net (dcvr.yhbt.net [64.71.152.64]) by rubyforge.org (Postfix) with ESMTP id 568FE18582EE for ; Thu, 21 Apr 2011 18:52:54 -0400 (EDT) Received: from localhost (unknown [127.0.2.5]) by dcvr.yhbt.net (Postfix) with ESMTP id CF4761F577; Thu, 21 Apr 2011 22:52:53 +0000 (UTC) I've rewritten the code in rainbows.git so it's hopefully safe. I would very much appreciate an independent code review and comments. Thread#raise is difficult to use, but I think I managed to use it safely (at least for all MRI). Unfortunately, I do not believe the way I implemented timeouts can be used to implement timeout.rb in Ruby-core. timeout.rb supports nesting while this module does not support nesting. I just pushed the below patch out to annotate all the internal behavior to make life easier for potential reviewers. The latest version of the file is here: http://bogomips.org/rainbows.git/plain/lib/rainbows/thread_timeout.rb Or: git clone git://bogomips.org/rainbows.git cd rainbows $EDITOR lib/rainbows/thread_timeout.rb >>From f6471ed9cafe6e65e72fa9486ecdcc4b2f8d1373 Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Thu, 21 Apr 2011 15:38:20 -0700 Subject: [PATCH] thread_timeout: annotate as much as possible This should make code review easier. --- lib/rainbows/thread_timeout.rb | 92 ++++++++++++++++++++++++++++++++++++++-- 1 files changed, 88 insertions(+), 4 deletions(-) diff --git a/lib/rainbows/thread_timeout.rb b/lib/rainbows/thread_timeout.rb index 3d28daf..4f62aba 100644 --- a/lib/rainbows/thread_timeout.rb +++ b/lib/rainbows/thread_timeout.rb @@ -29,14 +29,35 @@ require 'thread' # # Timed-out requests will cause this middleware to return with a # "408 Request Timeout" response. - +# +# == Caveats +# +# Badly-written C extensions may not be timed out. Audit and fix +# (or remove) those extensions before relying on this module. +# +# Do NOT, under any circumstances nest and load this in +# the same middleware stack. You may load this in parallel in the +# same process completely independent middleware stacks, but DO NOT +# load this twice so it nests. Things will break! +# +# This will behave badly if system time is changed since Ruby +# does not expose a monotonic clock for users, so don't change +# the system time while this is running. All servers should be +# running ntpd anyways. class Rainbows::ThreadTimeout # :stopdoc: + # + # we subclass Exception to get rid of normal StandardError rescues + # in app-level code. timeout.rb does something similar ExecutionExpired = Class.new(Exception) - NEVER = Time.at(0x7fffffff) # MRI 1.8 won't be usable in January, 2038 + + # The MRI 1.8 won't be usable in January 2038, we'll raise this + # when we eventually drop support for 1.8 (before 2038, hopefully) + NEVER = Time.at(0x7fffffff) def initialize(app, opts) + # @timeout must be Numeric since we add this to Time @timeout = opts[:timeout] Numeric === @timeout or raise TypeError, "timeout=#{@timeout.inspect} is not numeric" @@ -48,56 +69,119 @@ class Rainbows::ThreadTimeout @threshold < 0 and @threshold += Rainbows.server.worker_connections end @app = app + + # This is the main datastructure for communicating Threads eligible + # for expiration to the watchdog thread. If the eligible thread + # completes its job before its expiration time, it will delete itself + # @active. If the watchdog thread notices the thread is timed out, + # the watchdog thread will delete the thread from this hash as it + # raises the exception. + # + # key: Thread to be timed out + # value: Time of expiration @active = {} + + # Protects all access to @active. It is important since it also limits + # safe points for asynchronously raising exceptions. @lock = Mutex.new + + # There is one long-running watchdog thread that watches @active and + # kills threads that have been running too long + # see start_watchdog @watchdog = nil end + # entry point for Rack middleware def call(env) + # Once we have this lock, we ensure two things: + # 1) there is only one watchdog thread started + # 2) we can't be killed once we have this lock, it's unlikely + # to happen unless @timeout is really low and the machine + # is very slow. @lock.lock + + # we're dead if anything in the next two lines raises, but it's + # highly unlikely that they will, and anything such as NoMemoryError + # is hopeless and we might as well just die anyways. + # initialize guarantees @timeout will be Numeric start_watchdog(env) unless @watchdog @active[Thread.current] = Time.now + @timeout + begin + # It is important to unlock inside this begin block + # Mutex#unlock really can't fail here since we did a successful + # Mutex#lock before @lock.unlock + + # Once the Mutex was unlocked, we're open to Thread#raise from + # the watchdog process. This is the main place we expect to receive + # Thread#raise. @app is of course the next layer of the Rack + # application stack @app.call(env) ensure + # I's still possible to receive a Thread#raise here from + # the watchdog, but that's alright, the "rescue ExecutionExpired" + # line will catch that. @lock.synchronize { @active.delete(Thread.current) } + # Thread#raise no longer possible here end rescue ExecutionExpired + # If we got here, it's because the watchdog thread raised an exception + # here to kill us. The watchdog uses @active.delete_if with a lock, + # so we guaranteed it's [ 408, { 'Content-Type' => 'text/plain', 'Content-Length' => '0' }, [] ] end + # The watchdog thread is the one that does the job of killing threads + # that have expired. def start_watchdog(env) @watchdog = Thread.new(env["rack.logger"]) do |logger| begin if @threshold - # "active.size" is atomic in MRI 1.8 and 1.9 + # Hash#size is atomic in MRI 1.8 and 1.9 and we + # expect that from other implementations. + # + # Even without a memory barrier, sleep(@timeout) vs + # sleep(@timeout - time-for-SMP-to-synchronize-a-word) + # is too trivial to worry about here. sleep(@timeout) while @active.size < @threshold end next_expiry = NEVER + + # We always lock access to @active, so we can't kill threads + # that are about to release themselves from the eye of the + # watchdog thread. @lock.synchronize do now = Time.now @active.delete_if do |thread, expire_at| + # We also use this loop to get the maximum possible time to + # sleep for if we're not killing the thread. if expire_at > now next_expiry = expire_at if next_expiry > expire_at false else + # Terminate execution and delete this from the @active thread.raise(ExecutionExpired) true end end end + # We always try to sleep as long as possible to avoid consuming + # resources from the app. So that's the user-configured @timeout + # value. if next_expiry == NEVER sleep(@timeout) else + # sleep until the next known thread is about to expire. sec = next_expiry - Time.now sec > 0.0 ? sleep(sec) : Thread.pass # give other threads a chance end rescue => e + # just in case logger.error e - end while true + end while true # we run this forever end end # :startdoc: -- Eric Wong _______________________________________________ Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org http://rubyforge.org/mailman/listinfo/rainbows-talk Do not quote signatures (like this one) or top post when replying