unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
From: Gary Grossman <gary.grossman@gmail.com>
To: e@80x24.org
Cc: unicorn-public@bogomips.org, michael@grosser.it
Subject: Re: Please move to github
Date: Sat, 2 Aug 2014 00:46:11 -0700	[thread overview]
Message-ID: <CACNmK=PCZmNxS-3KgBKJzqiN=7UmZ_HQi22ZjtrgD60REtrAZA@mail.gmail.com> (raw)

Hi Eric,

I work with Michael, and this discussion sure got off on the
wrong foot... we love unicorn and use it heavily, and just
want to contribute back to it.

To detail the encoding problem we were trying to fix, unicorn
uses rb_str_new in several places to create Ruby strings.
For Ruby 1.9 and later, these strings are assigned ASCII-8BIT
encoding.

While the Rack specification doesn't dictate what encoding
should be used for strings in the environment, many
developers would probably expect the default external encoding
setting in Encoding.default_external to be used.

Many Rails applications use UTF8 heavily. The use of ASCII-8BIT
in the env can lead to Encoding::CompatibilityErrors being
raised when a UTF8 string and ASCII-8BIT string are concatenated,
which happens frequently when properties like request.url are
referenced in erb templates. To get around these problems,
an app would have to force encoding on the strings in the env
manually. It seems a shame to do this in slower Ruby code when
it could be done up front by unicorn.

We'd like to propose that unicorn use rb_external_str_new to
make strings instead of rb_str_new.

Perhaps you have your reasons for continuing to use rb_str_new
but we figured we'd run this by you.

Here's a proposed patch.

From befb01530c8d930ba53cc58b979ddf42a4c12565 Mon Sep 17 00:00:00 2001
From: Gary Grossman <gary.grossman@gmail.com>
Date: Sat, 2 Aug 2014 00:19:30 -0700
Subject: [PATCH] If unicorn is used with Ruby 1.9 or later, use
 rb_external_str_new instead of rb_str_new to create strings. The resulting
 strings will use the default external encoding. Continue using rb_str_new
for
 older versions of Ruby.

Using the default external encoding instead of ASCII-8BIT for
strings is more in line with developer expectations and will cause
less unexpected bugs such as Encoding::CompatibilityErrors which
result when, say, a UTF8 string and ASCII-8BIT string are
concatenated together.

Added a unit test to ensure that strings returned in the Rack
environment conform to the default external encoding.
---
 ext/unicorn_http/ext_help.h |  6 ++++++
 test/unit/test_request.rb   | 13 +++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/ext/unicorn_http/ext_help.h b/ext/unicorn_http/ext_help.h
index c87c272..6806f8e 100644
--- a/ext/unicorn_http/ext_help.h
+++ b/ext/unicorn_http/ext_help.h
@@ -79,4 +79,10 @@ static int str_cstr_case_eq(VALUE val, const char *ptr,
long len)
 #define STR_CSTR_CASE_EQ(val, const_str) \
   str_cstr_case_eq(val, const_str, sizeof(const_str) - 1)

+#ifdef HAVE_RUBY_ENCODING_H
+/* Use default external encoding for strings for Ruby 1.9+,
+ * fall back to rb_str_new when unavailable */
+#define rb_str_new rb_external_str_new
+#endif
+
 #endif /* ext_help_h */
diff --git a/test/unit/test_request.rb b/test/unit/test_request.rb
index fbda1a2..0a105e0 100644
--- a/test/unit/test_request.rb
+++ b/test/unit/test_request.rb
@@ -179,4 +179,17 @@ class RequestTest < Test::Unit::TestCase
     env['rack.input'].rewind
     res = @lint.call(env)
   end
+
+  def test_encoding
+    if ''.respond_to?(:encoding)
+      client = MockRequest.new("GET http://e:3/x?y=z HTTP/1.1\r\n" \
+                               "Host: foo\r\n\r\n")
+      env = @request.read(client)
+      encoding = Encoding.default_external
+      assert_equal encoding, env['REQUEST_PATH'].encoding
+      assert_equal encoding, env['PATH_INFO'].encoding
+      assert_equal encoding, env['QUERY_STRING'].encoding
+    end
+  end
+
 end
-- 
1.9.1

Gary


             reply	other threads:[~2014-08-02  7:46 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-02  7:46 Gary Grossman [this message]
  -- strict thread matches above, loose matches on Subject: below --
2014-08-02  7:51 Please move to github Gary Grossman
2014-08-02  7:54 ` Kapil Israni
2014-08-02  8:02   ` Eric Wong
2014-08-02  8:50 ` Eric Wong
2014-08-02 19:07   ` Gary Grossman
2014-08-02 19:33     ` Michael Fischer
2014-08-04  7:22       ` Hongli Lai
2014-08-02 20:15     ` Eric Wong
2014-08-01 22:32 Victor Kmita
2014-08-01 20:27 Michael Grosser
2014-08-01 21:32 ` Eric Wong
2014-08-01 22:27   ` Michael Grosser
2014-08-01 22:41     ` Eric Wong
     [not found]       ` <CAAms34NByMcXnbGQ3DvCZTsQuh6yBn8sMSNeTi7Ss4VmmYoOrQ@mail.gmail.com>
2014-08-01 23:07         ` Eric Wong
2014-08-01 22:48     ` Gabe da Silveira
2014-08-01 23:09       ` Xavier Noria
2014-08-01 23:12         ` Michael Grosser
2014-08-01 23:24           ` Eric Wong
2014-08-01 23:26           ` Daniel Evans
2014-08-01 23:38             ` Michael Grosser
2014-08-01 23:18         ` Aaron Suggs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://yhbt.net/unicorn/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACNmK=PCZmNxS-3KgBKJzqiN=7UmZ_HQi22ZjtrgD60REtrAZA@mail.gmail.com' \
    --to=gary.grossman@gmail.com \
    --cc=e@80x24.org \
    --cc=michael@grosser.it \
    --cc=unicorn-public@bogomips.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).