Some issues in the HTTP parser (and suggestions)

Eric Wong normalperson at yhbt.net
Fri May 7 14:33:28 EDT 2010


Iñaki Baz Castillo <ibc at aliax.net> wrote:
> Hi, by inspecting the Ragel grammar of the HTTP parser (coming from
> Mongrel) I've realized of some possible issues and bugs:
> 
> 
> hostname = (alnum | "-" | "." | "_")+;
> 
> - It doesn't allow IPv6. This is important IMHO.
> - It allows "_" which is an invalid symbol (not valid for a domain).
> 
> I suggest:
>   hostname = (alnum | "-" | "." | "[" | "]" | ":")+;

Hi Iñaki,

Underscore isn't valid for hostnames, but it is allowed in domain names
and most DNS servers will resolve them.  I've personally seen websites
with underscores in their domain names in the wild[1].

We'll have to test the IPv6 addresses and probably split that out into a
separate regexp since ":" would raise issues with the port number in
existing cases.  This is probably something for post-1.0.

> ------------------
> 
> host_with_port = (hostname (":" digit*)?) >mark %host;
> 
> - It allows something ugly as "mydomain.org:"
> 
> I suggest:
>   host_with_port = (hostname (":" digit{1,5})?) >mark %host;

It's ugly, but section 3.2.2 of RFC 2396 appears to allows it.

> ------------------
> 
> message_header = ((field_name ":" " "* field_value)|value_cont) :> CRLF;
> 
> - It doesn't allow valid spaces before ":" as:
>      Host : mydomain.org

Spaces before the ":" aren't allowed in rfc2616, and I have yet to see
evidence of clients sending headers like this in ~4 years of using this
parser.

> - Tabulators are also allowed.
> 
> I suggest:
>   message_header = ((field_name [ \t]* ":" [ \t]*
> field_value)|value_cont) :> CRLF;

I just pushed this out to unicorn.git to allow horizontal tabs:

>From 935912a422cabfd323f9b4ff268ded09a2ebe7a6 Mon Sep 17 00:00:00 2001
From: Eric Wong <normalperson at yhbt.net>
Date: Fri, 7 May 2010 18:20:49 +0000
Subject: [PATCH] http: allow horizontal tab as leading whitespace in header values
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This is allowed by RFC 2616, section 2.2, where spaces and
horizontal tabs are counted as linear white space and linear
white space (not just regular spaces) may prefix field-values
(section 4.2).

This has _not_ been a real issue in ~4 years of using this
parser (starting with Mongrel) with clients in the wild.

Thanks to Iñaki Baz Castillo for pointing this out.
---
 ext/unicorn_http/unicorn_http_common.rl |    2 +-
 test/unit/test_http_parser.rb           |    8 ++++++++
 2 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/ext/unicorn_http/unicorn_http_common.rl b/ext/unicorn_http/unicorn_http_common.rl
index 6fca604..f165e3f 100644
--- a/ext/unicorn_http/unicorn_http_common.rl
+++ b/ext/unicorn_http/unicorn_http_common.rl
@@ -54,7 +54,7 @@
 
   value_cont = lws+ any* >start_value %write_cont_value;
 
-  message_header = ((field_name ":" " "* field_value)|value_cont) :> CRLF;
+  message_header = ((field_name ":" lws* field_value)|value_cont) :> CRLF;
   chunk_ext_val = token*;
   chunk_ext_name = token*;
   chunk_extension = ( ";" " "* chunk_ext_name ("=" chunk_ext_val)? )*;
diff --git a/test/unit/test_http_parser.rb b/test/unit/test_http_parser.rb
index 5b0ca9f..b7c8a1c 100644
--- a/test/unit/test_http_parser.rb
+++ b/test/unit/test_http_parser.rb
@@ -51,6 +51,14 @@ class HttpParserTest < Test::Unit::TestCase
     assert parser.keepalive?
   end
 
+  def test_tab_lws
+    parser = HttpParser.new
+    req = {}
+    tmp = "GET / HTTP/1.1\r\nHost:\tfoo.bar\r\n\r\n"
+    assert_equal req.object_id, parser.headers(req, tmp).object_id
+    assert_equal "foo.bar", req['HTTP_HOST']
+  end
+
   def test_connection_close_no_ka
     parser = HttpParser.new
     req = {}
-- 

[1] - and those were wild NSFW sites, of course :>

-- 
Eric Wong


More information about the mongrel-unicorn mailing list