Alex Y. [email protected] writes:
I’m not sure whether this is a bug:
irb(main):013:0> a = URI.parse(“http://www.example.com/foo/bar?a=b”)
=> #<URI::HTTP:0xfdbb6d160 URL:http://www.example.com/foo/bar?a=b>
irb(main):014:0> b = URI.parse(“?a=c”)
=> #<URI::Generic:0xfdbb6b770 URL:?a=c>
irb(main):015:0> puts a.merge(b).to_s
http://www.example.com/foo/?a=c
I’ll note that although firefox agrees with your expectations, lynx
agrees with the behavior of the uri module.
To understand what the uri module is doing, look at this:
irb(main):001:0> require ‘uri’
=> true
irb(main):002:0> b = URI.parse(“?a=c”)
=> #<URI::Generic:0xfdbccb1d0 URL:?a=c>
irb(main):003:0> b.scheme
=> nil
irb(main):004:0> b.userinfo || b.host || b.port
=> nil
irb(main):005:0> b.path
=> “”
irb(main):006:0> b.query
=> “a=c”
irb(main):007:0> b.fragment
=> nil
That is, the scheme and authority portions of the uri are nil, but
the path is present, as the empty string. When merging an empty path
with the path “/foo/bar” , the uri module comes up with “/foo/”. Not
a totally unreasonable choice.
In fact, this is a bug, but not the one you think. “?a=b” is a
malformed relative URI. You should get a parse error trying to create
that.
According to RFC2396, a relative URI consists of (section 5, near the
bottom of pg. 17):
relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ]
rel_path = rel_segment [ abs_path ]
rel_segment = 1*( unreserved | escaped |
";" | "@" | "&" | "=" | "+" | "$" | "," )
See the 1* part? That means that a relative uri path segment must
consist of at least one character. An empty path segment is illegal.
(Note that uri references that begin with ‘#’ are covered in section 4
of the RFC, and match the rule “URI-reference” rather than the rule
“relativeURI”)
Now, given that the URI module does indeed accept relative URIs like
this, perhaps we should redefine URI merging for these pathological
cases so that the URI module behaves as some particular well-known
browser does:
module URI
class Generic
def merge_like(browser, other)
if !other.absolute? and other.path and other.path.empty? and
not (other.userinfo || other.host || other.port) then
case browser
when :firefox, :netscape
other = other.dup
other.path = self.path
when :ie, :microsoft, :links
other = other.dup
if other.query || other.fragment
other.path = self.path
else
other.path = ‘.’
end
when :lynx
# we’re good already, so we don’t need to do
# this, but let’s pass the real merge function
# valid relative uris anyway, okay?
other = other.dup
if other.query
other.path = ‘.’
else
other.path = self.path
end
else
# Could someone test how opera handles the three links on
# http://snowplow.org/martin/relative_uri_test.html ?
raise “Unhandled browser type #{browser}”
end
end
return merge(other)
end
end
end