When a symbol is defined, the memory used to store the symbol is
permanently lost. If one is parsing external input, this makes one’s
application vulnerable to DOS.
Secondarily, if while parsing external input, one refuses to make new
symbols blindly, then the symbol list is something over which one has
direct control, and it can be trusted in some situations to speed
processing.
I see that some of the core team appear to hang out here, so I thought I
would bring it up here.
Certainly, if I were to optimize things, I would assume that the list of
symbols is append only. Then I would put strings of the symbols in a
set and add in the new ones as found for each call. After the first
call, this would likely be pretty cheap, but there must be similar
functionality already in place at the ‘C’ level, so this is really a
request to exposed that as part of the class. (And avoid doubling the
amount of memory used by symbols!)
While on the topic, I have a related question about this Symbol DOS
attack
vector: Can’t an upper limit be put on the size of the symbols table,
and
if it is exceeded, then an error is raised? Wouldn’t that alone be
sufficient to neuter such an attack?
While on the topic, I have a related question about this Symbol DOS attack
vector: Can’t an upper limit be put on the size of the symbols table, and
if it is exceeded, then an error is raised? Wouldn’t that alone be
sufficient to neuter such an attack?
Or, rather than error, just flush a bunch. If they’re needed, they’ll
come
back. If not, no loss. I’m sure symbol creation isn’t that expensive.
When a symbol is defined, the memory used to store the symbol is
permanently lost. If one is parsing external input, this makes one’s
application vulnerable to DOS.
Secondarily, if while parsing external input, one refuses to make new
symbols blindly, then the symbol list is something over which one has
direct control, and it can be trusted in some situations to speed
processing.
I see.
If there’s a logical distinction between externally- and
internally-defined
symbols, you could override the entrypoint (your deserialiser or
whatever)
to build a hash of String=>Symbol pairs. That way instead of using Symbol.all_symbols.any?{|sym| sym.to_s == string} you could use my_hash.has_key? string. Not sure how you’d ever populate said hash,
though. Trusted entrypoints or something.
However if you want to reuse existing symbols you’d have to have a way
to
prepopulate and continuously update the hash. I can think of a bunch of
klugey ways to get it to work, but I’m not proud of any of them.
I imagine it should be relatively easy* to define a new native singleton
method defined? on Symbol… There’s obviously a legitimate use-case;
I
think it would be worth making a feature request for this.
When a symbol is defined, the memory used to store the symbol is
permanently lost. If one is parsing external input, this makes one’s
application vulnerable to DOS.
Secondarily, if while parsing external input, one refuses to make new
symbols blindly, then the symbol list is something over which one has
direct control, and it can be trusted in some situations to speed
processing.
I don’t believe this to be such a big deal: if you parse external data
and you do not know how many different strings there are of a kind you
would not use symbols anyway. Symbols make most sense for a fixed set
of values - similarly to an enum.
Also, there can also be DOS if external data is parsed and all the
Strings are stored somewhere during the import (e.g. as Hash keys)
which is quite a common scenario. If there are more Strings than fit
into memory the program will crash as well.
DOS does not occur with strings because strings can be garbage
collected. Symbols are forever.
I am very well aware of that. Still the fact remains that you can
create a DOS with any external data if the data set is large enough
and the processing does not take that possibility into account. There
is nothing really special about Symbols here - as I have pointed out
earlier. (And. btw., you did not argue against that.) It is the way
input from external sources is read. The choice to use Symbols for
data with large variance is just one of many decisions that can do
harm to an application.
Certainly if you accept arbitrary user input for parsing, you have an
automatic DOS vector by dint of sending a very large packet. Fine.
But if someone can make a thousand connections, and over the course of
the thousand connections PERMANENTLY chew up 100k of member per
connection, you start of have a problem of a very different sort.
It is in that sense–the sense of a memory leak–that symbols are
different in this regard.
And before you come back with “don’t do that”, remember that the ability
to create arbitrary objects is a prime feature of YAML. There needs to
be a way to scope that feature, and this is one option.
On Fri, Feb 8, 2013 at 12:54 AM, Student Jr [email protected]
wrote:
different in this regard.
And before you come back with “don’t do that”, remember that the ability
to create arbitrary objects is a prime feature of YAML. There needs to
be a way to scope that feature, and this is one option.
I’m running into something now with an API that converts XML to a
nested Hash with symbol keys via Savon. At some point, we’re going to
be getting near 5000 items in these XML responses. It’s not direly
problematic for this particular case, as this is something that gets
called infrequently at that rate, the XML is a response to a request
on our end (i.e. is not open to the wild wild internet), and is in a
self-contained job so it never permanently eats up memory, but it does
give me pause.
It’s not just rails. Rails happens to be a hotbed of bad programming
style to be sure, but the utility of allowing users to specify symbols
is substantial. Allowing them to create symbols is a memory leak &
therefore a DOS vulnerability. Thus the idea.
Certainly if you accept arbitrary user input for parsing, you have an
automatic DOS vector by dint of sending a very large packet. Fine.
But if someone can make a thousand connections, and over the course of
the thousand connections PERMANENTLY chew up 100k of member per
connection, you start of have a problem of a very different sort.
That can be achieved with any bad coding.
It is in that sense–the sense of a memory leak–that symbols are
different in this regard.
Yes and no: yes, because Symbol has the property to aggregate in
memory, no because it is the programmer’s choice which allows bad
things to happen. You cannot simply for a YAML.load() on a program
via a network connection,
And before you come back with “don’t do that”, remember that the ability
to create arbitrary objects is a prime feature of YAML. There needs to
be a way to scope that feature, and this is one option.
OK, now we’re cooking! I can see where YAML is an issue because I
couldn’t find a way to customize Symbol deserialization for YAML. If
that way existed, fairly easy measures could be taken to prevent
excessive Symbol creation. For the time being one would have to patch
the library to prevent this DOS OR modify the input before throwing it
at YAML.load().
OTOH I cannot remember having read of a DOS via YAML Symbol
deserialization on this list.
Cheers
robert
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.