Symbol.defined?

student · February 6, 2013, 2:27am

Symbol deserialization from external sources is now known to be
problematic. Is it also quite useful.

Here’s what I want, kinda:

class Symbol
def self.defined?(string)
all_symbols.any?{|sym| sym.to_s == string}
end
end

Unfortunately, this is REALLY going to be slow. How about native?

student · February 6, 2013, 3:11am

On 6 February 2013 11:28, Student Jr [email protected] wrote:

Unfortunately, this is REALLY going to be slow. How about native?

I might be missing something, but how is string.to_sym problematic?

–
Matthew K., B.Sc (CompSci) (Hons)
http://matthew.kerwin.net.au/
ABN: 59-013-727-651

“You’ll never find a programming language that frees
you from the burden of clarifying your ideas.” - xkcd

student · February 6, 2013, 3:43am

When a symbol is defined, the memory used to store the symbol is
permanently lost. If one is parsing external input, this makes one’s
application vulnerable to DOS.

Secondarily, if while parsing external input, one refuses to make new
symbols blindly, then the symbol list is something over which one has
direct control, and it can be trusted in some situations to speed
processing.

student · February 6, 2013, 5:27am

I see that some of the core team appear to hang out here, so I thought I
would bring it up here.

Certainly, if I were to optimize things, I would assume that the list of
symbols is append only. Then I would put strings of the symbols in a
set and add in the new ones as found for each call. After the first
call, this would likely be pretty cheap, but there must be similar
functionality already in place at the ‘C’ level, so this is really a
request to exposed that as part of the class. (And avoid doubling the
amount of memory used by symbols!)

student · February 6, 2013, 11:10am

While on the topic, I have a related question about this Symbol DOS
attack
vector: Can’t an upper limit be put on the size of the symbols table,
and
if it is exceeded, then an error is raised? Wouldn’t that alone be
sufficient to neuter such an attack?

student · February 6, 2013, 10:20pm

On 6 February 2013 20:10, Intransition [email protected] wrote:

While on the topic, I have a related question about this Symbol DOS attack
vector: Can’t an upper limit be put on the size of the symbols table, and
if it is exceeded, then an error is raised? Wouldn’t that alone be
sufficient to neuter such an attack?

Or, rather than error, just flush a bunch. If they’re needed, they’ll
come
back. If not, no loss. I’m sure symbol creation isn’t that expensive.

–
Matthew K., B.Sc (CompSci) (Hons)
http://matthew.kerwin.net.au/
ABN: 59-013-727-651

“You’ll never find a programming language that frees
you from the burden of clarifying your ideas.” - xkcd

student · February 6, 2013, 5:06am

On 6 February 2013 12:43, Student Jr [email protected] wrote:

When a symbol is defined, the memory used to store the symbol is
permanently lost. If one is parsing external input, this makes one’s
application vulnerable to DOS.

Secondarily, if while parsing external input, one refuses to make new
symbols blindly, then the symbol list is something over which one has
direct control, and it can be trusted in some situations to speed
processing.

I see.

If there’s a logical distinction between externally- and
internally-defined
symbols, you could override the entrypoint (your deserialiser or
whatever)
to build a hash of String=>Symbol pairs. That way instead of using
Symbol.all_symbols.any?{|sym| sym.to_s == string} you could use
my_hash.has_key? string. Not sure how you’d ever populate said hash,
though. Trusted entrypoints or something.

However if you want to reuse existing symbols you’d have to have a way
to
prepopulate and continuously update the hash. I can think of a bunch of
klugey ways to get it to work, but I’m not proud of any of them.

I imagine it should be relatively easy* to define a new native singleton
method defined? on Symbol… There’s obviously a legitimate use-case;
I
think it would be worth making a feature request for this.

I’m not a core contributor.
–
Matthew K., B.Sc (CompSci) (Hons)
http://matthew.kerwin.net.au/
ABN: 59-013-727-651

“You’ll never find a programming language that frees
you from the burden of clarifying your ideas.” - xkcd

student · February 7, 2013, 4:47am

Given some of the other discussions I’ve seen, it seems likely that the
problem with that would be symbol references.

student · February 7, 2013, 8:25pm

On Wed, Feb 6, 2013 at 3:43 AM, Student Jr [email protected] wrote:

When a symbol is defined, the memory used to store the symbol is
permanently lost. If one is parsing external input, this makes one’s
application vulnerable to DOS.

Secondarily, if while parsing external input, one refuses to make new
symbols blindly, then the symbol list is something over which one has
direct control, and it can be trusted in some situations to speed
processing.

I don’t believe this to be such a big deal: if you parse external data
and you do not know how many different strings there are of a kind you
would not use symbols anyway. Symbols make most sense for a fixed set
of values - similarly to an enum.

Also, there can also be DOS if external data is parsed and all the
Strings are stored somewhere during the import (e.g. as Hash keys)
which is quite a common scenario. If there are more Strings than fit
into memory the program will crash as well.

Kind regards

robert

student · February 8, 2013, 4:29am

DOS does not occur with strings because strings can be garbage
collected. Symbols are forever.

And rails allows > n! method definitions for models with n columns. For
instance.

student · February 8, 2013, 7:10am

calling to_sym on user input was not, is not and will not an good idea
and what Rails is doing is maybe not an good example to copy it

student · February 8, 2013, 7:38am

On Fri, Feb 8, 2013 at 4:29 AM, Student Jr [email protected] wrote:

DOS does not occur with strings because strings can be garbage
collected. Symbols are forever.

I am very well aware of that. Still the fact remains that you can
create a DOS with any external data if the data set is large enough
and the processing does not take that possibility into account. There
is nothing really special about Symbols here - as I have pointed out
earlier. (And. btw., you did not argue against that.) It is the way
input from external sources is read. The choice to use Symbols for
data with large variance is just one of many decisions that can do
harm to an application.

Cheers

robert

student · February 8, 2013, 7:54am

Well, if you want me to be explicit, I can.

Certainly if you accept arbitrary user input for parsing, you have an
automatic DOS vector by dint of sending a very large packet. Fine.

But if someone can make a thousand connections, and over the course of
the thousand connections PERMANENTLY chew up 100k of member per
connection, you start of have a problem of a very different sort.

It is in that sense–the sense of a memory leak–that symbols are
different in this regard.

And before you come back with “don’t do that”, remember that the ability
to create arbitrary objects is a prime feature of YAML. There needs to
be a way to scope that feature, and this is one option.

student · February 10, 2013, 3:37am

On Fri, Feb 8, 2013 at 12:54 AM, Student Jr [email protected]
wrote:

different in this regard.

And before you come back with “don’t do that”, remember that the ability
to create arbitrary objects is a prime feature of YAML. There needs to
be a way to scope that feature, and this is one option.

I’m running into something now with an API that converts XML to a
nested Hash with symbol keys via Savon. At some point, we’re going to
be getting near 5000 items in these XML responses. It’s not direly
problematic for this particular case, as this is something that gets
called infrequently at that rate, the XML is a response to a request
on our end (i.e. is not open to the wild wild internet), and is in a
self-contained job so it never permanently eats up memory, but it does
give me pause.

student · February 8, 2013, 7:27am

It’s not just rails. Rails happens to be a hotbed of bad programming
style to be sure, but the utility of allowing users to specify symbols
is substantial. Allowing them to create symbols is a memory leak &
therefore a DOS vulnerability. Thus the idea.

student · February 10, 2013, 8:50pm

On Fri, Feb 8, 2013 at 7:54 AM, Student Jr [email protected] wrote:

Well, if you want me to be explicit, I can.

Good.

Certainly if you accept arbitrary user input for parsing, you have an
automatic DOS vector by dint of sending a very large packet. Fine.

But if someone can make a thousand connections, and over the course of
the thousand connections PERMANENTLY chew up 100k of member per
connection, you start of have a problem of a very different sort.

That can be achieved with any bad coding.

It is in that sense–the sense of a memory leak–that symbols are
different in this regard.

Yes and no: yes, because Symbol has the property to aggregate in
memory, no because it is the programmer’s choice which allows bad
things to happen. You cannot simply for a YAML.load() on a program
via a network connection,

And before you come back with “don’t do that”, remember that the ability
to create arbitrary objects is a prime feature of YAML. There needs to
be a way to scope that feature, and this is one option.

OK, now we’re cooking! I can see where YAML is an issue because I
couldn’t find a way to customize Symbol deserialization for YAML. If
that way existed, fairly easy measures could be taken to prevent
excessive Symbol creation. For the time being one would have to patch
the library to prevent this DOS OR modify the input before throwing it
at YAML.load().

OTOH I cannot remember having read of a DOS via YAML Symbol
deserialization on this list.

Cheers

robert