Symbol table performance impact

So we found a bug that was symbolizing a rather large number of
strings, we had around 3 million entries in the symbol table. While
this is a lot, it seems to be impacting performance more then I
thought it would. Anyone have some insight as to why it has this much
impact on performance?

The symbolized strings were all 64 integers converted to strings and
then symbolized.

Chris

Is it degrading performance of the entire application (like you are
running
out of heap space)? Or is it something more specific than that like
creating a new symbol?

Joe

On Fri, Jan 6, 2012 at 2:52 PM, Joseph A. [email protected]
wrote:

Is it degrading performance of the entire application (like you are running
out of heap space)? Or is it something more specific than that like
creating a new symbol?

It degrades performance for the entire app. Everything just takes
around 20x longer to run, chewing up significantly more cpu time.

Chris

We have some hand-rolled hashing table structure for symbols. I think
the tongue-in-cheek reaction is “stop generating random symbols”. The
main problem with symbols in both MRI and JRuby is they don’t ever GC.
Your performance problem is likely too many entries hashing into the
same hash bucket which will keep slowing down access to any symbol.

I am wondering how easy it would be for us to make symbols GC…Short
of that I recommend reconsidering what you to_sym.

-Tom

On Mon, Jan 9, 2012 at 12:46 PM, snacktime [email protected] wrote:


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email


blog: http://blog.enebo.com twitter: tom_enebo
mail: [email protected]

symbolizing random strings was a bug, once we fixed that our
performance issues went away.

Doing some tests to fine out where symbolizing is not longer a
performance gain is something we will be doing, as the number of
unique names in our hash structures will just keep growing over time.

The use case FYI is a large rpg game. The data pretty much never
stops growing. Even when we only symbolize things that you would
think should be symbolized, we will probably have tens of thousands of
unique keys, and that’s not the id’s, it’s just the unique friendly
names we assign to certain things.

Chris