Dorren wrote:
I know use another markup language, like wiki syntax or textile is to
prevent javascript injection. But for user who don’t know about wiki
syntax or textile, I’m thinking about just allow them to enter plain
html, parse the content, and reject all questionable tags and
attributes, only allow predefined (safe) tags, like bold or italic,
etc.
Wiki-like syntax can be easily learned (and Textile is such a syntax:
markup that is non-HTML), and saves you from the hassle of sanitizing
the input. You’ll have to handle a lot of special cases, due to browser
incompatibilities (IE6, for example, allows javas\ncript as a valid tag,
which, for computers, isn’t the same as javascript, obviously).
Is using html for markup less secure than using non-html markup?
What’s the main reason people use non-html markup?
Yes, HTML is less secure, mainly due to JS exploit issues, and otherwise
lacks readability by humans.
If you can avoid HTML input, do so.
Shameless plug:
ClothRed’s aim is to convert HTML into textile, and will be able to
serve as a sanitizer in the (hopefully) not too distant future:
http://clothred.rubyforge.org
(P.S.: Out of a similar need than yours, I came up with this library)
–
Phillip “CynicalRyan” Gawlowski
http://cynicalryan.110mb.com/
Rule of Open-Source Programming #13:
Your first release can always be improved upon.