Hi Jason !
I’ve looked to the ragel code for textile and you are right: it has
become quite hard to understand. I have gone through the list of
difficult defects and through the current textile reference and I have
the feeling that the current parser is quite complicated for the task
at hand. Textile does not look like such a complicated grammar (at
least not what is listed in the reference page), but maybe I’m wrong
and there are many places where determinism is not easily attained.
I really feel that the parts that are difficult for the parser are
also difficult for the reader when editing text. And most of these
hard-to-parse and hard-to-read features in textile (except for tables)
are not related to describing content but to styling: something like
setting an “id” in an article seems really bad to me: what if you
display two articles on one page and they both define “hot” id ? Same
goes with “em” padding: that’s not content, that’s styling.
I feel very concerned about all these issues related to textile
because I am building a CMS in which my clients put everything:
letters, comments, documents, quality certification stuff, control
lists, etc. So I really need a textile parser that can survive in the
long run (10yrs). To achieve this goal, we need to:
a. have a parser that is easy to enhance with new needs without
breaking old text
b. have a grammar that is easy to parse
For point “a”, I think we can live with S-expression generation and
customization during s-expression tree processing. For example an
image with caption would be parsed as:
!file.jpg (foo bar baz)! ==> [:image, “file.jpg (foo bar baz)”]
So the processor will run ruby regex to “finish the work”. This means
the parser in “C” is kept simple and if someone wants to add more
features to the “image” tag, she just has to change the ruby regex.
For point “b”: we need to not support shortcut syntax for styling
features such as the “id” thing or “em” padding (at least not at the
“C” parser level). If someone really wants an em padding, she should
use html (it’s not nice to use and this is an indication that this is
bad practice) :
# one
# two
Since I really need such a tool, I could help refactoring redcloth
into a two step parser (half in “C”, half in ruby).
What do you think ?
Gaspard