Hello,
I have come across certain situations where text marked up with textile
syntax needs to be displayed where HTML isn’t wanted. For example, in
the title element of a an HTML page.
In these situations, I would like a way of stripping away the textile
markup from a string and displaying it completely “naked” - without HTML
tags and, crucially, without the textile markup too.
So:
The quick brown “fox”:/fox.html jumps over the lazy dog’s tail.
becomes
The quick brown fox jumps over the lazy dog’s tail.
(N.B. In the example above, I still want my punctuation made pretty
“dog’s tail” should still become “dog’s tail” )
What’s the best way of doing this? If there isn’t an elegant way of
doing it, could Redcloth have a to_plaintext method?
Thanks,
You could use Nokogiri:
require 'redcloth'
require 'nokogiri'
html = RedCloth.new(str).to_html
plaintext = Nokogiri::HTML.fragment(html).text
// Magnus H.
Thanks Magnus!
Magnus H. wrote:
You could use Nokogiri:
require 'redcloth'
require 'nokogiri'
html = RedCloth.new(str).to_html
plaintext = Nokogiri::HTML.fragment(html).text
// Magnus H.
This works nicely. Have also found an example using Hpricot
http://wiki.github.com/hpricot/hpricot/hpricot-challenge (see under
“Strip All HTML Tags”).
I’m happy using this, but I’m still left wondering whether it would make
more sense for RedCloth to have an elegant “to_plaintext” method. It
seems better not to go through an HTML intermediate stage.
Best,
That’s a simple, elegant solution! I was going to suggest writing your
own formatter that passed everything through without adding anything,
but that’s way too much work!
You should probably add that as a TODO, since I’ve seen plenty of
sites who uses the Textile source as a simple plaintext-preview (where
a proper formatter would definitely be better).
// Magnus H.