Hi,
I am using Sanitize.clean(), for freeing contents from html tags, but
the difficulty is I want to preserve some of the tags from removing… I
have given like this.
html = File.new(file).read
soup = BeautifulSoup.new(html)
soup.title.contents=[‘’]
soup.find_all.each do |tag|
if tag.string!= nil
tag.contents = [‘’+tag.contents.to_s+‘’] if
(tag[‘style’] =~ /bold/)
tag.contents = [‘’+tag.contents.to_s+‘’] if
(tag[‘style’] =~ /italic/)
tag.contents = [‘’+tag.contents.to_s+‘’] if
(tag[‘style’] =~ /underline/)
end
end
soup_string = str_replace(soup.html.to_s)
return Sanitize.clean(soup_string.to_s, :elements =>
[‘div’,‘p’,‘span’,‘center’,‘table’,‘tr’,‘th’,‘td’,‘blockquote’, ‘br’,
‘cite’, ‘code’, ‘dd’, ‘dl’, ‘dt’,‘em’,‘i’, ‘li’, ‘ol’,‘pre’, ‘q’,
‘small’, ‘strike’,‘strong’, ‘sub’,‘sup’, ‘u’, ‘ul’,‘tbody’]),
but the problem is that I want to preserver the center and right
justifications also, which is not happening if I give ‘center’ here. If
any body know how to preserve justifications pls help me.
Thanks In Advance,
Santosh
Jun Y. Kim wrote:
you can also use ruby library Sanitize (Sanitize: A whitelist-based Ruby HTML sanitizer - wonko.com)
This library can make you parse html template very easily.
let’s see the following examples.
Using Sanitize is easy. First, install it:
sudo gem install sanitize
Then call it like so:
require ‘rubygems’
require ‘sanitize’
html = ‘foo’
Sanitize.clean(html) # => ‘foo’
By default, Sanitize removes all HTML. You can use one of the built-in
configs to tell Sanitize to allow certain attributes and elements:
Sanitize.clean(html, Sanitize::Config::RESTRICTED)
=> ‘foo’
Sanitize.clean(html, Sanitize::Config::BASIC)
Sanitize.clean(html, Sanitize::Config::RELAXED)
=> 'foo<img
src="http://foo.com/bar.jpg
" />’
Or, if you��d like more control over what��s allowed, you can provide
your own custom configuration:
Sanitize.clean(html, :elements => [‘a’, ‘span’],
:attributes => {‘a’ => [‘href’, ‘title’], ‘span’ => [‘class’]},
:protocols => {‘a’ => {‘href’ => [‘http’, ‘https’, ‘mailto’]}})
good one
-
- 02, ���� 6:42, Vivek Netha �ۼ�: