Help me

Hi,
I am doing string search is one html file usign ruby.
If the seach sting is htmlentities means I have not match that word.
How can i do it. Please any one help me…

regards,
S.Sangeetha.

On 7/23/07, geetha [email protected] wrote:

Hi,
I am doing string search is one html file usign ruby.
If the seach sting is htmlentities means I have not match that word.
How can i do it. Please any one help me…

regards,
S.Sangeetha.

We might be able to help you better if you post the data and what you
expect to get out from it exactly.

Robert

On 7/23/07, Robert D. [email protected] wrote:

expect to get out from it exactly.

Robert

Robert:
If search string has html entities, then do not proceed with search.

Well, its very hard to define if query string has HTML entities or
not? For example, do you consider following string has HTML entities?

b = “hello world and so what; and < and there we go >”

dunno yes and no, but if your answer is yes, and string b HAS HTML
entities then:

require ‘cgi’

escaped_html = CGI::escapeHTML(b)
if escaped_html != b

string contains html entities

end

if you want a strict validation of HTML tags, and whether query is a
valid HTML, then hpricot may help.


Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.

http://blog.gnufied.org

On 7/23/07, hemant [email protected] wrote:

We might be able to help you better if you post the data and what you
not? For example, do you consider following string has HTML entities?

string contains html entities

end

if you want a strict validation of HTML tags, and whether query is a
valid HTML, then hpricot may help.

Some tips on asking questions:

  1. Have a meaningful subject or else your question will look like spam.
  2. Also please respond to answers that people are posting in response
    to your question.
  3. Robert suggestions.


Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.

http://blog.gnufied.org

On 7/23/07, Alex Y. [email protected] wrote:

Well, its very hard to define if query string has HTML entities or
not?
No it’s not…

Honestly, its up to user. Unless we are talking about valid XHTML,
which is definitely defined.

irb(main):002:0> re = REXML::Text::REFERENCE

I thought about this, when I was posting that response, but somehow i
felt user is not looking for valid HTML, but just if it contains HTML
entities or not?

Hpricot is certainly one tool you should consider.
also Rexml and Scrubyt.
Scrubyt is more for web-scraping but if you can scrape it, you can
remove it too.

hemant wrote:

We might be able to help you better if you post the data and what you
not?
No it’s not…

For example, do you consider following string has HTML entities?

b = “hello world and so what; and < and there we go >”

dunno yes and no, but if your answer is yes,
Then you’d be wrong.

irb(main):001:0> require ‘rexml/text’
=> true
irb(main):002:0> re = REXML::Text::REFERENCE
=> /(?:&([\w:][-\w\d.:]*);|&#\d+;|&#x[0-9a-fA-F]+;)/
irb(main):003:0> “this & that” =~ re
=> 5
irb(main):004:0> “hello world and so what; and < and there we go >” =~
re
=> nil

Admittedly I’m not scanning for all defined HTML entities, just for
valid XML entities, but given that one’s a superset of the other, and
undefined entity references probably shouldn’t occur within a valid HTML
document anyway, it’s good enough for most purposes…