No, I don’t want to censor what people are querying, I want to censor
the searches I display.
I display the last 5 queries entered in the search box on my website.
I do not want to display offensive terms - such as, the '7 words you
can’t say on TV. I am already censoring the words themselves, but I
want to censor terms that contain those words.
In case you need a reason why: the webpage is on the website of a
major University. I want to provide students with search topic tips,
but do not want to display any offensive words.
This is what I am doing:
censor = get_censor # returns an array of offensive words
unless censor.include?(query)
mod.update_search
How do I extend this to filter out queries that contain the censored
terms, I am thinking of MYSQL pattern matching with like but am not
sure how to do it.
Come on - this is an interesting problem. Too easy? Too hard? Too
controversial?
Well it certainly is interesting and can be a can of worms as it will
require regular maintenance. This is because there are many patterns
can be used to work around your censor.
I’d suggest you get to know regex pattern matching and see how
it can be done using MySQL functions. I think performance may
be an issue if you have a lot of content to filter.
If you can try to filter as you store content to DB or run an
independent
background process that cleans up the DB content while you sleep.
Yes my idea is to filter as I store in the DB. It seems like I should be
able to use wildcards around a value to match it to a list of words. For
example, if a user enters ‘ass’ the query will get filtered out and not
displayed, but if they enter ‘assmonkey’ then it will not get filtered. If
wildcards were used then it should filter it.
I do not expect to be able to filter every offensive query, but if I can get
some of them I would be happier.
I agree. Though you may want to consider if the following is acceptable.
a s s
a # s # s
a.s.s
etc.
I hope you get the idea.
I am familiar with reg ex, but how would I apply it? See my first post.
Any other takers?
I think Shawn may have eluded to the start of a possible solution…
Yes my idea is to filter as I store in the DB. It seems like I should be
able to use wildcards around a value to match it to a list of words. For
example, if a user enters ‘ass’ the query will get filtered out and not
displayed, but if they enter ‘assmonkey’ then it will not get filtered.
If
wildcards were used then it should filter it.
I do not expect to be able to filter every offensive query, but if I can
get
some of them I would be happier.
I am familiar with reg ex, but how would I apply it? See my first post.