Getting distinct result lists

Is there a way in Ferret to do something akin to SQL’s “SELECT DISTINCT”
query? I have a table with multiple columns, and I would like to
search just a couple of them (but index others for other kinds of
searches on the table). The two columns I want to search do not have
unique values, because the other columns finish specifying the record.
Is there a way to avoid picking up the duplicates in the result list?
Thanks,
–Paul

On 2007-08-16, at 3:15 AM, Paul L. wrote:

Is there a way in Ferret to do something akin to SQL’s “SELECT
DISTINCT”
query? I have a table with multiple columns, and I would like to
search just a couple of them (but index others for other kinds of
searches on the table). The two columns I want to search do not have
unique values, because the other columns finish specifying the record.
Is there a way to avoid picking up the duplicates in the result list?

Paul,

i think i don’t get your point… If you search for something in ferret,
you will never get duplicates in your result set (meaning ferret
documents). However, your columns (fields in ferret terms) might
have the same value for several documents.

So you just want to get the different field values for your search,
but not the actual ferret documents?

Maybe you can give us a small example of your data, searches,
actual results and what you would like to have as a result :slight_smile:

Ben

Here’s an example. Suppose you have a table “UsedCars” that looks like:

Color Make Age

Green Saturn New
Green Saturn Old
Red Saturn New
Purple Toyota New
Purple Toyota Old
Blue Yugo Ancient

The users searches on “Make”, and the returned data to the user is a
combination of Color and Make. The user just wants to see the unique
values, i.e.

Green Saturn
Red Saturn
Purple Toyota
Blue Yugo

(or whichever of those match the query). I could get rid of the
duplicates (e.g. the two Green Saturns) after doing the query, but the
table is large, has a lot of duplicates, and I’m limiting size of the
return list, so if I get rid of duplicates then I may need to make a
second query to get more if it turns out that after getting rid of the
duplicates my list is too short. (Potentially, I could be left with
just one item to show.)

In SQL, I could get the list like:
select distinct Color, Make from UsedCars limit 15

Is there a good way of doing that in ferret? If not, would writing a
filter be a good approach?

Thanks,
–Paul

Benjamin K. wrote:

On 2007-08-16, at 3:15 AM, Paul L. wrote:

Is there a way in Ferret to do something akin to SQL’s “SELECT
DISTINCT”
query? I have a table with multiple columns, and I would like to
search just a couple of them (but index others for other kinds of
searches on the table). The two columns I want to search do not have
unique values, because the other columns finish specifying the record.
Is there a way to avoid picking up the duplicates in the result list?

Paul,

i think i don’t get your point… If you search for something in ferret,
you will never get duplicates in your result set (meaning ferret
documents). However, your columns (fields in ferret terms) might
have the same value for several documents.

So you just want to get the different field values for your search,
but not the actual ferret documents?

Maybe you can give us a small example of your data, searches,
actual results and what you would like to have as a result :slight_smile:

Ben

Hi!

On Thu, Aug 16, 2007 at 07:23:14PM +0200, Paul L. wrote:

duplicates (e.g. the two Green Saturns) after doing the query, but the
table is large, has a lot of duplicates, and I’m limiting size of the
return list, so if I get rid of duplicates then I may need to make a
second query to get more if it turns out that after getting rid of the
duplicates my list is too short. (Potentially, I could be left with
just one item to show.)

In SQL, I could get the list like:
select distinct Color, Make from UsedCars limit 15

Is there a good way of doing that in ferret?

None that I know of - imho the best way to do this would be to use
Ferret for just fetching the IDs of all matching records and then use
these IDs with sql like above to let the database do it’s job.

If not, would writing a filter be a good approach?

I don’t think a Filter would be a good way to solve this problem -
Filter’s don’t have a clue about the query you’re running, they just
operate on bit vectors indicating which documents may appear in a query
result, and which may not.

Cheers,
Jens


Jens Krämer
http://www.jkraemer.net/ - Blog
http://www.omdb.org/ - The new free film database