Ticket #341 (closed annoyingness: fixed)

Opened 11 years ago

Last modified 11 years ago

Searching doesn't work as you might think when building index from multiple metadata elements

Reported by: mdewsnip Owned by: kjdon
Priority: low Milestone:
Component: Indexers Severity: minor
Keywords: Cc:

Description

If you build an index like:

indexes dc.Title,dc.Creator

and then try to search for "X and Y" in this index (where X is a title and Y is an author) it will not match. This is because what is indexed by MGPP looks like:

<Sec> <DC>X <DC>Y </Sec>

and MGPP won't match both X and Y because they come from different fields (even though it is one index). Searching for "X" in DC and "Y" in DC would probably work, but this can't be done from a simple query form. The only way I could find to get the search to succeed was to use the "ZZ" index, and this was a pain in the ass (and not always appropriate).

It's not clear whether this is a bug or a feature. Katherine suggested joining all the values together when indexing, to look like:

<Sec> <DC>X Y </Sec>

This would fix the problem, but there are cases where this isn't desirable. Searching for text at section level is one example. There will be other cases where you only want the search to match if all the query terms match in one field as well. This is a very fundamental aspect of the indexing so changing anything is potentially dangerous.

Katherine and I decided to document this issue here, but leave changing anything until someone has a clearer idea of how often this is a problem, and how best to fix it.

Change History

Changed 11 years ago by kjdon

  • status changed from new to closed
  • resolution set to fixed

I have fixed this by implementing the suggestion rejected above.

For example, if your indexes are title subject title,subject, then the indexer gets

<TI>title</TI><SU>subject</SU><TT>title subject</TT>

This is what you want. If you have an index title,subject then they should be indexed together and not separately (unless specified separately).

I can't see where this would be a problem.

This also means that if you have two eg subjects, then they both get in the same field

eg <SU>subject 1 subject 2</SU> so a search will match if one search term is in one subject and a second search term is in another subject.

I think this behaviour is what you'd want. If not, then we could look at making this an option.

Note: See TracTickets for help on using tickets.