org.greenstone.fedora.services
Class GSearchConnection

java.lang.Object
  extended by org.greenstone.fedora.services.GSearchConnection
All Implemented Interfaces:
FedoraToGS3Interface.Constants

public class GSearchConnection
extends java.lang.Object
implements FedoraToGS3Interface.Constants

Class GSearchConnection connects to FedoraGSearch's web services. FedorGSearch offers indexing and full-text search functionality for Fedora repositories. Its search web service (method gFindObjects) returns the response of a search as XML. GSearchConnection offers more convenient methods that extract just the parts of search results that FedoraGS3Connection needs and returns that.

Author:
ak19

Field Summary
protected  javax.xml.parsers.DocumentBuilder builder
          A DocumentBuilder object used to construct and parse XML
protected  org.apache.axis.client.Call call
          The Call object used to connect to the FedoraGSearch web services
protected static java.lang.String DC_TITLE_FIELD
           
protected static java.lang.String FIELD
           
protected static java.lang.String FULLTEXT_FIELD
           
protected static java.lang.String G_FIND_OBJECTS
          The names of the methods we use of Fedora Generic Search's web services are declared here as static final Strings.
protected static java.lang.String HIT_TOTAL
           
protected static java.lang.String INDEX_NAME
           
protected static java.lang.String NAME
           
protected static java.lang.String NAMESPACE_URI
           
protected static java.lang.String OBJECT
           
protected static java.lang.String PID
           
protected  javax.xml.namespace.QName portName
          The portName object used when connecting to FedoraGSearch's web services
protected  org.apache.axis.client.Service service
          The Service object used to connect to the FedoraGSearch web services
protected static java.lang.String SERVICE_NAME
           
protected static java.lang.String SPACE
          separator used internally to separate values of a search field
 
Fields inherited from interface org.greenstone.fedora.services.FedoraToGS3Interface.Constants
ALL_FIELDS, ALL_TITLES, ASSOCFILEPREFIX, COMMA, DOC_TITLES, FIELDNAME_ATT, FULLTEXT, GS3FilePathMacro, MAXDOCS, NUM_DOCS_MATCHED, OCCURS_ATT, QUERY, SIMPLEFIELD_ATT
 
Constructor Summary
GSearchConnection(java.lang.String wsdlFileLocation)
          Constructor that takes a String representing the url of the WSDL file for FedoraGSearch's web services, and tries to establish a connection to those web services.
 
Method Summary
protected  java.lang.String formatSearchTermsInField(java.lang.String field, java.lang.String fieldName)
          Each field is a comma separated list of terms that may be either a word OR a phrase.
 java.lang.String[] getPIDsFromSearchResult(java.lang.String collectionName, java.lang.String searchResult)
          Call this method with the return value of calling search().
protected  java.lang.String gFindObjects(java.lang.String searchFieldedTerms, java.lang.String sort, int hitPageStart, int hitPageSize, int snippetsMax, int fieldMaxLength, java.lang.String indexName, java.lang.String resultPageXslt)
          Method to invoke gfindObjects operation of Fedora Generic Search web services.
static void main(java.lang.String[] args)
           
 java.lang.String search(java.util.Map fieldsToSearchTerms, int hitPageStart, int hitPageSize)
          FedoraGSearch accepts a query of the form: <"cyclone val" "Gender Inequalities" ds.fulltext:"cyclone val" ds.fulltext:"worst storm"> where the first two phrases are searched for in all indexed fields, (in this case dc.title and ds.fulltext), while the last two are searched for in the ds.fulltext field.
 java.lang.String search(java.lang.String fieldedSearchTerms, int hitPageStart, int hitPageSize, int snippetsMax)
          Uses FedoraGSearch to perform a search where the query is embedded in fieldedSearchTerms, which not only provides the terms to search on, but also the fields to search the (various) given terms in.
 java.lang.String search(java.lang.String searchFieldName, java.lang.String searchTerm, int hitPageStart, int hitPageSize, int snippetsMax)
          Method that performs a search for the given searchTerm inside the given indexed field.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NAMESPACE_URI

protected static java.lang.String NAMESPACE_URI

SERVICE_NAME

protected static java.lang.String SERVICE_NAME

INDEX_NAME

protected static final java.lang.String INDEX_NAME
See Also:
Constant Field Values

G_FIND_OBJECTS

protected static final java.lang.String G_FIND_OBJECTS
The names of the methods we use of Fedora Generic Search's web services are declared here as static final Strings.

See Also:
Constant Field Values

PID

protected static final java.lang.String PID
See Also:
Constant Field Values

HIT_TOTAL

protected static final java.lang.String HIT_TOTAL
See Also:
Constant Field Values

OBJECT

protected static final java.lang.String OBJECT
See Also:
Constant Field Values

FIELD

protected static final java.lang.String FIELD
See Also:
Constant Field Values

NAME

protected static final java.lang.String NAME
See Also:
Constant Field Values

DC_TITLE_FIELD

protected static final java.lang.String DC_TITLE_FIELD
See Also:
Constant Field Values

FULLTEXT_FIELD

protected static final java.lang.String FULLTEXT_FIELD
See Also:
Constant Field Values

SPACE

protected static final java.lang.String SPACE
separator used internally to separate values of a search field

See Also:
Constant Field Values

service

protected final org.apache.axis.client.Service service
The Service object used to connect to the FedoraGSearch web services


call

protected final org.apache.axis.client.Call call
The Call object used to connect to the FedoraGSearch web services


portName

protected final javax.xml.namespace.QName portName
The portName object used when connecting to FedoraGSearch's web services


builder

protected final javax.xml.parsers.DocumentBuilder builder
A DocumentBuilder object used to construct and parse XML

Constructor Detail

GSearchConnection

public GSearchConnection(java.lang.String wsdlFileLocation)
                  throws java.net.MalformedURLException,
                         javax.xml.rpc.ServiceException,
                         javax.xml.parsers.ParserConfigurationException
Constructor that takes a String representing the url of the WSDL file for FedoraGSearch's web services, and tries to establish a connection to those web services.

Parameters:
wsdlFileLocation - is a String representing the url of the WSDL file
Throws:
java.net.MalformedURLException
javax.xml.rpc.ServiceException
javax.xml.parsers.ParserConfigurationException
Method Detail

gFindObjects

protected java.lang.String gFindObjects(java.lang.String searchFieldedTerms,
                                        java.lang.String sort,
                                        int hitPageStart,
                                        int hitPageSize,
                                        int snippetsMax,
                                        int fieldMaxLength,
                                        java.lang.String indexName,
                                        java.lang.String resultPageXslt)
                                 throws java.lang.Exception
Method to invoke gfindObjects operation of Fedora Generic Search web services. Parameter types, parameter order and return type of gFindObjects are as obtained from the wsdl file for the Fedora Generic Search web services located at: http://localhost:8080/fedoragsearch/services/FgsOperations?wsdl <wsdl:message name="gfindObjectsRequest"> <wsdl:part name="query" type="xsd:string"/> <wsdl:part name="sort" type="xsd:string"/> <wsdl:part name="hitPageStart" type="xsd:int"/> <wsdl:part name="hitPageSize" type="xsd:int"/> <wsdl:part name="snippetsMax" type="xsd:int"/> <wsdl:part name="fieldMaxLength" type="xsd:int"/> <wsdl:part name="indexName" type="xsd:string"/> <wsdl:part name="resultPageXslt" type="xsd:string"/> </wsdl:message> <wsdl:message name="gfindObjectsResponse"> <wsdl:part name="gfindObjectsReturn" type="xsd:string"/> </wsdl:message> <wsdl:operation name="gfindObjects" parameterOrder="query sort hitPageStart hitPageSize snippetsMax fieldMaxLength indexName resultPageXslt"> This method works: it searches the dc.title field of our FedoraIndex for the term (e.g. "interview") and the result returned is an XML String. There's no example on how to call gFindObjects with parameters. In particular, I don't know what values the parameter sort can take. But topazproject has an example on how to call updateIndex().

Throws:
java.lang.Exception
See Also:
An example on how to call updateIndex() with parameters, Axis Service class, Axis RPC Call, for specification of interface Call, Axis client Call class, for implementation of interface Call

search

public java.lang.String search(java.lang.String searchFieldName,
                               java.lang.String searchTerm,
                               int hitPageStart,
                               int hitPageSize,
                               int snippetsMax)
                        throws java.lang.Exception
Method that performs a search for the given searchTerm inside the given indexed field.

Parameters:
searchFieldName - is the name of the indexed field within which the given searchTerm is to be searched for.
searchTerm - is the term to be searched for.
hitPageStart - is the page of search results to start returning.
hitPageSize - is the number of search result pages to return, starting from hitPageStart.
snippetsMax - is the maximum number of separate snippets containing the searchTerm that are to be returned. (snippetsMax or a fewer number of occurrences of the word in the text will be returned)
Throws:
java.lang.Exception

search

public java.lang.String search(java.util.Map fieldsToSearchTerms,
                               int hitPageStart,
                               int hitPageSize)
                        throws java.lang.Exception
FedoraGSearch accepts a query of the form: <"cyclone val" "Gender Inequalities" ds.fulltext:"cyclone val" ds.fulltext:"worst storm"> where the first two phrases are searched for in all indexed fields, (in this case dc.title and ds.fulltext), while the last two are searched for in the ds.fulltext field. Another example: <gender dc.title:interview ds.fulltext:"cyclone val"> titles and fulltexts are searched for "gender", while title index is searched for "interview" and fulltexts are searched for the phrase "cyclone val"

Parameters:
fieldsToSearchTerms - is a Hashmap of searchfields and associated search terms (words or phrases). The terms are in a comma-separated list. fieldsToSearchTerms is a Hashmap of (Searchfields, associated-searchTerms) pairs. It can contain 3 searchfields: allfields, titles, text. The value for each is a comma-separated list of search terms in that field. Internally the field names get converted to what FedoraGSearch's gfindObjects understands: titles becomes dc.title:, text becomes ds.fulltext and allfields becomes nothing.
hitPageStart - is the page of search results to start returning.
hitPageSize - is the number of search result pages to return, starting from hitPageStart.
Returns:
the XML (in string format) returned from Fedora Generic Search's gfindObjects method
Throws:
java.lang.Exception

formatSearchTermsInField

protected java.lang.String formatSearchTermsInField(java.lang.String field,
                                                    java.lang.String fieldName)
Each field is a comma separated list of terms that may be either a word OR a phrase. We're going to separate each term from the list, and put quotes around phrases, then combine all the terms together again with spaces to separate them. Examples:
dc.title:"a phrase" word
 dc.fulltext: "cyclone val"
 (ALL_FIELDS) interview gender
This is required to facilitate fielded searching with fedoraGSearch.

Parameters:
field - is a comma separated list of search terms (corresponding to one fieldName) to be reorganised
fieldName - is the name of the field to prepend to the reorganised field value. FieldName ALL_FIELDS is ignored.
Returns:
parameter field reorganised such that terms that are phrases are in quotes and each term is separated by a space from the previous one.

search

public java.lang.String search(java.lang.String fieldedSearchTerms,
                               int hitPageStart,
                               int hitPageSize,
                               int snippetsMax)
                        throws java.lang.Exception
Uses FedoraGSearch to perform a search where the query is embedded in fieldedSearchTerms, which not only provides the terms to search on, but also the fields to search the (various) given terms in.

Parameters:
fieldedSearchTerms - is the String specifying all the search terms with their fields (or no field if it should search for the terms in all fields). The terms with no associated search-fields should come first. Search terms may be in quotes.
snippetsMax - is the maximum number of separate snippets containing the searchTerm (snippetsMax number of occurrences of the word in the text) returned.
hitPageStart - is the page of search results to start returning.
hitPageSize - is the number of search result pages to return, starting from hitPageStart.
Returns:
the XML (in string format) returned from Fedora Generic Search's gfindObjects method
Throws:
java.lang.Exception

getPIDsFromSearchResult

public java.lang.String[] getPIDsFromSearchResult(java.lang.String collectionName,
                                                  java.lang.String searchResult)
                                           throws java.lang.Exception
Call this method with the return value of calling search(). Search results are returned in GSearch's XML response format, containing information that includes the PIDs of the documents that matched the search. These PIDs are returned in the array.

Parameters:
collectionName - is the name of the collection to restrict the search results by. If it's "", then results from all collections are returned. Generally, don't want to pass "", because, theoretically, all indexed collections in the repository could be considered and not all of them may be Greenstone collections. If all Greenstone collections should be searched for, pass "greenstone" as the collection name instead.
searchResult - is the Fedora Generic Search XML response returned from performing a gfindObjects() operations.
Returns:
an array of the pids of documents found for the search.
Throws:
java.lang.Exception

main

public static void main(java.lang.String[] args)