General todo list. in no particular order. large jobs. * gs3 release - basic stuff, translations, installation, documentation. oct 31? gs2 building only, both default and nzdl interfaces? gatherer? * generic resource retrieval - associated documents, src docs eg pdf/word, niupepa image files etc. if http accessible, put a link otherwise use a general resource retrieve? * collector collection building - do we want to finish this? needs addDocument and configureCollection services. The addDocument service (in GS2Construct) has not been implemented. It should take a file name and add the document to the import directory of teh collection. there are problems with just transmitting a file name - the service may live remotely and therefore the document is not there. You should probably send it attached to the html - therefore need to work out: how to get the document attached to the form (theres something to do with post and encoding = multipart ?? otherwise the browser just sends the filename), and then where to get it in the servlet - is it a parameter? or something else? and then it needs to be added into the xml request to be passed to the service. If the program is running locally its much simpler just to send a filename - can we somehow check for this? * configure Collection for building also to do with building, a little harder, is the ConfigureCollection service. There is no stub for it yet, but easy enough to add one. - need to add this service to the service description xml stuff, and write a processConfigureColleciton() method. It would be easy enough to display the config file in a big text box, and have the user edit it like the collector does. the hard bit is that when you click ConfigureCollection, you dont know what collection you are going to be dealing with - all the building services, you select the collection on the service page. with the configure stuff, you need to select the collection, and then the config file needs to be retrieved. so its really a two step process to configure the coll - first select the coll, submit that, then edit the config file, and submit that. All the services currently are one step - need to think about how this type of service fits into the model. maybe it needs a hidden arg? - to tell teh service if you're at stage 1 or 2? when teh action does the request, it then asks for the service description again to redisplay it for the user. maybe if the service knows that it has done the first half, it sends the second type of description? also do we use the collect.cfg file or the collectionConfig.xml file to show the user? * sequence of services some service clusters have services that you are supposed to carry out in sequence such as building, but there may be others. can we do a generic action or xslt or something that sends teh user to teh next service once they've completed the first one? Maybe the service cluster/serviceRack class specifies teh sequence of services, and they are all handled individually like present except that some xslt puts a next button on each page with a link to teh next service in the list. * proper install package * currently nothing is cached - service decriptions, text strings etc could all be cached. (cgi args are cached by tomcat) * document structure - new greenstone archive format? TEI, XHTML, OO, GML compatible? parallel document structures? - associated resources/documents - xlinks * greenstone 3 building - is building going to change for gs3, or will we continue to use the perl scripts? - import, build, activate - info extraction, augmentation - incremental update? - modular? xml pipeline? * additional services - music search - keyphrase stuff - search history * all the gs2 admin/security stuff - user management, authentication * server side threading issues - is it thread safe? * usage logs * better error handling * document version control * combining requests to MR - results from first one becoming content for second one?? * language translation stuff - when translating, we want to click on a text piece or macro and go to a sample page with that text string in it. can we do this? can we tell in gs3 where bits will be in pages? I think we could have a new type of request, that provides a page containing all teh text strings used by a service/agent. - would it be class specific or service specific? eg it would contain the service form, some sample results, and if its a process type of thing, all teh possible error and success messages. what about the interface bits? * are message formats fixed? can we create a DTD? or will they evolve too much over time? * sitewide service agents - may want to specify that only some of the services that could potentially be provided by an agent are actually provided. * should actions respond to describe requests? and what would they say if they did? are they agents and part of teh system, or somehow outside the system? * what about soap? is there a new version of that? * combined query services - eg one page with two service forms on there eg text query and music query - need to combine the results. eg teh action could carry out both queries then use xslt to combine the results. or do the same thing across title and creator indexes in mg to approximate fielded searching? true cross collection search would probably need a new service?? * more generally, combining services in general, action and service level * action helpers: query term highlighting, page transformation and metadata determination (remove from receptionist?) * if leave an applet page you lose all teh previous info eg for the status display for importing. need a function to retrieve again all the previous messages to redisplay them. * can the xslt dynamically retrieve the metadata it needs for say query results?? * CSS instead of tables? - use html lists than use css to format them. * cross coll search - if colls are built the same, just present the form and search both. need to think about ranking. if colls are not built the same, and have indexes, subcolls, langs - do we merge the lists? and only search those that have the right ones? what if two indexes have different meanings and display text but the same id? are they treated as same index? which name do you display? what about colls with different levels eg oneonly has document, the other doc and section. what do you display? field lists? just combine? * search history * use objects instead of applets? java provides a conversion tool. NPX_PLUGIN_PATH= * danas combined search and browse stuff * can xslt be used to generate requests as well as transform teh output? * kea.