source: gs3-installations/intermuse/trunk/interfaces/intermuse/transform/pages/home.xsl@ 38220

Last change on this file since 38220 was 38220, checked in by davidb, 8 months ago

Make it easier to access Triplestore

File size: 19.2 KB
Line 
1<?xml version="1.0" encoding="UTF-8"?>
2<xsl:stylesheet version="1.0"
3 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
4 xmlns:java="http://xml.apache.org/xslt/java"
5 xmlns:util="xalan://org.greenstone.gsdl3.util.XSLTUtil"
6 xmlns:gslib="http://www.greenstone.org/skinning"
7 extension-element-prefixes="java util"
8 exclude-result-prefixes="java util xsl gslib">
9
10
11
12 <!-- the page content -->
13 <xsl:template match="/page/pageResponse">
14
15 <div id="quickSearch">
16 <gslib:crossCollectionQuickSearchForm/>
17 </div>
18
19 <h2><gslib:selectACollectionTextBar/></h2>
20 <xsl:call-template name="collectionAndGroupLinks"/>
21
22 <xsl:call-template name="homePageDescription"/>
23
24 <!--
25 <div style="clear: both; padding-top: 4px; padding-bottom: 4px;"><hr/></div>
26 <xsl:variable name="siteDesc"><xsl:choose><xsl:when test="$groupPath != ''"><gslib:groupDescription path="{$groupPath}"/></xsl:when><xsl:otherwise><gslib:siteDescription/></xsl:otherwise></xsl:choose></xsl:variable>
27 <xsl:if test="$siteDesc != ''">
28 <xsl:value-of select="$siteDesc"/>
29 <div style="clear: both; padding-top: 4px; padding-bottom: 4px;"><hr/></div>
30 </xsl:if>
31 -->
32
33 <gslib:serviceClusterList/>
34
35 <xsl:for-each select="serviceList/service[@type='query']">
36 <gslib:serviceLink/><br/>
37 </xsl:for-each>
38
39 <xsl:for-each select="serviceList/service[@type='authen']">
40 <!--<gslib:libraryInterfaceLink/><br/><br/>-->
41 <gslib:authenticationLink/><br/><br/>
42 <gslib:depositorTitleMainLink/><br/><br/>
43 <gslib:registerLink/><br/><br/>
44 </xsl:for-each>
45
46 <!--
47 <gslib:oaipmhServerLink/><br/><br/>
48 -->
49
50 <gslib:webswingGLILink/><br/><br/>
51
52 <gslib:aboutGreenstoneLink/><br/>
53 </xsl:template>
54
55 <xsl:template name="homePageDescriptionXXXXX">
56 <div style="clear: both; padding-top: 4px; padding-bottom: 4px;"><hr/></div>
57 <xsl:variable name="siteDesc"><xsl:choose><xsl:when test="$groupPath != ''"><gslib:groupDescription path="{$groupPath}"/></xsl:when><xsl:otherwise><gslib:siteDescription/></xsl:otherwise></xsl:choose></xsl:variable>
58 <xsl:if test="$siteDesc != ''">
59 <xsl:value-of select="$siteDesc"/>
60 <div style="clear: both; padding-top: 4px; padding-bottom: 4px;"><hr/></div>
61 </xsl:if>
62 </xsl:template>
63
64
65 <xsl:template name="homePageDescription">
66 <div style="clear: both; padding-top: 4px; padding-bottom: 4px;"><hr/></div>
67 <div style="float: right; width: 300px;">
68 <img style="width: 100%;" src="interfaces/{$interface_name}/images/intermuse-title-logo.png" />
69 <div style="float: right; padding-right: 0.7em;">
70 <a href="/fuseki3/dataset.html?tab=query&amp;ds=/greenstone">Access as Linked Data</a>
71 </div>
72 </div>
73
74 <p style="padding-top: 0.7rem;">
75
76 Live musical events play a vital role in community life across the
77 globe, yet their very ‘liveness’ means they often leave only faint
78 traces on the historical record, even in modern times. While
79 musicologists have used some types of concert ephemera to capture the
80 nature and identity of musical events, by their very nature these
81 resources can be confusingly inconsistent, tantalisingly incomplete,
82 and often scattered between different archives and collections.
83 </p>
84 <!--
85 <p>
86 This <i>prototype</i> InterMusE Digital Library is a resource developed as the result
87 of a two-year project, funded by AHRC’s UK-US New Directions for Digital
88 Scholarship in Cultural Institutions programme, that seeks
89 to better capture and represent these historical events.
90
91 Using natural-language processing, optical character recognition (OCR),
92 and other forms of artificial intelligence, this digital library
93 brings together an array of digitised resources sourced from:
94 </p>
95 -->
96
97 <p>
98 This <a href="https://intermuse.datatodata.org/" target="_blank">InterMusE Project</a>
99 is a two-year research endeavour,
100 funded by AHRC’s UK-US New Directions for Digital
101 Scholarship in Cultural Institutions programme, that seeks
102 to better capture and represent these historical events,
103 leveraging natural-language processing, optical character recognition (OCR),
104 and other forms of artificial intelligence.
105 To illustrate the potential of the approach we work with digitised resources
106 sourced from:
107 </p>
108
109 <ul>
110 <li>
111 <a href="https://www.york.ac.uk/borthwick/" target="_blank" rel="noreferrer noopener">Borthwick Institute for Archives</a> (University of York),
112 </li>
113 <li>
114 <a href="https://krannertcenter.com/" target="_blank" rel="noreferrer noopener">Krannert Center for the Performing Arts</a> (University of Illinois at Urbana-Champaign),
115 </li>
116 <!--
117 <li>
118 <a href="https://linenhall.com/?gclid=CjwKCAiAmrOBBhA0EiwArn3mfAtN1-wdaaatezqK2X672WkKTS2he3g-8eDTZPc2INaYSHIdBQ4DExoCh-kQAvD_BwE" target="_blank" rel="noreferrer noopener">Linen Hall Library (Belfast)</a>, and
119 </li>
120 -->
121 <li>
122 <a href="https://www.bl.uk/" target="_blank" rel="noreferrer noopener">The British Library,</a> and
123 </li>
124 <li>
125 <a href="https://www.rcm.ac.uk/" target="_blank" rel="noreferrer noopener">The Royal College of Music.</a>
126 </li>
127 </ul>
128 <p>Material is also sourced from three former chapters of the British Music Society (est. 1918):</p>
129 <ul>
130 <li>
131 <a href="http://www.huddersfield-music-society.org.uk/" target="_blank" rel="noreferrer noopener">Huddersfield Music Society,</a>
132 </li>
133 <li>
134 <a href="http://www.bms-york.org.uk/" target="_blank" rel="noreferrer noopener">British Music Society of York,</a> and
135 </li>
136 <li>
137 <a href="https://www.belfastmusicsociety.org/" target="_blank" rel="noreferrer noopener">Belfast Music Society.</a>
138 </li>
139 </ul>
140
141
142
143 <h3>Prototype Digital Library</h3>
144
145 <p>
146 Greenstone3 is an open-source digital-library system with a
147 versatile service-based software architecture, managed through
148 an extension mechanism. Taking the Huddersfield Music Society
149 Programmes as the set of digitised content processed, this
150 online resource demonstrates how Greenstone3 can be used to meet
151 the aspirations of the InterMusE project.
152 </p>
153
154 <!--
155 <p>
156 This prototype collection contains <xsl:value-of select="$numdocs"/> documents
157 focusing on a sample of programmes from the Huddersfield Music Society.
158 </p>
159 -->
160
161 <p>
162 <!-- Linked Open Data is used to unify these resources. -->
163
164 When content is added to the digital library, it is automatically
165 processed using the Google Vision API, and any text extracted is added
166 to the digital library's full-text index,
167 as well as stored as Linked Open Data using the
168 <a href="https://dev.gdmrdigital.com/" target="_blank">SimpleAnnotationServer</a>.
169 We make the OCR'd text available as Open Annotations,
170 accessible through a <a href="https://projectmirador.org/" target="_blank">Mirador3 Image Viewer</a>
171 embedded into the digital library.
172 Through the Mirador3 Viewer, annotations can be edited (correcting OCR errors, for example),
173 as well as allowing for the
174 addition of complete new annotations (unrelated to the OCR'd text, if so desired).
175 Because <a href="https://jena.apache.org/" target="_blank">Apache Jena Fuseki</a>
176 is the internal triplestore the Simple Annotation Server uses,
177 this means all the OCR'd content—along with
178 all the other metadata amassed in the digital library—can also be accessed via a SPARQL endpoint.
179 More details are available through the
180 <a href="https://intermuse.datatodata.org/" target="_blank">InterMuse project website</a>.
181 </p>
182
183 <!--
184 <p>
185 Prior to the start of the InterMusE project, HMS archivist
186 Hilary Norcliffe had been painstakingly assembled from the
187 programmes an Excel spreadsheet which records who the performers
188 were, and which musical works they performed at what concert.
189 In addition to the automatically generated OCR'd content, we
190 fold this into the digital library collection, both as
191 information to display, but also as metadata that can be used to
192 enrich how users can locate content of interest to them in the
193 collection.
194 </p>
195 -->
196
197
198 <p>
199 In addition to the automatically generated OCR'd content,
200 an Excel spreadsheet has been painstakingly
201 assembled from the programmes 'by the HMS archivist'
202 recording who the performers were, and which musical works they
203 performed at what concert.
204 We fold this into the digital-library collection, both as
205 information to display, but also as metadata that can be used to
206 enrich how users can locate content of interest to them in the
207 collection.
208
209
210 <h3>Designed for Different Types of User</h3>
211 <p>
212 Use the browsing and searching features the digital library provides to locate content
213 of interest. Register as a user to become an annotator/editor of the content.
214 For an external developer, interested in further enriching the forms of access to this content,
215 a machine-readable version of the content is accessible through the following
216 <a href="{$library_name}/collection/{$collName}/page/sparql">SPARQL endpoint</a>.
217 </p>
218 <p>
219 <gslib:collectionDescriptionTextAndServicesLinks/>
220 </p>
221
222 <xsl:variable name="raw_date">
223 <gslib:collectionMeta name="buildDate"/>
224 </xsl:variable>
225 <xsl:variable name="formatted_date">
226 <xsl:value-of select="util:formatTimeStamp($raw_date, 0, 3, /page/@lang)"/>
227 </xsl:variable>
228 <xsl:variable name="numdocs">
229 <gslib:collectionMeta name="numDocs"/>
230 </xsl:variable>
231 <p>
232 This prototype collection contains <xsl:value-of select="$numdocs"/> documents focusing on a sample of programmes from the Huddersfield Music Society.
233 <!--
234 <xsl:value-of select="util:getInterfaceText($interface_name, /page/@lang, 'about.standarddescriptiondays', concat($numdocs, ';', $formatted_date))"/>
235 -->
236 </p>
237
238
239 <h3>Implementation Details</h3>
240
241 <p>
242 To form this prototype InterMusE digital library, we have taken
243 the base digital-library system and added in Greenstone's
244 extensions for:
245 <ul>
246 <li>
247 <a href="https://trac.greenstone.org/browser/gs3-extensions/structured-image/trunk">structured-image</a>
248 to automatically perform OCR on programme pages using Google Vision's API;
249 </li>
250 <li>
251 <a href="https://trac.greenstone.org/browser/gs3-extensions/iiif-servlet/trunk/src">iiif-servlet</a>
252 to allow images in the digital library to be
253 available at a range of resolutions via the IIIF Image API; and
254 </li>
255 <li>
256 <a href="https://trac.greenstone.org/browser/gs2-extensions/apache-jena/trunk/src">apache-jena</a>
257 so that content—such as annotations added to
258 programme pages—can be accessed as Linked Data.
259 </li>
260 </ul>
261 </p>
262
263 <p>
264 A key strength of the Greenstone3 software architecture is its
265 ability to be customised, which is aligned with its three phases
266 for forming a digital-library collection: importing, building,
267 and runtime presentation. The first two phases typically go
268 hand-in-hand, and form the ingest process by which content
269 selected for the digital-library collection is turned into a
270 browseable and searchable online resource.
271 </p>
272 <p>
273 Importing centres around a pipeline of document-processing
274 plugins, written in Perl, that turn a wide array of document
275 and metadata formats into a canonical format known as
276 GreenstoneXML. Using one folder per document, this format
277 represents everything that constitutes the processed document:
278 the text and metadata of the document,
279 along with any supporting files. The internal format
280 allows for hierarchical structure, such as which occurs
281 in Word, PDF, and HTML documents using headings.
282 Metadata can be attached to any level of the hierarchy.
283 Examples of associated files include: automatically generated
284 web-friendly resources, such as
285 <!-- as an MP3 version of
286 a high quailty FLAC audio recording, for instance, -->
287 screen-sized and thumbnail-sized images in the case
288 of photos; embedded resources in the case of HTML; and the original file itself, so it can be
289 downloaded.
290
291 <!-- GreenstoneMETS -->
292 </p>
293 <p>
294 In terms of customisation, plugins support a
295 myriad of settings for fine-tuning how the processing is
296 undertaken. New plugins can also be introduced at any time,
297 with the digital-library system automatically detecting their
298 presence.
299 </p>
300
301 <p>
302 The building step takes the standardised XML form, and processes
303 it to form the backend indexes and database structures needed to
304 deliver the forms of searching—such as full-text search, and
305 search by title—and browsing—such as a hierarchical subject
306 classification—specified in the collection's configuration file.
307
308 Effectively the building phase turns the standardised/serialised
309 GreenstoneXML form back into in-memory data-structures representing
310 a document's hierarchical structure of text and metadata, along
311 with how supporting files relate to that.
312 Following the directives specified in the collection's
313 configuration file, it is
314 then a simple matter to transmit this text, metadata, and associated
315 files as needed to the digital-library's indexing/database/backing-store.
316 <!--
317 so it can be used by the runtime system to provide the
318
319 to be used by the runtime system
320 -->
321 </p>
322 <p>
323 Beyond the customisations that can be specified in a
324 collection-configuration file for the building phase, Greenstone
325 supports orthogonal indexers. Like the document-processing
326 plugins used in importing, orthogonal indexers are modules
327 written in Perl, and their inclusion is automatically detected
328 by the Greenstone3 installation. Orthogonal indexers get
329 presented with the same in-memory stream of
330 &quot;reconstructed&quot; documents, allowing them to undertake
331 additional processing if required (such as computing audio
332 features), which can then be transmitted to a specialist
333 indexing/database/backing-store (such as a content-based
334 music-recommender system), or otherwise added to the existing
335 indexing/database/backing-store.
336 </p>
337 <p>
338 The third phase of the Greenstone3 digital-library architecture
339 governs how functionality is accessed and data is extracted from
340 the digital library and presented to the user. The Greenstone3
341 runtime is a service-based architecture, written in Java,
342 consisting of a network of connected modules. Modules are
343 self-describing and advertise the services they offer.
344 Communication between modules is by XML messages, with the
345 service handling the final layer of communication responsible
346 for presentation. Here, XSL Transforms (XSLTs) are used to
347 convert the underlying XML content into the web page displayed
348 by the digital library, blending in CSS and Javascript
349 files that control appearance and functionality.
350
351 </p>
352 <p>
353 The XSLT files are grouped together in one place, forming the
354 interface for the digital library. An inheritance mechanism is
355 deployed throughout this part of the design. A collection can
356 override individual XSLT template rules, as required to tweak
357 presentation details. A collection can also provide an entire
358 replacement XSLT file, if so desired. For more substantial
359 changes a new interface is typically developed.
360 </p>
361
362 <p>
363
364 In terms of crafting the features and functionality to form this
365 prototype InterMusE digital library, we make use of all three
366 areas for customisation. Mirador3 is a NodeJS web stack, and so
367 to switch the digital library's document display to use this
368 viewer, replacement XSL template rules were introduced to load
369 in the necessary CSS and JavaScript files, and call the viewer's
370 initialisation function. Mirador draws its image content from
371 an IIIF compliant server. This was achieved by using the above
372 mentioned IIIF-Servlet extension to Greenstone3. Mirador,
373 however, cannot natively handle Google Vision JSON, but does
374 support the OpenAnnotation JSON format. We therefore extended
375 <i>StructuredImagePlugin</i> to include a function that performs
376 a cross-walk of the former JSON format to the latter.
377 </p>
378
379 <p>
380 To support the editing of annotations the
381 sequence undertaken is informally, but best, described as ``a
382 plumbing exercise.'' Mirador3 requires the addition of the
383 <i>mirador-annotations</i> plugin to allow editing. This in
384 turn was configured to direct the plugin to use a Simple
385 Annotation Server (SAS) endpoint to store the annotations. SAS
386 supports a variety of different storage backends. We set this
387 to be Apache Jena and directed SAS to use the one we had
388 installed as a Greenstone3 extension. To get the
389 OpenAnnotations produced by <i> StructuredImagePlugin</i> into
390 the Jena store, we added in a new orthogonal indexer. The net
391 result of all of this is that, upon a fresh rebuild of the digital-library
392 collection, a user accessing one of the digitised programmes can
393 now edit the OCR'd text, or else lay in new annotations over the
394 page. An XSLT-based <i>if-statement</i> completes the plumbing
395 exercise, checking settings provided by the digital library to ensure the
396 editing-based version of the Mirador viewer is only activated if
397 the user is logged in and has an editing role assigned for the
398 collection they are accessing.
399 </p>
400
401
402<!--
403 there
404 are three keys parts of the Greenstone3 design where
405
406 area where customisation
407
408
409 digital library collection into
410
411 the online resource
412
413 features and functionality
414
415 In developing a Greenstone3 digital library collection, there are three key phases
416 to consider: importing, building, and runtime-display.
417
418 The first
419 XML message based .. XSL Transforms (XSLT)
420
421 The there are three key phases to the
422
423 Its modular design
424
425 The modular design of Greenstone3 provides several stages where
426
427 ...
428
429 importing
430 building
431 runtime-display
432
433 orthogonal indexes
434-->
435
436<!--
437 Three key 'hook-in' points within the Greenstone3 software architecture
438 for customisation are: the Perl-based document-processing plugins
439 used in the content ingest pipeline, through which
440 content and metadata are ingested into a digital library
441 collection
442
443 Perl-based document-processing pipeline
444
445 Woven together in the following way
446
447 We have applied
448
449 Mirador
450 SimpleAnnotationServer
451
452 This forms the framework for this developed
453
454 In developing this online resource, we have applied it
455 -->
456 </p>
457
458 <h2>Available Services</h2>
459
460 </xsl:template>
461</xsl:stylesheet>
Note: See TracBrowser for help on using the repository browser.