Changeset 27116
- Timestamp:
- 2013-03-25T08:59:24+13:00 (11 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
documentation/trunk/tutorials/xml-source/tutorial_en.xml
r27115 r27116 1648 1648 </Title> 1649 1649 <SampleFiles folder="tudor"/> 1650 <Version initial="2.60" current="2.85 "/>1650 <Version initial="2.60" current="2.85|3.05"/> 1651 1651 <Content> 1652 1652 <Comment> … … 1672 1672 </Heading> 1673 1673 <NumberedItem> 1674 <Text id="0393">The browsing facilities in this collection (<AutoText key="coredm::_Global:labelTitle_" type="italics"/> and <AutoText key="coredm::_Global:labelSource_" type="italics"/>) are based entirely on extracted metadata. Switch to the <AutoText key="glidict::GUI.Enrich"/> panel in the Librarian Interface and examine the metadata that has been extracted for some of the files.</Text> 1675 </NumberedItem> 1676 <NumberedItem> 1677 <Text id="0393a">Many HTML documents contain metadata in <Format><meta></Format> tags in the <Format><head></Format> of the page. Open up the <Path>englishhistory.net → tudor → monarchs → boleyn.html</Path> file by navigating to it in the tree on the left hand side, and double clicking it. This will open it in a web browser. View the HTML source of the page (<Menu>View → Source</Menu> in Internet Explorer, <Menu>View → Page Source</Menu> in Mozilla). You will notice that this page has <AutoText text="page_topic, content" type="italics"/> and <AutoText text="author" type="italics"/> metadata.</Text> 1674 <Text id="0393">The browsing facilities in this collection <MajorVersion number="2">(<AutoText key="coredm::_Global:labelTitle_" type="italics"/> and <AutoText key="coredm::_Global:labelSource_" type="italics"/>)</MajorVersion><MajorVersion number="3">(<AutoText key="gs3::metadata_names::Title.buttonname" /> 1675 and <AutoText key="gs3::metadata_names::Source.buttonname" />)</MajorVersion> are based entirely on extracted metadata. Switch to the <AutoText key="glidict::GUI.Enrich"/> panel in the Librarian Interface and examine the metadata that has been extracted for some of the files.</Text> 1676 </NumberedItem> 1677 <NumberedItem> 1678 <Text id="0393a">Many HTML documents contain metadata in <Format><meta></Format> tags in the <Format><head></Format> of the page. Open up the <Path>englishhistory.net → tudor → monarchs → boleyn.html</Path> file by navigating to it in the tree on the left hand side, and double clicking it. This will open it in a web browser. View the HTML source of the page (<Menu>View → Source</Menu> in Internet Explorer, <Menu>Tools → Web Developer → Page Source</Menu> in Mozilla). You will notice that this page has <AutoText text="page_topic, content" type="italics"/> and <AutoText text="author" type="italics"/> metadata.</Text> 1678 1679 </NumberedItem> 1679 1680 <NumberedItem> … … 1709 1710 </Title> 1710 1711 <Prerequisite id="large_html_collection"/> 1711 <Version initial="2.60" current="2.85 "/>1712 <Version initial="2.60" current="2.85|3.05"/> 1712 1713 <Content> 1713 1714 <Comment> … … 1739 1740 </NumberedItem> 1740 1741 <NumberedItem> 1741 <Text id="0444">Now switch to the <AutoText key="glidict::GUI.Create"/> panel, <b>build</b> the collection, and <b>preview</b> it. Choose the new < AutoText key="coredm::_Global:labelSubject_"/> link that appears in the navigation bar, and click the bookshelves to navigate around the four-entry hierarchy that you have created.</Text>1742 <Text id="0444">Now switch to the <AutoText key="glidict::GUI.Create"/> panel, <b>build</b> the collection, and <b>preview</b> it. Choose the new <MajorVersion number="2"><AutoText key="coredm::_Global:labelSubject_"/></MajorVersion><MajorVersion number="3"><AutoText key="gs3::metadata_names::Subjects.buttonname" /></MajorVersion> link that appears in the navigation bar, and click the bookshelves to navigate around the four-entry hierarchy that you have created.</Text> 1742 1743 </NumberedItem> 1743 1744 <Heading> … … 1754 1755 </NumberedItem> 1755 1756 <NumberedItem> 1756 <Text id="0460"><b>Build</b> the collection again, <b>preview</b> it, and try out the new < AutoText key="coredm::_Global:labelPhrase_"/> option in the navigation bar. An interesting PHIND search term for this collection is <AutoText text="king" type="quoted"/>. Note that even though it is called a phrase browser, only single terms can be used as the starting point for browsing.</Text>1757 <Text id="0460"><b>Build</b> the collection again, <b>preview</b> it, and try out the new <MajorVersion number="2"><AutoText key="coredm::_Global:labelPhrase_"/></MajorVersion><MajorVersion number="3"><AutoText key="gs3::PhindPhraseBrowse::PhindApplet.name" /></MajorVersion> option in the navigation bar. An interesting PHIND search term for this collection is <AutoText text="king" type="quoted"/>. Note that even though it is called a phrase browser, only single terms can be used as the starting point for browsing.</Text> 1757 1758 </NumberedItem> 1758 1759 <Heading> … … 1803 1804 </NumberedItem> 1804 1805 <NumberedItem> 1805 <Text id="0464">Preview the newly rebuilt collection's < AutoText key="coredm::_Global:labelTitle_"/> page. Previously this listed more than a dozen pages per letter of the alphabet, but now there are just three—the first three files encountered by the building process.</Text>1806 <Text id="0464">Preview the newly rebuilt collection's <MajorVersion number="2"><AutoText key="coredm::_Global:labelTitle_"/></MajorVersion><MajorVersion number="3"><AutoText key="gs3::metadata_names::Title.buttonname" /></MajorVersion> page. Previously this listed more than a dozen pages per letter of the alphabet, but now there are just three—the first three files encountered by the building process.</Text> 1806 1807 </NumberedItem> 1807 1808 <NumberedItem> … … 1815 1816 </Title> 1816 1817 <Prerequisite id="large_html_collection"/> 1817 <Version initial="2.60" current="2.85 "/>1818 <Version initial="2.60" current="2.85|3.05"/> 1818 1819 <Content> 1819 1820 <NumberedItem> … … 1833 1834 </Indent> 1834 1835 <Text id="0472">for a particular document whose <i>Title</i> metadata is <AutoText text="A discussion of question five from Tudor Quiz: Henry VIII"/> and whose <i>Source</i> metadata is <AutoText text="quizstuff.html"/>.</Text> 1835 <Text id="0473">This format appears in the search results list, in the <AutoText key="coredm::_Global:labelTitle_"/> list, and also when you get down to individual documents in the <AutoText key="coredm::_Global:labelSubject_"/> hierarchy. This is Greenstone's default format statement<MajorVersion number="3"> used in the <AutoText text="browse"/> and <AutoText text="search"/> format features.</MajorVersion>.</Text> 1836 <MajorVersion number="2"> 1837 <Text id="0473a">This format appears in the search results list, in the <AutoText key="coredm::_Global:labelTitle_"/> list, and also when you get down to individual documents in the <AutoText key="coredm::_Global:labelSubject_"/> hierarchy. This is Greenstone's default format statement.</Text> 1838 </MajorVersion> 1839 <MajorVersion number="3"> 1840 <Text id="0473b">This format appears in the search results list, in the <AutoText key="gs3::metadata_names::Title.buttonname" /> list, and also when you get down to individual documents in the <AutoText key="gs3::metadata_names::Subjects.buttonname" /> hierarchy. This is Greenstone's default format statement used in the <AutoText text="browse"/> and <AutoText text="search"/> format features.</Text> 1841 </MajorVersion> 1836 1842 </NumberedItem> 1837 1843 <Comment> … … 1866 1872 <Text id="0475-3a">Replace the <AutoText text="search"/> format feature with the above format statement too.</Text> 1867 1873 </MajorVersion> 1868 <Text id="0476"><b>Preview</b> the result (you don't need to build the collection, because changes to format statements take effect immediately). Look at some search results and at the < AutoText key="coredm::_Global:labelTitle_"/> list. They are just the same as before! Under most circumstances this far simpler format statement is entirely equivalent to Greenstone's more complex default.</Text>1874 <Text id="0476"><b>Preview</b> the result (you don't need to build the collection, because changes to format statements take effect immediately). Look at some search results and at the <MajorVersion number="2"><AutoText key="coredm::_Global:labelTitle_"/></MajorVersion><MajorVersion number="3"><AutoText key="gs3::metadata_names::Title.buttonname" /></MajorVersion> list. They are just the same as before! Under most circumstances this far simpler format statement is entirely equivalent to Greenstone's more complex default.</Text> 1869 1875 <MajorVersion number="3"> 1870 1876 <Text id="0476-3">We can also reduce the <AutoText text="VList classifierNode"/> template of the <AutoText text="browse"/> format feature further, also without changing the display. Replace it with:</Text> … … 1939 1945 </NumberedItem> 1940 1946 <NumberedItem> 1941 <Text id="0486"><b>Preview</b> the < AutoText key="coredm::_Global:labelSubject_"/> list in the collection. <MajorVersion number="2">First, the offending "()" has disappeared from the bookshelves. Second, when</MajorVersion><MajorVersion number="3">When</MajorVersion> you get down to a list of documents in the subject hierarchy, the filename does not appear beside the title, because <AutoText key="metadata::ex.Source"/> is not specified in the format statement and this format statement applies to all nodes in the <i>subject</i> classifier. Note that the search results and titles lists have not changed: they still display the filename underneath the title.</Text>1947 <Text id="0486"><b>Preview</b> the <MajorVersion number="2"><AutoText key="coredm::_Global:labelSubject_"/></MajorVersion><MajorVersion number="3"><AutoText key="gs3::metadata_names::Subjects.buttonname" /></MajorVersion> list in the collection. <MajorVersion number="2">First, the offending "()" has disappeared from the bookshelves. Second, when</MajorVersion><MajorVersion number="3">When</MajorVersion> you get down to a list of documents in the subject hierarchy, the filename does not appear beside the title, because <AutoText key="metadata::ex.Source"/> is not specified in the format statement and this format statement applies to all nodes in the <i>subject</i> classifier. Note that the search results and titles lists have not changed: they still display the filename underneath the title.</Text> 1942 1948 </NumberedItem> 1943 1949 <NumberedItem> … … 1978 1984 </NumberedItem> 1979 1985 <NumberedItem> 1980 <Text id="0494">Finally, let's return to the < AutoText key="coredm::_Global:labelSubject_" type="italics"/> hierarchy and learn how to do different things to the bookshelves and to the documents themselves. <MajorVersion number="2">In the <AutoText key="glidict::CDM.FormatManager.Feature"/> menu, re-select the item</MajorVersion><MajorVersion number="3">Reselect the format feature for</MajorVersion></Text>1986 <Text id="0494">Finally, let's return to the <MajorVersion number="2"><AutoText key="coredm::_Global:labelSubject_"/></MajorVersion><MajorVersion number="3"><AutoText key="gs3::metadata_names::Subjects.buttonname" /></MajorVersion> hierarchy and learn how to do different things to the bookshelves and to the documents themselves. <MajorVersion number="2">In the <AutoText key="glidict::CDM.FormatManager.Feature"/> menu, re-select the item</MajorVersion><MajorVersion number="3">Reselect the format feature for</MajorVersion></Text> 1981 1987 <Indent> 1982 1988 CL2<MajorVersion number="2">:</MajorVersion> Hierarchy -metadata <AutoText key="metadata::dc.Subject" type="plain"/> … … 2019 2025 <Text id="st-1">Section tagging for HTML documents</Text> 2020 2026 </Title> 2021 <Version initial="2.70w" current="2.85 "/>2027 <Version initial="2.70w" current="2.85|3.05"/> 2022 2028 <Content> 2023 2029 <NumberedItem> … … 2025 2031 </NumberedItem> 2026 2032 <NumberedItem> 2027 <Text id="st-2">Using a text editor (e.g. WordPad) open up one of the HTML files from the demo collection: <Path>Greenstone → collect → demo → import → fb33fe →fb33fe.htm</Path>. You will see some HTML comments which contain section information for Greenstone. They look like:</Text> 2033 <Text id="st-2">Using a text editor (e.g. WordPad) open up one of the HTML files from the demo collection: 2034 <MajorVersion number="2"> 2035 <Path>Greenstone → collect → demo → import → fb33fe → fb33fe.htm</Path> 2036 </MajorVersion> 2037 <MajorVersion number="3"> 2038 <Path>Greenstone3 → web → sites → localsite → collect → lucene-jdbm-demo → import → fb33fe → fb33fe.htm</Path> 2039 </MajorVersion> 2040 . You will see some HTML comments which contain section information for Greenstone. They look like:</Text> 2028 2041 <Format> 2029 2042 <!--<br/> … … 2043 2056 --> 2044 2057 </Format> 2045 <Text id="st-3">When Greenstone encounters a <Format><Section></Format> tag in one of these comments, it will start a new subsection of the document. This will be closed when a <Format></Section></Format> tag is encountered. Metadata can also be added for each section—in this case, <AutoText text="Title"/> metadata has been added for each section. In the browser, find the <AutoText text="Farming snails 1"/> document in the demo collection (through the < AutoText key="coredm::_Global:labelTitle_" type="italics"/> browser). Look at its table of contents and compare it to the <Format><Section></Format> tags in the HTML document.</Text>2058 <Text id="st-3">When Greenstone encounters a <Format><Section></Format> tag in one of these comments, it will start a new subsection of the document. This will be closed when a <Format></Section></Format> tag is encountered. Metadata can also be added for each section—in this case, <AutoText text="Title"/> metadata has been added for each section. In the browser, find the <AutoText text="Farming snails 1"/> document in the demo collection (through the <MajorVersion number="2"><AutoText key="coredm::_Global:labelTitle_" type="italics"/></MajorVersion><MajorVersion number="3"><AutoText key="gs3::metadata_names::Title.buttonname" type="italics"/></MajorVersion> browser). Look at its table of contents and compare it to the <Format><Section></Format> tags in the HTML document.</Text> 2046 2059 </NumberedItem> 2047 2060 <NumberedItem> … … 2075 2088 <Text id="0411">Downloading files from the web</Text> 2076 2089 </Title> 2077 <Version initial="2.60" current="2.85 "/>2090 <Version initial="2.60" current="2.85|3.05"/> 2078 2091 <Content> 2079 2092 <Comment>
Note:
See TracChangeset
for help on using the changeset viewer.