root/other-projects/gti/gti-tmx-to-spreadsheet.xsl @ 25288

Revision 25288, 5.0 KB (checked in by ak19, 8 years ago)

For ease of import and export in Excel, these scripts no longer generates a spreadsheet .txt file of comma separated values but of tab separated values.

Line 
1<?xml version="1.0" encoding="UTF-8"?>
2<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:java="http://xml.apache.org/xslt/java" xmlns:tmx="http://www.lisa.org/tmx14">
3  <!-- For character entities: http://www.w3.org/MarkUp/html3/latin1.html -->
4  <!-- The necessity for using xml namespaces all over and matching on namespaced element names http://www.stylusstudio.com/xsllist/200302/post70120.html -->
5
6   <!-- HOW AND WHEN TO USE THIS FILE
7       1. Generate an xml file containing the chunks requiring work.
8       GS2/bin/script> perl -S gti.pl get-first-n-chunks-requiring-work mi coredm 1000 > ../../macros/maori/mi-core.xml
9
10       2. Generate a TMX file from that XML file as follows:
11       GS2/bin/script> java -cp /research/ak19/gs2-svn/bin/java/ApplyXSLT.jar org.nzdl.gsdl.ApplyXSLT -x ../../macros/maori/mi-core.xml -t ../script/gti-generate-tmx-xml.xsl -l mi > ../../maori/core-mi.tmx
12
13       3. Apply this XSLT to that XML file to obtain a unicode text file containing tab-separated values (a spreadsheet).
14       NOTE: Make sure the output is a *.txt file if you wish to open it in Excel without losing the unicode
15       (when opened with .csv extension, the unicode is not preserved).
16
17       GS2/bin/script> java -cp /research/ak19/gs2-svn/bin/java/ApplyXSLT.jar org.nzdl.gsdl.ApplyXSLT -x ../../maori/core-mi.tmx -t ../script/gti-tmx-to-spreadsheet.xsl > ../../maori/core-mi-tmx2spreadsheet.txt
18
19       4. Translators wishing to read this text file into Excel, need to first open Excel.
20       Then go to File > Open, set the filetypes drop-down to show All Files *.*,
21       and choose to open the unicode .txt file containing the tab-separated values.
22
23       - A wizard will appear, allowing users to open this .txt file as a proper spreadsheet.
24       In the first frame of this dialog, need to specify
25       a. on the left that the file is "delimited"
26       b. in the drop down on the right, select unicode (UTF-8 or UTF-16)
27       c. Click next
28       In the second frame of the dialog, select "tab" as the delimiter. Click Finish to open the spreadsheet data.
29       
30       5. When translators have finished working on the file, save it as an Excel spreadsheet Unicode .txt file
31       (through File > Save As > Excel 2003 spreadsheet > choose Unicode .txt in the file types box)
32       and mail this text file back to Greenstone.
33
34       6. The file returned by the translator should first be processed with the new gti-process-google-spreadsheet.pl script:
35       GS2/bin/script>perl -S gti-process-google-spreadsheet.pl ~/Desktop/core-mi-xml2spreadsheet-out.txt > <language>-submission.xml
36
37       Then continue processing as usual
38       > cat <language>-submission.xml | perl -S gti.pl submit-translations <language-code> <module-name> <username>
39
40    -->
41
42  <xsl:output method="text" encoding="UTF-16"/> <!-- When we save as txt from Excel, we choose UTF-16 too -->
43
44  <xsl:template match="tmx:tmx">
45    <xsl:apply-templates select="tmx:body"/>
46  </xsl:template>
47
48  <xsl:template match="tmx:body">
49    <xsl:text>Source key&#09;Source text&#09;Target key&#09;Target text</xsl:text><!--column headings separated by tabs-->
50    <xsl:text>&#10;</xsl:text> <!--newline--> 
51    <xsl:apply-templates select="tmx:tu"/>
52  </xsl:template>
53
54  <xsl:template match="tmx:tu">
55    <xsl:for-each select="tmx:tuv">
56      <xsl:if test="tmx:prop[@type='source']">
57    <xsl:text>source::</xsl:text>
58    <xsl:value-of select="tmx:prop"/>
59    <xsl:text>&#09;</xsl:text><!-- tab -->
60    <xsl:variable name="tempText1"><xsl:value-of select="tmx:seg"/></xsl:variable>
61    <xsl:variable name="tempText2" select='java:org.nzdl.gsdl.ApplyXSLT.replaceAll($tempText1, "&amp;#10;&#10;", "&amp;#10;")'/>
62    <xsl:variable name="tempText3" select='java:org.nzdl.gsdl.ApplyXSLT.replaceAll($tempText2, "&amp;lt;", "&#60;")'/>
63    <xsl:variable name="tempText4" select='java:org.nzdl.gsdl.ApplyXSLT.replaceAll($tempText3, "&amp;gt;", "&#62;")'/>
64    <xsl:variable name="escapedText" select='java:org.nzdl.gsdl.ApplyXSLT.replaceAll($tempText4, "&amp;amp;", "&#38;")'/>
65    <xsl:value-of select='$escapedText' disable-output-escaping="yes"/>
66    <xsl:text>&#09;</xsl:text><!-- tab -->
67    </xsl:if>
68    </xsl:for-each>
69
70    <xsl:for-each select="tmx:tuv">
71      <xsl:if test="tmx:prop[@type='target']">
72    <xsl:text>target::</xsl:text>
73    <xsl:value-of select="tmx:prop"/>
74    <xsl:text>&#09;</xsl:text><!-- tab -->
75    <xsl:variable name="tempText1"><xsl:value-of select="tmx:seg"/></xsl:variable>
76    <xsl:variable name="tempText2" select='java:org.nzdl.gsdl.ApplyXSLT.replaceAll($tempText1, "&amp;#10;&#10;", "&amp;#10;")'/>
77    <xsl:variable name="tempText3" select='java:org.nzdl.gsdl.ApplyXSLT.replaceAll($tempText2, "&amp;lt;", "&#60;")'/>
78    <xsl:variable name="tempText4" select='java:org.nzdl.gsdl.ApplyXSLT.replaceAll($tempText3, "&amp;gt;", "&#62;")'/>
79    <xsl:variable name="escapedText" select='java:org.nzdl.gsdl.ApplyXSLT.replaceAll($tempText4, "&amp;amp;", "&#38;")'/>
80    <xsl:value-of select='$escapedText' disable-output-escaping="yes"/>
81    <xsl:text>&#10;</xsl:text><!-- newline -->
82      </xsl:if>
83    </xsl:for-each>
84  </xsl:template>
85
86</xsl:stylesheet>
Note: See TracBrowser for help on using the browser.