Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

source: other-projects/nightly-tasks/diffcol/trunk/gs3-model-collect/Word-PDF-Enhanced/archives/HASHeaa2.dir/doc.xml@ 30029

Last change on this file since 30029 was 30029, checked in by ak19, 9 years ago
Adding the Enhanced Word tutorial collection that uses Windows Scripting. Pre-built on Windows 7 64 bit.
File size: 145.1 KB

Line
1	<?xml version="1.0" encoding="utf-8" standalone="no"?>
2	<!DOCTYPE Archive SYSTEM "http://greenstone.org/dtd/Archive/1.0/Archive.dtd">
3	<Archive>
4	<Section>
5	<Description>
6	<Metadata name="gsdldoctype">indexed_doc</Metadata>
7	<Metadata name="Language">en</Metadata>
8	<Metadata name="Encoding">windows_1252</Metadata>
9	<Metadata name="Creator">dg5</Metadata>
10	<Metadata name="Title">1997-00 Listing of Working Papers</Metadata>
11	<Metadata name="URL">http://C:/Users/Anupama/GS307_13July2015/web/sites/localsite/collect/Word-PDF-Enhanced/tmp/1436775751/word01.html</Metadata>
12	<Metadata name="UTF8URL">http://C:/Users/Anupama/GS307_13July2015/web/sites/localsite/collect/Word-PDF-Enhanced/tmp/1436775751/word01.html</Metadata>
13	<Metadata name="gsdlsourcefilename">import\word01.doc</Metadata>
14	<Metadata name="gsdlconvertedfilename">tmp\1436775751\word01.html</Metadata>
15	<Metadata name="OrigSource">word01.html</Metadata>
16	<Metadata name="Source">word01.doc</Metadata>
17	<Metadata name="SourceFile">word01.doc</Metadata>
18	<Metadata name="Plugin">WordPlugin</Metadata>
19	<Metadata name="FileSize">110080</Metadata>
20	<Metadata name="FilenameRoot">word01</Metadata>
21	<Metadata name="FileFormat">Word</Metadata>
22	<Metadata name="srcicon">_icondoc_</Metadata>
23	<Metadata name="srclink_file">doc.doc</Metadata>
24	<Metadata name="srclinkFile">doc.doc</Metadata>
25	<Metadata name="Identifier">HASHeaa2992e081949673150f3</Metadata>
26	<Metadata name="lastmodified">1436763858</Metadata>
27	<Metadata name="lastmodifieddate">20150713</Metadata>
28	<Metadata name="oailastmodified">1436775752</Metadata>
29	<Metadata name="oailastmodifieddate">20150713</Metadata>
30	<Metadata name="assocfilepath">HASHeaa2.dir</Metadata>
31	<Metadata name="gsdlassocfile">doc.doc:application/msword:</Metadata>
32	</Description>
33	<Content>
34
35
36
37	<div class=WordSection1>
38
39
40
41	<p class=MsoTitle><span lang=EN-US>1997-00 Listing of Working Papers </span></p>
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>2000/1</span></p>
58
59
60
61	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Using
62
63	compression to identify acronyms in text</span></p>
64
65
66
67	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Stuart <span
68
69	class=SpellE>Yeates</span>, David Bainbridge, Ian H. <span class=SpellE>Witten</span></span></p>
70
71
72
73	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Text mining is
74
75	about looking for patterns in natural language text, and may be defined as the
76
77	process of <span class=SpellE>analyzing</span> text to extract information from
78
79	it for particular purposes.<span style='mso-spacerun:yes'>Â </span>In previous
80
81	work, we claimed that compression is a key technology for text mining, and
82
83	backed this up with a study that showed how particular kinds of lexical
84
85	tokensânames, dates, locations, <i style='mso-bidi-font-style:normal'>etc.</i>âcan
86
87	be identified and located in running text, using compression models to provide
88
89	the leverage necessary to distinguish different token types (Witten <i
90
91	style='mso-bidi-font-style:normal'>et al.</i>, 1999)</span></p>
92
93
94
95
96
97
98
99	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>2000/2</span></p>
100
101
102
103	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Text <span
104
105	class=SpellE>categorization</span> using compression models</span></p>
106
107
108
109	<p class=MsoNormal style='margin-right:-.4pt'><span class=SpellE><span
110
111	lang=EN-GB>Eibe</span></span><span lang=EN-GB> Frank, Chang <span class=SpellE>Chui</span>,
112
113	Ian H. <span class=SpellE>Witten</span></span></p>
114
115
116
117	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Text <span
118
119	class=SpellE>categorization</span>, or the assignment of natural language texts
120
121	to predefined categories based on their content, is of growing importance as
122
123	the volume of information available on the internet continues to overwhelm
124
125	us.<span style='mso-spacerun:yes'>Â </span>The use of predefined categories implies
126
127	a âsupervised learningâ approach to <span class=SpellE>categorization</span>,
128
129	where already-classified articles â which effectively define the categories â
130
131	are used as âtraining dataâ to build a model that can be used for classifying
132
133	new articles that comprise the âtest dataâ.<span style='mso-spacerun:yes'>Â
134
135	</span>This contrasts with âunsupervisedâ learning, where there is no training
136
137	data and clusters of like documents are sought amongst the test articles.<span
138
139	style='mso-spacerun:yes'>Â </span>With supervised learning, meaningful labels
140
141	(such as <span class=SpellE>keyphrases</span>) are attached to the training
142
143	documents, and appropriate labels can be assigned automatically to test
144
145	documents depending on which category they fall into.</span></p>
146
147
148
149
150
151
152
153	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>2000/3</span></p>
154
155
156
157	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Reserved for
158
159	Sally Jo</span></p>
160
161
162
163
164
165
166
167	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>2000/4</span></p>
168
169
170
171	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Interactive
172
173	machine learningâletting users build classifiers</span></p>
174
175
176
177	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Malcolm Ware, <span
178
179	class=SpellE>Eibe</span> Frank, Geoffrey Holmes, Mark Hall, Ian H. <span
180
181	class=SpellE>Witten</span></span></p>
182
183
184
185	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>According to
186
187	standard procedure, building a classifier is a fully automated process that
188
189	follows data preparation by a domain expert.<span style='mso-spacerun:yes'>Â
190
191	</span>In contrast, &lt;I&gt;interactive&lt;/I&gt;machine learning engages
192
193	users in actually generating the classifier themselves.<span
194
195	style='mso-spacerun:yes'>Â </span>This offers a natural way of integrating
196
197	background knowledge into the <span class=SpellE>modeling</span> stageâso long
198
199	as interactive tools can be designed that support efficient and effective
200
201	communication.<span style='mso-spacerun:yes'>Â </span>This paper shows that
202
203	appropriate techniques can empower users to create models that compete with
204
205	classifiers built by state-of-the-art learning algorithms.<span
206
207	style='mso-spacerun:yes'>Â </span>It demonstrates that usersâeven users who are
208
209	not domain expertsâcan often construct good classifiers, without any help from
210
211	a learning algorithm, using a simple two-dimensional visual interface.<span
212
213	style='mso-spacerun:yes'>Â </span>Experiments demonstrate that, not
214
215	surprisingly, success hinges on the domain: if a few attributes can support
216
217	good predictions, users generate accurate classifiers, whereas domains with
218
219	many high-order attribute interactions <span class=SpellE>favor</span> standard
220
221	machine learning techniques.<span style='mso-spacerun:yes'>Â </span>The future
222
223	challenge is to achieve a symbiosis between human user and machine learning
224
225	algorithm.</span></p>
226
227
228
229
230
231
232
233	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>2000/5</span></p>
234
235
236
237	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>KEA: Practical
238
239	automatic <span class=SpellE>keyphrase</span> extraction</span></p>
240
241
242
243	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Ian H. <span
244
245	class=SpellE>Witten</span>, Gordon W. <span class=SpellE>Paynter</span>, <span
246
247	class=SpellE>Eibe</span> Frank, Carl <span class=SpellE>Gutwin</span>, Craig G.
248
249	<span class=SpellE>Nevill</span>-Manning</span></p>
250
251
252
253	<p class=MsoNormal style='margin-right:-.4pt'><span class=SpellE><span
254
255	lang=EN-GB>Keyphrases</span></span><span lang=EN-GB> provide semantic metadata
256
257	that <span class=SpellE>summarize</span> and <span class=SpellE>characterize</span>
258
259	documents.<span style='mso-spacerun:yes'>Â </span>This paper describes <span
260
261	class=SpellE>Kea</span>, an algorithm for automatically extracting <span
262
263	class=SpellE>keyphrases</span> from text.<span style='mso-spacerun:yes'>Â
264
265	</span><span class=SpellE>Kea</span> identifies candidate <span class=SpellE>keyphrases</span>
266
267	using lexical methods, calculates feature values for each candidate, and uses a
268
269	machine learning algorithm to predict which candidates are good <span
270
271	class=SpellE>keyphrases</span>.<span style='mso-spacerun:yes'>Â </span>The
272
273	machine learning scheme first builds a prediction model using training
274
275	documents with known <span class=SpellE>keyphrases</span>, and then uses the
276
277	model to find <span class=SpellE>keyphrases</span> in new documents.<span
278
279	style='mso-spacerun:yes'>Â </span>We use a large test corpus to evaluate <span
280
281	class=SpellE>Kea's</span> effectiveness in terms of how many author-assigned <span
282
283	class=SpellE>keyphrases</span> are correctly identified.<span
284
285	style='mso-spacerun:yes'>Â </span>The system is simple, robust, and publicly
286
287	available.</span></p>
288
289
290
291
292
293
294
295	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>2000/6</span></p>
296
297
298
299	<p class=MsoNormal style='margin-right:-.4pt'><i style='mso-bidi-font-style:
300
301	normal'><span lang=EN-GB style='font-family:Symbol;mso-ascii-font-family:"Times New Roman";
302
303	mso-hansi-font-family:"Times New Roman";mso-char-type:symbol;mso-symbol-font-family:
304
305	Symbol'><span style='mso-char-type:symbol;mso-symbol-font-family:Symbol'>m</span></span></i><span
306
307	lang=EN-GB>-Charts and Z:<span style='mso-spacerun:yes'>Â </span><span
308
309	class=SpellE>hows</span>, <span class=SpellE>whys</span> and <span
310
311	class=SpellE>wherefores</span></span></p>
312
313
314
315	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Greg Reeve,
316
317	Steve Reeves</span></p>
318
319
320
321	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>In this paper we
322
323	show, by a series of examples, how the </span><i style='mso-bidi-font-style:
324
325	normal'><span lang=EN-GB style='font-family:Symbol;mso-ascii-font-family:"Times New Roman";
326
327	mso-hansi-font-family:"Times New Roman";mso-char-type:symbol;mso-symbol-font-family:
328
329	Symbol'><span style='mso-char-type:symbol;mso-symbol-font-family:Symbol'>m</span></span></i><span
330
331	lang=EN-GB>-chart formalism can be translated into Z.<span
332
333	style='mso-spacerun:yes'>Â </span>We give reasons for why this is an
334
335	interesting and sensible thing to do and what it might be used for.</span></p>
336
337
338
339
340
341
342
343
344
345
346
347	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>2000/7</span></p>
348
349
350
351	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>One dimensional
352
353	non-uniform rational B-splines for animation control</span></p>
354
355
356
357	<p class=MsoNormal style='margin-right:-.4pt'><span class=SpellE><span
358
359	lang=EN-GB>Abdelaziz</span></span><span lang=EN-GB> <span class=SpellE>Mahoui</span></span></p>
360
361
362
363	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Most 3D
364
365	animation packages use graphical representations called motion graphs to
366
367	represent the variation in time of the motion parameters.<span
368
369	style='mso-spacerun:yes'>Â </span>Many use two-dimensional B-splines as
370
371	animation curves because of their power to represent free-form curves.<span
372
373	style='mso-spacerun:yes'>Â </span>In this project, we investigate the
374
375	possibility of using One-dimensional Non-Uniform Rational B-<span class=SpellE>Spline</span>
376
377	(NURBS) curves for the interactive construction of animation control
378
379	curves.<span style='mso-spacerun:yes'>Â </span>One-dimensional NURBS curves
380
381	present the potential of solving some problems encountered in motion graphs
382
383	when two-dimensional B-splines are used.<span style='mso-spacerun:yes'>Â
384
385	</span>The study focuses on the properties of One-dimensional NURBS
386
387	mathematical model.<span style='mso-spacerun:yes'>Â </span>It also investigates
388
389	the algorithms and shape modification tools devised for two-dimensional curves
390
391	and their port to the One-dimensional NURBS model.<span
392
393	style='mso-spacerun:yes'>Â </span>It also looks at the issues related to the
394
395	user interface used to interactively modify the shape of the curves.</span></p>
396
397
398
399
400
401
402
403	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>2000/8</span></p>
404
405
406
407	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Correlation-based
408
409	feature selection of discrete and numeric class machine learning</span></p>
410
411
412
413	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Mark A. Hall</span></p>
414
415
416
417	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Algorithms for
418
419	feature selection fall into two broad categories:
420
421	&lt;I&gt;wrappers&lt;/I&gt;that use the learning algorithm itself to evaluate
422
423	the usefulness of features and &lt;I&gt;filters&lt;/I&gt;that evaluate features
424
425	according to heuristics based on general characteristics of the data.<span
426
427	style='mso-spacerun:yes'>Â </span>For application to large databases, filters
428
429	have proven to be more practical than wrappers because they are much
430
431	faster.<span style='mso-spacerun:yes'>Â </span>However, most existing filter
432
433	algorithms only work with discrete classification problems.<span
434
435	style='mso-spacerun:yes'>Â </span>This paper describes a fast,
436
437	correlation-based filter algorithm that can be applied to continuous and
438
439	discrete problems.<span style='mso-spacerun:yes'>Â </span>The algorithm often
440
441	out-performs the well-known <span class=SpellE>ReliefF</span> attribute
442
443	estimator when used as a <span class=SpellE>preprocessing</span> step for naÃ¯ve
444
445	<span class=SpellE>Bayes</span>, instance-based learning, decision trees,
446
447	locally weighted regression, and model trees.<span style='mso-spacerun:yes'>Â
448
449	</span>It performs more feature selection than <span class=SpellE>ReliefF</span>
450
451	does-reducing the data dimensionality by fifty percent in most cases.<span
452
453	style='mso-spacerun:yes'>Â </span>Also, decision and model trees built from the
454
455	<span class=SpellE>prepocessed</span> data are often significantly smaller.</span></p>
456
457
458
459
460
461
462
463	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>2000/9</span></p>
464
465
466
467	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>A development
468
469	environment for predictive modelling in foods</span></p>
470
471
472
473	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>G. Holmes, <span
474
475	class=SpellE>M.A.</span> Hall</span></p>
476
477
478
479	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>WEKA (Waikato
480
481	Environment for Knowledge Analysis) is a comprehensive suite of Java class
482
483	libraries that implement many state-of-the-art machine learning/data mining
484
485	algorithms.<span style='mso-spacerun:yes'>Â </span>Non-programmers interact
486
487	with the software via a user interface component called the Knowledge Explorer.</span></p>
488
489
490
491
492
493
494
495	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Applications
496
497	constructed from the WEKA class libraries can be run on any computer with a web
498
499	browsing capability, allowing users to apply machine learning techniques to
500
501	their own data regardless of computer platform.<span style='mso-spacerun:yes'>Â
502
503	</span>This paper describes the user interface component of the WEKA system in
504
505	reference to previous applications in the predictive <span class=SpellE>modeling</span>
506
507	of foods.</span></p>
508
509
510
511
512
513
514
515	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>2000/10</span></p>
516
517
518
519	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Benchmarking
520
521	attribute selection techniques for data mining</span></p>
522
523
524
525	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Mark A. Hall,
526
527	Geoffrey Holmes</span></p>
528
529
530
531	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Data engineering
532
533	is generally considered to be a central issue in the development of data mining
534
535	applications.<span style='mso-spacerun:yes'>Â </span>The success of many
536
537	learning schemes, in their attempts to construct models of data, hinges on the
538
539	reliable identification of a small set of highly predictive attributes.<span
540
541	style='mso-spacerun:yes'>Â </span>The inclusion of irrelevant, redundant and
542
543	noisy attributes in the model building process phase can result in poor
544
545	predictive performance and increased computation.</span></p>
546
547
548
549
550
551
552
553	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Attribute
554
555	selection generally involves a combination of search and attribute utility
556
557	estimation plus evaluation with respect to specific learning schemes.<span
558
559	style='mso-spacerun:yes'>Â </span>This leads to a large number of possible
560
561	permutations and has led to a situation where very few benchmark studies have
562
563	been conducted.</span></p>
564
565
566
567
568
569
570
571	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>This paper
572
573	presents a benchmark comparison of several attribute selection methods.<span
574
575	style='mso-spacerun:yes'>Â </span>All the methods produce an attribute ranking,
576
577	a useful devise of isolating the individual merit of an attribute.<span
578
579	style='mso-spacerun:yes'>Â </span>Attribute selection is achieved by
580
581	cross-validating the rankings with respect to a learning scheme to find the
582
583	best attributes.<span style='mso-spacerun:yes'>Â </span>Results are reported
584
585	for a selection of standard data sets and two learning schemes C4.5 and naÃ¯ve <span
586
587	class=SpellE>Bayes</span>.</span></p>
588
589
590
591
592
593
594
595	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>2000/11</span></p>
596
597
598
599	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Steve Reeves,
600
601	Greg Reeve</span></p>
602
603
604
605
606
607
608
609
610
611
612
613	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>2000/12</span></p>
614
615
616
617	<p class=MsoNormal style='margin-right:-.4pt'><span class=SpellE><span
618
619	lang=EN-GB>Malika</span></span><span lang=EN-GB> <span class=SpellE>Mahoui</span>,
620
621	Sally Jo Cunningham</span></p>
622
623
624
625	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Transaction logs
626
627	are invaluable sources of fine-grained information about users' search <span
628
629	class=SpellE>behavior</span>.<span style='mso-spacerun:yes'>Â </span>This paper
630
631	compares the searching <span class=SpellE>behavior</span> of users across two
632
633	WWW-accessible digital libraries: the New Zealand Digital Library's Computer
634
635	Science Technical Reports collection (CSTR), and the <span class=SpellE>Karlsruhe</span>
636
637	Computer Science Bibliographies (CSBIB) collection.<span
638
639	style='mso-spacerun:yes'>Â </span>Since the two collections are designed to
640
641	support the same type of users-researchers/students in computer science a
642
643	comparative log analysis is likely to uncover common searching preferences for
644
645	that user group.<span style='mso-spacerun:yes'>Â </span>The two collections
646
647	differ in their content, however; the CSTR indexes a full text collection,
648
649	while the CSBIB is primarily a bibliographic database.<span
650
651	style='mso-spacerun:yes'>Â </span>Differences in searching <span class=SpellE>behavior</span>
652
653	between the two systems may indicate the effect of differing search facilities
654
655	and content type.</span></p>
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>99/1</span></p>
684
685
686
687	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Lexical
688
689	attraction for text compression</span></p>
690
691
692
693	<p class=MsoNormal style='margin-right:-.4pt'><span class=SpellE><span
694
695	lang=EN-GB>Joscha</span></span><span lang=EN-GB> Bach, Ian H. <span
696
697	class=SpellE>Witten</span></span></p>
698
699
700
701	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>New methods of
702
703	acquiring structural information in text documents may support better
704
705	compression by identifying an appropriate prediction context for each
706
707	symbol.<span style='mso-spacerun:yes'>Â </span>The method of âlexical
708
709	attractionâ infers syntactic dependency structures from statistical analysis of
710
711	large corpora.<span style='mso-spacerun:yes'>Â </span>We describe the
712
713	generation of a lexical attraction model, discuss its application to text
714
715	compression, and explore its potential to outperform fixed-context models such
716
717	as word-level PPM.<span style='mso-spacerun:yes'>Â </span>Perhaps the most
718
719	exciting aspect of this work is the prospect of using compression as a metric
720
721	for structure discovery in text.</span></p>
722
723
724
725
726
727
728
729
730
731
732
733	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>99/2</span></p>
734
735
736
737	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Generating rule
738
739	sets from model trees</span></p>
740
741
742
743	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Geoffrey Holmes,
744
745	Mark Hall, <span class=SpellE>Eibe</span> Frank</span></p>
746
747
748
749	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Knowledge discovered
750
751	in a database must be represented in a form that is easy to understand.<span
752
753	style='mso-spacerun:yes'>Â </span>Small, easy to interpret nuggets of knowledge
754
755	from data are one requirement and the ability to induce them from a variety of
756
757	data sources is a second.<span style='mso-spacerun:yes'>Â </span>The literature
758
759	is abound with classification algorithms, and in recent years with algorithms
760
761	for time sequence analysis, but relatively little has been published on
762
763	extracting meaningful information from problems involving continuous classes
764
765	(regression).</span></p>
766
767
768
769
770
771
772
773	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Model
774
775	trees-decision trees with linear models at the leaf nodes-have recently emerged
776
777	as an accurate method for numeric prediction that produces understandable
778
779	models.<span style='mso-spacerun:yes'>Â </span>However, it is well known that
780
781	decision lists-ordered sets of If-Then rules-have the potential to be more compact
782
783	and therefore more understandable than their tree counterparts.</span></p>
784
785
786
787
788
789
790
791	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>In this paper we
792
793	present an algorithm for inducing simple, yet accurate rule sets from model
794
795	trees.<span style='mso-spacerun:yes'>Â </span>The algorithm works by repeatedly
796
797	building model trees and selecting the best rule at each iteration.<span
798
799	style='mso-spacerun:yes'>Â </span>It produces rule sets that are, on the whole,
800
801	as accurate but smaller than the model tree constructed from the entire <span
802
803	class=SpellE>dataset</span>.<span style='mso-spacerun:yes'>Â
804
805	</span>Experimental results for various heuristics which attempt to find a
806
807	compromise between rule accuracy and rule coverage are reported.<span
808
809	style='mso-spacerun:yes'>Â </span>We also show empirically that our method
810
811	produces more accurate and smaller rule sets than the commercial
812
813	state-of-the-art rule learning system Cubist.</span></p>
814
815
816
817
818
819
820
821
822
823
824
825	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>99/3</span></p>
826
827
828
829	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>A diagnostic
830
831	tool for tree based supervised classification learning algorithms</span></p>
832
833
834
835	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Leonard <span
836
837	class=SpellE>Trigg</span>, Geoffrey Holmes</span></p>
838
839
840
841	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>The process of
842
843	developing applications of machine learning and data mining that employ
844
845	supervised classification algorithms includes the important step of knowledge
846
847	verification.<span style='mso-spacerun:yes'>Â </span>Interpretable output is
848
849	presented to a user so that they can verify that the knowledge contained in the
850
851	output makes sense for the given application.<span style='mso-spacerun:yes'>Â
852
853	</span>As the development of an application is an iterative process it is quite
854
855	likely that a user would wish to compare models constructed at various times or
856
857	stages.</span></p>
858
859
860
861
862
863
864
865	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>One crucial
866
867	stage where comparison of models is important is when the accuracy of a model
868
869	is being estimated, typically using some form of cross-validation.<span
870
871	style='mso-spacerun:yes'>Â </span>This stage is used to establish an estimate
872
873	of how well a model will perform on unseen data.<span
874
875	style='mso-spacerun:yes'>Â </span>This is vital information to present to a
876
877	user, but it is also important to show the degree of variation between models
878
879	obtained from the entire <span class=SpellE>dataset</span> and models obtained
880
881	during cross-validation.<span style='mso-spacerun:yes'>Â </span>In this way it
882
883	can be verified that the cross-validation models are at least structurally
884
885	aligned with the model garnered from the entire <span class=SpellE>dataset</span>.</span></p>
886
887
888
889
890
891
892
893	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>This paper
894
895	presents a diagnostic tool for the comparison of tree-based supervised
896
897	classification models.<span style='mso-spacerun:yes'>Â </span>The method is
898
899	adapted from work on approximate tree matching and applied to decision
900
901	trees.<span style='mso-spacerun:yes'>Â </span>The tool is described together
902
903	with experimental results on standard <span class=SpellE>datasets</span>.</span></p>
904
905
906
907
908
909
910
911
912
913
914
915	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>99/4</span></p>
916
917
918
919	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Feature
920
921	selection for discrete and numeric class machine learning</span></p>
922
923
924
925	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Mark A. Hall</span></p>
926
927
928
929	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Algorithms for
930
931	feature selection fall into two broad categories:
932
933	&lt;I&gt;wrappers&lt;/I&gt;use the learning algorithm itself to evaluate the
934
935	usefulness of features, while &lt;I&gt;filters&lt;/I&gt;evaluate features
936
937	according to heuristics based on general characteristics of the data.<span
938
939	style='mso-spacerun:yes'>Â </span>For application to large databases, filters
940
941	have proven to be more practical than wrappers because they are much
942
943	faster.<span style='mso-spacerun:yes'>Â </span>However, most existing filter
944
945	algorithms only work with discrete classification problems.</span></p>
946
947
948
949
950
951
952
953	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>This paper
954
955	describes a fast, correlation-based filter algorithm that can be applied to
956
957	continuous and discrete problems.<span style='mso-spacerun:yes'>Â
958
959	</span>Experiments using the new method as a <span class=SpellE>preprocessing</span>
960
961	step for naÃ¯ve <span class=SpellE>Bayes</span>, instance-based learning,
962
963	decision trees, locally weighted regression, and model trees show it to be an
964
965	effective feature selector- it reduces the data in dimensionality by more than
966
967	sixty percent in most cases without negatively affecting accuracy.<span
968
969	style='mso-spacerun:yes'>Â </span>Also, decision and model trees built from the
970
971	pre-processed data are often significantly smaller.</span></p>
972
973
974
975
976
977
978
979
980
981
982
983	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>99/5</span></p>
984
985
986
987	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Browsing tree
988
989	structures</span></p>
990
991
992
993	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Mark <span
994
995	class=SpellE>Apperley</span>, Robert <span class=SpellE>Spence</span>, Stephen <span
996
997	class=SpellE>Hodge</span>, Michael Chester</span></p>
998
999
1000
1001	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Graphic
1002
1003	representations of tree structures are notoriously difficult to create,
1004
1005	display, and interpret, particularly when the volume of information they
1006
1007	contain, and hence the number of nodes, is large.<span
1008
1009	style='mso-spacerun:yes'>Â </span>The problem of interactively browsing
1010
1011	information held in tree structures is examined, and the implementation of an
1012
1013	innovative tree browser described.<span style='mso-spacerun:yes'>Â </span>This
1014
1015	browser is based on distortion-oriented display techniques and intuitive direct
1016
1017	manipulation interaction.<span style='mso-spacerun:yes'>Â </span>The tree
1018
1019	layout is automatically generated, but the location and extent of detail shown
1020
1021	is controlled by the user.<span style='mso-spacerun:yes'>Â </span>It is
1022
1023	suggested that these techniques could be extended to the browsing of more
1024
1025	general networks.</span></p>
1026
1027
1028
1029
1030
1031
1032
1033	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>99/6</span></p>
1034
1035
1036
1037	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Facilitating
1038
1039	multiple copy/past operations</span></p>
1040
1041
1042
1043	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Mark <span
1044
1045	class=SpellE>Apperley</span>, Jay Baker, Dale Fletcher, Bill Rogers</span></p>
1046
1047
1048
1049	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Copy and paste,
1050
1051	or cut and paste, using a clipboard or paste buffer has long been the principle
1052
1053	facility provided to users for transferring data between and within GUI
1054
1055	applications.<span style='mso-spacerun:yes'>Â </span>We argue that this
1056
1057	mechanism can be clumsy in circumstances where several pieces of information
1058
1059	must be moved systematically.<span style='mso-spacerun:yes'>Â </span>In two
1060
1061	situations - extraction of data fields from unstructured data found in a
1062
1063	directed search process, and reorganisation of computer program source text -
1064
1065	we present alternative, more natural, user interface facilities to make the
1066
1067	task less onerous, and to provide improved visual feedback during the
1068
1069	operation.</span></p>
1070
1071
1072
1073
1074
1075
1076
1077	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>For the data
1078
1079	extraction task we introduce the Stretchable Selection Tool, a <span
1080
1081	class=SpellE>semi</span>-transparent overlay augmenting the mouse pointer to
1082
1083	automate paste operations and provide information to prompt the user.<span
1084
1085	style='mso-spacerun:yes'>Â </span>We describe a prototype implementation that
1086
1087	functions in a collaborative software environment, allowing users to <span
1088
1089	class=SpellE>cooperate</span> on a multiple copy/paste operation.<span
1090
1091	style='mso-spacerun:yes'>Â </span>For text reorganisation, we present an
1092
1093	extension to <span class=SpellE>Emacs</span>, providing similar functionality,
1094
1095	but without the collaborative features.</span></p>
1096
1097
1098
1099
1100
1101
1102
1103	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>99/7</span></p>
1104
1105
1106
1107	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Automating
1108
1109	iterative tasks with programming by demonstration: a user evaluation</span></p>
1110
1111
1112
1113	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Gordon W. <span
1114
1115	class=SpellE>Paynter</span>, Ian H. <span class=SpellE>Witten</span></span></p>
1116
1117
1118
1119	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Computer users
1120
1121	often face iterative tasks that cannot be automated using the tools and
1122
1123	aggregation techniques provided by their application program: they end up
1124
1125	performing the iteration by hand, repeating user interface actions over and
1126
1127	over again.<span style='mso-spacerun:yes'>Â </span>We have implemented an
1128
1129	agent, called Familiar, that can be taught to perform iterative tasks using
1130
1131	programming by demonstration (PBD).<span style='mso-spacerun:yes'>Â
1132
1133	</span>Unlike other PBD systems, it is domain independent and works with
1134
1135	unmodified, widely-used, applications in a popular operating system.<span
1136
1137	style='mso-spacerun:yes'>Â </span>In a formal evaluation, we found that users
1138
1139	quickly learned to use the agent to automate iterative tasks.<span
1140
1141	style='mso-spacerun:yes'>Â </span>Generally, the participants preferred to use
1142
1143	multiple selection where possible, but could and did use PBD in situations
1144
1145	involving iteration over many commands, or when other techniques were
1146
1147	unavailable.</span></p>
1148
1149
1150
1151
1152
1153
1154
1155	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>99/8</span></p>
1156
1157
1158
1159	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>A survey of
1160
1161	software requirements specification practices in the New Zealand software
1162
1163	industry</span></p>
1164
1165
1166
1167	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Lindsay Groves,
1168
1169	Ray <span class=SpellE>Nickson</span>, Greg Reeve, Steve Reeves, Mark Utting</span></p>
1170
1171
1172
1173	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>We report on the
1174
1175	software development techniques used in the New Zealand software industry,
1176
1177	paying particular attention to requirements gathering.<span
1178
1179	style='mso-spacerun:yes'>Â </span>We surveyed a selection of software companies
1180
1181	with a general questionnaire and then conducted in-depth interviews with four
1182
1183	companies.<span style='mso-spacerun:yes'>Â </span>Our results show a wide
1184
1185	variety in the kinds of companies undertaking software development, employing a
1186
1187	wide range of software development techniques.<span style='mso-spacerun:yes'>Â
1188
1189	</span>Although our data are not sufficiently detailed to draw statistically
1190
1191	significant conclusions, it appears that larger software development groups
1192
1193	typically have more well-defined software development processes, spend
1194
1195	proportionally more time on requirements gathering, and follow more rigorous
1196
1197	testing regimes.</span></p>
1198
1199
1200
1201
1202
1203
1204
1205	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>99/9</span></p>
1206
1207
1208
1209	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>The LRU*WWW proxy
1210
1211	cache document replacement algorithm</span></p>
1212
1213
1214
1215	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Chung-<span
1216
1217	class=SpellE>yi</span> Chang, Tony <span class=SpellE>McGregor</span>, Geoffrey
1218
1219	Holmes</span></p>
1220
1221
1222
1223	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Obtaining good
1224
1225	performance from WWW proxy caches is critically dependent on the document
1226
1227	replacement policy used by the proxy.<span style='mso-spacerun:yes'>Â
1228
1229	</span>This paper validates the work of other authors by reproducing their
1230
1231	studies of proxy cache document replacement algorithms.<span
1232
1233	style='mso-spacerun:yes'>Â </span>From this basis a cross-trace study is
1234
1235	mounted.<span style='mso-spacerun:yes'>Â </span>This demonstrates that the
1236
1237	performance of most document replacement algorithms is dependent on the type of
1238
1239	workload that they are presented with.<span style='mso-spacerun:yes'>Â
1240
1241	</span>Finally we propose a new algorithm, LRU*, that consistently performs
1242
1243	well across all our traces.</span></p>
1244
1245
1246
1247
1248
1249
1250
1251	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>99/10</span></p>
1252
1253
1254
1255	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Reduced-error
1256
1257	pruning with significance tests</span></p>
1258
1259
1260
1261	<p class=MsoNormal style='margin-right:-.4pt'><span class=SpellE><span
1262
1263	lang=EN-GB>Eibe</span></span><span lang=EN-GB> Frank, Ian H. <span
1264
1265	class=SpellE>Witten</span></span></p>
1266
1267
1268
1269	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>When building
1270
1271	classification models, it is common practice to prune them to counter spurious
1272
1273	effects of the training data: this often improves performance and reduces model
1274
1275	size.<span style='mso-spacerun:yes'>Â </span>&quot;Reduced-error pruning&quot;
1276
1277	is a fast pruning procedure for decision trees that is known to produce small
1278
1279	and accurate trees.<span style='mso-spacerun:yes'>Â </span>Apart from the data
1280
1281	from which the tree is grown, it uses an independent &quot;pruning&quot; set,
1282
1283	and pruning decisions are based on the model's error rate on this fresh
1284
1285	data.<span style='mso-spacerun:yes'>Â </span>Recently it has been observed that
1286
1287	reduced-error pruning <span class=SpellE>overfits</span> the pruning data,
1288
1289	producing unnecessarily large decision trees.<span style='mso-spacerun:yes'>Â
1290
1291	</span>This paper investigates whether standard statistical significance tests
1292
1293	can be used to counter this phenomenon.</span></p>
1294
1295
1296
1297
1298
1299
1300
1301	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>The problem of <span
1302
1303	class=SpellE>overfitting</span> to the pruning set highlights the need for
1304
1305	significance testing.<span style='mso-spacerun:yes'>Â </span>We investigate two
1306
1307	classes of test, &quot;parametric&quot; and &quot;non-parametric.&quot;<span
1308
1309	style='mso-spacerun:yes'>Â </span>The standard chi-squared statistic can be
1310
1311	used both in a parametric test and as the basis for a non-parametric
1312
1313	permutation test.<span style='mso-spacerun:yes'>Â </span>In both cases it is
1314
1315	necessary to select the significance level at which pruning is applied.<span
1316
1317	style='mso-spacerun:yes'>Â </span>We show empirically that both versions of the
1318
1319	chi-squared test perform equally well if their significance levels are adjusted
1320
1321	appropriately.<span style='mso-spacerun:yes'>Â </span>Using a collection of
1322
1323	standard <span class=SpellE>datasets</span>, we show that significance testing
1324
1325	improves on standard reduced error pruning if the significance level is
1326
1327	tailored to the particular <span class=SpellE>dataset</span> at hand using
1328
1329	cross-validation, yielding consistently smaller trees that perform at least as
1330
1331	well and sometimes better.</span></p>
1332
1333
1334
1335
1336
1337
1338
1339	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>99/11</span></p>
1340
1341
1342
1343	<p class=MsoNormal style='margin-right:-.4pt'><span class=SpellE><span
1344
1345	lang=EN-GB>Weka</span></span><span lang=EN-GB>: Practical machine learning
1346
1347	tools and techniques with Java implementations</span></p>
1348
1349
1350
1351	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Ian H. <span
1352
1353	class=SpellE>Witten</span>, <span class=SpellE>Eibe</span> Frank, Len <span
1354
1355	class=SpellE>Trigg</span>, Mark Hall, Geoffrey Holmes, Sally Jo Cunningham</span></p>
1356
1357
1358
1359	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>The Waikato
1360
1361	Environment for Knowledge Analysis (Weka) is a comprehensive suite of Java
1362
1363	class libraries that implement many state-of-the-art machine learning and data
1364
1365	mining algorithms.<span style='mso-spacerun:yes'>Â </span><span class=SpellE>Weka</span>
1366
1367	is freely available on the <span class=SpellE>World-Wide</span> Web and
1368
1369	accompanies a new text on data mining [1] which documents and fully explains
1370
1371	all the algorithms it contains.<span style='mso-spacerun:yes'>Â
1372
1373	</span>Applications written using the <span class=SpellE>Weka</span> class
1374
1375	libraries can be run on any computer with a Web browsing capability; this
1376
1377	allows users to apply machine learning techniques to their own data regardless
1378
1379	of computer platform.</span></p>
1380
1381
1382
1383
1384
1385
1386
1387	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>99/12</span></p>
1388
1389
1390
1391	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Pace Regression</span></p>
1392
1393
1394
1395	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Yong Wang, Ian
1396
1397	H. <span class=SpellE>Witten</span></span></p>
1398
1399
1400
1401	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>This paper
1402
1403	articulates a new method of linear regression, âpace regressionâ, that
1404
1405	addresses many drawbacks of standard regression reported in the
1406
1407	literatureâparticularly the subset selection problem.<span
1408
1409	style='mso-spacerun:yes'>Â </span>Pace regression improves on classical ordinary
1410
1411	least squares (OLS) regression by evaluating the effect of each variable and
1412
1413	using a clustering analysis to improve the statistical basis for estimating
1414
1415	their contribution to the overall regression.<span style='mso-spacerun:yes'>Â
1416
1417	</span>As well as outperforming OLS, it also outperformsâin a remarkably
1418
1419	general senseâother linear <span class=SpellE>modeling</span> techniques in the
1420
1421	literature, including subset selection procedures, which seek a reduction in
1422
1423	dimensionality that falls out as a natural <span class=SpellE>byproduct</span>
1424
1425	of pace regression.<span style='mso-spacerun:yes'>Â </span>The paper defines
1426
1427	six procedures that share the fundamental idea of pace regression, all of which
1428
1429	are theoretically justified in terms of asymptotic performance.<span
1430
1431	style='mso-spacerun:yes'>Â </span>Experiments confirm the performance
1432
1433	improvement over other techniques.</span></p>
1434
1435
1436
1437
1438
1439
1440
1441	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>99/13</span></p>
1442
1443
1444
1445	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>A
1446
1447	compression-based algorithm for Chinese word segmentation</span></p>
1448
1449
1450
1451	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>W.J. <span
1452
1453	class=SpellE>Teahan</span>, <span class=SpellE>Yingying</span> Wen, <span
1454
1455	class=SpellE>Rodger</span> <span class=SpellE>McNab</span>, Ian H. <span
1456
1457	class=SpellE>Witten</span></span></p>
1458
1459
1460
1461	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>The Chinese
1462
1463	language is written without using spaces or other word delimiters.<span
1464
1465	style='mso-spacerun:yes'>Â </span>Although a text may be thought of as a
1466
1467	corresponding sequence of words, there is considerable ambiguity in the
1468
1469	placement of boundaries.<span style='mso-spacerun:yes'>Â </span>Interpreting a
1470
1471	text as a sequence of words is beneficial for some information retrieval and
1472
1473	storage tasks: for example, full-text search, word-based compression, and <span
1474
1475	class=SpellE>keyphrase</span> extraction.</span></p>
1476
1477
1478
1479
1480
1481
1482
1483	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>We describe a
1484
1485	scheme that infers appropriate positions for word boundaries using an adaptive
1486
1487	language model that is standard in text compression.<span
1488
1489	style='mso-spacerun:yes'>Â </span>It is trained on a corpus of pre-segmented
1490
1491	text, and when applied to new text, interpolates word boundaries so as to <span
1492
1493	class=SpellE>maximize</span> the compression obtained.<span
1494
1495	style='mso-spacerun:yes'>Â </span>This simple and general method performs well
1496
1497	with respect to <span class=SpellE>specialized</span> schemes for Chinese
1498
1499	language segmentation.</span></p>
1500
1501
1502
1503
1504
1505
1506
1507	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>99/14</span></p>
1508
1509
1510
1511	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Clustering with
1512
1513	finite data from <span class=SpellE>semi</span>-parametric mixture
1514
1515	distributions</span></p>
1516
1517
1518
1519	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Yong Wang, Ian
1520
1521	H. <span class=SpellE>Witten</span></span></p>
1522
1523
1524
1525	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Existing
1526
1527	clustering methods for the <span class=SpellE>semi</span>-parametric mixture
1528
1529	distribution perform well as the volume of data increases.<span
1530
1531	style='mso-spacerun:yes'>Â </span>However, they all suffer from a serious
1532
1533	drawback in finite-data situations: small outlying groups of data points can be
1534
1535	completely ignored in the clusters that are produced, no matter how far away
1536
1537	they lie from the major clusters.<span style='mso-spacerun:yes'>Â </span>This
1538
1539	can result in unbounded loss if the loss function is sensitive to the distance
1540
1541	between clusters.</span></p>
1542
1543
1544
1545	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>This paper
1546
1547	proposes a new distance-based clustering method that overcomes the problem by
1548
1549	avoiding global constraints.<span style='mso-spacerun:yes'>Â
1550
1551	</span>Experimental results illustrate its superiority to existing methods when
1552
1553	small clusters are present in finite data sets; they also suggest that it is
1554
1555	more accurate and stable than other methods even when there are no small
1556
1557	clusters.</span></p>
1558
1559
1560
1561
1562
1563
1564
1565	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>99/15</span></p>
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>99/16</span></p>
1578
1579
1580
1581	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>The <span
1582
1583	class=SpellE>Niupepa</span> Collection:<span style='mso-spacerun:yes'>Â </span>Opening
1584
1585	the blinds on a window to the past</span></p>
1586
1587
1588
1589	<p class=MsoNormal style='margin-right:-.4pt'><span class=SpellE><span
1590
1591	lang=EN-GB>Te</span></span><span lang=EN-GB> <span class=SpellE>Taka</span> <span
1592
1593	class=SpellE>Keegan</span>, Sally Jo Cunningham, Mark <span class=SpellE>Apperley</span></span></p>
1594
1595
1596
1597	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>This paper
1598
1599	describes the building of a digital library collection of historic
1600
1601	newspapers.<span style='mso-spacerun:yes'>Â </span>The newspapers (<span
1602
1603	class=SpellE><i style='mso-bidi-font-style:normal'>Niupepa</i></span> in <span
1604
1605	class=SpellE>Maori</span>), which were published in New Zealand during the
1606
1607	period 1842 to 1933, form a unique historical record of the <span class=SpellE>Maori</span>
1608
1609	language, and of events from an historical perspective.<span
1610
1611	style='mso-spacerun:yes'>Â </span>Images of these newspapers have been
1612
1613	converted to digital form, electronic text extracted from these, and the
1614
1615	collection is now being made available over the Internet as a part of the New
1616
1617	Zealand Digital Library (NZDL) project at the University of Waikato.</span></p>
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>98/1</span></p>
1634
1635
1636
1637	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Boosting trees
1638
1639	for cost-sensitive classifications</span></p>
1640
1641
1642
1643	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Kai <span
1644
1645	class=SpellE>Ming</span> Ting, <span class=SpellE>Zijian</span> <span
1646
1647	class=SpellE>Zheng</span></span></p>
1648
1649
1650
1651	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>This paper
1652
1653	explores two boosting techniques for cost-sensitive tree classification in the
1654
1655	situation where misclassification costs change very often.<span
1656
1657	style='mso-spacerun:yes'>Â </span>Ideally, one would like to have only one
1658
1659	induction, and use the induced model for different misclassification
1660
1661	costs.<span style='mso-spacerun:yes'>Â </span>Thus, it demands robustness of
1662
1663	the induced model against cost changes.<span style='mso-spacerun:yes'>Â
1664
1665	</span>Combining multiple trees gives robust predictions against this
1666
1667	change.<span style='mso-spacerun:yes'>Â </span>We demonstrate that ordinary
1668
1669	boosting combined with the minimum expected cost criterion to select the
1670
1671	prediction class is a good solution under this situation.<span
1672
1673	style='mso-spacerun:yes'>Â </span>We also introduce a variant of the ordinary
1674
1675	boosting procedure which <span class=SpellE>utilizes</span> the cost
1676
1677	information during training.<span style='mso-spacerun:yes'>Â </span>We show
1678
1679	that the proposed technique performs better than the ordinary boosting in terms
1680
1681	of misclassification cost.<span style='mso-spacerun:yes'>Â </span>However, this
1682
1683	technique requires to induce a set of new trees every time the cost
1684
1685	changes.<span style='mso-spacerun:yes'>Â </span>Our empirical investigation
1686
1687	also reveals some interesting <span class=SpellE>behavior</span> of boosting
1688
1689	decision trees for cost-sensitive classification.</span></p>
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>98/2</span></p>
1702
1703
1704
1705	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Generating
1706
1707	accurate rule sets without global <span class=SpellE>optimization</span> </span></p>
1708
1709
1710
1711	<p class=MsoNormal style='margin-right:-.4pt'><span class=SpellE><span
1712
1713	lang=EN-GB>Eibe</span></span><span lang=EN-GB> Frank, Ian H. <span
1714
1715	class=SpellE>Witten</span></span></p>
1716
1717
1718
1719	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>The two dominant
1720
1721	schemes for rule-learning, C4.5 and RIPPER, both operate in two stages.<span
1722
1723	style='mso-spacerun:yes'>Â </span>First they induce an initial rule set and
1724
1725	then they refine it using a rather complex <span class=SpellE>optimization</span>
1726
1727	stage that discards (C4.5) or adjusts (RIPPER) individual rules to make them
1728
1729	work better together.<span style='mso-spacerun:yes'>Â </span>In contrast, this
1730
1731	paper shows how good rule sets can be learned one rule at a time, without any
1732
1733	need for global <span class=SpellE>optimization</span>.<span
1734
1735	style='mso-spacerun:yes'>Â </span>We present an algorithm for inferring rules
1736
1737	by repeatedly generating partial decision trees, thus combining the two major
1738
1739	paradigms for rule generation-creating rules from decision trees and the
1740
1741	separate-and-conquer rule-learning technique.<span style='mso-spacerun:yes'>Â
1742
1743	</span>The algorithm is straightforward and elegant: despite this, experiments
1744
1745	on standard <span class=SpellE>datasets</span> show that it produces rule sets
1746
1747	that are as accurate as and of similar size to those generated by C4.5, and
1748
1749	more accurate than <span class=SpellE>RIPPER's</span>.<span
1750
1751	style='mso-spacerun:yes'>Â </span>Moreover, it operates efficiently, and
1752
1753	because it avoids <span class=SpellE>postprocessing</span>, does not suffer the
1754
1755	extremely slow performance on pathological example sets for which the C4.5
1756
1757	method has been <span class=SpellE>criticized</span>.</span></p>
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>98/3</span></p>
1770
1771
1772
1773	<p class=MsoNormal style='margin-right:-.4pt'><span class=SpellE><span
1774
1775	lang=EN-GB>VQuery</span></span><span lang=EN-GB>: a graphical user interface
1776
1777	for Boolean query Specification and dynamic result preview</span></p>
1778
1779
1780
1781	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Steve Jones</span></p>
1782
1783
1784
1785	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Textual query
1786
1787	languages based on Boolean logic are common amongst the search facilities of
1788
1789	on-line information repositories.<span style='mso-spacerun:yes'>Â
1790
1791	</span>However, there is evidence to suggest that the syntactic and semantic
1792
1793	demands of such languages lead to user errors and adversely affect the time
1794
1795	that it takes users to form queries.<span style='mso-spacerun:yes'>Â
1796
1797	</span>Additionally, users are faced with user interfaces to these repositories
1798
1799	which are unresponsive and uninformative, and consequently fail to support
1800
1801	effective query refinement.<span style='mso-spacerun:yes'>Â </span>We suggest
1802
1803	that graphical query languages, particularly Venn-like diagrams, provide a
1804
1805	natural medium for Boolean query specification which overcomes the problems of
1806
1807	textual query languages.<span style='mso-spacerun:yes'>Â </span>Also, dynamic
1808
1809	result previews can be seamlessly integrated with graphical query specification
1810
1811	to increase the effectiveness of query refinements.<span
1812
1813	style='mso-spacerun:yes'>Â </span>We describe <span class=SpellE>VQuery</span>,
1814
1815	a query interface to the New Zealand Digital Library which exploits querying by
1816
1817	Venn diagrams and integrated query result previews.</span></p>
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>98/4</span></p>
1830
1831
1832
1833	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Revising
1834
1835	&lt;I&gt;Z&lt;/I&gt;: semantics and logic</span></p>
1836
1837
1838
1839	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Martin C. <span
1840
1841	class=SpellE>Henson</span>, Steve Reeves</span></p>
1842
1843
1844
1845	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>We introduce a
1846
1847	simple specification logic &lt;I&gt;Z&lt;/I&gt;c comprising a logic and
1848
1849	semantics (in &lt;I&gt;ZF&lt;/I&gt; set theory).<span
1850
1851	style='mso-spacerun:yes'>Â </span>We then provide an interpretation for (a
1852
1853	rational reconstruction of) the specification language &lt;I&gt;Z&lt;/I&gt;
1854
1855	within &lt;I&gt;Z&lt;/I&gt;c.<span style='mso-spacerun:yes'>Â </span>As a
1856
1857	result we obtain a sound logic for &lt;I&gt;Z&lt;/I&gt;, including the schema
1858
1859	calculus.<span style='mso-spacerun:yes'>Â </span>A consequence of our
1860
1861	formalisation is a critique of a number of concepts used in
1862
1863	&lt;I&gt;Z&lt;/I&gt;.<span style='mso-spacerun:yes'>Â </span>We demonstrate
1864
1865	that the complications and confusions which these concepts introduce can be avoided
1866
1867	without compromising <span class=SpellE>expressibility</span>.</span></p>
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>98/5</span></p>
1880
1881
1882
1883	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>A logic for the
1884
1885	schema calculus</span></p>
1886
1887
1888
1889	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Martin C. <span
1890
1891	class=SpellE>Henson</span>, Steve Reeves</span></p>
1892
1893
1894
1895	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>In this paper we
1896
1897	introduce and investigate a logic for the schema calculus of
1898
1899	&lt;I&gt;Z&lt;/I&gt;.<span style='mso-spacerun:yes'>Â </span>The schema
1900
1901	calculus is arguably the reason for &lt;I&gt;Z&lt;/I&gt;âs popularity but so
1902
1903	far no true calculus (a sound system of rules for reasoning about schema
1904
1905	expressions) has been given.<span style='mso-spacerun:yes'>Â
1906
1907	</span>Presentations thus far have either failed to provide a calculus (e.g.
1908
1909	the draft standard [3]) or have fallen back on informal descriptions at a
1910
1911	syntactic level (most text books e.g. [7[).<span style='mso-spacerun:yes'>Â
1912
1913	</span>Once the calculus is established we introduce a derived <span
1914
1915	class=SpellE>equational</span> logic which enables us to formalise properly the
1916
1917	informal notations of schema expression equality to be found in the literature.</span></p>
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>98/6</span></p>
1930
1931
1932
1933	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>New foundations
1934
1935	for &lt;I&gt;Z&lt;/I&gt;</span></p>
1936
1937
1938
1939	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Martin C. <span
1940
1941	class=SpellE>Henson</span>, Steve Reeves</span></p>
1942
1943
1944
1945	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>We provide a
1946
1947	constructive and <span class=SpellE>intensional</span> interpretation for the
1948
1949	specification language &lt;I&gt;Z&lt;/I&gt; in a theory of operations and kinds
1950
1951	&lt;I&gt;T&lt;/I&gt;.<span style='mso-spacerun:yes'>Â </span>The motivation is
1952
1953	to facilitate the development of an integrated approach to program
1954
1955	construction.<span style='mso-spacerun:yes'>Â </span>We illustrate the new
1956
1957	foundations for &lt;I&gt;Z&lt;/I&gt; with examples.</span></p>
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>98/7</span></p>
1970
1971
1972
1973	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Predicting apple
1974
1975	bruising relationships using machine learning</span></p>
1976
1977
1978
1979	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>G. Holmes, S.J.
1980
1981	Cunningham, B.T. <span class=SpellE>Dela</span> Rue, <span class=SpellE>A.F.</span>
1982
1983	<span class=SpellE>Bollen</span></span></p>
1984
1985
1986
1987	<p class=MsoBodyText><span lang=EN-US>Many models have been used to describe
1988
1989	the influence of internal or external factors on apple bruising.<span
1990
1991	style='mso-spacerun:yes'>Â </span>Few of these have addressed the application
1992
1993	of derived relationships to the evaluation of commercial operations.<span
1994
1995	style='mso-spacerun:yes'>Â </span>From an industry perspective, a model must
1996
1997	enable fruit to be rejected on the basis of a commercially significant bruise
1998
1999	and must also accurately quantify the effects of various combinations of input
2000
2001	features (such as <span class=SpellE>cultivar</span>, maturity, size, and so
2002
2003	on) on bruise prediction.<span style='mso-spacerun:yes'>Â </span>Input features
2004
2005	must in turn have characteristics which are measurable commercially; for
2006
2007	example, the measure of force should be impact energy rather than energy
2008
2009	absorbed.<span style='mso-spacerun:yes'>Â </span>Further, as the commercial
2010
2011	criteria for acceptable damage levels change, the model should be versatile
2012
2013	enough to regenerate new bruise thresholds from existing data.</span></p>
2014
2015
2016
2017
2018
2019
2020
2021	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Machine learning
2022
2023	is a burgeoning technology with a vast range of potential applications
2024
2025	particularly in agriculture where large amounts of data can be readily
2026
2027	collected [1].<span style='mso-spacerun:yes'>Â </span>The main advantage of
2028
2029	using a machine learning method in an application is that the models built for
2030
2031	prediction can be viewed and understood by the owner of the data who is in a
2032
2033	position to determine the usefulness of the model, an essential component in a
2034
2035	commercial environment.</span></p>
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>98/8</span></p>
2048
2049
2050
2051	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>An evaluation of
2052
2053	passage-level indexing strategies for a technical report archive</span></p>
2054
2055
2056
2057	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Michael Williams</span></p>
2058
2059
2060
2061	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Past research
2062
2063	has shown that using evidence from document passages rather than complete
2064
2065	documents is an effective way of improving the precision of full-text database
2066
2067	searches.<span style='mso-spacerun:yes'>Â </span>However, passage-level
2068
2069	indexing has yet to be widely adopted for commercial or online databases.</span></p>
2070
2071
2072
2073
2074
2075
2076
2077	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>This paper
2078
2079	reports on experiments designed to test the efficacy of passage-level indexing
2080
2081	with a particular collection of a full-text online database, the New Zealand
2082
2083	Digital Library.<span style='mso-spacerun:yes'>Â </span>Discourse passages and
2084
2085	word-window passages are used for the indexing process.<span
2086
2087	style='mso-spacerun:yes'>Â </span>Both ranked and Boolean searching are used to
2088
2089	test the resulting indexes.</span></p>
2090
2091
2092
2093
2094
2095
2096
2097	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Overlapping
2098
2099	window passages are shown to offer the best retrieval performance with both
2100
2101	ranked and Boolean queries.<span style='mso-spacerun:yes'>Â
2102
2103	</span>Modifications may be necessary to the term weighting methodology in
2104
2105	order to ensure optimal ranked query performance.</span></p>
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>98/9</span></p>
2118
2119
2120
2121	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Managing
2122
2123	multiple collections, multiple languages, and multiple media in a distributed
2124
2125	digital library</span></p>
2126
2127
2128
2129	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Ian H. <span
2130
2131	class=SpellE>Witten</span>, <span class=SpellE>Rodger</span> <span
2132
2133	class=SpellE>McNab</span>, Steve Jones, Sally Jo Cunningham, David Bainbridge,
2134
2135	Mark <span class=SpellE>Apperley</span></span></p>
2136
2137
2138
2139	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Managing the <span
2140
2141	class=SpellE>organizational</span> and software complexity of a comprehensive
2142
2143	digital library presents a significant challenge.<span
2144
2145	style='mso-spacerun:yes'>Â </span>Different library collections each have their
2146
2147	own distinctive features.<span style='mso-spacerun:yes'>Â </span>Different
2148
2149	presentation languages have structural implications such as left-to-right
2150
2151	writing order and text-only interfaces for the visually impaired.<span
2152
2153	style='mso-spacerun:yes'>Â </span>Different media involve different file
2154
2155	formats, and-more importantly-radically different search strategies are
2156
2157	required for non-textual media.<span style='mso-spacerun:yes'>Â </span>In a
2158
2159	distributed library, new collections can appear asynchronously on servers in
2160
2161	different parts of the world.<span style='mso-spacerun:yes'>Â </span>And as
2162
2163	searching interfaces mature from the command-line era exemplified by current
2164
2165	Web search engines into the age of reactive visual interfaces, experimental new
2166
2167	interfaces must be developed, supported, and tested.<span
2168
2169	style='mso-spacerun:yes'>Â </span>This paper describes our experience, gained
2170
2171	from operating a substantial digital library service over several years, in
2172
2173	solving these problems by designing an appropriate software architecture.</span></p>
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>98/10</span></p>
2186
2187
2188
2189	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Experiences with
2190
2191	a weighted decision tree learner</span></p>
2192
2193
2194
2195	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>John G. <span
2196
2197	class=SpellE>Cleary</span>, Leonard E. <span class=SpellE>Trigg</span></span></p>
2198
2199
2200
2201	<p class=MsoBodyText><span lang=EN-US>Machine learning algorithms for inferring
2202
2203	decision trees typically choose a single âbestâ tree to describe the training
2204
2205	data.<span style='mso-spacerun:yes'>Â </span>Recent research has shown that
2206
2207	classification performance can be significantly improved by voting predictions
2208
2209	of multiple, independently produced decision trees.<span
2210
2211	style='mso-spacerun:yes'>Â </span>This paper describes an algorithm, OB1, that
2212
2213	makes a weighted sum over many possible models.<span style='mso-spacerun:yes'>Â
2214
2215	</span>We describe one instance of OB1, that includes &lt;I&gt;all&lt;/I&gt;
2216
2217	possible decision trees as well as naÃ¯ve <span class=SpellE>Bayesian</span>
2218
2219	models.<span style='mso-spacerun:yes'>Â </span>OB1 is compared with a number of
2220
2221	other decision tree and instance based learning <span class=SpellE>alogrithms</span>
2222
2223	on some of the data sets from the UCI repository.<span
2224
2225	style='mso-spacerun:yes'>Â </span>Both an information gain and an accuracy
2226
2227	measure are used for the comparison.<span style='mso-spacerun:yes'>Â </span>On
2228
2229	the information gain measure OB1 performs significantly better than all the
2230
2231	other algorithms.<span style='mso-spacerun:yes'>Â </span>On the accuracy
2232
2233	measure it is significantly better than all the algorithms except naÃ¯ve <span
2234
2235	class=SpellE>Bayes</span> which performs comparably to OB1.</span></p>
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>98/11</span></p>
2248
2249
2250
2251	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>An entropy gain
2252
2253	measure of numeric prediction performance</span></p>
2254
2255
2256
2257	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Leonard <span
2258
2259	class=SpellE>Trigg</span></span></p>
2260
2261
2262
2263	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Categorical
2264
2265	classifier performance is typically evaluated with respect to error rate,
2266
2267	expressed as a percentage of test instances that were not correctly
2268
2269	classified.<span style='mso-spacerun:yes'>Â </span>When a classifier produces
2270
2271	multiple classifications for a test instance, the prediction is counted as
2272
2273	incorrect (even if the correct class was one of the predictions).<span
2274
2275	style='mso-spacerun:yes'>Â </span>Although commonly used in the literature,
2276
2277	error rate is a coarse measure of classifier performance, as it is based only
2278
2279	on a single prediction offered for a test instance.<span
2280
2281	style='mso-spacerun:yes'>Â </span>Since many classifiers can produce a class
2282
2283	distribution as a prediction, we should use this to provide a better measure of
2284
2285	how much information the classifier is extracting from the domain.</span></p>
2286
2287
2288
2289
2290
2291
2292
2293	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Numeric
2294
2295	classifiers are a relatively new development in machine learning, and as such
2296
2297	there is no single performance measure that has become standard.<span
2298
2299	style='mso-spacerun:yes'>Â </span>Typically these machine learning schemes
2300
2301	predict a single real number for each test instance, and the error between the
2302
2303	predicted and actual value is used to calculate a myriad of performance
2304
2305	measures such as correlation coefficient, root mean squared error, mean
2306
2307	absolute error, relative absolute error, and root relative squared error.<span
2308
2309	style='mso-spacerun:yes'>Â </span>With so many performance measures it is
2310
2311	difficult to establish an overall performance evaluation.</span></p>
2312
2313
2314
2315
2316
2317
2318
2319	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>The next section
2320
2321	describes a performance measure for machine learning schemes that attempts to
2322
2323	overcome the problems with current measures.<span style='mso-spacerun:yes'>Â
2324
2325	</span>In addition, the same evaluation measure is used for categorical and
2326
2327	numeric classifier.</span></p>
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>98/12</span></p>
2344
2345
2346
2347	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Proceedings of
2348
2349	CBISE â98 CaiSE*98 Workshop on Component Based Information Systems Engineering</span></p>
2350
2351
2352
2353	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Edited by John <span
2354
2355	class=SpellE>Grundy</span></span></p>
2356
2357
2358
2359	<p class=MsoBodyText><span lang=EN-US>Component-based information systems
2360
2361	development is an area of research and practice of increasing importance.<span
2362
2363	style='mso-spacerun:yes'>Â </span>Information Systems developers have <span
2364
2365	class=SpellE>realised</span> that traditional approaches to IS engineering
2366
2367	produce monolithic, difficult to maintain, difficult to reuse systems.<span
2368
2369	style='mso-spacerun:yes'>Â </span>In contrast, the use of software components,
2370
2371	which embody data, functionality and well-specified and understood interfaces,
2372
2373	makes interoperable, distributed and highly reusable IS components
2374
2375	feasible.<span style='mso-spacerun:yes'>Â </span>Component-based approaches to
2376
2377	IS engineering can be used at strategic and <span class=SpellE>organisational</span>
2378
2379	levels, to model business processes and whole IS architectures, in development
2380
2381	methods which <span class=SpellE>utilise</span> component-based models during
2382
2383	analysis and design, and in system implementation.<span
2384
2385	style='mso-spacerun:yes'>Â </span>Reusable components can allow end users to
2386
2387	compose and configure their own Information Systems, possibly from a range of
2388
2389	suppliers, and to more tightly couple their <span class=SpellE>organisational</span>
2390
2391	<span class=SpellE>workflows</span> with their IS support.</span></p>
2392
2393
2394
2395
2396
2397
2398
2399	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>This workshop
2400
2401	proceedings contains a range of papers addressing one or more of the above
2402
2403	issues relating to the use of component models for IS development.<span
2404
2405	style='mso-spacerun:yes'>Â </span>All of these papers were refereed by at least
2406
2407	two members of an international workshop committee comprising industry and
2408
2409	academic researchers and users of component technologies.<span
2410
2411	style='mso-spacerun:yes'>Â </span>Strategic uses of components are addressed in
2412
2413	the first three papers, while the following three address uses of components for
2414
2415	systems design and workflow management.<span style='mso-spacerun:yes'>Â
2416
2417	</span>Systems development using components, and the provision of environments
2418
2419	for component management are addressed in the following group of five
2420
2421	papers.<span style='mso-spacerun:yes'>Â </span>The last three papers in this
2422
2423	proceedings address component management and analysis techniques.</span></p>
2424
2425
2426
2427
2428
2429
2430
2431	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>All of these
2432
2433	papers provide new insights into the many<span style='mso-spacerun:yes'>Â
2434
2435	</span>varied uses of component technology for IS engineering.<span
2436
2437	style='mso-spacerun:yes'>Â </span>I hope you find them as interesting and
2438
2439	useful as I have when collating this proceedings and organising the workshop.</span></p>
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>98/13</span></p>
2452
2453
2454
2455	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>An analysis of
2456
2457	usage of a digital library</span></p>
2458
2459
2460
2461	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Steve Jones,
2462
2463	Sally Jo Cunningham, <span class=SpellE>Rodger</span> <span class=SpellE>McNab</span></span></p>
2464
2465
2466
2467	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>As experimental
2468
2469	digital library <span class=SpellE>testbeds</span> gain wider acceptance and
2470
2471	develop significant user bases, it becomes important to investigate the ways in
2472
2473	which users interact with the systems in practice.<span
2474
2475	style='mso-spacerun:yes'>Â </span>Transaction logs are one source of usage
2476
2477	information, and the information on user behaviour can be culled from them both
2478
2479	automatically (through calculation of summary statistics) and manually (by
2480
2481	examining query strings for semantic clues on search motivations and searching
2482
2483	strategy).<span style='mso-spacerun:yes'>Â </span>We conduct a transaction log
2484
2485	analysis on user activity in the Computer Science Technical Reports Collection
2486
2487	of the New Zealand Digital Library, and report insights gained and identify
2488
2489	resulting search interface design issues.</span></p>
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>98/14</span></p>
2502
2503
2504
2505	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Measuring ATM
2506
2507	traffic: final report for New Zealand Telecom</span></p>
2508
2509
2510
2511	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>John <span
2512
2513	class=SpellE>Cleary</span>, Ian Graham, Murray Pearson, Tony <span
2514
2515	class=SpellE>McGregor</span></span></p>
2516
2517
2518
2519	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>The report
2520
2521	describes the development of a low-cost ATM monitoring system, hosted by a
2522
2523	standard PC.<span style='mso-spacerun:yes'>Â </span>The monitor can be used
2524
2525	remotely returning information on ATM traffic flows to a central site.<span
2526
2527	style='mso-spacerun:yes'>Â </span>The monitor is interfaces to a GPS timing
2528
2529	receiver, which provides an absolute time accuracy of better than 1 <span
2530
2531	class=SpellE>usec</span>.<span style='mso-spacerun:yes'>Â </span>By monitoring
2532
2533	the same traffic flow at different points in a network it is possible to
2534
2535	measure cell delay and delay variation in real time, and with existing
2536
2537	traffic.<span style='mso-spacerun:yes'>Â </span>The monitoring system
2538
2539	characterises cells by a CRC calculated over the cell payload, thus special
2540
2541	measurement cells are not required.<span style='mso-spacerun:yes'>Â
2542
2543	</span>Delays in both local area and wide-area networks have been measured
2544
2545	using this system.<span style='mso-spacerun:yes'>Â </span>It is possible to
2546
2547	measure delay in a network that is not end-to-end ATM, as long as some cells
2548
2549	remain identical at the entry and exit points.<span style='mso-spacerun:yes'>Â
2550
2551	</span>Examples are given of traffic and delay measurements in both wide and
2552
2553	local area network systems, including delays measured over the Internet from
2554
2555	Canada to New Zealand.</span></p>
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
2566
2567
2568
2569
2570
2571	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>98/15</span></p>
2572
2573
2574
2575	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Despite its
2576
2577	simplicity, the naÃ¯ve <span class=SpellE>Bayes</span> learning scheme performs
2578
2579	well on most classification tasks, and is often significantly more accurate
2580
2581	than more sophisticated methods.<span style='mso-spacerun:yes'>Â
2582
2583	</span>Although the probability estimates that it produces can be inaccurate,
2584
2585	it often assigns maximum probability to the correct class.<span
2586
2587	style='mso-spacerun:yes'>Â </span>This suggests that its good performance might
2588
2589	be restricted to situations where the output is categorical.<span
2590
2591	style='mso-spacerun:yes'>Â </span>It is therefore interesting to see how it
2592
2593	performs in domains where the predicted value is numeric, because in this case,
2594
2595	predictions are more sensitive to inaccurate probability estimates.&lt;P&gt;</span></p>
2596
2597
2598
2599
2600
2601
2602
2603	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>This paper shows
2604
2605	how to apply the naÃ¯ve <span class=SpellE>Bayes</span> methodology to numeric
2606
2607	prediction (i.e. regression) tasks, and compares it to linear regression,
2608
2609	instance-based learning, and a method that produces âmodel treesâ-decision
2610
2611	trees with linear regression functions at the leaves.<span
2612
2613	style='mso-spacerun:yes'>Â </span>Although we exhibit an artificial <span
2614
2615	class=SpellE>dataset</span> for which naÃ¯ve <span class=SpellE>Bayes</span> is
2616
2617	the method of choice, on real-world <span class=SpellE>datasets</span> it is
2618
2619	almost uniformly worse than model trees.<span style='mso-spacerun:yes'>Â
2620
2621	</span>The comparison with linear regression depends on the error measure: for
2622
2623	one measure naÃ¯ve <span class=SpellE>Bayes</span> performs similarly, for
2624
2625	another it is worse.<span style='mso-spacerun:yes'>Â </span>Compared to
2626
2627	instance-based learning, it performs similarly with respect to both
2628
2629	measures.<span style='mso-spacerun:yes'>Â </span>These results indicate that
2630
2631	the simplistic statistical assumption that naÃ¯ve <span class=SpellE>Bayes</span>
2632
2633	makes is indeed more restrictive for regression than for classification.</span></p>
2634
2635
2636
2637
2638
2639
2640
2641	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>98/16</span></p>
2642
2643
2644
2645	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Link as you
2646
2647	type: using key phrases for automated dynamic link generation</span></p>
2648
2649
2650
2651	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Steve Jones</span></p>
2652
2653
2654
2655	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>When documents
2656
2657	are collected together from diverse sources they are unlikely to contain useful
2658
2659	hypertext links to support browsing amongst them.<span
2660
2661	style='mso-spacerun:yes'>Â </span>For large collections of thousands of
2662
2663	documents it is prohibitively resource intensive to manually insert links into
2664
2665	each document.<span style='mso-spacerun:yes'>Â </span>Users of such collections
2666
2667	may wish to relate documents within them to text that they are themselves
2668
2669	generating.<span style='mso-spacerun:yes'>Â </span>This process, often
2670
2671	involving keyword searching, distracts from the authoring process and results
2672
2673	in material related to query terms but not necessarily to the authorâs
2674
2675	document.<span style='mso-spacerun:yes'>Â </span>Query terms that are effective
2676
2677	in one collection might not be so in another.<span style='mso-spacerun:yes'>Â
2678
2679	</span>We have developed <span class=SpellE>Phrasier</span>, a system that
2680
2681	integrates authoring (of text and hyperlinks), browsing, querying and reading
2682
2683	in support of information retrieval activities.<span style='mso-spacerun:yes'>Â
2684
2685	</span><span class=SpellE>Phrasier</span> exploits key phrases which are
2686
2687	automatically extracted from documents in a collection, and uses them as link
2688
2689	anchors and to identify candidate destinations for hyperlinks.<span
2690
2691	style='mso-spacerun:yes'>Â </span>This system suggests links into existing
2692
2693	collections for purposes of authoring and retrieval of related information,
2694
2695	creates links between documents in a collection and provides supportive
2696
2697	document and link overviews.</span></p>
2698
2699
2700
2701
2702
2703
2704
2705
2706
2707
2708
2709	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>98/17</span></p>
2710
2711
2712
2713	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Melody based
2714
2715	tune retrieval over the World Wide Web</span></p>
2716
2717
2718
2719	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>David
2720
2721	Bainbridge, <span class=SpellE>Rodger</span> J. <span class=SpellE>McNab</span>,
2722
2723	Lloyd A. Smith</span></p>
2724
2725
2726
2727	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>In this paper we
2728
2729	describe the steps taken to develop a Web-based version of an existing
2730
2731	stand-alone, single-user digital library application for <span class=SpellE>melodical</span>
2732
2733	searching of a collection of music.<span style='mso-spacerun:yes'>Â </span>For
2734
2735	the three key components: input, searching, and output, we assess the
2736
2737	suitability of various Web-based strategies that deal with the now distributed
2738
2739	software architecture and explain the decisions we made.<span
2740
2741	style='mso-spacerun:yes'>Â </span>The resulting melody indexing service, known
2742
2743	as MELDEX, has been in operation for one year, and the feed-back we have
2744
2745	received has been <span class=SpellE>favorable</span>.</span></p>
2746
2747
2748
2749
2750
2751
2752
2753	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>98/18</span></p>
2754
2755
2756
2757	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Making oral
2758
2759	history accessible over the World Wide Web</span></p>
2760
2761
2762
2763	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>David
2764
2765	Bainbridge, Sally Jo Cunningham</span></p>
2766
2767
2768
2769	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>We describe a
2770
2771	multimedia, WWW-based oral history collection constructed from off-the-shelf or
2772
2773	publicly available software.<span style='mso-spacerun:yes'>Â </span>The source
2774
2775	materials for the collection include audio tapes of interviews and summary
2776
2777	transcripts of each interview, as well as photographs illustrating episodes
2778
2779	mentioned in the tapes.<span style='mso-spacerun:yes'>Â </span>Sections of the
2780
2781	transcripts are manually matched to associated segments of the tapes, and the
2782
2783	tapes are <span class=SpellE>digitized</span>.<span style='mso-spacerun:yes'>Â
2784
2785	</span>Users search a full-text retrieval system based on the text transcripts
2786
2787	to retrieve relevant transcript sections and their associated audio recordings
2788
2789	and photographs.<span style='mso-spacerun:yes'>Â </span>It is also possible to
2790
2791	search for photos by matching text queries against text descriptions of the
2792
2793	photos in the collection, where the located photos link back to their
2794
2795	respective interview transcript and audio recordings.</span></p>
2796
2797
2798
2799
2800
2801
2802
2803
2804
2805
2806
2807
2808
2809
2810
2811
2812
2813
2814
2815	<p class=MsoNormal style='margin-right:-.4pt'><b style='mso-bidi-font-weight:
2816
2817	normal'><span lang=EN-GB>1997<o:p></o:p></span></b></p>
2818
2819
2820
2821
2822
2823
2824
2825	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/1</span></p>
2826
2827
2828
2829	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>A dynamic and
2830
2831	flexible representation of social relationships in CSCW</span></p>
2832
2833
2834
2835	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Steve Jones,
2836
2837	Steve Marsh</span></p>
2838
2839
2840
2841	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>CSCW system
2842
2843	designers lack effective support in addressing the social issues and
2844
2845	interpersonal relationships which are linked with the use of CSCW systems.<span
2846
2847	style='mso-spacerun:yes'>Â </span>We present a formal description of trust to
2848
2849	support CSCW system designers in considering the social aspects of group work,
2850
2851	embedding those considerations in systems and analysing computer supported
2852
2853	group processes.</span></p>
2854
2855
2856
2857
2858
2859
2860
2861	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>We argue that
2862
2863	trust is a critical aspect in group work, and describe what we consider to be
2864
2865	the building blocks of trust.<span style='mso-spacerun:yes'>Â </span>We then
2866
2867	present a formal notation for the building blocks, their use in reasoning about
2868
2869	social interactions and how they are amended over time.</span></p>
2870
2871
2872
2873
2874
2875
2876
2877	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>We then consider
2878
2879	how the formalism may be used in practice, and present some insights from
2880
2881	initial analysis of the behaviour of the formalism.<span
2882
2883	style='mso-spacerun:yes'>Â </span>This is followed by a description of possible
2884
2885	amendments and extensions to the formalism.<span style='mso-spacerun:yes'>Â
2886
2887	</span>We conclude that it is possible to formalise a notion of trust and to
2888
2889	model the formalisation by a computational mechanism.</span></p>
2890
2891
2892
2893
2894
2895
2896
2897
2898
2899
2900
2901	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/2</span></p>
2902
2903
2904
2905	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Design issues
2906
2907	for World Wide Web navigation visualisation tools</span></p>
2908
2909
2910
2911	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Andy <span
2912
2913	class=SpellE>Cockburn</span>, Steve Jones</span></p>
2914
2915
2916
2917	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>The World Wide
2918
2919	Web (WWW) is a successful hypermedia information space used by millions of
2920
2921	people, yet it suffers from many deficiencies and problems in support for
2922
2923	navigation around its vast information space.<span style='mso-spacerun:yes'>Â
2924
2925	</span>In this paper we identify the origins of these navigation problems,
2926
2927	namely WWW browser design, WWW page design, and WWW page description
2928
2929	languages.<span style='mso-spacerun:yes'>Â </span>Regardless of their origins,
2930
2931	these problems are eventually represented to the user at the browserâs user
2932
2933	interface.<span style='mso-spacerun:yes'>Â </span>To help overcome these
2934
2935	problems, many tools are being developed which allow users to visualise WWW
2936
2937	subspaces.<span style='mso-spacerun:yes'>Â </span>We identify five key issues
2938
2939	in the design and functionality of these visualisation systems: characteristics
2940
2941	of the visual representation, the scope of the subspace representation, the
2942
2943	mechanisms for generating the visualisation, the degree of browser
2944
2945	independence, and the navigation support facilities.<span
2946
2947	style='mso-spacerun:yes'>Â </span>We provide a critical review of the diverse
2948
2949	range of WWW visualisation tools with respect to these issues.</span></p>
2950
2951
2952
2953
2954
2955
2956
2957
2958
2959
2960
2961	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/3</span></p>
2962
2963
2964
2965	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Stacked <span
2966
2967	class=SpellE>generalization</span>:<span style='mso-spacerun:yes'>Â </span>when
2968
2969	does it work?</span></p>
2970
2971
2972
2973	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Kai <span
2974
2975	class=SpellE>Ming</span> Ting, Ian H. <span class=SpellE>Witten</span></span></p>
2976
2977
2978
2979	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Stacked <span
2980
2981	class=SpellE>generalization</span> is a general method of using a high-level
2982
2983	model to combine lower-level models to achieve greater predictive
2984
2985	accuracy.<span style='mso-spacerun:yes'>Â </span>In this paper we address two
2986
2987	crucial issues which have been considered to be a 'black art' in classification
2988
2989	tasks ever since the introduction of stacked <span class=SpellE>generalization</span>
2990
2991	in 1992 by <span class=SpellE>Wolpert</span>: the type of <span class=SpellE>generalizer</span>
2992
2993	that is suitable to derive the higher-level model, and the kind of attributes
2994
2995	that should be used as its input. </span></p>
2996
2997
2998
2999
3000
3001
3002
3003	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>We demonstrate
3004
3005	the effectiveness of stacked <span class=SpellE>generalization</span> for
3006
3007	combining three different types of learning algorithms, and also for combining
3008
3009	models of the same type derived from a single learning algorithm in a
3010
3011	multiple-data-batches scenario.<span style='mso-spacerun:yes'>Â </span>We also
3012
3013	compare the performance of stacked <span class=SpellE>generalization</span>
3014
3015	with published results arcing and bagging.</span></p>
3016
3017
3018
3019
3020
3021
3022
3023
3024
3025
3026
3027	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/4</span></p>
3028
3029
3030
3031	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Browsing in
3032
3033	digital libraries:<span style='mso-spacerun:yes'>Â </span>a phrase-based
3034
3035	approach</span></p>
3036
3037
3038
3039	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Craig <span
3040
3041	class=SpellE>Nevill</span>-Manning, Ian H. <span class=SpellE>Witten</span>,
3042
3043	Gordon W. <span class=SpellE>Paynter</span></span></p>
3044
3045
3046
3047	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>A key question
3048
3049	for digital libraries is this: how should one go about becoming familiar with a
3050
3051	digital collection, as opposed to a physical one?<span
3052
3053	style='mso-spacerun:yes'>Â </span>Digital collections generally present an
3054
3055	appearance which is extremely opaque-a screen, typically a Web page, with no
3056
3057	indication of what, or how much, lies beyond: whether a carefully-selected
3058
3059	collection or a morass of worthless ephemera; whether half a dozen documents or
3060
3061	many millions.<span style='mso-spacerun:yes'>Â </span>At least physical
3062
3063	collections occupy physical space, present a physical appearance, and exhibit
3064
3065	tangible physical <span class=SpellE>organization</span>.<span
3066
3067	style='mso-spacerun:yes'>Â </span>When standing on the threshold of a large
3068
3069	library one gains a sense of presence and permanence that reflects the care
3070
3071	taken in building and maintaining the collection inside.<span
3072
3073	style='mso-spacerun:yes'>Â </span>No-one could confuse it with a
3074
3075	dung-heap!<span style='mso-spacerun:yes'>Â </span>Yet in the digital world the
3076
3077	difference is not so palpable.</span></p>
3078
3079
3080
3081
3082
3083
3084
3085
3086
3087
3088
3089
3090
3091
3092
3093	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/5</span></p>
3094
3095
3096
3097	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>A graphical
3098
3099	notation for the design of information visualisations</span></p>
3100
3101
3102
3103	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Matthew C. <span
3104
3105	class=SpellE>Humphrey</span></span></p>
3106
3107
3108
3109	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Visualisations
3110
3111	are coherent, graphical expressions of complex information that enhance peopleâs
3112
3113	ability to communicate and reason about that information.<span
3114
3115	style='mso-spacerun:yes'>Â </span>Yet despite the importance of visualisations
3116
3117	in helping people to understand and solve a wide variety of problems, there is
3118
3119	a dearth of formal tools and methods for discussing, describing and designing
3120
3121	them.<span style='mso-spacerun:yes'>Â </span>Although simple visualisations,
3122
3123	such as bar charts and <span class=SpellE>scatterplots</span>, are easily
3124
3125	produced by modern interactive software, novel visualisations of multivariate, <span
3126
3127	class=SpellE>multirelational</span> data must be expressed in a programming
3128
3129	language.<span style='mso-spacerun:yes'>Â </span>The Relational Visualisation
3130
3131	Notation is a new, graphical language for designing such highly expressive
3132
3133	visualisations that does not use programming constructs.<span
3134
3135	style='mso-spacerun:yes'>Â </span>Instead, the notation is based on relational
3136
3137	algebra, which is widely used in database query languages, and it is supported
3138
3139	by a suite of direct manipulation tools.<span style='mso-spacerun:yes'>Â
3140
3141	</span>This article presents the notation and examines the designs of some
3142
3143	interesting visualisations.</span></p>
3144
3145
3146
3147
3148
3149
3150
3151
3152
3153
3154
3155
3156
3157
3158
3159	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/6</span></p>
3160
3161
3162
3163	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Applications of
3164
3165	machine learning in information retrieval</span></p>
3166
3167
3168
3169	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Sally Jo
3170
3171	Cunningham, James <span class=SpellE>Littin</span>, Ian H. <span class=SpellE>Witten</span></span></p>
3172
3173
3174
3175	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Information
3176
3177	retrieval systems provide access to collections of thousands, or millions, of
3178
3179	documents, from which, by providing an appropriate description, users can
3180
3181	recover any one.<span style='mso-spacerun:yes'>Â </span>Typically, users <span
3182
3183	class=SpellE>iteratively</span> refine the descriptions they provide to satisfy
3184
3185	their needs, and retrieval systems can <span class=SpellE>utilize</span> user
3186
3187	feedback on selected documents to indicate the accuracy of the description at
3188
3189	any stage.<span style='mso-spacerun:yes'>Â </span>The style of description
3190
3191	required from the user, and the way it is employed to search the document
3192
3193	database, are consequences of the indexing method used for the collection.<span
3194
3195	style='mso-spacerun:yes'>Â </span>The index may take different forms, from
3196
3197	storing keywords with links to individual documents, to clustering documents
3198
3199	under related topics.</span></p>
3200
3201
3202
3203
3204
3205
3206
3207
3208
3209
3210
3211
3212
3213
3214
3215	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/7</span></p>
3216
3217
3218
3219	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Computer
3220
3221	concepts without computers:<span style='mso-spacerun:yes'>Â </span>a first
3222
3223	course in computer science</span></p>
3224
3225
3226
3227	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Geoffrey Holmes,
3228
3229	Tony C. Smith, William J. Rogers</span></p>
3230
3231
3232
3233	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>While some
3234
3235	institutions seek to make CS1 curricula more enjoyable by incorporating
3236
3237	specialised educational software [1] or by setting more enjoyable programming
3238
3239	assignments [2], we have joined the growing number of Computer Science
3240
3241	departments that seek to improve the quality of the CS1 experience by focusing
3242
3243	student attention away from the computer monitor [3,4].<span
3244
3245	style='mso-spacerun:yes'>Â </span>Sophisticated computing concepts usually
3246
3247	reserved for senior level courses are presented in a &lt;I&gt;popular
3248
3249	science&lt;/I&gt; manner, and given equal time alongside the essential
3250
3251	introductory programming material.<span style='mso-spacerun:yes'>Â </span>By
3252
3253	exposing students to a broad range of specific computational problems we
3254
3255	endeavour to make the introductory course more interesting and enjoyable, and
3256
3257	instil in students a sense of vision for areas they might specialise in as
3258
3259	computing majors.</span></p>
3260
3261
3262
3263
3264
3265
3266
3267
3268
3269
3270
3271
3272
3273
3274
3275	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/8</span></p>
3276
3277
3278
3279	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>A sight-singing
3280
3281	tutor</span></p>
3282
3283
3284
3285	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Lloyd A. Smith, <span
3286
3287	class=SpellE>Rodger</span> J. <span class=SpellE>McNab</span></span></p>
3288
3289
3290
3291	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>This paper
3292
3293	describes a computer program designed to aid its users in learning to
3294
3295	sight-sing.<span style='mso-spacerun:yes'>Â </span>Sight-singing-the ability to
3296
3297	sing music from a score without prior study-is an important skill for musicians
3298
3299	and holds a central place in most university music curricula.<span
3300
3301	style='mso-spacerun:yes'>Â </span>Its importance to vocalists is obvious; it is
3302
3303	also an important skill for instrumentalists and conductors because it develops
3304
3305	the aural imagination necessary to judge how the music should sound, when
3306
3307	played (<span class=SpellE>Benward</span> and Carr 1991).<span
3308
3309	style='mso-spacerun:yes'>Â </span>Furthermore, it is an important skill for
3310
3311	amateur musicians, who can save a great deal of rehearsal time through an
3312
3313	ability to sing music at sight.</span></p>
3314
3315
3316
3317
3318
3319
3320
3321
3322
3323
3324
3325	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/9</span></p>
3326
3327
3328
3329	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Stacking bagged
3330
3331	and <span class=SpellE>dagged</span> models</span></p>
3332
3333
3334
3335	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Kai <span
3336
3337	class=SpellE>Ming</span> Ting, I.H. <span class=SpellE>Witten</span></span></p>
3338
3339
3340
3341	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>In this paper,
3342
3343	we investigate the method of <i style='mso-bidi-font-style:normal'>stacked <span
3344
3345	class=SpellE>generalization</span></i> in combining models derived from
3346
3347	different subsets of a training <span class=SpellE>dataset</span> by a single
3348
3349	learning algorithm, as well as different algorithms.<span
3350
3351	style='mso-spacerun:yes'>Â </span>The simplest way to combine predictions from
3352
3353	competing models is majority vote, and the effect of the sampling regime used
3354
3355	to generate training subsets has already been studied in this context-when
3356
3357	bootstrap samples are used the method is called <i style='mso-bidi-font-style:
3358
3359	normal'>bagging</i>, and for disjoint samples we call it <span class=SpellE><i
3360
3361	style='mso-bidi-font-style:normal'>dagging</i></span>.<span
3362
3363	style='mso-spacerun:yes'>Â </span>This paper extends these studies to stacked <span
3364
3365	class=SpellE>generalization</span>, where a learning algorithm is employed to combine
3366
3367	the models.<span style='mso-spacerun:yes'>Â </span>This yields new methods
3368
3369	dubbed <i style='mso-bidi-font-style:normal'>bag-stacking</i> and <span
3370
3371	class=SpellE><i style='mso-bidi-font-style:normal'>dag</i></span><i
3372
3373	style='mso-bidi-font-style:normal'>-stacking</i>.</span></p>
3374
3375
3376
3377
3378
3379
3380
3381	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>We demonstrate
3382
3383	that bag-stacking and <span class=SpellE>dag</span>-stacking can be effective
3384
3385	for classification tasks even when the training samples cover just a small
3386
3387	fraction of the full <span class=SpellE>dataset</span>.<span
3388
3389	style='mso-spacerun:yes'>Â </span>In contrast to earlier bagging results, we
3390
3391	show that bagging and bag-stacking work for stable as well as unstable learning
3392
3393	algorithms, as do <span class=SpellE>dagging</span> and <span class=SpellE>dag</span>-stacking.<span
3394
3395	style='mso-spacerun:yes'>Â </span>We find that bag-stacking (<span
3396
3397	class=SpellE>dag</span>-stacking) almost always has higher predictive accuracy
3398
3399	than bagging (<span class=SpellE>dagging</span>), and we also show that
3400
3401	bag-stacking models derived using two different algorithms is more effective
3402
3403	than bagging.</span></p>
3404
3405
3406
3407
3408
3409
3410
3411
3412
3413
3414
3415	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/10</span></p>
3416
3417
3418
3419	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Extracting text
3420
3421	from Postscript</span></p>
3422
3423
3424
3425	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Craig <span
3426
3427	class=SpellE>Nevill</span>-Manning, Todd Reed, Ian H. <span class=SpellE>Witten</span></span></p>
3428
3429
3430
3431	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>We show how to
3432
3433	extract plain text from PostScript files. A textual scan is inadequate because
3434
3435	PostScript interpreters can generate characters on the page that do not appear
3436
3437	in the source file. Furthermore, word and line breaks are implicit in the
3438
3439	graphical rendition, and must be inferred from the positioning of word
3440
3441	fragments. We present a robust technique for extracting text and <span
3442
3443	class=SpellE>recognizing</span> words and paragraphs. The method uses a
3444
3445	standard PostScript interpreter but redefines several PostScript operators, and
3446
3447	simple heuristics are employed to locate word and line breaks. The scheme has
3448
3449	been used to create a full-text index, and plain-text versions, of 40,000
3450
3451	technical reports (34 <span class=SpellE>Gbyte</span> of PostScript). Other
3452
3453	text-extraction systems are reviewed: none offer the same combination of
3454
3455	robustness and simplicity.</span></p>
3456
3457
3458
3459
3460
3461
3462
3463
3464
3465
3466
3467	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/11</span></p>
3468
3469
3470
3471	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Gathering and
3472
3473	indexing rich fragments of the World Wide Web</span></p>
3474
3475
3476
3477	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Geoffrey Holmes,
3478
3479	William J Rogers</span></p>
3480
3481
3482
3483	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>While the World
3484
3485	Wide Web (WWW) is an attractive option as a resource for teaching and research
3486
3487	it does have some undesirable features. The cost of allowing students unlimited
3488
3489	access can be high-both in money and time; students may become addicted to
3490
3491	'surfing' the web-exploring purely for entertainment-and jeopardise their
3492
3493	studies. Students are likely to discover undesirable material because large
3494
3495	scale search engines index sites regardless of their merit. Finally, the
3496
3497	explosive growth of WWW usage means that servers and networks are often
3498
3499	overloaded, to the extent that a student may gain a very negative view of the
3500
3501	technology.</span></p>
3502
3503
3504
3505
3506
3507
3508
3509	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>We have developed
3510
3511	a piece of software which attempts to address these issues by capturing rich
3512
3513	fragments of the WWW onto local storage media. It is possible to put a
3514
3515	collection onto CD ROM, providing portability and inexpensive storage. This
3516
3517	enables the presentation of the WWW to distance learning students, who do not
3518
3519	have internet access. The software interfaces to standard, commonly available
3520
3521	web browsers, acting as a proxy server to the files stored on the local media,
3522
3523	and provides a search engine giving full text searching capability within the
3524
3525	collection.</span></p>
3526
3527
3528
3529
3530
3531
3532
3533
3534
3535
3536
3537
3538
3539
3540
3541	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/12</span></p>
3542
3543
3544
3545	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Using model
3546
3547	trees for classification</span></p>
3548
3549
3550
3551	<p class=MsoNormal style='margin-right:-.4pt'><span class=SpellE><span
3552
3553	lang=EN-GB>Eibe</span></span><span lang=EN-GB> Frank, Yong Wang, Stuart <span
3554
3555	class=SpellE>Inglis</span>, Geoffrey Holmes, Ian H. <span class=SpellE>Witten</span></span></p>
3556
3557
3558
3559	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Model trees,
3560
3561	which are a type of decision tree with linear regression functions at the
3562
3563	leaves, form the basis of a recent successful technique for predicting
3564
3565	continuous numeric values.<span style='mso-spacerun:yes'>Â </span>They can be
3566
3567	applied to classification problems by employing a standard method of
3568
3569	transforming a classification problem into a problem of function
3570
3571	approximation.<span style='mso-spacerun:yes'>Â </span>Surprisingly, using this
3572
3573	simple transformation the model tree <span class=SpellE>inducer</span> M5',
3574
3575	based on <span class=SpellE>Quinlan's</span> M5, generates more accurate
3576
3577	classifiers than the state-of-the-art decision tree learner C5.0, particularly
3578
3579	when most of the attributes are numeric.</span></p>
3580
3581
3582
3583
3584
3585
3586
3587
3588
3589
3590
3591	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/13</span></p>
3592
3593
3594
3595	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Discovering inter-attribute
3596
3597	relationships</span></p>
3598
3599
3600
3601	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Geoffrey Holmes</span></p>
3602
3603
3604
3605	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>It is important
3606
3607	to discover relationships between attributes being used to predict a class
3608
3609	attribute in supervised learning situations for two reasons.<span
3610
3611	style='mso-spacerun:yes'>Â </span>First, any such relationship will be
3612
3613	potentially interesting to the provider of a <span class=SpellE>dataset</span>
3614
3615	in its own right.<span style='mso-spacerun:yes'>Â </span>Second, it would
3616
3617	simplify a learning algorithm's search space, and the related irrelevant
3618
3619	feature and subset selection problem, if the relationships were removed from <span
3620
3621	class=SpellE>datasets</span> ahead of learning.<span style='mso-spacerun:yes'>Â
3622
3623	</span>An algorithm to discover such relationships is presented in this
3624
3625	paper.<span style='mso-spacerun:yes'>Â </span>The algorithm is described and a
3626
3627	surprising number of inter-attribute relationships are discovered in <span
3628
3629	class=SpellE>datasets</span> from the University of California at Irvine (UCI)
3630
3631	repository.</span></p>
3632
3633
3634
3635
3636
3637
3638
3639
3640
3641
3642
3643	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/14</span></p>
3644
3645
3646
3647	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Learning from <span
3648
3649	class=SpellE>batched</span> data:<span style='mso-spacerun:yes'>Â </span>model
3650
3651	combination <span class=SpellE>vs</span> data combination</span></p>
3652
3653
3654
3655	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Kai <span
3656
3657	class=SpellE>Ming</span> Ting, Boon <span class=SpellE>Toh</span> Low, Ian H. <span
3658
3659	class=SpellE>Witten</span></span></p>
3660
3661
3662
3663	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>When presented
3664
3665	with multiple batches of data, one can either combine them into a single batch
3666
3667	before applying a machine learning procedure or learn from each batch
3668
3669	independently and combine the resulting models.<span style='mso-spacerun:yes'>Â
3670
3671	</span>The former procedure, data combination, is straightforward; this paper
3672
3673	investigates the latter, model combination.<span style='mso-spacerun:yes'>Â
3674
3675	</span>Given an appropriate combination method, one might expect model
3676
3677	combination to prove superior when the data in each batch was obtained under
3678
3679	somewhat different conditions or when different learning algorithms were used
3680
3681	on the batches.<span style='mso-spacerun:yes'>Â </span>Empirical results show
3682
3683	that model combination often outperforms data combination even when the batches
3684
3685	are drawn randomly from a single source of data and the same learning method is
3686
3687	used on each.<span style='mso-spacerun:yes'>Â </span>Moreover, this is not just
3688
3689	an <span class=SpellE>artifact</span> of one particular method of combining
3690
3691	models: it occurs with several different combination methods.</span></p>
3692
3693
3694
3695
3696
3697
3698
3699	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>We relate this
3700
3701	phenomenon to the learning curve of the classifiers being used.<span
3702
3703	style='mso-spacerun:yes'>Â </span>Early in the learning process when the
3704
3705	learning curve is steep there is much to gain from data combination, but later
3706
3707	when it becomes shallow there is less to gain and model combination achieves a
3708
3709	greater reduction in variance and hence a lower error rate.</span></p>
3710
3711
3712
3713
3714
3715
3716
3717	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>The practical
3718
3719	implication of these results is that one should consider using model
3720
3721	combination rather than data combination, especially when multiple batches of
3722
3723	data for the same task are readily available.<span style='mso-spacerun:yes'>Â
3724
3725	</span>It is often superior even when the batches are drawn randomly from a
3726
3727	single sample, and we expect its advantage to increase if genuine statistical
3728
3729	differences between the batches exist.</span></p>
3730
3731
3732
3733
3734
3735
3736
3737
3738
3739
3740
3741	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/15</span></p>
3742
3743
3744
3745	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Information
3746
3747	seeking retrieval, reading and storing behaviour of library users</span></p>
3748
3749
3750
3751	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Turner K.</span></p>
3752
3753
3754
3755	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>In the interest
3756
3757	of digital libraries, it is advisable that designers be aware of the potential
3758
3759	behaviour of the users of such a system.<span style='mso-spacerun:yes'>Â
3760
3761	</span>There are two distinct parts under investigation, the interaction
3762
3763	between traditional libraries involving the seeking and retrieval of relevant
3764
3765	material, and the reading and storage behaviours ensuing. Through this
3766
3767	analysis, the findings could be incorporated into digital library facilities.
3768
3769	There has been copious amounts of research on information seeking leading to
3770
3771	the development of behavioural models to describe the process. Often research
3772
3773	on the information seeking practices of individuals is based on the task and
3774
3775	field of study. The information seeking model, presented by Ellis et al.
3776
3777	(1993), characterises the format of this study where it is used to compare
3778
3779	various research on the information seeking practices of groups of people (from
3780
3781	academics to professionals). It is found that, although researchers do make use
3782
3783	of library facilities, they tend to rely heavily on their own collections and
3784
3785	primarily use the library as a source for previously identified information,
3786
3787	browsing and <span class=SpellE>interloan</span>. It was found that there are
3788
3789	significant differences in user behaviour between the groups analysed. When
3790
3791	looking at the reading and storage of material it was hard to draw conclusions,
3792
3793	due to the lack of substantial research and information on the topic. However,
3794
3795	through the use of reading strategies, a general idea on how readers behave can
3796
3797	be developed. Designers of digital libraries can benefit from the guidelines
3798
3799	presented here to better understand their audience.</span></p>
3800
3801
3802
3803
3804
3805
3806
3807
3808
3809
3810
3811	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/16</span></p>
3812
3813
3814
3815	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Proceeding of
3816
3817	the INTERACT97 Combined Workshop on CSCW in HCI-<span class=SpellE>Worldwide</span></span></p>
3818
3819
3820
3821	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Matthias <span
3822
3823	class=SpellE>Rauterberg</span>, Lars <span class=SpellE>Oestreicher</span>,
3824
3825	John <span class=SpellE>Grundy</span></span></p>
3826
3827
3828
3829	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>This is the
3830
3831	proceedings for the INTERACT97 combined workshop on âCSCW in HCI-<span
3832
3833	class=SpellE>worldwide</span>â.<span style='mso-spacerun:yes'>Â </span>The
3834
3835	position papers in this proceedings are those selected from topics relating to
3836
3837	HCI community development <span class=SpellE>worldwide</span> and to CSCW
3838
3839	issues.<span style='mso-spacerun:yes'>Â </span>Originally these were to be two
3840
3841	separate INTERACT workshops, but were combined to ensure sufficient
3842
3843	participation for a combined workshop to run.</span></p>
3844
3845
3846
3847
3848
3849
3850
3851	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>The combined
3852
3853	workshop has been split into two separate sessions to run in the morning of
3854
3855	July 15<sup>th</sup>, Sydney, Australia.<span style='mso-spacerun:yes'>Â
3856
3857	</span>One to discuss the issues relating to the position papers focusing on
3858
3859	general CSCW systems, the other to the development of HCI communities in a <span
3860
3861	class=SpellE>worldwide</span> context.<span style='mso-spacerun:yes'>Â
3862
3863	</span>The CSCW session uses as a case study a proposed <span class=SpellE>groupware</span>
3864
3865	tool for facilitating the development of an HCI database with a <span
3866
3867	class=SpellE>worldwide</span> geographical distribution.<span
3868
3869	style='mso-spacerun:yes'>Â </span>The HCI community session focuses on
3870
3871	developing the content for such a database, in order for it to foster the
3872
3873	continued development of HCI communities.<span style='mso-spacerun:yes'>Â
3874
3875	</span>The afternoon session of the combined workshop involves a joint
3876
3877	discussion of the case study <span class=SpellE>groupware</span> tool, in terms
3878
3879	of its content and likely <span class=SpellE>groupware</span> facilities.</span></p>
3880
3881
3882
3883
3884
3885
3886
3887	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>The position
3888
3889	papers have been grouped into those focusing on HCI communities and hence
3890
3891	content issues for a <span class=SpellE>groupware</span> database, and those focusing
3892
3893	on CSCW and <span class=SpellE>groupware</span> issues, and hence likely <span
3894
3895	class=SpellE>groupware</span> support in the proposed HCI
3896
3897	database/collaboration tools.<span style='mso-spacerun:yes'>Â </span>We hope
3898
3899	that you find the position papers in this proceedings offer a wide range of
3900
3901	interesting reports of HCI community development <span class=SpellE>worldwide</span>,
3902
3903	leading CSCW system research, and that a <span class=SpellE>groupware</span>
3904
3905	tool supporting aspects of a <span class=SpellE>worldwide</span> HCI database
3906
3907	can draw upon the varied work reported.</span></p>
3908
3909
3910
3911
3912
3913
3914
3915
3916
3917
3918
3919
3920
3921
3922
3923	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/17</span></p>
3924
3925
3926
3927	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Internationalising
3928
3929	a spreadsheet for Pacific Basin languages</span></p>
3930
3931
3932
3933	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Robert <span
3934
3935	class=SpellE>Barbour</span>, Alvin <span class=SpellE>Yeo</span></span></p>
3936
3937
3938
3939	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>As people trade
3940
3941	and engage in commerce, an economically dominant culture tends to migrate
3942
3943	language into other recently contacted cultures.<span
3944
3945	style='mso-spacerun:yes'>Â </span>Information technology (IT) can accelerate <span
3946
3947	class=SpellE>enculturation</span> and promote the expansion of western hegemony
3948
3949	in IT.<span style='mso-spacerun:yes'>Â </span>Equally, IT can present a
3950
3951	culturally appropriate interface to the user that promotes the preservation of
3952
3953	culture and language with very little additional effort.<span
3954
3955	style='mso-spacerun:yes'>Â </span>In this paper a spreadsheet is
3956
3957	internationalised to accept languages from the Latin-1 character set such as
3958
3959	English, <span class=SpellE>Maori</span> and <span class=SpellE>Bahasa</span> <span
3960
3961	class=SpellE>Melayu</span> (Malaysiaâs national language).<span
3962
3963	style='mso-spacerun:yes'>Â </span>A technique that allows a non-programmer to
3964
3965	add a new language to the spreadsheet is described.<span
3966
3967	style='mso-spacerun:yes'>Â </span>The technique could also be used to
3968
3969	internationalise other software at the point of design by following the steps
3970
3971	we outline.</span></p>
3972
3973
3974
3975
3976
3977
3978
3979
3980
3981
3982
3983
3984
3985
3986
3987	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/18</span></p>
3988
3989
3990
3991	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Localising a
3992
3993	spreadsheet:<span style='mso-spacerun:yes'>Â </span>an <span class=SpellE>Iban</span>
3994
3995	example</span></p>
3996
3997
3998
3999	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Alvin <span
4000
4001	class=SpellE>Yeo</span>, Robert <span class=SpellE>Barbour</span></span></p>
4002
4003
4004
4005	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Presently, there
4006
4007	is little localisation of software to smaller cultures if it is not
4008
4009	economically viable.<span style='mso-spacerun:yes'>Â </span>We believe software
4010
4011	should also be localised to the languages of small cultures in order to sustain
4012
4013	and preserve these small cultures.<span style='mso-spacerun:yes'>Â </span>As an
4014
4015	example, we localised a spreadsheet from English to <span class=SpellE>Iban</span>.<span
4016
4017	style='mso-spacerun:yes'>Â </span>The process in which we carried out the
4018
4019	localisation can be used as a framework for the localisation of software to
4020
4021	languages of small ethnic minorities.<span style='mso-spacerun:yes'>Â
4022
4023	</span>Some problems faced during the localisation process are also discussed.</span></p>
4024
4025
4026
4027
4028
4029
4030
4031
4032
4033
4034
4035
4036
4037
4038
4039	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/19</span></p>
4040
4041
4042
4043	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Strategies of
4044
4045	internationalisation and localisation: a postmodernist/s perspective</span></p>
4046
4047
4048
4049	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Alvin <span
4050
4051	class=SpellE>Yeo</span>, Robert <span class=SpellE>Barbour</span></span></p>
4052
4053
4054
4055	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Many software
4056
4057	companies today are developing software not only for local consumption but for
4058
4059	the rest of the world.<span style='mso-spacerun:yes'>Â </span>We introduce the
4060
4061	concepts of internationalisation and localisation and discuss some techniques
4062
4063	using these processes.<span style='mso-spacerun:yes'>Â </span>An examination of
4064
4065	<span class=SpellE>postmodern</span> critique with respect to the software
4066
4067	industry is also reported.<span style='mso-spacerun:yes'>Â </span>In addition,
4068
4069	we also feature our proposed internationalisation technique that was inspired
4070
4071	by taking into account the researches of <span class=SpellE>postmodern</span>
4072
4073	philosophers and mathematicians.<span style='mso-spacerun:yes'>Â </span>As illustrated
4074
4075	in our prototype, the technique empowers non-programmers to localise their own
4076
4077	software.<span style='mso-spacerun:yes'>Â </span>Further development of the
4078
4079	technique and its implications on user interfaces and the future of software
4080
4081	internationalisation and localisation are discussed.</span></p>
4082
4083
4084
4085
4086
4087
4088
4089
4090
4091
4092
4093	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/20</span></p>
4094
4095
4096
4097	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Language use in
4098
4099	software</span></p>
4100
4101
4102
4103	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Alvin <span
4104
4105	class=SpellE>Yeo</span>, Robert <span class=SpellE>Barbour</span></span></p>
4106
4107
4108
4109	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Many of the
4110
4111	popular software we use today are in English.<span style='mso-spacerun:yes'>Â
4112
4113	</span>Very few software applications are available in minority languages.<span
4114
4115	style='mso-spacerun:yes'>Â </span>Besides economic goals, we justify why
4116
4117	software should be made available to smaller cultures.<span
4118
4119	style='mso-spacerun:yes'>Â </span>Furthermore, there is evidence that people
4120
4121	learn and progress faster in software in their mother tongue (<span
4122
4123	class=SpellE>Griffiths</span> et at, 1994) (<span class=SpellE>Krock</span>,
4124
4125	1996).<span style='mso-spacerun:yes'>Â </span>We hypothesise that experienced
4126
4127	users of English spreadsheet can easily migrate to a spreadsheet in their
4128
4129	native tongue i.e. <span class=SpellE>Bahasa</span> <span class=SpellE>Melayu</span>
4130
4131	(Malaysiaâs national language).<span style='mso-spacerun:yes'>Â
4132
4133	</span>Observations made in the study suggest that the native speakers of <span
4134
4135	class=SpellE>Bahasa</span> <span class=SpellE>Melayu</span> had difficulties
4136
4137	with the <span class=SpellE>Bahasa</span> <span class=SpellE>Melayu</span>
4138
4139	interface.<span style='mso-spacerun:yes'>Â </span>The subjectsâ main difficulty
4140
4141	was their unfamiliarity with computing terminology in <span class=SpellE>Bahasa</span>
4142
4143	<span class=SpellE>Melayu</span>.<span style='mso-spacerun:yes'>Â </span>We
4144
4145	present possible strategies to increase the use of <span class=SpellE>Bahasa</span>
4146
4147	<span class=SpellE>Melayu</span> in IT.<span style='mso-spacerun:yes'>Â
4148
4149	</span>These strategies may also be used to promote the use of other minority
4150
4151	languages in IT.</span></p>
4152
4153
4154
4155
4156
4157
4158
4159
4160
4161
4162
4163
4164
4165
4166
4167	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/21</span></p>
4168
4169
4170
4171	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Usability
4172
4173	testing:<span style='mso-spacerun:yes'>Â </span>a Malaysian study</span></p>
4174
4175
4176
4177	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Alvin <span
4178
4179	class=SpellE>Yeo</span>, Robert <span class=SpellE>Barbour</span>, Mark <span
4180
4181	class=SpellE>Apperley</span></span></p>
4182
4183
4184
4185	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>An exploratory
4186
4187	study of software assessment techniques is conducted in Malaysia.<span
4188
4189	style='mso-spacerun:yes'>Â </span>Subjects in the study comprised staff members
4190
4191	of a Malaysian university with a high Information Technology (IT) presence.<span
4192
4193	style='mso-spacerun:yes'>Â </span>The subjects assessed a spreadsheet tool with
4194
4195	a <span class=SpellE>Bahasa</span> <span class=SpellE>Melayu</span> (Malaysiaâs
4196
4197	national language) interface.<span style='mso-spacerun:yes'>Â </span>Software
4198
4199	evaluation techniques used include the think aloud method, interviews and the
4200
4201	System Usability Scale.<span style='mso-spacerun:yes'>Â </span>The responses in
4202
4203	the various techniques used are reported and initial results indicate
4204
4205	idiosyncratic behaviour of Malaysian subjects.<span style='mso-spacerun:yes'>Â
4206
4207	</span>The implications of the findings are also discussed.</span></p>
4208
4209
4210
4211
4212
4213
4214
4215
4216
4217
4218
4219
4220
4221
4222
4223	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/22</span></p>
4224
4225
4226
4227	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Inducing
4228
4229	cost-sensitive trees via instance-weighting</span></p>
4230
4231
4232
4233	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Kai <span
4234
4235	class=SpellE>Ming</span> Ting</span></p>
4236
4237
4238
4239	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>We introduce an
4240
4241	instance-weighting method to induce cost-sensitive trees in this paper.<span
4242
4243	style='mso-spacerun:yes'>Â </span>It is a <span class=SpellE>generalization</span>
4244
4245	of the standard tree induction process where only the initial instance weights
4246
4247	determine the type of tree (i.e., minimum error trees or minimum cost trees) to
4248
4249	be induced.<span style='mso-spacerun:yes'>Â </span>We demonstrate that it can
4250
4251	be easily adopted to an existing tree learning algorithm.</span></p>
4252
4253
4254
4255
4256
4257
4258
4259	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Previous
4260
4261	research gave insufficient evidence to support the fact that the greedy
4262
4263	divide-and-conquer algorithm can effectively induce a truly cost-sensitive tree
4264
4265	directly from the training data.<span style='mso-spacerun:yes'>Â </span>We
4266
4267	provide this empirical evidence in this paper.<span style='mso-spacerun:yes'>Â
4268
4269	</span>The algorithm employing the instance-weighting method is found to be
4270
4271	comparable to or better than both C4.5 and C5 in terms of total
4272
4273	misclassification costs, tree size and the number of high cost errors.<span
4274
4275	style='mso-spacerun:yes'>Â </span>The instance-weighting method is also simpler
4276
4277	and more effective in implementation than a method based on altered priors.</span></p>
4278
4279
4280
4281
4282
4283
4284
4285
4286
4287
4288
4289	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/23</span></p>
4290
4291
4292
4293	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Fast convergence
4294
4295	with a greedy tag-phrase dictionary</span></p>
4296
4297
4298
4299	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Ross <span
4300
4301	class=SpellE>Peeters</span>, Tony C. Smith</span></p>
4302
4303
4304
4305	<p class=MsoBodyText><span lang=EN-US>The best general-purpose compression
4306
4307	schemes make their gains by estimating a probability distribution over all
4308
4309	possible next symbols given the context established by some number of previous
4310
4311	symbols.<span style='mso-spacerun:yes'>Â </span>Such context models typically
4312
4313	obtain good compression results for plain text by taking advantage of
4314
4315	regularities in character sequences.<span style='mso-spacerun:yes'>Â
4316
4317	</span>Frequent words and syllables can be incorporated into the model quickly
4318
4319	and thereafter used for reasonably accurate prediction.<span
4320
4321	style='mso-spacerun:yes'>Â </span>However, the precise context in which
4322
4323	frequent patterns emerge is often extremely varied, and each new word or phrase
4324
4325	immediately introduces new contexts which can adversely affect the compression
4326
4327	rate</span></p>
4328
4329
4330
4331
4332
4333
4334
4335	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>A great deal of
4336
4337	the structural regularity in a natural language is given rather more by
4338
4339	properties of its grammar than by the orthographic transcription of its
4340
4341	phonology.<span style='mso-spacerun:yes'>Â </span>This implies that access to a
4342
4343	grammatical abstraction might lead to good compression.<span
4344
4345	style='mso-spacerun:yes'>Â </span>While grammatical models have been used
4346
4347	successfully for compressing computer programs [4], grammar-based compression
4348
4349	of plain text has received little attention, primarily because of the
4350
4351	difficulties associated with constructing a suitable natural language
4352
4353	grammar.<span style='mso-spacerun:yes'>Â </span>But even without a precise
4354
4355	formulation of the syntax of a language, there is a linguistic abstraction
4356
4357	which is easily accessed and which demonstrates a high degree of regularity
4358
4359	which can be exploited for compression purposes-namely, lexical categories.</span></p>
4360
4361
4362
4363
4364
4365
4366
4367
4368
4369
4370
4371	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/24</span></p>
4372
4373
4374
4375	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Tag based models
4376
4377	of English text</span></p>
4378
4379
4380
4381	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>W. J. <span
4382
4383	class=SpellE>Teahan</span>, John G. <span class=SpellE>Cleary</span></span></p>
4384
4385
4386
4387	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>The problem of
4388
4389	compressing English text is important both because of the ubiquity of English
4390
4391	as a target for compression and because of the light that compression can shed
4392
4393	on the structure of English.<span style='mso-spacerun:yes'>Â </span>English
4394
4395	text is examined in conjunction with additional information about the parts of
4396
4397	speech of each word in the text (these are referred to as âtagsâ).<span
4398
4399	style='mso-spacerun:yes'>Â </span>It is shown that the tags plus the text can
4400
4401	be compressed more than the text alone.<span style='mso-spacerun:yes'>Â
4402
4403	</span>Essentially the tags can be compressed for nothing or even a small net
4404
4405	saving in size.<span style='mso-spacerun:yes'>Â </span>A comparison is made of
4406
4407	a number of different ways of integrating compression of tags and text using an
4408
4409	escape mechanism similar to PPM.<span style='mso-spacerun:yes'>Â </span>These
4410
4411	are also compared with standard word based and character based compression
4412
4413	programs.<span style='mso-spacerun:yes'>Â </span>The result is that the tag
4414
4415	character and word based schemes always outperform the character based
4416
4417	schemes.<span style='mso-spacerun:yes'>Â </span>Overall, the tag based schemes
4418
4419	outperform the word based schemes.<span style='mso-spacerun:yes'>Â </span>We
4420
4421	conclude by conjecturing that tags chosen for compression rather than
4422
4423	linguistic purposes would perform even better.</span></p>
4424
4425
4426
4427
4428
4429
4430
4431
4432
4433
4434
4435	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/25</span></p>
4436
4437
4438
4439	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Musical image
4440
4441	compression</span></p>
4442
4443
4444
4445	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>David
4446
4447	Bainbridge, Stuart <span class=SpellE>Inglis</span></span></p>
4448
4449
4450
4451	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Optical music
4452
4453	recognition aims to convert the vast repositories of sheet music in the world
4454
4455	into an on-line digital format [Bai97].<span style='mso-spacerun:yes'>Â
4456
4457	</span>In the near future it will be possible to assimilate music into digital
4458
4459	libraries and users will be able to perform searches based on a sung melody in
4460
4461	addition to typical text-based searching [MSW+96].<span
4462
4463	style='mso-spacerun:yes'>Â </span>An important requirement for such a system is
4464
4465	the ability to reproduce the original score as accurately as possible.<span
4466
4467	style='mso-spacerun:yes'>Â </span>Due to the huge amount of sheet music
4468
4469	available, the efficient storage of musical images is an important topic of
4470
4471	study.</span></p>
4472
4473
4474
4475
4476
4477
4478
4479	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>This paper
4480
4481	investigates whether the âknowledgeâ extracted from the optical music
4482
4483	recognition (OMR) process can be exploited to gain higher compression than the
4484
4485	JBIG international standard for <span class=SpellE>bi</span>-level image
4486
4487	compression.<span style='mso-spacerun:yes'>Â </span>We present a hybrid
4488
4489	approach where the primitive shapes of music extracted by the optical music
4490
4491	recognition process-note heads, note stems, staff lines and so forth-are fed
4492
4493	into a graphical symbol based compression scheme originally designed for images
4494
4495	containing mainly printed text.<span style='mso-spacerun:yes'>Â </span>Using
4496
4497	this hybrid approach the average compression rate for a single page is improved
4498
4499	by 3.5% over JBIG.<span style='mso-spacerun:yes'>Â </span>When multiple pages with
4500
4501	similar typography are processed in sequence, the file size is decreased by
4502
4503	4-8%.</span></p>
4504
4505
4506
4507
4508
4509
4510
4511	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Section 2
4512
4513	presents the relevant background to both optical music recognition and textual
4514
4515	image compression.<span style='mso-spacerun:yes'>Â </span>Section 3 describes
4516
4517	the experiments performed on 66 test images, outlining the combinations of
4518
4519	parameters that were examined to give the best results.<span
4520
4521	style='mso-spacerun:yes'>Â </span>The initial results and refinements are
4522
4523	presented in Section 4, and we conclude in the last section by <span
4524
4525	class=SpellE>summarizing</span> the findings of this work.</span></p>
4526
4527
4528
4529
4530
4531
4532
4533
4534
4535
4536
4537
4538
4539
4540
4541	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/26</span></p>
4542
4543
4544
4545	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Correcting English
4546
4547	text using PPM models</span></p>
4548
4549
4550
4551	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>W. J. <span
4552
4553	class=SpellE>Teahan</span>, S. <span class=SpellE>Inglis</span>, J. G. <span
4554
4555	class=SpellE>Cleary</span>, G. Holmes</span></p>
4556
4557
4558
4559	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>An essential
4560
4561	component of many applications in natural language processing is a language <span
4562
4563	class=SpellE>modeler</span> able to correct errors in the text being
4564
4565	processed.<span style='mso-spacerun:yes'>Â </span>For optical character recognition
4566
4567	(OCR), poor scanning quality or extraneous pixels in the image may cause one or
4568
4569	more characters to be mis-<span class=SpellE>recognized</span>; while for
4570
4571	spelling correction, two characters may be transposed, or a character may be
4572
4573	inadvertently inserted or missed out. </span></p>
4574
4575
4576
4577
4578
4579
4580
4581	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>This paper
4582
4583	describes a method for correcting English text using a PPM model.<span
4584
4585	style='mso-spacerun:yes'>Â </span>A method that segments words in English text
4586
4587	is introduced and is shown to be a significant improvement over previously used
4588
4589	methods.<span style='mso-spacerun:yes'>Â </span>A similar technique is also
4590
4591	applied as a post-processing stage after pages have been <span class=SpellE>recognized</span>
4592
4593	by a state-of-the-art commercial OCR system.<span style='mso-spacerun:yes'>Â
4594
4595	</span>We show that the accuracy of the OCR system can be increased from 95.9%
4596
4597	to 96.6%, a decrease of about 10 errors per page.</span></p>
4598
4599
4600
4601
4602
4603
4604
4605
4606
4607
4608
4609
4610
4611
4612
4613
4614
4615
4616
4617	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/27</span></p>
4618
4619
4620
4621	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Constraints on
4622
4623	parallelism beyond 10 instructions per cycle</span></p>
4624
4625
4626
4627	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>John G. <span
4628
4629	class=SpellE>Cleary</span>, Richard H. <span class=SpellE>Littin</span>, J. A.
4630
4631	David <span class=SpellE>McWha</span>, Murray W. Pearson</span></p>
4632
4633
4634
4635	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>The problem of
4636
4637	extracting Instruction Level Parallelism at levels of 10 instructions per clock
4638
4639	and higher is considered.<span style='mso-spacerun:yes'>Â </span>Two different
4640
4641	architectures which use speculation on memory accesses to achieve this level of
4642
4643	performance are reviewed.<span style='mso-spacerun:yes'>Â </span>It is pointed
4644
4645	out that while this form of speculation gives high potential parallelism it is
4646
4647	necessary to retain execution state so that incorrect speculation can be detected
4648
4649	and subsequently squashed.<span style='mso-spacerun:yes'>Â </span>Simulation
4650
4651	results show that the space to store such state is a critical resource in
4652
4653	obtaining good speedup.<span style='mso-spacerun:yes'>Â </span>To make good use
4654
4655	of the space it is essential that state be stored efficiently and that it be
4656
4657	retired as soon as possible.<span style='mso-spacerun:yes'>Â </span>A number of
4658
4659	techniques for extracting the best usage from the available state storage are
4660
4661	introduced.</span></p>
4662
4663
4664
4665
4666
4667
4668
4669
4670
4671
4672
4673
4674
4675
4676
4677	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/28</span></p>
4678
4679
4680
4681	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Effects of
4682
4683	re-ordered memory operations on parallelism</span></p>
4684
4685
4686
4687	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>Richard H. <span
4688
4689	class=SpellE>Littin</span>, John G. <span class=SpellE>Cleary</span></span></p>
4690
4691
4692
4693	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>The performance
4694
4695	effect of permitting different memory operations to be re-ordered is
4696
4697	examined.<span style='mso-spacerun:yes'>Â </span>The available parallelism is
4698
4699	computed using a machine code simulator.<span style='mso-spacerun:yes'>Â
4700
4701	</span>A range of possible restrictions on the re-ordering of memory operations
4702
4703	is considered: from the purely sequential case where no re-ordering is
4704
4705	permitted; to the completely permissive one where memory operations may occur
4706
4707	in any order so that the parallelism is restricted only by data
4708
4709	dependencies.<span style='mso-spacerun:yes'>Â </span>A general conclusion is
4710
4711	drawn that to reliably obtain parallelism beyond 10 instructions per clock will
4712
4713	require an ability to re-order all memory instructions.<span
4714
4715	style='mso-spacerun:yes'>Â </span>A brief description of a feasible
4716
4717	architecture capable of this is given.</span></p>
4718
4719
4720
4721
4722
4723
4724
4725
4726
4727
4728
4729
4730
4731
4732
4733	<p class=MsoNormal style='margin-right:-.4pt'><span lang=EN-GB>97/29</span></p>
4734
4735
4736
4737	<p class=MsoNormal><span lang=EN-GB>OZCHIâ96 Industry Session:<span
4738
4739	style='mso-spacerun:yes'>Â </span>Sixth Australian Conference on Human-Computer
4740
4741	Interaction</span></p>
4742
4743
4744
4745	<p class=MsoNormal><span lang=EN-GB>Edited by Chris Phillips, Janis <span
4746
4747	class=SpellE>McKauge</span></span></p>
4748
4749
4750
4751	<p class=MsoNormal><span lang=EN-GB>The idea for a specific industry session at
4752
4753	OZCHI was first mooted at the 1995 conference in <span class=SpellE>Wollongong</span>,
4754
4755	during questions following a session of short papers which happened
4756
4757	(serendipitously) to be presented by people from industry.<span
4758
4759	style='mso-spacerun:yes'>Â </span>An animated discussion took place, most of
4760
4761	which was about how OZCHI could be made more relevant to people in industry, be
4762
4763	it working as usability consultants, or working within organisations either as
4764
4765	usability professionals or as âchampions of the causeâ.<span
4766
4767	style='mso-spacerun:yes'>Â </span>The discussion raised more questions than
4768
4769	answers, about the format of such as session, about the challenges of
4770
4771	attracting industry participation, and about the best way of publishing the
4772
4773	results.<span style='mso-spacerun:yes'>Â </span>Although no real solutions were
4774
4775	arrived at, it was enough to place an industry session on the agenda for
4776
4777	OZCHIâ96.</span></p>
4778
4779
4780
4781
4782
4783
4784
4785
4786
4787
4788
4789	<p class=MsoNormal><span lang=EN-GB>97/30</span></p>
4790
4791
4792
4793	<p class=MsoNormal><span lang=EN-GB>Adaptive models of English text</span></p>
4794
4795
4796
4797	<p class=MsoNormal><span lang=EN-GB>W. J. <span class=SpellE>Teahan</span>,
4798
4799	John G. <span class=SpellE>Cleary</span></span></p>
4800
4801
4802
4803	<p class=MsoNormal><span lang=EN-GB>High quality models of English text with
4804
4805	performance approaching that of humans is important for many applications
4806
4807	including spelling correction, speech recognition, OCR, and encryption.<span
4808
4809	style='mso-spacerun:yes'>Â </span>A number of different statistical models of
4810
4811	English are compared with each other and with previous estimates from human
4812
4813	subjects.<span style='mso-spacerun:yes'>Â </span>It is concluded that the best
4814
4815	current models are word based with part of speech tags.<span
4816
4817	style='mso-spacerun:yes'>Â </span>Given sufficient training text, they are able
4818
4819	to attain performance comparable to humans.</span></p>
4820
4821
4822
4823
4824
4825
4826
4827
4828
4829
4830
4831
4832
4833
4834
4835	<p class=MsoNormal><span lang=EN-GB>97/31</span></p>
4836
4837
4838
4839	<p class=MsoNormal><span lang=EN-GB>A graphical user interface for Boolean
4840
4841	query specification</span></p>
4842
4843
4844
4845	<p class=MsoNormal><span lang=EN-GB>Steve Jones, <span class=SpellE>Shona</span>
4846
4847	<span class=SpellE>McInnes</span></span></p>
4848
4849
4850
4851	<p class=MsoNormal><span lang=EN-GB>On-line information repositories commonly
4852
4853	provide keyword search facilities via textual query languages based on Boolean
4854
4855	logic.<span style='mso-spacerun:yes'>Â </span>However, there is evidence to
4856
4857	suggest that the syntactical demands of such languages can lead to user errors
4858
4859	and adversely affect the time that it takes users to form queries.<span
4860
4861	style='mso-spacerun:yes'>Â </span>Users also face difficulties because of the
4862
4863	conflict in semantics between AND <span class=SpellE>and</span> OR when used in
4864
4865	Boolean logic and English language.<span style='mso-spacerun:yes'>Â </span>We
4866
4867	suggest that graphical query languages, in particular Venn-like diagrams, can
4868
4869	alleviate the problems that users experience when forming Boolean expressions
4870
4871	with textual languages.<span style='mso-spacerun:yes'>Â </span>We describe <span
4872
4873	class=SpellE>Vquery</span>, a Venn-diagram based user interface to the New
4874
4875	Zealand Digital Library (NZDL).<span style='mso-spacerun:yes'>Â </span>The
4876
4877	design of <span class=SpellE>Vquery</span> has been partly motivated by
4878
4879	analysis of NZDL usage.<span style='mso-spacerun:yes'>Â </span>We found that
4880
4881	few queries contain more than three terms, use of the intersection operator
4882
4883	dominates and that query refinement is common.<span style='mso-spacerun:yes'>Â
4884
4885	</span>A study of the utility of Venn diagrams for query specification
4886
4887	indicates that with little or no training users can interpret and form
4888
4889	Venn-like diagrams which accurately correspond to Boolean expressions.<span
4890
4891	style='mso-spacerun:yes'>Â </span>The utility of <span class=SpellE>Vquery</span>
4892
4893	is considered and directions for future work are proposed.</span></p>
4894
4895
4896
4897
4898
4899
4900
4901
4902
4903
4904
4905	</div>
4906
4907
4908
4909
4910
4911
4912
4913
4914
4915	</Content>
4916	</Section>
4917	</Archive>

Note: See TracBrowser for help on using the repository browser.

Download in other formats: