1 | <?xml version="1.0" encoding="utf-8" standalone="no"?>
|
---|
2 | <!DOCTYPE Archive SYSTEM "http://greenstone.org/dtd/Archive/1.0/Archive.dtd">
|
---|
3 | <Archive>
|
---|
4 | <Section>
|
---|
5 | <Description>
|
---|
6 | <Metadata name="gsdldoctype">indexed_doc</Metadata>
|
---|
7 | <Metadata name="Language">en</Metadata>
|
---|
8 | <Metadata name="Encoding">utf8</Metadata>
|
---|
9 | <Metadata name="Author">Bronwyn</Metadata>
|
---|
10 | <Metadata name="Title">biblio_for_dl_scientometrics.do</Metadata>
|
---|
11 | <Metadata name="URL">http://C:/Users/Anupama/GS307_13July2015/web/sites/localsite/collect/Word-PDF-Enhanced/tmp/1436775750/pdf03.html</Metadata>
|
---|
12 | <Metadata name="UTF8URL">http://C:/Users/Anupama/GS307_13July2015/web/sites/localsite/collect/Word-PDF-Enhanced/tmp/1436775750/pdf03.html</Metadata>
|
---|
13 | <Metadata name="gsdlsourcefilename">import\pdf03.pdf</Metadata>
|
---|
14 | <Metadata name="gsdlconvertedfilename">tmp\1436775750\pdf03.html</Metadata>
|
---|
15 | <Metadata name="OrigSource">pdf03.html</Metadata>
|
---|
16 | <Metadata name="Source">pdf03.pdf</Metadata>
|
---|
17 | <Metadata name="SourceFile">pdf03.pdf</Metadata>
|
---|
18 | <Metadata name="Plugin">PDFPlugin</Metadata>
|
---|
19 | <Metadata name="FileSize">35935</Metadata>
|
---|
20 | <Metadata name="FilenameRoot">pdf03</Metadata>
|
---|
21 | <Metadata name="FileFormat">PDF</Metadata>
|
---|
22 | <Metadata name="srcicon">_iconpdf_</Metadata>
|
---|
23 | <Metadata name="srclink_file">doc.pdf</Metadata>
|
---|
24 | <Metadata name="srclinkFile">doc.pdf</Metadata>
|
---|
25 | <Metadata name="NumPages">17</Metadata>
|
---|
26 | <Metadata name="dc.Creator">Sally Jo Cunningham</Metadata>
|
---|
27 | <Metadata name="dc.Title">Applications for Bibliometric Research in the Emerging Digital Libraries</Metadata>
|
---|
28 | <Metadata name="Identifier">HASH019c5dca7f5bb781460a6b9c</Metadata>
|
---|
29 | <Metadata name="lastmodified">1436763858</Metadata>
|
---|
30 | <Metadata name="lastmodifieddate">20150713</Metadata>
|
---|
31 | <Metadata name="oailastmodified">1436775750</Metadata>
|
---|
32 | <Metadata name="oailastmodifieddate">20150713</Metadata>
|
---|
33 | <Metadata name="assocfilepath">HASH019c.dir</Metadata>
|
---|
34 | <Metadata name="gsdlassocfile">doc.pdf:application/pdf:</Metadata>
|
---|
35 | </Description>
|
---|
36 | <Content>
|
---|
37 |
|
---|
38 |
|
---|
39 | <A name=1></a><b>Applications for Bibliometric Research</b><br>
|
---|
40 |
|
---|
41 |
|
---|
42 | <b>in the Emerging Digital Libraries</b><br>
|
---|
43 |
|
---|
44 |
|
---|
45 | Sally Jo Cunningham<br>
|
---|
46 |
|
---|
47 |
|
---|
48 | Department of Computer Science<br>
|
---|
49 |
|
---|
50 |
|
---|
51 | University of Waikato<br>
|
---|
52 |
|
---|
53 |
|
---|
54 | Hamilton, New Zealand<br>
|
---|
55 |
|
---|
56 |
|
---|
57 | email: [email protected]<br>
|
---|
58 |
|
---|
59 |
|
---|
60 | <b>Abstract:</b> Large numbers of research documents have recently become available on<br>
|
---|
61 |
|
---|
62 |
|
---|
63 | the Internet through âdigital librariesâ, and these collections are seeing high levels of<br>
|
---|
64 |
|
---|
65 |
|
---|
66 | use by their related research communities. A secondary use for these document<br>
|
---|
67 |
|
---|
68 |
|
---|
69 | repositories and indexes is as a platform for bibliometric research. We examine the<br>
|
---|
70 |
|
---|
71 |
|
---|
72 | extent to which the new digital libraries support conventional bibliometric analysis, and<br>
|
---|
73 |
|
---|
74 |
|
---|
75 | discuss shortcomings in their current forms. Interestingly, these electronic text<br>
|
---|
76 |
|
---|
77 |
|
---|
78 | archives also provide opportunities for new types of studies: generally the full text of<br>
|
---|
79 |
|
---|
80 |
|
---|
81 | documents are available for analysis, giving a finer grain of insight than abstract-only<br>
|
---|
82 |
|
---|
83 |
|
---|
84 | online databases; these repositories often contain technical reports or pre-prints, the<br>
|
---|
85 |
|
---|
86 |
|
---|
87 | âgrey literatureâ that has been previously unavailable for analysis; and document<br>
|
---|
88 |
|
---|
89 |
|
---|
90 | âusageâ can be measured directly by recording user accesses, rather than studied<br>
|
---|
91 |
|
---|
92 |
|
---|
93 | indirectly through document references.<br>
|
---|
94 |
|
---|
95 |
|
---|
96 | <b>1. Introduction</b><br>
|
---|
97 |
|
---|
98 |
|
---|
99 | In recent years a number of &quot;digital libraries&quot; have become available through the<br>
|
---|
100 |
|
---|
101 |
|
---|
102 | Internet. While the technology promises in the future to support large, heterogenous<br>
|
---|
103 |
|
---|
104 |
|
---|
105 | collections, at present the most widely used of the academically-focussed digital<br>
|
---|
106 |
|
---|
107 |
|
---|
108 | libraries are generally repositories of one or two types of document (typically technical<br>
|
---|
109 |
|
---|
110 |
|
---|
111 | reports, journal articles, pre-prints, or conference proceedings), grouped by discipline.<br>
|
---|
112 |
|
---|
113 |
|
---|
114 | <hr>
|
---|
115 |
|
---|
116 |
|
---|
117 | <A name=2></a>A distinguishing characteristic of these digital libraries is that the full text of documents<br>
|
---|
118 |
|
---|
119 |
|
---|
120 | are often available for retrieval, as well as bibliographic records.The sciences are<br>
|
---|
121 |
|
---|
122 |
|
---|
123 | represented much more heavily in the present crop of digital libraries than the social<br>
|
---|
124 |
|
---|
125 |
|
---|
126 | sciences, arts, or humanities. They are maintained by professional societies,<br>
|
---|
127 |
|
---|
128 |
|
---|
129 | universities, research laboratories, and even private individuals. Access is generally<br>
|
---|
130 |
|
---|
131 |
|
---|
132 | free, both to search and to download documents.<br>
|
---|
133 |
|
---|
134 |
|
---|
135 | The emergence of these subject-specific digital libraries is particularly important<br>
|
---|
136 |
|
---|
137 |
|
---|
138 | given the pattern of access to materials presently employed by research scientists.<br>
|
---|
139 |
|
---|
140 |
|
---|
141 | Informal exchanges of preprints, reprints, and photocopies of papers passed on by<br>
|
---|
142 |
|
---|
143 |
|
---|
144 | colleagues currently are major venues for the transmission of scientific information<br>
|
---|
145 |
|
---|
146 |
|
---|
147 | between researchers in the sciences. In one study, the dependence on these sources<br>
|
---|
148 |
|
---|
149 |
|
---|
150 | ranges from 12% (for chemistry) to 39% (for mathematics) of all papers cited in<br>
|
---|
151 |
|
---|
152 |
|
---|
153 | researchers' own publications [11]. A qualitative study of study of how computer<br>
|
---|
154 |
|
---|
155 |
|
---|
156 | scientists locate and retrieve documents (computing is one of the domains considered<br>
|
---|
157 |
|
---|
158 |
|
---|
159 | later in this paper) indicates that for that field, technical reports and research documents<br>
|
---|
160 |
|
---|
161 |
|
---|
162 | found in various locations on the Internet are a preferred source of information [6].<br>
|
---|
163 |
|
---|
164 |
|
---|
165 | Many of the digital library systems discussed in this paper are repositories for just this<br>
|
---|
166 |
|
---|
167 |
|
---|
168 | type of literature. The documents tend to be of high quality: primarily technical<br>
|
---|
169 |
|
---|
170 |
|
---|
171 | reports or working papers from research institutions (both academic and commercial),<br>
|
---|
172 |
|
---|
173 |
|
---|
174 | as well as advance copies of work accepted for publication in conventional paper<br>
|
---|
175 |
|
---|
176 |
|
---|
177 | journals. Moreover, these digital libraries are also coming to include refereed work<br>
|
---|
178 |
|
---|
179 |
|
---|
180 | published digitally (in electronic journals). Anecdotal evidence suggests that in their<br>
|
---|
181 |
|
---|
182 |
|
---|
183 | fields, these digital libraries are coming to be the resource of choice for locating cutting<br>
|
---|
184 |
|
---|
185 |
|
---|
186 | edge work.<br>
|
---|
187 |
|
---|
188 |
|
---|
189 | For specialized subjects such as high energy physics, this dependence on<br>
|
---|
190 |
|
---|
191 |
|
---|
192 | informal or extra-library dissemination can be much higher. Ginsparg ([9], [10])<br>
|
---|
193 |
|
---|
194 |
|
---|
195 | reports that fields in physics have traditionally relied heavily on preprint exchanges, and<br>
|
---|
196 |
|
---|
197 |
|
---|
198 | the digital repositories of physics preprints begun in 1991 (the PHYSICS E-PRINT<br>
|
---|
199 |
|
---|
200 |
|
---|
201 | ARCHIVES) have to a large extent supplanted conventional publishing and physical<br>
|
---|
202 |
|
---|
203 |
|
---|
204 | <hr>
|
---|
205 |
|
---|
206 |
|
---|
207 | <A name=3></a>paper mailing of technical reports. By providing ready access to information sources<br>
|
---|
208 |
|
---|
209 |
|
---|
210 | that are already preferentially utilized by scientists, the digital libraries show potential to<br>
|
---|
211 |
|
---|
212 |
|
---|
213 | increase access to information that until recently was expensive or difficult to acquire in<br>
|
---|
214 |
|
---|
215 |
|
---|
216 | paper form. Indeed, in some fields (most notably physics) this process has already<br>
|
---|
217 |
|
---|
218 |
|
---|
219 | begun, as researchers in less developed countries report access to ongoing research<br>
|
---|
220 |
|
---|
221 |
|
---|
222 | through the Internet repositories that their local libraries could not afford to acquire<br>
|
---|
223 |
|
---|
224 |
|
---|
225 | through conventional journal subscriptions ([9], [10]).<br>
|
---|
226 |
|
---|
227 |
|
---|
228 | The primary use for new bibliographic resources is, of course, for the contents<br>
|
---|
229 |
|
---|
230 |
|
---|
231 | of the documents involved. A secondary use for emerging resources is as a basis for<br>
|
---|
232 |
|
---|
233 |
|
---|
234 | bibliometric analysis of the subject field. With the conventionally published scientific<br>
|
---|
235 |
|
---|
236 |
|
---|
237 | literature, the sheer difficulty of accumulating statistics discouraged bibliometric<br>
|
---|
238 |
|
---|
239 |
|
---|
240 | research until the advent of large bibliographic databases in the 1960's. Computerized<br>
|
---|
241 |
|
---|
242 |
|
---|
243 | bibliographic databases sparked a significant increase in the number of large-scale<br>
|
---|
244 |
|
---|
245 |
|
---|
246 | bibliographic studies, as significant portions of the collection and analysis of data could<br>
|
---|
247 |
|
---|
248 |
|
---|
249 | be automated ([12], [13]). The availability of CD-ROM versions of bibliographic<br>
|
---|
250 |
|
---|
251 |
|
---|
252 | databases has been of particular importance, since they provide a cheaper alternative to<br>
|
---|
253 |
|
---|
254 |
|
---|
255 | the online commercial databases [3].<br>
|
---|
256 |
|
---|
257 |
|
---|
258 | These computerized bibliographic resources have drawbacks, however. The<br>
|
---|
259 |
|
---|
260 |
|
---|
261 | greatest is that the full text of documents are rarely available, and even abstracts are not<br>
|
---|
262 |
|
---|
263 |
|
---|
264 | always present. This obviously limits the types of bibliometric research that can be<br>
|
---|
265 |
|
---|
266 |
|
---|
267 | conducted <i>solely</i> through these databases. In addition, these databases are generally<br>
|
---|
268 |
|
---|
269 |
|
---|
270 | limited to formally published documents (those appearing in selected books, journals,<br>
|
---|
271 |
|
---|
272 |
|
---|
273 | and conference proceedings). The &quot;grey literature&quot; of technical reports, pre-prints, and<br>
|
---|
274 |
|
---|
275 |
|
---|
276 | other works not formally published are largely ignored, and it is this absence of easy<br>
|
---|
277 |
|
---|
278 |
|
---|
279 | access to these documents that has hampered the analysis of these important forms of<br>
|
---|
280 |
|
---|
281 |
|
---|
282 | scientific communication.<br>
|
---|
283 |
|
---|
284 |
|
---|
285 | The digital libraries currently in existence complement the online and CD-ROM<br>
|
---|
286 |
|
---|
287 |
|
---|
288 | bibliographic databases. They are best suited for examinations of the &quot;physical&quot;<br>
|
---|
289 |
|
---|
290 |
|
---|
291 | characteristics of documents (for example, document length), analysis based on<br>
|
---|
292 |
|
---|
293 |
|
---|
294 | <hr>
|
---|
295 |
|
---|
296 |
|
---|
297 | <A name=4></a>bibliographic information that can be automatically extracted from the document text or<br>
|
---|
298 |
|
---|
299 |
|
---|
300 | the sometimes unevenly formatted bibliographic records (such as obsolescence<br>
|
---|
301 |
|
---|
302 |
|
---|
303 | studies), and usage studies (geographic or institutional origin of users, date/time of<br>
|
---|
304 |
|
---|
305 |
|
---|
306 | access, individual patterns of document retrieval, etc.). Because references are present<br>
|
---|
307 |
|
---|
308 |
|
---|
309 | in the document file but not identified by field, co-citation and bibliographic coupling<br>
|
---|
310 |
|
---|
311 |
|
---|
312 | research is not well-supported, and conducting these studies requires considerable<br>
|
---|
313 |
|
---|
314 |
|
---|
315 | effort on the part of the researcher.<br>
|
---|
316 |
|
---|
317 |
|
---|
318 | The variety of bibliographic repositories in the available digital libraries in itself<br>
|
---|
319 |
|
---|
320 |
|
---|
321 | has great potential in conducting bibliometric research. Sigogneau et al [15] present a<br>
|
---|
322 |
|
---|
323 |
|
---|
324 | case study illustrating the ways in which the strengths of different databases can be<br>
|
---|
325 |
|
---|
326 |
|
---|
327 | played off each other; they conduct a fine-grained analysis of the emergence of research<br>
|
---|
328 |
|
---|
329 |
|
---|
330 | fronts in molecular and cellular biology, and demonstrate that the observations gleaned<br>
|
---|
331 |
|
---|
332 |
|
---|
333 | from two complementary bibliographic databases provide greater insight into their<br>
|
---|
334 |
|
---|
335 |
|
---|
336 | problem. Similarly, it appears that the types of bibliographic data that can be gleaned<br>
|
---|
337 |
|
---|
338 |
|
---|
339 | from the relatively unstructured digital libraries can be profitably combined with data<br>
|
---|
340 |
|
---|
341 |
|
---|
342 | from online databases, CD-ROMS, and other more conventional bibliographic<br>
|
---|
343 |
|
---|
344 |
|
---|
345 | resources.<br>
|
---|
346 |
|
---|
347 |
|
---|
348 | This paper is organized as follows: Section 2 discusses the types of indexing<br>
|
---|
349 |
|
---|
350 |
|
---|
351 | and searching available with current digital libraries; Section 3 gives examples of<br>
|
---|
352 |
|
---|
353 |
|
---|
354 | conventional bibliometric techniques applied to Internet-accessible archives; Section 4<br>
|
---|
355 |
|
---|
356 |
|
---|
357 | discusses opportunities to directly measure usage of documents and to detect<br>
|
---|
358 |
|
---|
359 |
|
---|
360 | information-seeking patterns in researchers; and Section 5 presents our conclusions.<br>
|
---|
361 |
|
---|
362 |
|
---|
363 | <b>2. Indexing and searching in current digital libraries</b><br>
|
---|
364 |
|
---|
365 |
|
---|
366 | At present, the types of indexing fields for most academically-oriented digital<br>
|
---|
367 |
|
---|
368 |
|
---|
369 | library systems are limited. Many schemes index on user-supplied document<br>
|
---|
370 |
|
---|
371 |
|
---|
372 | descriptions, abstracts, or similar document surrogates (for example, the PHYSICS E-<br>
|
---|
373 |
|
---|
374 |
|
---|
375 | PRINT ARCHIVE [10], a collection of physics pre-prints and technical reports). As will<br>
|
---|
376 |
|
---|
377 |
|
---|
378 | <hr>
|
---|
379 |
|
---|
380 |
|
---|
381 | <A name=5></a>be discussed below, the quality of this user-provided data can be highly variable, and<br>
|
---|
382 |
|
---|
383 |
|
---|
384 | may unfavorably impact the usefulness of the index for searching. Alternatively, a<br>
|
---|
385 |
|
---|
386 |
|
---|
387 | designated site librarian may maintain a catalog (eg, the WATERS [14] system, now<br>
|
---|
388 |
|
---|
389 |
|
---|
390 | subsumed by NCSTRL (http://www.ncstrl.org/), both primarily collections of<br>
|
---|
391 |
|
---|
392 |
|
---|
393 | computer science technical reports); in this case the quality of the bibliographic<br>
|
---|
394 |
|
---|
395 |
|
---|
396 | information may be expedited to be higher, but fewer sites will be likely to support<br>
|
---|
397 |
|
---|
398 |
|
---|
399 | such a librarian and therefore fewer documents are likely to be included in the digital<br>
|
---|
400 |
|
---|
401 |
|
---|
402 | library. In a âharvestingâ system such as the computer science technical report<br>
|
---|
403 |
|
---|
404 |
|
---|
405 | collections supported by HARVEST [2] or the NEW ZEALAND DIGITAL LIBRARY<br>
|
---|
406 |
|
---|
407 |
|
---|
408 | computer science technical report collection ([16], [17]), documents are indexed from<br>
|
---|
409 |
|
---|
410 |
|
---|
411 | passive repositories (that may not even be aware that their documents are being<br>
|
---|
412 |
|
---|
413 |
|
---|
414 | included in the digital library). Harvesting systems therefore cannot rely on the<br>
|
---|
415 |
|
---|
416 |
|
---|
417 | presence of bibliographic data of any sort.<br>
|
---|
418 |
|
---|
419 |
|
---|
420 | Because of the relative paucity of high-quality bibliographic data available to<br>
|
---|
421 |
|
---|
422 |
|
---|
423 | many of the current academically- or research-focussed digital library collections, their<br>
|
---|
424 |
|
---|
425 |
|
---|
426 | search interfaces tend to be more primitive than those ordinarily found in online<br>
|
---|
427 |
|
---|
428 |
|
---|
429 | bibliographic databases or library catalogs. Systems such as NCSTRL can support<br>
|
---|
430 |
|
---|
431 |
|
---|
432 | author, title, and subject searching, but this more sophisticated search functionality<br>
|
---|
433 |
|
---|
434 |
|
---|
435 | comes at the expense of requiring participating repositories to use specific software. As<br>
|
---|
436 |
|
---|
437 |
|
---|
438 | a consequence, these latter systems may provide access to a small number of sites than<br>
|
---|
439 |
|
---|
440 |
|
---|
441 | harvesting systems. Harvesters may access a broader range of providers, but at the<br>
|
---|
442 |
|
---|
443 |
|
---|
444 | penalty of being limited to unfielded, keyword searches over the raw text of the<br>
|
---|
445 |
|
---|
446 |
|
---|
447 | document or document surrogate.<br>
|
---|
448 |
|
---|
449 |
|
---|
450 | Specifically, the indexing in existing digital libraries has a variety of shortcomings for<br>
|
---|
451 |
|
---|
452 |
|
---|
453 | bibliometric applications:<br>
|
---|
454 |
|
---|
455 |
|
---|
456 | â¢<br>
|
---|
457 |
|
---|
458 |
|
---|
459 | <i>lack of fielded indexing:</i> As noted above, some large and widely used digital<br>
|
---|
460 |
|
---|
461 |
|
---|
462 | libraries (such as the computer science technical report collection of the NEW<br>
|
---|
463 |
|
---|
464 |
|
---|
465 | ZEALAND DIGITAL LIBRARY) may lack formal cataloging entirely, and rely on<br>
|
---|
466 |
|
---|
467 |
|
---|
468 | <hr>
|
---|
469 |
|
---|
470 |
|
---|
471 | <A name=6></a>keyword searching over the raw document text. Obviously this makes field-<br>
|
---|
472 |
|
---|
473 |
|
---|
474 | dependent analysis more difficult (for example, locating documents produced by<br>
|
---|
475 |
|
---|
476 |
|
---|
477 | specific authors), and in the worst case my require a manual examination of all<br>
|
---|
478 |
|
---|
479 |
|
---|
480 | files in the collection in order to reliably identify a desired document subset.<br>
|
---|
481 |
|
---|
482 |
|
---|
483 | However, keyword search techniques that approximate fielded searching results<br>
|
---|
484 |
|
---|
485 |
|
---|
486 | may suffice: for example in the NEW ZEALAND DIGITAL LIBRARY computer<br>
|
---|
487 |
|
---|
488 |
|
---|
489 | science technical report collection, limiting the keyword search for âJohnsonâ<br>
|
---|
490 |
|
---|
491 |
|
---|
492 | to a search of first pages only is likely to retrieve documents written by Johnson<br>
|
---|
493 |
|
---|
494 |
|
---|
495 | (since for the majority of computer science technical reports, the first page<br>
|
---|
496 |
|
---|
497 |
|
---|
498 | contains little more than author, title, date, and institution details).<br>
|
---|
499 |
|
---|
500 |
|
---|
501 | A more principled approach to extracting bibliographic information is embodied<br>
|
---|
502 |
|
---|
503 |
|
---|
504 | in the CiteSeer tool [1]. This software parses raw, unfielded academic<br>
|
---|
505 |
|
---|
506 |
|
---|
507 | documents and attempts to identify such indexing information as author, title,<br>
|
---|
508 |
|
---|
509 |
|
---|
510 | reference list, etc. Obviously such a tool cannot attain 100% accuracy over a<br>
|
---|
511 |
|
---|
512 |
|
---|
513 | heterogenous document collection, but in practice it appears useful in that it can<br>
|
---|
514 |
|
---|
515 |
|
---|
516 | make a good first pass in processing a set of documents, providing an initial set<br>
|
---|
517 |
|
---|
518 |
|
---|
519 | of parsed documents for analysis. The remaining (presumably much smaller) set<br>
|
---|
520 |
|
---|
521 |
|
---|
522 | of unparsable documents can then be dealt with manually.<br>
|
---|
523 |
|
---|
524 |
|
---|
525 | â¢<br>
|
---|
526 |
|
---|
527 |
|
---|
528 | <i>lack of consistency in field formatting:</i> Current digital libraries usually acquire<br>
|
---|
529 |
|
---|
530 |
|
---|
531 | bibliographic information from either the authors of submitted articles or<br>
|
---|
532 |
|
---|
533 |
|
---|
534 | automatic extraction routines (retrieving bibliographic details from catalog files<br>
|
---|
535 |
|
---|
536 |
|
---|
537 | that may or may not be in a given document site, and that may or may not be in<br>
|
---|
538 |
|
---|
539 |
|
---|
540 | an easily parsable form). Neither of these methods produce records with<br>
|
---|
541 |
|
---|
542 |
|
---|
543 | standard formatting, which causes problems with automated bibliometric<br>
|
---|
544 |
|
---|
545 |
|
---|
546 | analysis. Consider the following examples selected from entries in the hep-th<br>
|
---|
547 |
|
---|
548 |
|
---|
549 | (high energy physics) collection of the PHYSICS E-PRINT ARCHIVES:<br>
|
---|
550 |
|
---|
551 |
|
---|
552 | <hr>
|
---|
553 |
|
---|
554 |
|
---|
555 | <A name=7></a>(i)<br>
|
---|
556 |
|
---|
557 |
|
---|
558 | Authors: A. Yu. Alekseev, V. Schomerus<br>
|
---|
559 |
|
---|
560 |
|
---|
561 | (ii)<br>
|
---|
562 |
|
---|
563 |
|
---|
564 | Authors: Adel Bilal and Ian. I. Kogan<br>
|
---|
565 |
|
---|
566 |
|
---|
567 | (iii)<br>
|
---|
568 |
|
---|
569 |
|
---|
570 | Authors: Paul S. Aspinwall and David R. Morrison (with an appendix <br>
|
---|
571 |
|
---|
572 |
|
---|
573 | by Mark Gross)<br>
|
---|
574 |
|
---|
575 |
|
---|
576 | (iv)<br>
|
---|
577 |
|
---|
578 |
|
---|
579 | Authors: A. H. Chamseddine and Herbi Dreiner (ETH-Zurich)<br>
|
---|
580 |
|
---|
581 |
|
---|
582 | In this case, typical for existing digital libraries, there is no standardized format<br>
|
---|
583 |
|
---|
584 |
|
---|
585 | for authors' names (here, appearing with full names, initials plus last name, and<br>
|
---|
586 |
|
---|
587 |
|
---|
588 | a mixture of the two); no standard convention for separating author names<br>
|
---|
589 |
|
---|
590 |
|
---|
591 | (here, either a comma or &quot;and&quot; are used); and parenthetical information can<br>
|
---|
592 |
|
---|
593 |
|
---|
594 | include a variety of information such as the name of an associate author or the<br>
|
---|
595 |
|
---|
596 |
|
---|
597 | institutional affiliations of an author. Manual processing or specially crafted<br>
|
---|
598 |
|
---|
599 |
|
---|
600 | software would be required to reformat these fields for analysis.<br>
|
---|
601 |
|
---|
602 |
|
---|
603 | â¢<br>
|
---|
604 |
|
---|
605 |
|
---|
606 | <i>duplicate entries: </i> Digital libraries that draw documents from a variety of sources<br>
|
---|
607 |
|
---|
608 |
|
---|
609 | may inadvertently contain duplicate items. Unfortunately, the irregular<br>
|
---|
610 |
|
---|
611 |
|
---|
612 | formatting of the bibliographic information makes it difficult to automatically<br>
|
---|
613 |
|
---|
614 |
|
---|
615 | detect these duplicates.<br>
|
---|
616 |
|
---|
617 |
|
---|
618 | â¢<br>
|
---|
619 |
|
---|
620 |
|
---|
621 | <i>implicit field tagging:</i> In some repositories, items are not explicitly tagged with<br>
|
---|
622 |
|
---|
623 |
|
---|
624 | certain types of information â most commonly the document's date of<br>
|
---|
625 |
|
---|
626 |
|
---|
627 | publication or production. Instead, the date is implicit in the document's title<br>
|
---|
628 |
|
---|
629 |
|
---|
630 | (eg, its numeration in a technical report series) or in the location of the document<br>
|
---|
631 |
|
---|
632 |
|
---|
633 | in the file structure of the repository (eg, separate directories exist for each<br>
|
---|
634 |
|
---|
635 |
|
---|
636 | year). A second common piece of implicit data is the authorsâ institutional<br>
|
---|
637 |
|
---|
638 |
|
---|
639 | affiliations. This may be contained in the document itself (typically on a cover<br>
|
---|
640 |
|
---|
641 |
|
---|
642 | page), or may be implicit in the documentâs location (for example, a<br>
|
---|
643 |
|
---|
644 |
|
---|
645 | corporationâs technical reports are stored in its ftp repository). Again, in these<br>
|
---|
646 |
|
---|
647 |
|
---|
648 | <hr>
|
---|
649 |
|
---|
650 |
|
---|
651 | <A name=8></a>cases special processing is required to append this field information to a<br>
|
---|
652 |
|
---|
653 |
|
---|
654 | document record for bibliometric analysis. <br>
|
---|
655 |
|
---|
656 |
|
---|
657 | â¢<br>
|
---|
658 |
|
---|
659 |
|
---|
660 | <i>extraction of document text:</i> Few of the documents stored in the research-<br>
|
---|
661 |
|
---|
662 |
|
---|
663 | oriented digital libraries discussed in this paper are straight ascii text; instead,<br>
|
---|
664 |
|
---|
665 |
|
---|
666 | documents may appear in a variety of file formats, such as LaTeX, PostScript,<br>
|
---|
667 |
|
---|
668 |
|
---|
669 | PDF, etc. If the contents of the documents are to be automatically processed<br>
|
---|
670 |
|
---|
671 |
|
---|
672 | (for example, to count the words in a document, or to extract reference<br>
|
---|
673 |
|
---|
674 |
|
---|
675 | publication dates for an obsolescence study), then the text must be extracted.<br>
|
---|
676 |
|
---|
677 |
|
---|
678 | Utilities are available to convert most common document formats to ascii.<br>
|
---|
679 |
|
---|
680 |
|
---|
681 | It is likely that many of these problems will be addressed as the Internet-based<br>
|
---|
682 |
|
---|
683 |
|
---|
684 | document indexing systems mature. Even minor changes can greatly increase the<br>
|
---|
685 |
|
---|
686 |
|
---|
687 | useability of a bibliographic database for bibliometric research. For example, the<br>
|
---|
688 |
|
---|
689 |
|
---|
690 | addition of an explicit date tag to many online databases in 1975 sparked new<br>
|
---|
691 |
|
---|
692 |
|
---|
693 | applications in time series research [3].<br>
|
---|
694 |
|
---|
695 |
|
---|
696 | <b>3. Opportunities for applications of bibliometric techniques</b><br>
|
---|
697 |
|
---|
698 |
|
---|
699 | One type of bibliometric research concentrates on quantifying fundamental,<br>
|
---|
700 |
|
---|
701 |
|
---|
702 | structural details about a subject literature: how many items are published, how many<br>
|
---|
703 |
|
---|
704 |
|
---|
705 | authors are publishing, over what time period documents are likely to be used, etc.<br>
|
---|
706 |
|
---|
707 |
|
---|
708 | More complex studies analyze the relationships between documents, such as how<br>
|
---|
709 |
|
---|
710 |
|
---|
711 | documents cluster into subjects. The following examples give a flavour of the<br>
|
---|
712 |
|
---|
713 |
|
---|
714 | bibliometric research that is possible using the emerging digital libraries:<br>
|
---|
715 |
|
---|
716 |
|
---|
717 | <i>examining the âphysicalâ characteristics of archived documents</i><br>
|
---|
718 |
|
---|
719 |
|
---|
720 | One relatively straightforward type of bibliometric study characterizes the<br>
|
---|
721 |
|
---|
722 |
|
---|
723 | formats of different literatures. For example, Figure 1 presents a the range of the size<br>
|
---|
724 |
|
---|
725 |
|
---|
726 | <hr>
|
---|
727 |
|
---|
728 |
|
---|
729 | <A name=9></a>of computer science technical reports as measured by their length in pages. Of the<br>
|
---|
730 |
|
---|
731 |
|
---|
732 | 45,720 documents in the CSTR collection as of April 1998, nearly 1600 did not contain<br>
|
---|
733 |
|
---|
734 |
|
---|
735 | page divisions in their files (and hence are excluded from analysis). Note that the<br>
|
---|
736 |
|
---|
737 |
|
---|
738 | number of pages in the shorter documents (&lt;50 pages) falls into an approximately<br>
|
---|
739 |
|
---|
740 |
|
---|
741 | normal distribution (slightly skewed to the left), while presumably the longer<br>
|
---|
742 |
|
---|
743 |
|
---|
744 | documents represent Mastersâ and Doctoral theses. A surprising number of documents<br>
|
---|
745 |
|
---|
746 |
|
---|
747 | are very short (between one and 5 pages); these may represent the type of condensed<br>
|
---|
748 |
|
---|
749 |
|
---|
750 | results frequently found in the âtechnical notesâ, âshort papersâ, and âposter sessionsâ<br>
|
---|
751 |
|
---|
752 |
|
---|
753 | of computing conferences and journals. The average number of pages per document,<br>
|
---|
754 |
|
---|
755 |
|
---|
756 | 27.5, appears to be slightly longer than the common upper bound for a computing<br>
|
---|
757 |
|
---|
758 |
|
---|
759 | journal article, although this observation must be confirmed by a similar study of the<br>
|
---|
760 |
|
---|
761 |
|
---|
762 | lengths of formally published computing articles.<br>
|
---|
763 |
|
---|
764 |
|
---|
765 | This type of analysis is of particular interest for technical reports, since they<br>
|
---|
766 |
|
---|
767 |
|
---|
768 | have not been studied in the same detail as formally published papers. A comparison of<br>
|
---|
769 |
|
---|
770 |
|
---|
771 | the physical characteristics of the formal and informal literature could provide<br>
|
---|
772 |
|
---|
773 |
|
---|
774 | supporting evidence for common beliefs about the relationship between the two types<br>
|
---|
775 |
|
---|
776 |
|
---|
777 | of documents. For example, do publishing constraints force journal and proceedings<br>
|
---|
778 |
|
---|
779 |
|
---|
780 | articles to be shorter than technical reports, and therefore presumably omit technical<br>
|
---|
781 |
|
---|
782 |
|
---|
783 | details of findings? Do technical reports contain more/less extensive reference sections?<br>
|
---|
784 |
|
---|
785 |
|
---|
786 | If reference sections of technical reports are longer than those of published articles, then<br>
|
---|
787 |
|
---|
788 |
|
---|
789 | citation links are being ommitted in published works; if technical reports contain fewer<br>
|
---|
790 |
|
---|
791 |
|
---|
792 | references, then this may confirm earlier indications that computer scientists tend to<br>
|
---|
793 |
|
---|
794 |
|
---|
795 | âresearch firstâ and do literature surveys later [6].<br>
|
---|
796 |
|
---|
797 |
|
---|
798 | Figure 1. Range of sizes of CS technical reports, measured by number of pages<br>
|
---|
799 |
|
---|
800 |
|
---|
801 | <i>obsolescence studies.</i><br>
|
---|
802 |
|
---|
803 |
|
---|
804 | A document is considered obsolete when it is no longer referenced by the<br>
|
---|
805 |
|
---|
806 |
|
---|
807 | current literature. Typically, documents receive their greatest number and frequency of<br>
|
---|
808 |
|
---|
809 |
|
---|
810 | <hr>
|
---|
811 |
|
---|
812 |
|
---|
813 | <A name=10></a>citations immediately after publication, and the frequency of citation falls rapidly as time<br>
|
---|
814 |
|
---|
815 |
|
---|
816 | passes. One technique for estimating the obsolescence rate of a body of literatureâ the<br>
|
---|
817 |
|
---|
818 |
|
---|
819 | <i>synchronous</i> method â is to find the median date in the references of the documents.<br>
|
---|
820 |
|
---|
821 |
|
---|
822 | This median date is subtracted from the year of publication for the documents, yielding<br>
|
---|
823 |
|
---|
824 |
|
---|
825 | the <i>median citation age</i>. As would be expected, this median varies between the<br>
|
---|
826 |
|
---|
827 |
|
---|
828 | disciplines. Typically the social sciences and arts have a higher median citation age<br>
|
---|
829 |
|
---|
830 |
|
---|
831 | than the âhardâ sciences and engineering, indicating that documents obsolesce more<br>
|
---|
832 |
|
---|
833 |
|
---|
834 | quickly for the latter fields.<br>
|
---|
835 |
|
---|
836 |
|
---|
837 | As noted in Section 2, references are not generally explicitly tagged in existing<br>
|
---|
838 |
|
---|
839 |
|
---|
840 | digital repositories. However, reference dates can usually be extracted from the<br>
|
---|
841 |
|
---|
842 |
|
---|
843 | document text by first locating the reference section (usually delimited by a &quot;references&quot;<br>
|
---|
844 |
|
---|
845 |
|
---|
846 | or &quot;bibliography&quot; section heading), and then extracting all numbers in the appropriate<br>
|
---|
847 |
|
---|
848 |
|
---|
849 | ranges for dates for the field under study.<br>
|
---|
850 |
|
---|
851 |
|
---|
852 | To illustrate this process, 188 technical reports were sampled from Internet-<br>
|
---|
853 |
|
---|
854 |
|
---|
855 | accessible repositories1 and used as source documents for a synchronous obsolescence<br>
|
---|
856 |
|
---|
857 |
|
---|
858 | study. Conveniently, the repositories chosen organize technical reports into sub-<br>
|
---|
859 |
|
---|
860 |
|
---|
861 | directories by their date of publication. The reference dates for each technical report<br>
|
---|
862 |
|
---|
863 |
|
---|
864 | were automatically extracted by software that scanned the documentâs file for numbers<br>
|
---|
865 |
|
---|
866 |
|
---|
867 | of the form 19XX, since previous studies indicate that few if any computing reports<br>
|
---|
868 |
|
---|
869 |
|
---|
870 | reference documents published in previous centuries [5]. Table 1 presents the median<br>
|
---|
871 |
|
---|
872 |
|
---|
873 | citation age calculated for these documents, broken down by repository and the year of<br>
|
---|
874 |
|
---|
875 |
|
---|
876 | publication for the source documents from which the reference dates were extracted:<br>
|
---|
877 |
|
---|
878 |
|
---|
879 | Table 1. Median citation ages for technical report repositories<br>
|
---|
880 |
|
---|
881 |
|
---|
882 | The median citation age ranges between 2 and 4 years, which is consistent with<br>
|
---|
883 |
|
---|
884 |
|
---|
885 | previous examinations of computing and information systems literature ([5], [4]).<br>
|
---|
886 |
|
---|
887 |
|
---|
888 | When graphed, the distribution of reference dates show the exponential curve typically<br>
|
---|
889 |
|
---|
890 |
|
---|
891 | found in obsolescence studies, including the final droop due to an âimmediacy effectâ<br>
|
---|
892 |
|
---|
893 |
|
---|
894 | <hr>
|
---|
895 |
|
---|
896 |
|
---|
897 | <A name=11></a>as fewer very new documents are available for citation [7]. These types of results<br>
|
---|
898 |
|
---|
899 |
|
---|
900 | provide confirmation that references used in computer science technical reports (the pre-<br>
|
---|
901 |
|
---|
902 |
|
---|
903 | eminent âgrey literatureâ of the computing field) conforms to the same patterns as<br>
|
---|
904 |
|
---|
905 |
|
---|
906 | references found in the formally published literature.<br>
|
---|
907 |
|
---|
908 |
|
---|
909 | <i>co-citation and bibliographic coupling studies</i><br>
|
---|
910 |
|
---|
911 |
|
---|
912 | The rate at which documents cite each other (co-citation) or cite the same<br>
|
---|
913 |
|
---|
914 |
|
---|
915 | documents (bibliographic coupling) can be used to produce &quot;maps&quot; of a subject<br>
|
---|
916 |
|
---|
917 |
|
---|
918 | literature. These techniques rely on analysis of the references of documents, and these<br>
|
---|
919 |
|
---|
920 |
|
---|
921 | references must be in a common format. While digital libraries contain full text of<br>
|
---|
922 |
|
---|
923 |
|
---|
924 | documents, their references are not standardized, and indeed are not even tagged as<br>
|
---|
925 |
|
---|
926 |
|
---|
927 | such. To perform these studies the references must be manually extracted and<br>
|
---|
928 |
|
---|
929 |
|
---|
930 | processedâa tedious process that is only worthwhile for documents (such as technical<br>
|
---|
931 |
|
---|
932 |
|
---|
933 | reports) that are not included in existing citation databases such as the Science Citation<br>
|
---|
934 |
|
---|
935 |
|
---|
936 | Index and Social Science Citation Index.<br>
|
---|
937 |
|
---|
938 |
|
---|
939 | <i>detecting cycles or regularities in the rate of production of research</i><br>
|
---|
940 |
|
---|
941 |
|
---|
942 | Analysis of trends in the production of technical reports can give indications<br>
|
---|
943 |
|
---|
944 |
|
---|
945 | about working conditions that affect research; for example, is more research produced<br>
|
---|
946 |
|
---|
947 |
|
---|
948 | over the summer, when the teaching load is lighter? or is research steadily produced<br>
|
---|
949 |
|
---|
950 |
|
---|
951 | throughout the year?<br>
|
---|
952 |
|
---|
953 |
|
---|
954 | Figure 2. Distribution of the number of documents submitted to hep-th, 1992-1994<br>
|
---|
955 |
|
---|
956 |
|
---|
957 | Figures 2 and 3 present statistics on document accumulation in the hep-th (high<br>
|
---|
958 |
|
---|
959 |
|
---|
960 | energy physics) e-print server, a part of the PHYSICS E-PRINT ARCHIVE. This system<br>
|
---|
961 |
|
---|
962 |
|
---|
963 | is one of the oldest formal pre-print archives, and has become the primary means for<br>
|
---|
964 |
|
---|
965 |
|
---|
966 | information dissemination in its field. Examination of these figures reveals several<br>
|
---|
967 |
|
---|
968 |
|
---|
969 | trends. Clearly the absolute number of documents deposited in the repository has<br>
|
---|
970 |
|
---|
971 |
|
---|
972 | <hr>
|
---|
973 |
|
---|
974 |
|
---|
975 | <A name=12></a>tended to increase over the time period. For all three years, research production has its<br>
|
---|
976 |
|
---|
977 |
|
---|
978 | lowest point in January and February, increases through May and June, then decreases<br>
|
---|
979 |
|
---|
980 |
|
---|
981 | until August and September. At that point the rate of production steps up, reaching a<br>
|
---|
982 |
|
---|
983 |
|
---|
984 | yearly peak in November and December. This pattern is less clear for 1992, which<br>
|
---|
985 |
|
---|
986 |
|
---|
987 | might be expected as the archive was established in mid-1991.<br>
|
---|
988 |
|
---|
989 |
|
---|
990 | Figure 3. Distribution of the percentage of documents submitted to hep-th, 1992-1994<br>
|
---|
991 |
|
---|
992 |
|
---|
993 | <b>4. Analysis of usage data</b><br>
|
---|
994 |
|
---|
995 |
|
---|
996 | The emerging Internet-based digital libraries will permit research on scientific<br>
|
---|
997 |
|
---|
998 |
|
---|
999 | information collection and use at a much finer grain than is possible with current paper<br>
|
---|
1000 |
|
---|
1001 |
|
---|
1002 | libraries or online bibliographic databases. Current bibliometric or scientometric<br>
|
---|
1003 |
|
---|
1004 |
|
---|
1005 | research of this type must measure information use indirectly â for example, through<br>
|
---|
1006 |
|
---|
1007 |
|
---|
1008 | examination of the list of references appended to published articles. However, it is well<br>
|
---|
1009 |
|
---|
1010 |
|
---|
1011 | known that authors do not necessarily include in the reference list all documents that<br>
|
---|
1012 |
|
---|
1013 |
|
---|
1014 | could have been cited, and conversely that not all references listed may have been<br>
|
---|
1015 |
|
---|
1016 |
|
---|
1017 | actually âusedâ in performing the research; citation behavior can be affected by a<br>
|
---|
1018 |
|
---|
1019 |
|
---|
1020 | number of motivating factors (Garfield lists <i>15</i> possible reasons in [8]).<br>
|
---|
1021 |
|
---|
1022 |
|
---|
1023 | Digital library transaction logs provide a powerful tool for direct analysis of<br>
|
---|
1024 |
|
---|
1025 |
|
---|
1026 | document âusageâ: since digital libraries contain the actual document (rather than only a<br>
|
---|
1027 |
|
---|
1028 |
|
---|
1029 | document surrogate), the relative amount of âuseâ that a digital libraryâs clients make of<br>
|
---|
1030 |
|
---|
1031 |
|
---|
1032 | a given document sees can be estimated from the number of times the document file is<br>
|
---|
1033 |
|
---|
1034 |
|
---|
1035 | downloaded (and, presumably, the document is read). Note that file downloading is a<br>
|
---|
1036 |
|
---|
1037 |
|
---|
1038 | much stronger statement on the part of the user than, for example, having a<br>
|
---|
1039 |
|
---|
1040 |
|
---|
1041 | bibliographic record appear in the query result set for a conventional bibliographic<br>
|
---|
1042 |
|
---|
1043 |
|
---|
1044 | system; the user downloads only <i>after</i> the document has been found potentially relevant<br>
|
---|
1045 |
|
---|
1046 |
|
---|
1047 | through examination of its document surrogate. Additionally, downloading is<br>
|
---|
1048 |
|
---|
1049 |
|
---|
1050 | frequently time-consuming and sometimes costly (depending on local pricing for<br>
|
---|
1051 |
|
---|
1052 |
|
---|
1053 | <hr>
|
---|
1054 |
|
---|
1055 |
|
---|
1056 | <A name=13></a>Internet access). Downloaded documents are therefore highly likely at least to be<br>
|
---|
1057 |
|
---|
1058 |
|
---|
1059 | scanned, if not read closely. The transaction logs for a digital library can provide a<br>
|
---|
1060 |
|
---|
1061 |
|
---|
1062 | global picture of the use of documents in the collection, since all user interactions with<br>
|
---|
1063 |
|
---|
1064 |
|
---|
1065 | the library can be automatically logged for analysis. By contrast, it is of course<br>
|
---|
1066 |
|
---|
1067 |
|
---|
1068 | impossible to track usage of print bibliographies, and very difficult to monitor usage of<br>
|
---|
1069 |
|
---|
1070 |
|
---|
1071 | bibliographic data available on CD-ROM across more than one or two sites.<br>
|
---|
1072 |
|
---|
1073 |
|
---|
1074 | Furthermore, analysis of search requests by geographic location, institution,<br>
|
---|
1075 |
|
---|
1076 |
|
---|
1077 | and sometimes even individual user are also possible. As an example, Table 2 presents<br>
|
---|
1078 |
|
---|
1079 |
|
---|
1080 | a portion of the summary of usage statistics (broken down by domain code) for queries<br>
|
---|
1081 |
|
---|
1082 |
|
---|
1083 | to the computer science technical collection of the NEW ZEALAND DIGITAL LIBRARY.<br>
|
---|
1084 |
|
---|
1085 |
|
---|
1086 | Examination of the data indicates that the heaviest use of the collection comes from<br>
|
---|
1087 |
|
---|
1088 |
|
---|
1089 | North America, Europe (particularly Germany and Finland), as well as the local New<br>
|
---|
1090 |
|
---|
1091 |
|
---|
1092 | Zealand community and nearby Australia. As expected for such a collection, a large<br>
|
---|
1093 |
|
---|
1094 |
|
---|
1095 | proportion of users are from educational (.edu) institutions; surprisingly, however, a<br>
|
---|
1096 |
|
---|
1097 |
|
---|
1098 | similar number of queries come from commercial (.com) organizations, indicating<br>
|
---|
1099 |
|
---|
1100 |
|
---|
1101 | perhaps that the documents are seeing use in commercial research and development<br>
|
---|
1102 |
|
---|
1103 |
|
---|
1104 | units.<br>
|
---|
1105 |
|
---|
1106 |
|
---|
1107 | Table 2. Accesses to the NEW ZEALAND DIGITAL LIBRARY CS collection by Domain<br>Code<br>
|
---|
1108 |
|
---|
1109 |
|
---|
1110 | Of course, usage levels can also be further broken down by IP number<br>
|
---|
1111 |
|
---|
1112 |
|
---|
1113 | (indicating institutions), and systems requiring users to register may also be able to<br>
|
---|
1114 |
|
---|
1115 |
|
---|
1116 | analyze usage on an individual basis. Since the query strings themselves are also<br>
|
---|
1117 |
|
---|
1118 |
|
---|
1119 | recorded in the transaction logs, this domain/institution/individual activity could also be<br>
|
---|
1120 |
|
---|
1121 |
|
---|
1122 | linked to specific subjects through the query terms. Summaries of this type could be<br>
|
---|
1123 |
|
---|
1124 |
|
---|
1125 | invaluable for studies of geographic diffusion and distribution of research topics.<br>
|
---|
1126 |
|
---|
1127 |
|
---|
1128 | Transaction log analysis can also indicate time-related patterns in the<br>
|
---|
1129 |
|
---|
1130 |
|
---|
1131 | information seeking behavior of digital library users. As a sample of this type of<br>
|
---|
1132 |
|
---|
1133 |
|
---|
1134 | analysis, Paul Ginsparg notes a seven day periodicity in the number of search requests<br>
|
---|
1135 |
|
---|
1136 |
|
---|
1137 | <hr>
|
---|
1138 |
|
---|
1139 |
|
---|
1140 | <A name=14></a>made to the PHYSICS E-PRINT archives (Figure 4, reproduced from [9]). From this he<br>
|
---|
1141 |
|
---|
1142 |
|
---|
1143 | adduces that many physicists do not yet have weekend access to the Internet (an<br>
|
---|
1144 |
|
---|
1145 |
|
---|
1146 | alternative, slightly more cynical hypothesis is that even high energy theoretical<br>
|
---|
1147 |
|
---|
1148 |
|
---|
1149 | physicists take the weekend off).<br>
|
---|
1150 |
|
---|
1151 |
|
---|
1152 | Figure 4. Summary of search requests to the physics pre-print archives<br>
|
---|
1153 |
|
---|
1154 |
|
---|
1155 | <b>5. Conclusion</b><br>
|
---|
1156 |
|
---|
1157 |
|
---|
1158 | This study suggests opportunities for conducting bibliometric research on the<br>
|
---|
1159 |
|
---|
1160 |
|
---|
1161 | evolving digital libraries. These repositories are suitable platforms for conventional<br>
|
---|
1162 |
|
---|
1163 |
|
---|
1164 | bibliometric techniques (such as obsolescence studies, quantification of physical<br>
|
---|
1165 |
|
---|
1166 |
|
---|
1167 | characteristics of documents comprising a subject literature, time analysis, etc.). The<br>
|
---|
1168 |
|
---|
1169 |
|
---|
1170 | ability to directly monitor access to documents in digital libraries also enables<br>
|
---|
1171 |
|
---|
1172 |
|
---|
1173 | researchers to explicitly quantify document usage, as well as to implicitly measure<br>
|
---|
1174 |
|
---|
1175 |
|
---|
1176 | usage through citations. Additional facilities could aid in the performance of<br>
|
---|
1177 |
|
---|
1178 |
|
---|
1179 | bibliographic experiments, such as: improved tagging of document fields; provision of<br>
|
---|
1180 |
|
---|
1181 |
|
---|
1182 | utilities to strip out titles, authors, etc. from common document formats; and the ability<br>
|
---|
1183 |
|
---|
1184 |
|
---|
1185 | to easily eliminate duplicate entries from downloaded library subsets. Unfortunately,<br>
|
---|
1186 |
|
---|
1187 |
|
---|
1188 | the most useful of these additional facilities â those associated with a higher degree of<br>
|
---|
1189 |
|
---|
1190 |
|
---|
1191 | cataloging â run counter to the underlying philosophy of many digital libraries: to<br>
|
---|
1192 |
|
---|
1193 |
|
---|
1194 | avoid, if possible, manual processing and formal cataloging of documents. While<br>
|
---|
1195 |
|
---|
1196 |
|
---|
1197 | adherence to this principle can limit the accuracy of fielded searching (or indeed,<br>
|
---|
1198 |
|
---|
1199 |
|
---|
1200 | preclude it altogether), it can also avoid the cataloging bottleneck and permit digital<br>
|
---|
1201 |
|
---|
1202 |
|
---|
1203 | libraries to provide access to larger numbers of documents.<br>
|
---|
1204 |
|
---|
1205 |
|
---|
1206 | The digital libraries complement the information currently available through<br>
|
---|
1207 |
|
---|
1208 |
|
---|
1209 | paper, online, and CD-ROM bibliographic resources. While these latter databases<br>
|
---|
1210 |
|
---|
1211 |
|
---|
1212 | generally have the advantage of standardized formatting of bibliographic fields, the<br>
|
---|
1213 |
|
---|
1214 |
|
---|
1215 | digital libraries are freely accessible, often contain &quot;grey literature&quot; that is otherwise<br>
|
---|
1216 |
|
---|
1217 |
|
---|
1218 | <hr>
|
---|
1219 |
|
---|
1220 |
|
---|
1221 | <A name=15></a>unavailable for analysis, and generally make the full text of documents available. The<br>
|
---|
1222 |
|
---|
1223 |
|
---|
1224 | insights gained from analysis of digital libraries will add to the store of &quot;information<br>
|
---|
1225 |
|
---|
1226 |
|
---|
1227 | about information&quot; that we have gained from older types of bibliographic repositories.<br>
|
---|
1228 |
|
---|
1229 |
|
---|
1230 | <b>References</b><br>
|
---|
1231 |
|
---|
1232 |
|
---|
1233 | [1] Bollacker, K.D., S. Lawrence, and C.L.Giles, CiteSeer: An Autonomous Web<br>
|
---|
1234 |
|
---|
1235 |
|
---|
1236 | Agent for Automatic Retrieval and Identification of Interesting Publications,<br>
|
---|
1237 |
|
---|
1238 |
|
---|
1239 | <i>Proceedings of the Second International Conference on Autonomous Agents</i><br>
|
---|
1240 |
|
---|
1241 |
|
---|
1242 | (Minneapolis/St. Paul, May 9-13), 1998.<br>
|
---|
1243 |
|
---|
1244 |
|
---|
1245 | [2] Bowman, C.M., P.B. Danzig, U. Manber, and M.F. Schwartz, Scalable Internet<br>
|
---|
1246 |
|
---|
1247 |
|
---|
1248 | resource discovery: Research problems and approaches, <i>Communications of</i><br>
|
---|
1249 |
|
---|
1250 |
|
---|
1251 | <i>the ACM 37(8)</i> (1994) 98-107.<br>
|
---|
1252 |
|
---|
1253 |
|
---|
1254 | [3] Burton, Hilary D. , Use of a virtual information system for bibliometric analysis,<br>
|
---|
1255 |
|
---|
1256 |
|
---|
1257 | <i>Informaton Processing &amp; Management 24(1)</i> (1988) 39-44.<br>
|
---|
1258 |
|
---|
1259 |
|
---|
1260 | [4] Cunningham, S.J., An empirical investigation of the obsolescence rate for<br>
|
---|
1261 |
|
---|
1262 |
|
---|
1263 | information systems literature, <i>Library and Information Science</i><br>
|
---|
1264 |
|
---|
1265 |
|
---|
1266 | <i>Research</i>., 1996, http://library.fgcu.edu/iclc/lisrissu.htm<br>
|
---|
1267 |
|
---|
1268 |
|
---|
1269 | [5] Cunningham, S.J., and D. Bocock, Obsolescence of computing literature.<br>
|
---|
1270 |
|
---|
1271 |
|
---|
1272 | <i>Scientometrics</i> <i>34(2) </i> (1995), pp. 255-262.<br>
|
---|
1273 |
|
---|
1274 |
|
---|
1275 | [6] Cunningham, S.J. and Lynn Silipigni Connaway, Information searching<br>
|
---|
1276 |
|
---|
1277 |
|
---|
1278 | preferences and practices of computer science researchers, <i>Proceedings of</i><br>
|
---|
1279 |
|
---|
1280 |
|
---|
1281 | <i>OZCHI '96</i> (1996) 294-299.<br>
|
---|
1282 |
|
---|
1283 |
|
---|
1284 | [7] de Solla Price, D.J., Citation measures of hard science, soft science, technology,<br>
|
---|
1285 |
|
---|
1286 |
|
---|
1287 | and nonscience. In: C.E. Nelson and D.K. Pollock (eds), <i>Communication</i><br>
|
---|
1288 |
|
---|
1289 |
|
---|
1290 | <i>among scientists and engineers</i> (Heath Lexington, 1970).<br>
|
---|
1291 |
|
---|
1292 |
|
---|
1293 | [8] Garfield, E., <i>Citation Indexing: Its theory and application in Science, Technology</i><br>
|
---|
1294 |
|
---|
1295 |
|
---|
1296 | <i>and Humanities (</i>Wiley, 1979).<br>
|
---|
1297 |
|
---|
1298 |
|
---|
1299 | <hr>
|
---|
1300 |
|
---|
1301 |
|
---|
1302 | <A name=16></a>[9] Ginsparg, P. After dinner remarks: 14 Oct â94 APS meeting at LANL, 1994<br>
|
---|
1303 |
|
---|
1304 |
|
---|
1305 | (&lt;URL: http://xxx.lanl.gov/blurb&gt; ).<br>
|
---|
1306 |
|
---|
1307 |
|
---|
1308 | [10] Ginsparg, P., First steps towards electronic research communication, <i>Computers</i><br>
|
---|
1309 |
|
---|
1310 |
|
---|
1311 | <i>in Physics 8(4)</i> (1994) 390-401. <br>
|
---|
1312 |
|
---|
1313 |
|
---|
1314 | [11] Hallmark, J., Scientists' access and retrieval of references cited in their recent<br>
|
---|
1315 |
|
---|
1316 |
|
---|
1317 | journal articles, <i> College and Research Libraries 55(3)</i> (1994) 199-210.<br>
|
---|
1318 |
|
---|
1319 |
|
---|
1320 | [12] Hawkins, D.T. , Unconventional uses of on-line information retrieval systems:<br>
|
---|
1321 |
|
---|
1322 |
|
---|
1323 | on-line bibliometric studies, <i>Journal of the American Society for Information</i><br>
|
---|
1324 |
|
---|
1325 |
|
---|
1326 | <i>Science 28</i> (1977) 13-18.<br>
|
---|
1327 |
|
---|
1328 |
|
---|
1329 | [13] McGhee, P.E. , P.R. Skinner, K. Roberto, N.J. Ridenour, and S.M. Larson,<br>
|
---|
1330 |
|
---|
1331 |
|
---|
1332 | Using online databases to study current research trends: an online bibliometric<br>
|
---|
1333 |
|
---|
1334 |
|
---|
1335 | study, <i>Library and Information Science Research 9</i> (1987) 285-291.<br>
|
---|
1336 |
|
---|
1337 |
|
---|
1338 | [14] Maly, K., E.A. Fox, J.C. French, and A.L. Selman, Wide area technical report<br>
|
---|
1339 |
|
---|
1340 |
|
---|
1341 | server (<i>Technical Report , </i> Dept. of Computer Science, Old Dominion<br>
|
---|
1342 |
|
---|
1343 |
|
---|
1344 | University, <br>
|
---|
1345 |
|
---|
1346 |
|
---|
1347 | 1994. <br>
|
---|
1348 |
|
---|
1349 |
|
---|
1350 | Also <br>
|
---|
1351 |
|
---|
1352 |
|
---|
1353 | available <br>
|
---|
1354 |
|
---|
1355 |
|
---|
1356 | at <br>
|
---|
1357 |
|
---|
1358 |
|
---|
1359 | <br>
|
---|
1360 |
|
---|
1361 |
|
---|
1362 | <br>
|
---|
1363 |
|
---|
1364 |
|
---|
1365 | &lt;URL:<br>
|
---|
1366 |
|
---|
1367 |
|
---|
1368 | http://www.cs.odu.edu/WATERS/WATERS-paper.ps&gt; ).<br>
|
---|
1369 |
|
---|
1370 |
|
---|
1371 | [15] Sigogneau, M.J. , S. Bain, J.P. Courtial, and H. Feillet, Scientific innovation in<br>
|
---|
1372 |
|
---|
1373 |
|
---|
1374 | bibliographical databases: a comparative study of the Science Citation Index<br>
|
---|
1375 |
|
---|
1376 |
|
---|
1377 | and the Pascal database, <i>Scientometrics 22(1)</i> (1991) 65-82.<br>
|
---|
1378 |
|
---|
1379 |
|
---|
1380 | [16] Witten, I.H., S.J. Cunningham, M. Vallabh, and T.C. Bell, A New Zealand<br>
|
---|
1381 |
|
---|
1382 |
|
---|
1383 | digital library for computer science research, <i>Proceedings of Digital Libraries</i><br>
|
---|
1384 |
|
---|
1385 |
|
---|
1386 | <i>'95</i> (1995) 25-30.<br>
|
---|
1387 |
|
---|
1388 |
|
---|
1389 | [17] Witten, I.H., C. Nevill-Manning, and S.J. Cunningham, A public library based<br>
|
---|
1390 |
|
---|
1391 |
|
---|
1392 | on full-text retrieval, <i>Communications of the ACM</i> 41(4), 1998, p. 71<br>
|
---|
1393 |
|
---|
1394 |
|
---|
1395 | <hr>
|
---|
1396 |
|
---|
1397 |
|
---|
1398 | <A name=17></a> <br>
|
---|
1399 |
|
---|
1400 |
|
---|
1401 | 1Documents were randomly sampled from the DEC<br>
|
---|
1402 |
|
---|
1403 |
|
---|
1404 | (ftp://crl.dec.com/pub/DEC/CRL/tech-reports/), Sony<br>
|
---|
1405 |
|
---|
1406 |
|
---|
1407 | (ftp://ftp.csl.sony.co.jp/CSL/CSL-Papers), and Ohio (ftp://archive.cis.ohio-<br>
|
---|
1408 |
|
---|
1409 |
|
---|
1410 | state.edu/pub/tech-report/) technical report repositories<br>
|
---|
1411 |
|
---|
1412 |
|
---|
1413 | <hr>
|
---|
1414 |
|
---|
1415 |
|
---|
1416 |
|
---|
1417 |
|
---|
1418 |
|
---|
1419 |
|
---|
1420 |
|
---|
1421 |
|
---|
1422 | </Content>
|
---|
1423 | </Section>
|
---|
1424 | </Archive>
|
---|