source: trunk/gsdl/packages/yaz/doc/yaz-4.html@ 1860

Last change on this file since 1860 was 1343, checked in by johnmcp, 24 years ago

Added the YAZ toolkit source to the packages directory (for z39.50 stuff)

  • Property svn:keywords set to Author Date Id Revision
File size: 18.6 KB
Line 
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
2<HTML>
3<HEAD>
4 <META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
5 <TITLE>YAZ User's Guide and Reference: Supporting Tools</TITLE>
6 <LINK HREF="yaz-5.html" REL=next>
7 <LINK HREF="yaz-3.html" REL=previous>
8 <LINK HREF="yaz.html#toc4" REL=contents>
9</HEAD>
10<BODY>
11<A HREF="yaz-5.html">Next</A>
12<A HREF="yaz-3.html">Previous</A>
13<A HREF="yaz.html#toc4">Contents</A>
14<HR>
15<H2><A NAME="s4">4. Supporting Tools</A></H2>
16
17<P>In support of the service API - primarily the ASN module, which
18provides the programmatic interface to the Z39.50 APDUs, YAZ contains
19a collection of tools that support the development of applications.
20<P>
21<H2><A NAME="ss4.1">4.1 Query Syntax Parsers</A>
22</H2>
23
24<P>Since the type-1 (RPN) query structure has no direct, useful string
25representation, every origin application needs to provide some form of
26mapping from a local query notation or representation to a
27<CODE>Z_RPNQuery</CODE> structure. Some programmers will prefer to construct
28the query manually, perhaps using <CODE>odr_malloc()</CODE> to simplify memory
29management. The YAZ distribution includes two separate,
30query-generating tools that may be of use to you.
31<P>
32<H3>Prefix Query Format</H3>
33
34<P>Since RPN or reverse polish notation is really just a fancy way of
35describing a suffix notation format (operator follows operands), it
36would seem that the confusion is total when we now introduce a prefix
37notation for RPN. The reason is one of simple laziness - it's somewhat
38simpler to interpret a prefix format, and this utility was designed
39for maximum simplicity, to provide a baseline representation for use
40in simple test applications and scripting environments (like Tcl). The
41demonstration client included with YAZ uses the PQF.
42<P>The PQF is defined by the pquery module in the YAZ library. The
43<CODE>pquery.h</CODE> file provides the declaration of the functions
44<P>
45<BLOCKQUOTE><CODE>
46<PRE>
47Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf);
48
49Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto,
50 Odr_oid **attributeSetP, const char *qbuf);
51
52int p_query_attset (const char *arg);
53</PRE>
54</CODE></BLOCKQUOTE>
55<P>The function <CODE>p_query_rpn()</CODE> takes as arguments an <B>ODR</B> stream
56(see section
57<A HREF="yaz-5.html#odr">The ODR Module</A>) to provide a memory
58source (the structure created is released on the next call to
59<CODE>odr_reset()</CODE> on the stream/), a protocol identifier (one of the
60constants <CODE>PROTO_Z3950</CODE> and <CODE>PROTO_SR</CODE>), an attribute set
61reference, and finally a null-terminated string holding the query
62string.
63<P>If the parse went well, <CODE>p_query_rpn()</CODE> returns a pointer to a
64<CODE>Z_RPNQuery</CODE> structure which can be placed directly into a
65<CODE>Z_SearchRequest</CODE>.
66<P>The <CODE>p_query_attset</CODE> specifies which attribute set to use if
67the query doesn't specify one by the <CODE>@attrset</CODE> operator. The
68<CODE>p_query_attset</CODE> returns 0 if the argument is a valid attribute
69set specifier; otherwise the function returns -1.
70<P>The grammar of the PQF is as follows:
71<P>
72<BLOCKQUOTE><CODE>
73<PRE>
74Query ::= [ AttSet ] QueryStruct.
75
76AttSet ::= string.
77
78QueryStruct ::= { Attribute } Simple | Complex.
79
80Attribute ::= '@attr' AttributeType '=' AttributeValue.
81
82AttributeType ::= integer.
83
84AttributeValue ::= integer.
85
86Complex ::= Operator QueryStruct QueryStruct.
87
88Operator ::= '@and' | '@or' | '@not' | '@prox' Proximity.
89
90Simple ::= ResultSet | Term.
91
92ResultSet ::= '@set' string.
93
94Term ::= string | '"' string '"'.
95
96Proximity ::= Exclusion Distance Ordered Relation WhichCode UnitCode.
97
98Exclusion ::= '1' | '0' | 'void'.
99
100Distance ::= integer.
101
102Ordered ::= '1' | '0'.
103
104Relation ::= integer.
105
106WhichCode ::= 'known' | 'private' | integer.
107
108UnitCode ::= integer.
109</PRE>
110</CODE></BLOCKQUOTE>
111<P>You will note that the syntax above is a fairly faithful
112representation of RPN, except for the <CODE>Attibute</CODE>, which has been
113moved a step away from the term, allowing you to associate one or more
114attributes with an entire query structure. The parser will
115automatically apply the given attributes to each term as required.
116<P>The following are all examples of valid queries in the PQF.
117<P>
118<BLOCKQUOTE><CODE>
119<PRE>
120dylan
121
122"bob dylan"
123
124@or "dylan" "zimmerman"
125
126@set Result-1
127
128@or @and bob dylan @set Result-1
129
130@attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
131
132@attr 4=1 @attr 1=4 "self portrait"
133
134@prox 0 3 1 2 k 2 dylan zimmerman
135</PRE>
136</CODE></BLOCKQUOTE>
137<P>
138<H3>Common Command Language</H3>
139
140<P>Not all users enjoy typing in prefix query structures and numerical
141attribute values, even in a minimalistic test client. In the library
142world, the more intuitive Common Command Language (or ISO 8777) has
143enjoyed some popularity - especially before the widespread
144availability of graphical interfaces. It is still useful in
145applications where you for some reason or other need to provide a
146symbolic language for expressing boolean query structures.
147<P>The EUROPAGATE research project working under the Libraries programme
148of the European Commission's DG XIII has, amongst other useful tools,
149implemented a general-purpose CCL parser which produces an output
150structure that can be trivially converted to the internal RPN
151representation of YAZ (The <CODE>Z_RPNQuery</CODE> structure). Since the CCL
152utility - along with the rest of the software produced by EUROPAGATE -
153is made freely available on a liberal license, it is included as a
154supplement to YAZ.
155<P>
156<H3>CCL Syntax</H3>
157
158<P>The CCL parser obeys the following grammar for the FIND argument.
159The syntax is annotated by in the lines prefixed by <CODE>--</CODE>.
160<P>
161<BLOCKQUOTE><CODE>
162<PRE>
163CCL-Find ::= CCL-Find Op Elements
164 | Elements.
165
166Op ::= "and" | "or" | "not"
167-- The above means that Elements are separated by boolean operators.
168
169Elements ::= '(' CCL-Find ')'
170 | Set
171 | Terms
172 | Qualifiers Relation Terms
173 | Qualifiers Relation '(' CCL-Find ')'
174 | Qualifiers '=' string '-' string
175-- Elements is either a recursive definition, a result set reference, a
176-- list of terms, qualifiers followed by terms, qualifiers followed
177-- by a recursive definition or qualifiers in a range (lower - upper).
178
179Set ::= 'set' = string
180-- Reference to a result set
181
182Terms ::= Terms Prox Term
183 | Term
184-- Proximity of terms.
185
186Term ::= Term string
187 | string
188-- This basically means that a term may include a blank
189
190Qualifiers ::= Qualifiers ',' string
191 | string
192-- Qualifiers is a list of strings separated by comma
193
194Relation ::= '=' | '>=' | '&lt;=' | '&lt;>' | '>' | '&lt;'
195-- Relational operators. This really doesn't follow the ISO8777
196-- standard.
197
198Prox ::= '%' | '!'
199-- Proximity operator
200</PRE>
201</CODE></BLOCKQUOTE>
202<P>The following queries are all valid:
203<P>
204<BLOCKQUOTE><CODE>
205<PRE>
206dylan
207
208"bob dylan"
209
210dylan or zimmerman
211
212set=1
213
214(dylan and bob) or set=1
215</PRE>
216</CODE></BLOCKQUOTE>
217
218Assuming that the qualifiers <CODE>ti</CODE>, <CODE>au</CODE> and <CODE>date</CODE>
219are defined we may use:
220<BLOCKQUOTE><CODE>
221<PRE>
222ti=self portrait
223
224au=(bob dylan and slow train coming)
225
226date>1980 and (ti=((self portrait)))
227</PRE>
228</CODE></BLOCKQUOTE>
229<P>
230<H3>CCL Qualifiers</H3>
231
232<P>
233<P>Qualifiers are used to direct the search to a particular searchable
234index, such as title (ti) and author indexes (au). The CCL standard
235itself doesn't specify a particular set of qualifiers, but it does
236suggest a few short-hand notations. You can customize the CCL parser
237to support a particular set of qualifiers to relect the current target
238profile. Traditionally, a qualifier would map to a particular
239use-attribute within the BIB-1 attribute set. However, you could also
240define qualifiers that would set, for example, the
241structure-attribute.
242<P>Consider a scenario where the target support ranked searches in the
243title-index. In this case, the user could specify
244<BLOCKQUOTE><CODE>
245<PRE>
246ti,ranked=knuth computer
247</PRE>
248</CODE></BLOCKQUOTE>
249
250and the <CODE>ranked</CODE> would map to structure=free-form-text
251(4=105) and the <CODE>ti</CODE> would map to title (1=4).
252<P>A "profile" with a set predefined CCL qualifiers can be read from a
253file. The YAZ client reads its CCL qualifiers from a file named
254<CODE>default.bib</CODE>. Each line in the file has the form:
255<P><I>qualifier-name</I> <I>type</I>=<I>val</I> <I>type</I>=<I>val</I> ...
256<P>where <I>qualifier-name</I> is the name of the qualifier to be used
257(eg. <CODE>ti</CODE>), <I>type</I> is a BIB-1 category type and <I>val</I> is the
258corresponding BIB-1 attribute value. The <I>type</I> can be either
259numeric or it may be either <CODE>u</CODE> (use), <CODE>r</CODE> (relation), <CODE>p</CODE>
260(position), <CODE>s</CODE> (structure), <CODE>t</CODE> (truncation) or <CODE>c</CODE>
261(completeness). The <I>qualifier-name</I> <CODE>term</CODE> has a special
262meaning. The types and values for this definition is used when <I>no</I>
263qualifier is present.
264<P>Consider the following definition:
265<BLOCKQUOTE><CODE>
266<PRE>
267ti u=4 s=1
268au u=1 s=1
269term s=105
270</PRE>
271</CODE></BLOCKQUOTE>
272
273Two qualifiers are defined, <CODE>ti</CODE> and <CODE>au</CODE>. They both set the
274structure-attribute to phrase (1). <CODE>ti</CODE> sets the use-attribute to
2754. <CODE>au</CODE> sets the use-attribute to 1. When no qualifiers are used
276in the query the structure-attribute is set to free-form-text (105).
277<P>
278<H3>CCL API</H3>
279
280<P>
281<P>All public definitions can be found in the header file <CODE>ccl.h</CODE>.
282A profile identifier is of type <CODE>CCL_bibset</CODE>. A profile must be
283created with the call to the function <CODE>ccl_qual_mk</CODE> which returns
284a profile handle of type <CODE>CCL_bibset</CODE>.
285<P>To read a file containing qualifier definitions the function
286<CODE>ccl_qual_file</CODE> may be convenient. This function takes an already
287opened <CODE>FILE</CODE> handle pointer as argument along with a
288<CODE>CCL_bibset</CODE> handle.
289<P>To parse a simple string with a FIND query use the function
290<BLOCKQUOTE><CODE>
291<PRE>
292struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str,
293 int *error, int *pos);
294</PRE>
295</CODE></BLOCKQUOTE>
296
297which takes the CCL profile (<CODE>bibset</CODE>) and query (<CODE>str</CODE>) as
298input. Upon successful completion the RPN tree is returned. If an
299error eccur, such as a syntax error, the integer pointed to by
300<CODE>error</CODE> holds the error code and <CODE>pos</CODE> holds the offset inside
301query string in which the parsing failed.
302<P>An english representation of the error may be obtained by calling
303the <CODE>ccl_err_msg</CODE> function. The error codes are listed in
304<CODE>ccl.h</CODE>.
305<P>To convert the CCL RPN tree (type <CODE>struct ccl_rpn_node *</CODE>) to the
306Z_RPNQuery of YAZ the function <CODE>ccl_rpn_query</CODE> must be used. This
307function which is part of YAZ is implemented in <CODE>yaz-ccl.c</CODE>.
308After calling this function the CCL RPN tree is probably no longer
309needed. The <CODE>ccl_rpn_delete</CODE> destroys the CCL RPN tree.
310<P>A CCL profile may be destroyed by calling the <CODE>ccl_qual_rm</CODE>
311function.
312<P>The token names for the CCL operators may be changed by setting the
313globals (all type <CODE>char *</CODE>) <CODE>ccl_token_and</CODE>, <CODE>ccl_token_or</CODE>,
314<CODE>ccl_token_not</CODE> and <CODE>ccl_token_set</CODE>.
315An operator may have aliases, i.e. there may be more than one name for
316the operator. To do this, separate each alias with a space character.
317<P>
318<H2><A NAME="ss4.2">4.2 Object Identifiers</A>
319</H2>
320
321<P>The basic YAZ representation of an OID is an array of integers,
322terminated with the value -1. The <B>ODR</B> module provides two
323utility-functions to create and copy this type of data elements:
324<P>
325<BLOCKQUOTE><CODE>
326<PRE>
327Odr_oid *odr_getoidbystr(ODR o, char *str);
328</PRE>
329</CODE></BLOCKQUOTE>
330<P>Creates an OID based on a string-based representation using dots (.)
331to separate elements in the OID.
332<P>
333<BLOCKQUOTE><CODE>
334<PRE>
335Odr_oid *odr_oiddup(ODR odr, Odr_oid *o);
336</PRE>
337</CODE></BLOCKQUOTE>
338<P>Creates a copy of the OID referenced by the <I>o</I> parameter. Both
339functions take an <B>ODR</B> stream as parameter. This stream is used to
340allocate memory for the data elements, which is released on a
341subsequent call to <CODE>odr_reset()</CODE> on that stream.
342<P>The <B>OID</B> module provides a higher-level representation of the
343family of object identifers which describe the Z39.50 protocol and its
344related objects. The definition of the module interface is given in
345the <CODE>oid.h</CODE> file.
346<P>The interface is mainly based on the <CODE>oident</CODE> structure. The
347definition of this structure looks like this:
348<P>
349<BLOCKQUOTE><CODE>
350<PRE>
351typedef struct oident
352{
353 oid_proto proto;
354 oid_class oclass;
355 oid_value value;
356 int oidsuffix[OID_SIZE];
357 char *desc;
358} oident;
359</PRE>
360</CODE></BLOCKQUOTE>
361<P>The <I>proto</I> field takes one of the values
362<P>
363<BLOCKQUOTE><CODE>
364<PRE>
365PROTO_Z3950
366PROTO_SR
367</PRE>
368</CODE></BLOCKQUOTE>
369<P>If you don't care about talking to SR-based implementations (few
370exist, and they may become fewer still if and when the ISO SR and ANSI
371Z39.50 documents are merged into a single standard), you can ignore
372this field on incoming packages, and always set it to PROTO_Z3950
373for outgoing packages.
374<P>The <I>oclass</I> field takes one of the values
375<P>
376<BLOCKQUOTE><CODE>
377<PRE>
378CLASS_APPCTX
379CLASS_ABSYN
380CLASS_ATTSET
381CLASS_TRANSYN
382CLASS_DIAGSET
383CLASS_RECSYN
384CLASS_RESFORM
385CLASS_ACCFORM
386CLASS_EXTSERV
387CLASS_USERINFO
388CLASS_ELEMSPEC
389CLASS_VARSET
390CLASS_SCHEMA
391CLASS_TAGSET
392CLASS_GENERAL
393</PRE>
394</CODE></BLOCKQUOTE>
395<P>corresponding to the OID classes defined by the Z39.50 standard.
396<P>Finally, the <I>value</I> field takes one of the values
397<P>
398<BLOCKQUOTE><CODE>
399<PRE>
400VAL_APDU
401VAL_BER
402VAL_BASIC_CTX
403VAL_BIB1
404VAL_EXP1
405VAL_EXT1
406VAL_CCL1
407VAL_GILS
408VAL_WAIS
409VAL_STAS
410VAL_DIAG1
411VAL_ISO2709
412VAL_UNIMARC
413VAL_INTERMARC
414VAL_CCF
415VAL_USMARC
416VAL_UKMARC
417VAL_NORMARC
418VAL_LIBRISMARC
419VAL_DANMARC
420VAL_FINMARC
421VAL_MAB
422VAL_CANMARC
423VAL_SBN
424VAL_PICAMARC
425VAL_AUSMARC
426VAL_IBERMARC
427VAL_EXPLAIN
428VAL_SUTRS
429VAL_OPAC
430VAL_SUMMARY
431VAL_GRS0
432VAL_GRS1
433VAL_EXTENDED
434VAL_RESOURCE1
435VAL_RESOURCE2
436VAL_PROMPT1
437VAL_DES1
438VAL_KRB1
439VAL_PRESSET
440VAL_PQUERY
441VAL_PCQUERY
442VAL_ITEMORDER
443VAL_DBUPDATE
444VAL_EXPORTSPEC
445VAL_EXPORTINV
446VAL_NONE
447VAL_SETM
448VAL_SETG
449VAL_VAR1
450VAL_ESPEC1
451</PRE>
452</CODE></BLOCKQUOTE>
453<P>again, corresponding to the specific OIDs defined by the standard.
454<P>The <I>desc</I> field contains a brief, mnemonic name for the OID in
455question.
456<P>The function
457<P>
458<BLOCKQUOTE><CODE>
459<PRE>
460struct oident *oid_getentbyoid(int *o);
461</PRE>
462</CODE></BLOCKQUOTE>
463<P>takes as argument an OID, and returns a pointer to a static area
464containing an <CODE>oident</CODE> structure. You typically use this function
465when you receive a PDU containing an OID, and you wish to branch out
466depending on the specific OID value.
467<P>The function
468<P>
469<BLOCKQUOTE><CODE>
470<PRE>
471int *oid_ent_to_oid(struct oident *ent, int *dst);
472</PRE>
473</CODE></BLOCKQUOTE>
474<P>Takes as argument an <CODE>oident</CODE> structure - in which the <I>proto</I>,
475<I>oclass</I>, and <I>value</I> fields are assumed to be set correctly -
476and returns a pointer to a the buffer as given by <I>dst</I>
477containing the base
478representation of the corresponding OID. The function returns
479NULL and the array dst is unchanged if a mapping couldn't place.
480The array <I>dst</I> should be at least of size <CODE>OID_SIZE</CODE>.
481<P>The <CODE>oid_ent_to_oid()</CODE> function can be used whenever you need to
482prepare a PDU containing one or more OIDs. The separation of the
483<I>protocol</I> element from the remainer of the OID-description makes
484it simple to write applications that can communicate with either
485Z39.50 or OSI SR-based applications.
486<P>The function
487<P>
488<BLOCKQUOTE><CODE>
489<PRE>
490oid_value oid_getvalbyname(const char *name);
491</PRE>
492</CODE></BLOCKQUOTE>
493<P>takes as argument a mnemonic OID name, and returns the <I>value</I>
494field of the first entry in the database that contains the given name
495in its <I>desc</I> field.
496<P>Finally, the module provides the following utility functions, whose
497meaning should be obvious:
498<P>
499<BLOCKQUOTE><CODE>
500<PRE>
501void oid_oidcpy(int *t, int *s);
502void oid_oidcat(int *t, int *s);
503int oid_oidcmp(int *o1, int *o2);
504int oid_oidlen(int *o);
505</PRE>
506</CODE></BLOCKQUOTE>
507<P><I>NOTE: The <B>OID</B> module has been criticized - and perhaps rightly so
508- for needlessly abstracting the
509representation of OIDs. Other toolkits use a simple
510string-representation of OIDs with good results. In practice, we have
511found the interface comfortable and quick to work with, and it is a
512simple matter (for what it's worth) to create applications compatible
513with both ISO SR and Z39.50. Finally, the use of the <CODE>oident</CODE>
514database is by no means mandatory. You can easily create your
515own system for representing OIDs, as long as it is compatible with the
516low-level integer-array representation of the ODR module.</I>
517<P>
518<H2><A NAME="ss4.3">4.3 Nibble Memory</A>
519</H2>
520
521<P>Sometimes when you need to allocate and construct a large,
522interconnected complex of structures, it can be a bit of a pain to
523release the associated memory again. For the structures describing the
524Z39.50 PDUs and related structures, it is convenient to use the
525memory-management system of the <B>ODR</B> subsystem (see
526<A HREF="yaz-5.html#odr-use">Using ODR</A>). However, in some circumstances
527where you might otherwise benefit from using a simple nibble memory
528management system, it may be impractical to use <CODE>odr_malloc()</CODE> and
529<B>odr_reset()</B>. For this purpose, the memory manager which also
530supports the <B>ODR</B> streams is made available in the <B>NMEM</B>
531module. The external interface to this module is given in the
532<CODE>nmem.h</CODE> file.
533<P>The following prototypes are given:
534<P>
535<BLOCKQUOTE><CODE>
536<PRE>
537NMEM nmem_create(void);
538void nmem_destroy(NMEM n);
539void *nmem_malloc(NMEM n, int size);
540void nmem_reset(NMEM n);
541int nmem_total(NMEM n);
542void nmem_init(void);
543</PRE>
544</CODE></BLOCKQUOTE>
545<P>The <CODE>nmem_create()</CODE> function returns a pointer to a memory control
546handle, which can be released again by <CODE>nmem_destroy()</CODE> when no
547longer needed. The function <CODE>nmem_malloc()</CODE> allocates a block of
548memory of the requested size. A call to <CODE>nmem_reset()</CODE> or
549<CODE>nmem_destroy()</CODE> will release all memory allocated on the handle
550since it was created (or since the last call to
551<CODE>nmem_reset()</CODE>. The function <CODE>nmem_total()</CODE> returns the number
552of bytes currently allocated on the handle.
553<P>Note that the nibble memory pool is shared amonst threads. Posix
554mutex'es and WIN32 Critical sections are introduced to keep the
555module thread safe. On WIN32 function <CODE>nmem_init()</CODE> initialises
556the Critical Section handle and should be called once before any
557other nmem function is used.
558<P>
559<HR>
560<A HREF="yaz-5.html">Next</A>
561<A HREF="yaz-3.html">Previous</A>
562<A HREF="yaz.html#toc4">Contents</A>
563</BODY>
564</HTML>
Note: See TracBrowser for help on using the repository browser.