WP12: Cross concordances of classifications and thesauri
Contact person:
Dr. Albert Schröder
Universitaetsbibliothek Regensburg
93042 Regensburg
Germany
Tel. 009 49 941/943-3903; Fax 009 49 941/943-3285
Overall goal
Initial situation:
within the systems of libraries and specialized
information centers different classifications and thesauri are
in use. Therefore a search across subjects or files is impeded.
A person searching e.g. first a library catalogue in Regensburg,
then in Lower Saxony or in the U.S., and subsequently articles
in the reference database of an information service has to work
with different search terms and the respective search logic of
the system, so that an efficient search is hardly possible. The
user as a rule is only familiar with the classification or the
thesaurus which he uses normally. This problem is increased if
the different library catalogues and reference databases are connected
technologically by a common user interface. This is also true
for the application of different indexing systems for metadata.
The goal is to allow an integrated search for subject aspects
in distributed data holdings with different intentional emphases
taking into account the conceptual differences of the applied
thesauri and classifications by cross concordances.
In order to achieve the overall goal it is necessary
- to examine the methodics of cross concordances between classifications
or thesauri
- to program a procedure for the representation of these cross
concordances between the different classifications or thesauri
available in the Internet.
- to establish a prototype of cross concordance for special
subjects and selected classifications or thesauri.
According to the emphasis in CARMEN mathematics and physics are
selected as a specific basis. For methodological reasons the subject-oriented
frame will include social sciences.
By parallel action in classifications and thesauri it is possible
to understand diverging problems and methods of solution within
different indexing procedures and subjects. This will secure the
prototypical character of the study. The solutions shall be applicable
also to other classifications and thesauri that have not been
included in the study.
Objectives and products of the work package
The clarification of methodological questions is common to both
subfields, classification and thesaurus.
- The cross concordances refer to classifications/thesauri which
represent a closed system. Navigation between different classifications/thesauri
must be made possible by cross concordances.
- The kind of relation between related notations/descriptors
must be mapped, e.g.:
- relation 1:1 (synonymous terms, parallel notations);
- broader term : narrower term;
- narrower term : broader term;
- related terms;
- measure for matching and so forth
- These relations will be evaluated within workpackage 7 (retrieval),
9 (interdisciplinary information system), and 11 (treatment of
heterogeneity).
- The cross concordances will access partly local, partly decentralized
linked data pools (classifications/thesauri). The updating of
the respective classification or thesaurus will be done by the
responsible institution, e.g. the American Mathematical Society,
OCLC ... The subsequently required updating of the cross references
will be undertaken by UB Regensburg and Die Deutsche Bibliothek.
On this occasion the compatibility of the software products to
be developed must be considered.
Programming
The programming of the cross concordances will be done in Java
based on a relational database system with an abstract intermediate
level to allow a transit to different producers of database software.
The method of rapid prototyping will by applied. A mutual software
tool for both subfields, classification and thesaurus, will be
developed.
In the field of classification programming will be done in two
parts. Based on the project RVK-Online a system for data maintenance,
structured search, and user friendly presentation of classifications
will be developed. This tool should on principle be able to map
any classification. This will concretely be realised for RVK and
DDC.
In addition the cross concordance referring to these or other
not involved classifications (MSC, PACS, and the classification
for social sciences) will be developed.
The aim is to refer exactly to the actual applied position of
a classification, not just to the classification itself. This
functionality should also be used for metadata.
It is yet undecided whether this distributed, organizationally
advantageous structure will stand the test, or whether a different
structural model will prove more promising. To this the experiences
out of the project ELVIRA (cf. AP11) and Regensburg's preparatory
work will be taken into consideration.
The results have to be visualized both for punctual search and
for navigation and mapping of the structure of the classification/thesaurus.
The expenditure for the development of the classification software
in particular will be a major one. Parts of the functionality
are realized already for the SWD.
Subarea classification
The methods for a concordance between general classification and
special classification shall be worked out exemplary. Particularly
suitable seem to be DDC and RVK in the areas mathematics and physics
as well as MSC and PACS.
A concordance between the classification for the social sciences
on one hand and RVK and DDC on the other hand will be created
additionally.
The problems is the overlap berween the technical terms of the
special classifications (MCS, PACS) and between the very specialized
classifications and the general classifications (e.g. MSC-DDC).
In the project it has to be examined whether a single classification
will form the basis to which the other classifications will be
mapped, or whether each classification will be mapped to the others.
There is also a need to find out how to implement a structured
search in concept trees of varying parallel classifications.
In addition to the search for notations in the classification
and the search within the hierarchical structure there is the
need for a verbal search. A bilingual search in all classification
is highly desirable.
Way of organization
The prototype of the cross concordance of classifications will
be programmed at the University of Regensburg and integrated into
CARMEN by AP 11 and AP 7. It can be integrated into library systems
later and also be used as a stand-alone system by specialized
information centers and publishers. The concordance of classifications
will be compiled in Regensburg in cooperation with OCLC. OCLC
is an external partner.
The concordance of thesauri will be compiled cooperatively by
Die Deutsche Bibliothek Frankfurt and the IZ in cooperation with
DIFF and Max-Planck-Institut für Bildungsforschung.
Leske + Budrich publishers will act in an advisory capacity for
the social sciences to ensure applicability at a publishing agency
Beside the CARMEN project there is a cooperation with the applications
of the Fachhochschule Regensburg and Leske + Budrich publishers
within Slot 1 (author systems for multimedia products).
With regard to the application of products like DDC and SWD which
are liable to a license fee it will be necessary later to use
the GlobalInfo program's modules for settlement of accounts.
Planning for exploitation
The software products will be applicable for different classifications
and thesauri and will be provided for free. There is a great demand
for products like that. The compiled concordances will be prototypes
which will be made available for free on the Internet by the involved
institutions and will be maintained by the respective institutions.
Subarea classification: the classifications RVK and possibly a
German edition of DDC as well as the concordance will be supplied
permanently by the University of Regensburg. Once there is an
actual production a contract has to be set up with OCLC (Forest
Press) for the DDC license. It is also conceivable to maintain
a German edition of DDC at Die Deutsche Bibliothek.
Subarea thesaurus: the thesauri will be held continuously at the
respective institutions involved, the intermediating concordance
possibly at Die Deutsche Bibliothek. The SWD copyright will not
be touched.
|