Transcription

InformationProcessing& Management,Vol.31, No.1, pp.15-27,1995Elsevier Science LtdPrinted in Great Britain. All rights reserved0306-4573195 9.50 NOLOGICAL KNOWLEDGE STRUCTUREFOR INTERMEDIARY EXPERT SYSTEMSRAYA FIDELGraduate School of Library and Information Science, University of Washington,Seattle, WA 98195, U.S.A.andEFTHIMIS N . EFTHIMIADISGraduate School of Library and Information Science, University of California,Los Angeles, CA 90024, U.S.A.(Received 4 November 1993; accepted in final form 23 January 1994)Abstract -An intermediary expert system (IES) helps both end users and professionalsearchers to conduct their online database searching. To provide advice about term selection and query expansion, an IES should include a terminological knowledge structure.Terminological attributes as well as other properties could provide the starting point forbuilding a knowledge base, and knowledge acquisition could rely on knowledge-basetechniques coupled with statistical techniques. The searching behavior of expert onlinesearchers would provide one source of knowledge. The knowledge structure wouldinclude three constructs for each term: frequency data, a hedge, and a position in aclassification scheme. Switching vocabularies or languages could provide a meta-schemaand facilitate the interoperability of databases in similar subject domains. To developsuch knowledge structure, future research should focus on terminological attributes, wordand phrase disambiguation, automated text processing, and the role of thesauri and classification schemes in indexing and retrieval. In particular, such research should developtechniques that combine knowledge-base and statistical methods and that consider userpreferences.1. INTRODUCTIONexpert system (IES) helps users, professional searchers, and end usersto conduct their searches of online bibliographic databases. Currently, most online bibliographic databases provide for searching the titles, abstracts, and sources of the bibliographic items to be retrieved in addition to descriptors and identifiers which have beenassigned by human indexers, if they are available. This article examines the knowledgebase of an IES that provides advice about the selection of search terms, or search keys.It presents a proposal for an integrated approach that would include various methods andtechniques that are available today. Recognizing that these methods and techniques couldbe integrated in a variety of combinations, the article presents one option that focuses onterminological attributes that is based on knowledge acquired from professional searchers. This option creates a scenario that illustrates how various approaches can be usedsimultaneously and the effect such a combination would have on research. It examines whatknowledge and information could be included in the knowledge base and how they couldbe organized. The article then shows what research would be required to support the development of this option.Most commercially available search systems require the use of Boolean operators,Thus, before searching a request, a user breaks it down into concepts, the representationof which would be linked with Boolean AND operators. For actual searching, each concept is represented by one or more search keys. A search key is a string of characters toAn intermediaryCorrespondence should be addressed to Raya Fidel, Graduate School of Library and Information Science,FM-30, University of Washington, Seattle, WA 98195.15

R.16FIDEL andE.N.EFTHIMIADISbe searched in the database. A search key, which represents a concept of a request, mayconsist of one or more words. The selection of search keys is at times a straightforwardprocess; however, at other times it requires knowledge and expertise.Consider the request “attitude of students toward themselves during examinationperiod.” The request can be broken down into three concepts: “attitudes toward themselves,” “students,” and “examinations.” A straightforward approach to searching wouldbe to search on the keys as they appear in the request in all available fields and then tointersect the resulting sets (using the AND operator). A professional searcher, however,would likely see much more complexity in the request and would probably try a variety ofother search keys that would result in better retrieval. A searcher would probably decideto express the concept “attitudes toward themselves” in a phrase such as “self-image” or“self-esteem.” Also, the searcher is likely to prefer to search the key “examination” onlyin the descriptor field because it is a common term; as a textword, it appears frequentlyin the text, often referring to concepts other than educational tests. An IES of the kindconsidered here would advise users of the most promising search keys to be used.It is well established by now that relying only on the words in a request is not sufficient for satisfactory retrieval (Svenonius, 1986). Indeed, research into query expansionthe process of supplementing the original query with additional terms-has been motivatedby this observation (Efthimiadis, 1991). In addition, databases that use an indexing language require users to make another decision: whether to enter the search key as a textword key, which would retrieve all bibliographic database records that include the key inany field of the record, or as a descriptor, which would retrieve only the records whosedescriptor field includes the key. An IES of the type considered here should be able to helpusers in this decision as well.The interaction that takes place in information retrieval between users and the database searched can be described with the use of a simple two-stage model (Efthimiadis, 1991;Efthimiadis & Robertson, 1989). The model includes the end user, the intermediary mechanism, and the database. The intermediary mechanism may be a human being or some software, such as a front-end system or an expert system. Here we consider an intermediarymechanism that is a machine, as described in Fig. 1. For simplicity in the discussion, theIES is treated as part of the retrieval system. However, an IES could reside anywhere;it could reside between the retrieval system and the user-supporting initial query formulation; it could be a front end or client at the user end, a front end at the database end,or an integral part of the retrieval mechanism.Depending on the characteristics of the particular request searched, the user’s level ofexpertise, and the database searched, an IES could provide help in three modes:1. The system decides about the search key with no consultation with the user.2. The system decides about the search key after interrogating the user.3. The system presents options from which the user is asked to make a selection.The decision about which mode of advice to provide is situational.In general, we can identify two main sources which can provide knowledge to beutilized or incorporated in an IES (Efthimiadis, 1990). The first source of knowledge isthe search intermediaries. Here, the approach that has been taken so far is to try to encapsulate their skills in a system, such as in PLEXUS (Vickery et al., 1987), IR-NLI---- ---- --------IIIIIiDatabaseIL IFig. 1. The role of an IES in the two stage model of interaction in information retrieval.II

Terminologicalknowledgestructure17(Brajnik et al., 1986, 1988), and EP-X (Shute & Smith, 1992). The second source is theknowledge structures found in databases or embodied in search aids or indexing languages,such as thesauri or classification schemes like EP-X (Smith et al., 1989a, 1989b), MENUSE(Pollitt, 1988), and UMLS (Humphreys & Lindberg, 1989).It is promising, however, to integrate knowledge and information from both sources.For instance, terminological knowledge (i.e., knowledge about terms and their properties)acquired from professional intermediaries can point to terminological issues that are relevant to retrieval. At times, however, professional searchers may not be in a position toprovide the best solution to a terminological problem because there is not enough information for them to make the most useful decisions. Given the existing techniques in information retrieval, help can come from additional sources. Associative retrieval techniquescreate one of these sources. While not yet widely available, various techniques have beendeveloped and tested over the last two decades. These techniques are not incompatible.It is possible to devise methods based on more than one associative retrieval approach,and such mixed methods may be appropriate for certain retrieval situations. Furthermore,it is also possible to combine associative and Boolean techniques to enhance both throughknowledge-based retrieval techniques.Current prototype IES vary in the help they offer in terms of query negotiation aids,the selection of search keys, and query expansion. Because it is difficult to automateassistance offered at the query formulation stage, there have been various attempts to dealwith it at the interface level (e.g., Pollitt, 1988; Thompson & Croft, 1989; Vickery, 1988;Vickery et al., 1986). Some of the systems use thesauri and classification schemes to assistquery formulation, for navigation and retrieval (e.g., Frei & Jauslin, 1983; Monarch &Carbonell, 1987; Pollitt, 1987, 1988; Shoval, 1981, 1985; Smith et al., 1989a, 1989b;Vickery, 1988). A few experimental expert systems already incorporate in their knowledgebases knowledge that is pertinent to the selection of search keys. MedIndEx at the NationalLibrary of Medicine, for instance, incorporates the Medline indexing policy (Humphrey,1989; Humphrey & Miller, 1987), a system at the American Petroleum Institute employsknowledge acquired from professional API indexers (Brenner et al., 1984; Martinez et al.,1987), and a system at BIOSIS incorporates knowledge on biological concepts in a semantic vocabulary (Vleduts-Stokolov, 1987). Although their knowledge bases could be incorporated into IESs, these expert systems were designed initially to assist indexing ratherthan searching. To date, terminological knowledge as employed by searchers has not beenexplored.To focus the discussion on search-key selection, it is assumed that a request is alreadybroken into its concepts, the databases are selected, and the user is prepared to enter searchkeys. Further, to provide advice in the selection of search keys and the field(s) to besearched, an IES must possess expert knowledge about the particular database that is beingsearched. Therefore, it is assumed here that such a system would provide advice for searching a defined set of databases covering a certain subject domain.2. KNOWLEDGE USED BY EXPERIENCED SEARCHERSThe first type of knowledge to be incorporated into an IES is knowledge about theselection of search keys acquired from experienced online searchers (Fidel, 1991b). Thistype of knowledge was collected through observations of searchers performing their regular, job-related searches and through interviews with them. The study team analyzedsearch protocols, verbal protocols of thought processes while searching, and the transcriptsof interviews, with 47 searchers performing a total of 281 searches in a variety of subjectareas and library types. The analysis of the search and verbal protocols uncovered theintuitive rules that searchers used and resulted in a decision tree for the selection of searchkeys which is called the selection routine.The selection routine embodied in the decision tree describes the conditions that searchers considered and the options that each condition generated. For example, the condition“a search key is mapped to a descriptor through an exact match” generated the followingoptions: enter the descriptor, but if recall needs to be improved: add textword synonyms

18R. FIDELandE.N. EFTHIMIADISto descriptors, or use generic descriptors in an inclusive mode (“explode,” or “cascade”),or add the next broader descriptor in the hierarchy. Data were gathered on the frequencywith which the different options were selected. Thus, of the 228 cases in which a descriptor was an exact match and searchers wanted to increase recall, 72% of the time theyentered textwords as synonyms, 25% they did an inclusive search, and 3% of the timethey selected a broader descriptor. Also, data were gathered on reasons associated with special conditions for each option. Thus, searchers selected textwords as synonyms becausethe user insisted on using the terms, because they needed to perform a multidatabase search,or because they did not trust the descriptors and/or the indexing of the database. Theyperformed an inclusive search when the query formulation included a relatively large number of concepts, and they entered a broader descriptor when they thought the user wouldbe interested in the broader descriptor as well.Data of the type just described could be incorporated into the knowledge base and theinference engine of an IES for a specific set of databases. The frequencies with whichoptions were selected could be used to handle uncertainties. The frequencies, together withother factors, could also determine the mode of advice to be given. For example, if a searchkey were matched to a descriptor through an exact match, the system might automaticallyenter the descriptor without consulting the user. This would be reasonable, given the searching behavior of the professional searchers: 100% of the time when there was an exact matchthey entered the descriptor; only if recall needed to be improved they selected additionalkeys. The next step, then, is for the system to inquire if recall is satisfactory. If it needsto be improved, other questions can be asked and advice given.Selection routines by themselves are not sufficient for the IES to advise about the selection of search keys. Clearly, the system must include the database’s thesaurus to be ableto map search keys