+ Page 1 + ---------------------------------------------------------------- The Public-Access Computer Systems Review Volume 1, Number 3 (1990) ISSN 1048-6542 Editor-In-Chief: Charles W. Bailey, Jr. University of Houston Associate Editors: Leslie Pearse, OCLC Mike Ridley, McMaster University Editorial Board: Walt Crawford, Research Libraries Group Nancy Evans, Library and Information Technology Association David R. McDonald, Tufts University R. Bruce Miller, University of California, San Diego Paul Evan Peters, Coalition for Networked Information Peter Stone, University of Sussex Published three times a year (Winter, Summer, and Fall) by the University Libraries, University of Houston. Technical support is provided by the Information Technology Division, University of Houston. Circulation: 1,883. ---------------------------------------------------------------- Editor's Address: Charles W. Bailey, Jr. University Libraries University of Houston Houston, TX 77204-2091 (713) 749-4241 LIB3@UHUPVM1 Articles are stored as files at LISTSERV@UHUPVM1. To retrieve a file, send the e-mail message given after the article abstract to LISTSERV@UHUPVM1. The file will be sent to your account. Back issues are also stored at LISTSERV@UHUPVM1. To obtain a list of all available files, send the following message to LISTSERV@UHUPVM1: INDEX PACS-L. The name of each issue's table of contents file begins with the word "CONTENTS." + Page 2 + CONTENTS COMMUNICATIONS Library Information System II: Progress Report and Technical Plan Denise A. Troll (pp. 4-29) To retrieve this file: GET TROLL PRV1N3 The University of Guelph Library's SearchMe Public-Access Catalogue George Loney (pp. 30-43) To retrieve this file: GET LONEY PRV1N3 SPECIAL SECTION ON THE SPIRES SYSTEM An Overview of SPIRES and the SPIRES Consortium Bo Parker (pp. 44-50) To retrieve this file: GET PARKER PRV1N3 Mounting Commercial Databases Using the SPIRES DBMS Slavko Manjlovich (pp. 51-57) To retrieve this file: GET MANJLOVI PRV1N3 LITMSS: Princeton's SPIRES Manuscripts Database John Delaney (pp. 58-76) To retrieve this file: GET DELANEY PRV1N3 The Libraries at Rensselaer Implement Access to Information Beyond Their Walls Pat Molholt (pp. 77-82) To retrieve this file: GET MOLHOLT PRV1N3 + Page 3 + Mounting a Full-Text Database Using SPIRES Walter Piovesan (pp. 83-88) To retrieve this file: GET PIOVESAN PRV1N3 The WatMedia Project Mark Ritchie (pp. 89-95) To retrieve this file: GET RITCHIE PRV1N3 DEPARTMENTS Public-Access Provocations: An Informal Column "Future User Interfaces and the Common Command Language" Walt Crawford (pp. 96-99) To retrieve this file: GET CRAWFORD PRV1N3 Recursive Reviews "Hypermedia, Interactive Multimedia, and Virtual Realities" Martin Halbert (pp. 100-108) To retrieve this file: GET HALBERT PRV1N3 Review "MediaTracks" Steve Cisler (pp. 109-115) To retrieve this file: GET CISLER PRV1N3 ---------------------------------------------------------------- The Public-Access Computer Systems Review is Copyright (C) 1990 by the University Libraries, University of Houston. All rights reserved. Copying is permitted for noncommercial use by computerized bulletin board/conference systems, individual scholars, and libraries. Libraries are authorized to add the journal to their collection at no cost. This message must appear on copied material. All commercial use requires permission. ---------------------------------------------------------------- + Page 109 + ---------------------------------------------------------------- The Public-Access Computer Systems Review 1, No. 3 (1990): 109-115. ---------------------------------------------------------------- ---------------------------------------------------------------- Review ---------------------------------------------------------------- "MediaTracks" By Steve Cisler The first piece of advanced library technology that I used was in 1950. The branch librarian handed me a shoe box full of photographs, showed me how to insert one in the Stereopticon, and went on to the next person in less than a minute. The only time I needed her help after that initial session was for the storage and retrieval of the shoe box. Then libraries began using electricity for more than lighting and telephones, and the game changed completely. The first time I used a computer was in 1984. The California State Library administered an LSCA grant to provide public access computers in dozens of public libraries around the state. Training the staff to use them was one of the first phases of the project. Choosing hardware and software for purchase was another phase, and making it accessible to the public was the longest and most difficult phase. Having worked in a branch that had been showered with a very rich selection of audio-visual equipment in the early seventies, I was well aware of the time it would take to train staff and public to use any one piece of equipment, whether it was an 8-mm film loop player, a videotape recorder, or a computer. Our staff was willing but felt they were overworked, even before the 128 KB Macintosh arrived with a drawing program, MacWrite, and a spreadsheet. I re-wrote the manuals, digesting the basics into eight-page pamphlets aimed at certain tasks that we expected most people to tackle. Each staff member was able to instruct a novice and have them pecking away at a word processing document after about fifteen minutes of one-on-one instruction. However, the Macintosh was being used over 100 hours a month, and many of the people were first time users. The fifteen minute sessions began to add up quickly, and some of the staff began to tire of explaining over and over how a mouse worked, how to open a document, and how to save (or trash) a file. Answering the same repetitious questions affects some staff more than others, and most of us can use some assistance in the form of instructional aids. + Page 110 + People learn in many different ways. Sitting in a lecture hall, taking notes, and then digesting and applying them to an exercise is a classical method. Perhaps the most effective is to be tutored by an interested, sensitive teacher or friend, but for some reading a manual (computer, unassembled toy, or software) and then struggling alone is the most productive way to master a machine or program. Self-paced tutorials can be very effective for introductions to new technology or for specific tasks such as using an interactive videodisc or logging on to a multiuser database. MediaTracks Making these tutorials has been very complex and time consuming, whether they are on paper or are in electronic format, but a new product from Farallon Computing has changed this. It puts the production of library-specific tutorials into librarians' hands. As with many Macintosh programs you don't spend time fiddling with the interface or learning new commands. Your time is devoted to the tasks which the computer is supposed to facilitate, not to struggling with the computer. MediaTracks is comprised of several parts: a Screen Recorder (which appeared as a separate program over a year ago) that makes a virtual tape of real time events on your Macintosh screen, an editing program for sound and graphics, and several playback options. Screen Recorder is a Desk Accessory which runs while other applications are operating. After naming the tape file the recording begins, and a small control panel is displayed at the bottom of the screen to record, pause, stop, load, or play the tape. All of the actions you perform will be recorded in black and white at the original speed. This demo tape can be edited, integrated into a HyperCard presentation, or turned into a stand- alone application that can be distributed without paying Farallon any license fees. + Page 111 + Many activities lend themselves to Screen Recorder. I have used it to record activities that involved a complex equipment setup such as a network of CD-ROMs, data news feeds from a satellite receiver, an online session with a high speed modem, a LAN e-mail system, or a workstation with a variety of multimedia tools. I can play the tape at a conference, workshop, or other library without hauling all the gear needed for the original. Besides eliminating a lot of equipment for demonstrations, you have the chance to make the demo work before showing it to others! Even if I am doing a live presentation, I will carry a Screen Recorder tape as a backup. Editing Until July, 1990, Screen Recorder could not be edited. Now that it is included in MediaTracks, anyone who knows how to use a Macintosh can modify a tape in a number of ways. Once you boot MediaTracks and choose a tape to edit, a window appears with a single frame at the left of the screen with the sprocket holes stretching to the right. Below the frames are five icons for playing, recording sound, actions, drawing, and changing the view of the tape on the editing board. At the right are two indicators that show the starting time and duration of each frame. If you have made the Screen Recorder tape, you may have an idea of how you want to edit the session. If not, click on the play icon and think about the natural breaks in the tape where you might want to highlight important events and add sounds to explain a complex action. After watching the tape once or twice you can press the "M" key in order to divide it into clips or sections for further editing or annotation. These marks may be removed if you decide they were incorrectly placed, or if you wish to combine two clips. + Page 112 + Marks are generally used to divide a demonstration or tutorial into logical parts. If you are showing someone the basics of an online service you would have an intro, the login sequence, the help screens, a simple search for information and perhaps a more complex search and a logoff sequence. Another use for marks is to cut out dead time and mistakes. If your system is slow to respond you can shorten the demo by cutting seconds from each clip by marking and deleting periods of inactivity. If you entered a wrong command or typo and then corrected yourself during the initial Screen Recorder session, use marks to clean up that part. If you speed up a session, let the user know the actual session may be much slower. Adding Graphics First you can insert a title clip at the beginning of the tape. Double-clicking on the clip opens up a screen and palette with drawing tools for annotation. You can paste in graphics in color or black and white whether it is a diagram of the library, a network map, a scanned photograph of the reference staff, or a list of choices for the user to pursue, i.e. logon, search, or any part of the ensuing demo. Close the window after you finish adding text or graphics. Because this is interactive, the user may not want to watch the whole sequence but jump to new or difficult parts of your tutorial. Another title clip can be inserted elsewhere in the sequence. This can be useful if you want the user to branch to a variety of choices later in the demo. Proceed to the next clip, double click on it, and use the arrows and text boxes to highlight important parts of the screen activity. Don't overwhelm the subject matter by using 36 point type pointing to 9 point type on the screen. + Page 113 + Incorporating Sound Using MacRecorder Besides the graphics, the main addition is sound. Aside from a librarian explaining how a device works, or how to navigate through some information space, sound can be used to reinforce an action or correct a mistake. Farallon makes a package called MacRecorder that works with MediaTracks. The hardware is a bit larger than the Macintosh mouse, plugs into a serial port, and can record voice, input from a tape, VCR, CD, record, or radio in digitized format. The length depends on the amount of RAM you have, so be sure and remember who is going to play this tape. The default setting is for a ten second recording using 256 KB of RAM and sampled at 22Khz (about like AM quality sound). If you have a 5 MB Mac IIcx, and all the tapes are going to run on 1 MB Mac Pluses in the public area, you will have to keep your sound files short or compressed. Spoken word suffers if it compressed too much, but 30 seconds is not too long for a clip. Explaining what is happening is the most common use of sound; most libraries are not going to add theme music from Wheel of Fortune while you wait for the results of a complex Boolean search, though it might be fun to try. Be sure and have someone with a lively voice do the recording. Don't put the user to sleep. Prepare a script and storyboard once you have divided the Screen Recorder tape into clips. For each clip write the commentary, but make it brief. This can be on the Mac or on paper. You may have to make several takes and listen to each one until it sounds right. Buttons and HyperCard A MediaTracks file can be linked to HyperCard using buttons generated in the graphics palette when you edit individual screens. Each button can contain a HyperTalk script of 256 characters or less, so you can start a MediaTracks tape from HyperCard and then control HyperCard from within a tape. Many people have already used HyperCard for their tutorials, and will use selected tape segments from MediaTracks to augment an existing work. + Page 114 + The file can also be played with MediaTracks Player, an 87 KB application that may be distributed freely with the tapes that you make. It has a control panel that is similar to a VCR. Icons allow you to pause, stop, repeat, speed up, slow down, rewind, skip forward/backward, fast forward, step frame by frame, or hide the panel. Finally, the file may saved as a stand-alone tape with the player functions built-in. Double-clicking on the icon begins the tape. Wrapping it up All of these elements (sound, graphics, clips, and text) can be cut, copied and pasted between parts of the file you are constructing, from existing MediaTracks files and from other Macintosh applications. If you have a special sound in a HyperCard tour, it can be copied into a sound clip very easily. If you have an opening title clip from one tape, it can be used in another one. This makes it very easy to share and customize library instruction done for another library. One of the drawbacks for distribution by floppy disk is the size of the final files. Uncompressed sound chunks at 256 KB each quickly pushes the file size over the capacity of an existing floppy disk. You can break up your tapes into pieces that will fit on an 800 KB or 1.4 MB floppy. If you are going to transfer files by hard disk or tape backup, you have no limitations on size. For all of you DOS users: by running a program called PC-Soft, you can make tapes of actual DOS program sessions and then use the Mac to teach new users programs for either operating system! The manual is well written and includes a bibliography for further reading. For advanced users there are sections that help you set up menus, multilevel presentations, and quiz clips which can take the user back to elements of your demo for reinforcement. The Apple Library Users Group (10381 Bandley Dr. MS: 8C, Cupertino, CA 95014) has a template exchange for database management and HyperCard templates. With MediaTracks we expect to be exchanging tapes of common library activities: searching CD-ROMs, using Internet and BITNET resources, and demonstrations of OPACs. While there are none yet, perhaps this review will help you decide to share your own efforts. + Page 115 + Product Information MediaTracks Farallon Computing, Inc. 2000 Powell Street, Suite 600 Emeryville, CA 94608 (415) 596-9000 Prices (U.S. Dollars): MM100 MediaTracks--295.00 (if you already own MacRecorder) MM110 MediaTracks Multimedia Pack--495.00 (includes MacRecorder) MM111 MediaTracks Multimedia Pack - CD ROM Version--495.00 (includes many sample MediaTracks demos) MR200 MacRecorder Sound System 2.0--249.00 (Includes HyperSound, HyperSound Toolkit, and SoundEdit) About the Author Steve Cisler Senior Scientist 10381 Bandley Drive Cupertino, CA 95014 (408) 974-3258 SAC@APPLE.COM ----------------------------------------------------------------- The Public-Access Computer Systems Review is an electronic journal. It is sent free of charge to participants of the Public-Access Computer Systems Forum (PACS-L), a computer conference on BITNET. To join PACS-L, send an electronic mail message to LISTSERV@UHUPVM1 that says: SUBSCRIBE PACS-L First Name Last Name. This article is Copyright (C) 1990 by Steve Cisler. All Rights Reserved. The Public-Access Computer Systems Review is Copyright (C) 1990 by the University Libraries, University of Houston. All Rights Reserved. Copying is permitted for noncommercial use by computer conferences, individual scholars, and libraries. Libraries are authorized to add the journal to their collection, in electronic or printed form, at no charge. This message must appear on all copied material. All commercial use requires permission. ---------------------------------------------------------------- + Page 96 + ---------------------------------------------------------------- The Public-Access Computer Systems Review 1, No. 3 (1990): 96-99. ---------------------------------------------------------------- ----------------------------------------------------------------- Public-Access Provocations: An Informal Column ----------------------------------------------------------------- "Future User Interfaces and the Common Command Language" by Walt Crawford With any luck at all, 1991 will finally see adoption of ANSI/NISO Z39.58, Common Command Language (CCL). It's been a long, difficult process to nail down a standard that can provide a common means of access across many different catalogs and online systems. But according to some people, it's too late: command languages will be irrelevant for the online catalogs of the future. These brave new catalogs will use Graphic User Interfaces (GUIs) or WIMPs (Windows, Icons, Mice, and Pull-down Menus); patrons will thus be guided painlessly and intuitively to the material they need. Well, maybe. I'd love to see the icon for "books about Japanese baseball, published since 1980 in English." Or, more simply, the icon that will tell me whether the library has Norman Mailer's book with a title something like "Fire on the Moon" without plowing through dozens of authors and titles. (The title is "Of a Fire on the Moon," so an alphabetic browse just might take a while.) Painless? Intuitive? Plausible on a dial-up line from home at 2,400 bps (if you're really lucky)? No, this isn't going to be a jeremiad against GUIs or an assertion that commands are the only good way to use a catalog. But I will assert that access to a command line continues to offer the fastest and most powerful way to perform complex searches (where "complex" can be defined as anything other than a one-index phrase search), and that access to direct command entry would improve the usefulness of non-command-driven catalogs for frequent users and dial-up/network users. + Page 97 + CCL as a Secondary Interface? CCL, probably the most widely-implemented not-yet-adopted standard in the history of NISO and Z39, could become the universal secondary access technique, available to power users and dial-up/network users as an alternative to the user-friendly, bandwidth-intensive, hardware-dependent, slow for complex searches, GUI interface that is so much fun to use the first time around. Probably not all of CCL; most of the set-manipulation capabilities and macro-creation capabilities are useful for professional online searchers but overkill for patrons. Instead, I'd expect to see "secondary CCL" looking more like the partial CCL implementations that have been around (in some cases) for a decade or more: the West Coast Group--BALLOTS/RLIN (the original), Melvyl, Orion, Carlyle, and the like. Yes, you can implement the logic of CCL in a GUI with icons, buttons and dialog boxes for the inevitable search text, and it would make an interesting design; I'd love to try one out. But it makes sense to have plain old CCL available from the keyboard as well; why penalize library users who find text comfortable? The Return of the Command Line I find it interesting that one significant improvement in PC Tools Deluxe 6 over PC Tools Deluxe 5.5 is that Version 6, which uses a well-designed "graphic" user interface, includes a command line within the interface window. You don't ever need to use it--but when you want the speed and power of the DOS prompt, you can mouse down to it and use it. Amiga users have noted for some years that they have the best of both worlds: the Amiga user interface is GUI in the extreme, but a command line is immediately available for the times when it's the best, fastest way to get the job done. + Page 98 + Understand, I do use GUIs. I can't imagine using Ventura Publisher as a pure command-driven system; ditto for any painting or drawing program. When I'm revising text in Microsoft Word, the mouse does come into play--and it certainly gets used in Quattro Pro. And yes, I find PC Tools much easier and more powerful at home (with a mouse and color screen) than at work (without a mouse, and with a monochrome screen). I'm text-oriented, but I'm no bigot. Click on the Jar, then the Anteater, then the Piano. . . Two or three years ago, two or three of us considered designing a truly graphic online catalog interface as a joke (after you got past the icons for indexes, you'd have twenty-six icons to narrow the search: an Anteater, a Bell, a Cat, a Dog. . .on up to a Xylophone, Yacht and Zebra). We never prepared the demo for two reasons. For one thing, back then it would have been quite a bit of work. More importantly, though, we realized that people would take it seriously--after all, words are such a nuisance when you're looking for a book! Comments? What do you think? Does the future really omit the command line, or will mixed environments thrive? (Tried any good touch-screen catalogs lately?) Those aren't simply rhetorical questions. I'm gearing up for another project on patron access, and your comments might help me to broaden my narrow-minded perspectives. Please send brief comments to my e-mail address and more lengthy ones to my regular mail address. Meanwhile, whether in its pure form or embedded within a rich graphic interface, CCL offers the best chance for common entry points to diverse online systems. I hope to see it popping up in new offerings and revisions of current offerings, old-fashioned as commands may be. + Page 99 + About the Author Walt Crawford The Research Libraries Group, Inc. 1200 Villa Street Mountain View, CA 94041-1100 BR.WCC@RLG.BITNET ----------------------------------------------------------------- The Public-Access Computer Systems Review is an electronic journal. It is sent free of charge to participants of the Public-Access Computer Systems Forum (PACS-L), a computer conference on BITNET. To join PACS-L, send an electronic mail message to LISTSERV@UHUPVM1 that says: SUBSCRIBE PACS-L First Name Last Name. This article is Copyright (C) 1990 by Walt Crawford. All Rights Reserved. The Public-Access Computer Systems Review is Copyright (C) 1990 by the University Libraries, University of Houston. All Rights Reserved. Copying is permitted for noncommercial use by computer conferences, individual scholars, and libraries. Libraries are authorized to add the journal to their collection, in electronic or printed form, at no charge. This message must appear on all copied material. All commercial use requires permission. ---------------------------------------------------------------- + Page 58 + ----------------------------------------------------------------- Delaney, John. "LITMSS: Princeton's SPIRES Manuscripts Database." The Public-Access Computer Systems Review 1, no. 3 (1990): 58-76. ----------------------------------------------------------------- 1.0 Introduction LITMSS is an online database of information about modern (post- 1500 A.D.) manuscript holdings of Princeton University Library's Department of Rare Books and Special Collections. It emphasizes eighteenth-, nineteenth-, and twentieth-century manuscripts whose primary language is English, though significant Spanish and French holdings are described as well. It includes, at the collection level, all of the Department's administrative units that house manuscripts--Manuscripts, Theatre, Western Americana, and Mudd Library of Public Affairs Papers--except Archives (to be added). In addition, it contains in-depth information about folder- and item-level holdings of the Manuscripts Division and several units' miscellaneous manuscripts files. Over 15,500 individuals have been indexed in LITMSS--artists, novelists, presidents, generals, scientists, educators, etc.--and several thousand subjects (defined with Library of Congress subject headings) have been identified/associated with the 1,000 collections described in the database. In all, over 55,000 records are searchable through both find (keyword) and browse (phrase) indexes. This article covers the evolution of the database, the scope and contents of its records, the public "face" of the database in FOLIO, searching and display capabilities, and its structure of interrelated SPIRES subfiles. 2.0 Brief History of Automation Efforts Princeton University Library's Department of Rare Books and Special Collections consists of a group of subject-oriented administrative units, each of which has its own curator, its own staff, and its own physical location, including reading room and reference area. Some are located within Firestone Library, on different floors; others are housed in different campus buildings. All possess manuscripts and/or "special collections" of materials that are commonly part of manuscript collections, such as photographs. It has taken a decade to achieve the kind of centralized control over all of the Department's manuscript collections that exists today in the form of LITMSS. + Page 59 + With the goal of producing a guide to its literary holdings, the Manuscripts Division began in 1980 to create machine-readable records for its holdings. 2.1 ISIS The first database employed a batch mode version of ISIS (Integrated Set of Information Systems), a system developed by the International Labour Office and later updated and maintained by UNESCO. The Office of Population Research on the Princeton campus installed ISIS in 1977, and staff from that office helped the Manuscripts Division define the data elements it needed for a database of it own. Because of its literary focus, it was called LITMSS. ISIS capabilities allowed sorting on any field and subfield, and the system could combine sort fields for multilevel sorts. For example, a primary sort could be performed on a combined set of authors and corporate bodies, and secondary, tertiary, and other sorts could be done on other elements. Searches could use Boolean logic, and text in any field (ninety-eight fields had been defined!) could be searched. In addition, a flexible formatting language permitted one to format the output of queries in virtually unlimited ways, using vertical and horizontal spacing, conditional and unconditional literals, up to four levels of headings, and columns. All of this computer "magic"-- quite rudimentary in hindsight--had a profound and positive effect on departmental staff: members began to see tremendous possibilities for automation in manuscripts work. 2.2 SPIRES By 1985, after several years of a Title II-C grant and three years of funding from the National Endowment for the Humanities, the database had grown to over 45,000 records. Support for ISIS at the campus computer center continued to wane, however, as the university introduced newer, more state-of-the-art database management systems to its computer users. SPIRES, the Stanford Public Information Retrieval System, was a powerful product attracting a good deal of attention, and by 1984 the university had already joined the consortium of sites that were using it. Supported by the computer center and backed by a network of diverse users, SPIRES offered the Department an attractive alternative to ISIS. In the spring of 1986, the manuscripts database was converted to SPIRES, beginning its online phase. + Page 60 + The database was still not publicly available, but its printouts were. Periodically, usually once each year or after every 5,000 records, the database was dumped, providing a multi-volume printout of entries, sorted by author. At a glance, the user could find the locations and descriptions of all manuscripts pertaining to a particular person that had been indexed to date. The manuscripts curator often photocopied pages of the reference work to aid in her answering of mail queries. Printed indexes, identifying collections of manuscripts by subjects and forms of material, were also available. In addition, printouts could be customized by request. It became clear during these years that the publication of a literary guide was too limited a goal, for it would only partially represent the variety and significance of the Department's holdings. As a result, a concerted effort began in 1986 to fully describe, at the collection level, all of the manuscript collections in all of the Department's administrative units. With the publication in July 1989 of A Guide to Modern Manuscripts in the Princeton University Library (Boston: G.K. Hall & Co.), that larger goal was accomplished. 2.3 Public Access to LITMSS Once the size of the database had reached a "critical mass", public access to it made more sense. The printouts had always been available--at least to readers that visited the Department-- but now the power and convenience of the computer, staff thought, could and should be made available to anyone who had access to the university's mainframe. During this period of gradual shift in departmental emphasis, from needing to intellectually control the Department's manuscript holdings to wanting to expand access to, and promote use of, the material, the main library was closing its card catalog and providing only online access to post-1980 (January) cataloged acquisitions. In addition, local network connections to the online library catalog were opened so that access was available from personal computers anywhere on campus. + Page 61 + Into this rapidly developing environment of accessible information, LITMSS made its first public appearance in the fall of 1989 through a SPIRES interface called FOLIO, where the database is simply called "Manuscripts." In Folio, data is displayed line by line; hence, full-screen terminals are not needed, thereby broadening its applicability. Only searching is permitted, and only selected data elements may be seen. In addition, searches can be logged so that database owners can see how the database is being used and whether users are having any problems. Since the campus computer center is mounting other public databases (like GPO documents) in FOLIO, the Department hopes that this shared interface will promote use of LITMSS even more. Local or remote network users can access the FOLIO database using an anonymous logon capability. Some system capabilities (i.e., saving, printing, and mailing searches) are only available to users with regular accounts on a Princeton mainframe. 3.0 Collections in LITMSS As a departmental manuscripts database, LITMSS describes manuscript holdings of the whole Department of Rare Books and Special Collections, not just its Manuscripts Division. Other administrative units of the Department maintain manuscript collections that pertain to their particular subject orientations, and these collections are represented in LITMSS. Excluded, however, are manuscripts in non-Romance languages, such as Persian and Arabic, medieval codices, papyri, and cuneiform tablets. The emphasis is on post-1600 ("modern") manuscripts in English, with lesser amounts in Spanish and French. Below is a brief summary of each unit's covered "manuscript" [1] holdings and the names of some representative collections. + Page 62 + 3.1 Manuscripts Division The Manuscripts Division has over 650 collections, ranging in size from one box of documents of the signers of the Declaration of Independence to hundreds of boxes in the Archives of Charles Scribner's Sons, the New York publisher. Its strengths are in American and English history and literature. It includes the F. Scott Fitzgerald Papers, the M. L. Parrish Collection of Victorian Novelists, the records of Henry Holt & Co., the archives of Story Magazine and Story Press, several Ernest Hemingway collections, the Janet Camp Troxell Collection of Rossetti Manuscripts, the Mario Vargas Llosa Papers, a Woodrow Wilson collection of personal and family papers, and the Andre de Coppet Collection of Americana, including manuscripts of all the presidents from Washington to Truman. 3.2 Seeley G. Mudd Manuscript Library The Seeley G. Mud Manuscript Library has over 150 collections, ranging in size from one box of documents relating to Adolf Hitler to hundreds of boxes of the American Civil Liberties Union. Its strengths are in twentieth-century statecraft and public affairs papers. It includes the John Foster Dulles Papers, the David E. Lilienthal Papers, the Albert Einstein Duplicate Archive (photocopies), Fight For Freedom, Inc., Archives, Council on Books in Wartime Archives, and the James Forrestal Papers. 3.3 Theatre Collection The Theatre Collection has over 100 collections, ranging in size from one box of material relating to calypso music to hundreds of boxes in the Warner Bros. It is an archive that contains only business records. Its strengths are in performing arts and popular entertainment. It includes the William Seymour Family Papers, the McCaddon Collection of the Barnum and Bailey Circus, manuscripts of Woody Allen, the McCartre Theatre (Princeton) Archives, and correspondence of Luigi Pirandello. + Page 63 + 3.4 Western Americana Division The Western American Division has over 50 collections, ranging in size from a portfolio of photographs of Eskimos to hundreds of boxes of the Association of American Indian Affairs. Its strengths are in overland narratives, Mormon material, indigenous American languages, and twentieth-century American Indian affairs. It includes the Philip Ashton Rollins Collection, cattle ranch account books, the Herbert S. Auerbach Collection on Mormons and Indians, and San Juan Pueblo records. 4.0 LITMSS Records Of the more than 55,000 records in LITMSS, only about 1,000 are collection records (for the 1,000 collections in the Department); the rest are indexing records. Each collection record describes a manuscript collection (as defined before), and includes such elements as main entry (if appropriate), collection name, range of dates of the material, scope and contents, physical size (in cubic feet), arrangement (the organization of the manuscripts and any series names), subject/title/form headings appropriate to the material, and any restrictions that may pertain to the collection. Acquisition and other in-house information are present in the collection record and are available to departmental staff, but such elements are purposefully omitted from the FOLIO displays. Indexing records, which constitute the bulk of the records in LITMSS, describe folder- or item-level holdings of manuscripts of specific individuals. The purpose of this indexing is to make known the whereabouts (i.e., non-obvious locations) of manuscripts of "significant" [2] individuals and to provide the Department an additional measure of security over its holdings. A JOHN DOE collection of manuscripts would be described in LITMSS in a collection record. Manuscripts of others in the collection--his correspondents, for example--would be described in indexing records. (Note: nothing of John Doe would be indexed for him in his own collection). To date, the manuscripts of approximately 15,500 individuals, representing many academic disciplines and vocations, have been indexed. + Page 64 + Each indexing record contains the following elements: (1) main entry; (2) collection name; (3) series name (if appropriate); (4) box; (5) folder; and (6) a manuscripts "structure" (a SPIRES name for a group of related elements that always occur together) that describes the number of manuscripts, the type of manuscript(s), the inclusive date(s) of the manuscript(s), and the manuscripts themselves. Depending on the specific location (collection/series/box/folder), an indexing record may describe a single item or many. To date, only about 350 of the Manuscripts Division's collections have been indexed. In addition, each of the units' "miscellaneous" manuscripts files, into which single accessions are placed (for example, one George Washington letter donated by an alumnus), have been indexed, as well as the modern manuscript holdings of the Department's Robert H. Taylor Library of English and American literature. Among the many authors amply represented in the library are Richard Brinsley Sheridan, Max Beerbohm, members of the Trollope family, Bernard Shaw, Virginia Woolf, Henry James, the Bronte sisters, and Thomas Hardy. 5.0 Searching LITMSS LITMSS contains two sets of indexes for retrieving records, FIND (keyword) and BROWSE (phrase) indexes. 5.1 FIND Searches The FIND indexes are word indexes that take the user's search terms and respond with records whose specified elements contain those words. LITMSS makes eight FIND indexes available through FOLIO. + Page 65 + ----------------------------------------------------------------- Table 1. FIND Indexes ----------------------------------------------------------------- Index Name Description Sample Search Term(s) AUTHOR Creator of a manuscript Ernest Hemingway NAT Nationality of the author French, Italian, Swiss [3] ID Brief identity of the author Journalist, poet, senator DISC Author's discipline / field Biology, history, government COLL Name of a ms. collection Allen Tate Papers YEAR Year date of a manuscript 1955, 1778, 1812, 1920s MS General type of manuscript Letter, document, volume STF Collection subjects/titles/forms Civil War, bills of lading ----------------------------------------------------------------- Boolean operators (AND, OR, NOT) are permitted with all FIND indexes. As a result of this flexibility, rather sophisticated searches are possible. For example, it is possible to do a search for letters written by Italian poets during the 1920s. The basic format of a search command using a FIND index is find [index name] [search term] Here are some examples: find nat japanese find id historian and date 1920 fin aut mark twain FIND and index names can be abbreviated. + Page 66 + Truncation (with the # sign) searching is also allowed on all of these indexes: fin year 192# fin stf indian# Except for the YEAR index, which can only retrieve indexing records, and the STF index, which only retrieves collection records, the FIND indexes make no distinction between the two types of records in LITMSS: both may be displayed in a search result, depending on the extent of Princeton's holdings. The Department may have a JOHN DOE collection, several JOHN DOE letters distributed among a few collections, or both. A search for JOHN DOE material would find all of these records. For example, a search for Aaron Burr material would produce the following screen on the user's terminal or PC. ----------------------------------------------------------------- Figure 1. Search for Aaron Burr ----------------------------------------------------------------- Manuscripts / Search: Find AUTHOR AARON BURR Result: 42 records 1) Burr, Aaron, 1716-1757 / [Collection *], Aaron Burr (1716- 1757) Collection / Consists of Burr manuscripts, correspondence, and documents dating from the period (1748-1757) he was president of the College of New Jersey, now Princeton. Included are original manuscripts of sermons, a Latin oration, and letters and documents, as.../ Date(s): 1750-1761 / Size: 1 box 2) Burr, Aaron, 1716-1757 / [Collection *], General Manuscripts [Bound] / 1 volume(s), 1753-1758 3) Burr, Aaron, 1716-1757 / [Collection *], General Manuscripts [Misc.] / 1 document(s), 1755 ... ----------------------------------------------------------------- By default, FOLIO displays retrieved records in a brief display and numbers them on the left side for reference (see Section 6.0 for information about the display format and what it reveals). In the above example, the first record is a collection record (i.e., an Aaron Burr collection) and the other two are indexing records. + Page 67 + If one scanned the rest of the 42 records retrieved in this search, he would see that Aaron Burr's son, Aaron Burr (1756- 1836), the famous duelist with Alexander Hamilton, is also represented because both share the same name. To find only the father's records, one would have to add a date in the search phrase: "find author aaron burr 1716." In the AUTHOR index, real names and pseudonyms are indexed together so that a search under one name will retrieve the same records as a search under the other. For example, searching under "Mark Twain" will find the same records as searching under "Samuel Langhorne Clemens." (How this works is described in the Section 7.0.) 5.2 BROWSE Searches The BROWSE indexes are phrase indexes that attempt to match the user's whole search phrase with headings in the database's records. The system responds with an alphabetical listing of headings drawn from records that include the phrase or, that failing, contain headings which alphabetically precede and follow the user's phrase. In this way, the user can browse through headings as if he/she were using the library's card catalog. There are two BROWSE indexes available in FOLIO for LITMSS: name and subject. ----------------------------------------------------------------- Table 2. BROWSE Indexes ----------------------------------------------------------------- Index Name Description Sample Search Phrase NAME Phrase of author's inverted name Hemingway, Ernest, 1899- 1961 SUBJECT Added entry for a collection United States-- Civil War... ----------------------------------------------------------------- + Page 68 + The basic format of a search command using a BROWSE index is browse [index name] [phrase] Here are some examples: browse name twain, mark bro sub tammany hall BROWSE and index names can be abbreviated. Truncation (without using the # sign) is automatic: bro name james, h bro sub united states--history--civil The BROWSE feature of FOLIO is an especially useful one because the user, as he/she browses, also sees the number of records associated with each heading. For example, browsing in the name index for "burr, aaron" would retrieve the following result. ----------------------------------------------------------------- Figure 2. Example BROWSE Search ----------------------------------------------------------------- Manuscripts / Search: Browse NAME BURR, AARON Result filed under the following headings: -3) Name: Burnshaw, Stanley, 1906- (5 records) -2) Name: Burnside, Ambrose E. (Ambrose Everett, 1824-1881 (3 records) -1) Name: Burpee, Lawrence J. (Lawrence Johnston), 1873-1946 (1 record) 0) Name: Burr, Aaron, 1716-1757 (14 records) 1) Name: Burr, Aaron, 1756-1836 (28 records) 2) Name: Burr, Amelia Josephine, 1878- (1 record) 3) Name: Burr, Anna Robeson, 1873-1941 (4 records) ----------------------------------------------------------------- + Page 69 + For reference, FOLIO numbers the headings on the left, forward and backward from 0, which identifies the first heading that best matches the search phrase. One can see from this result that the two groups of Aaron Burr records equal the number of records retrieved in the FIND search described above (14 + 28 = 42). In other words, there are two ways to get the same author information. Similarly, there are two ways to retrieve subject information about Princeton's manuscript collections: the STF FIND index and the SUBJECT BROWSE index. [4] Browsing subjects, however, is more successful if one is familiar with Library of Congress Subject Headings since they are used in the collection records. FOLIO recognizes the dash ("--") in search phrases, and thus its presence or absence can make a difference in the results one obtains. A search for Civil War collections could be phrased "fin stf civil war" for the STF index, but in the SUBJECT index one would have to know that the appropriate subject heading for the Civil War is "United States--History--Civil War, 1861-1865." Omitting the first dash in the latter phrase would produce very different results. (In the BROWSE indexes the system always attempts to find a match character by character, starting from left to right.) 6.0 Displaying Records Users can see LITMSS records in either brief or full displays. 6.1 Collection Records For collection records, the brief display consists of the name of the main entry (if the collection has one), the name of the collection, the first 250 characters of the record's scope note, the inclusive dates of the collection, and its size (number of boxes, containers). + Page 70 + The full display for collection records has three parts--Name, Location, and Description--each of which can be displayed independently if desired. The Name section provides the main entry's full name (AACR2 form), a brief biographical phrase about him/her/it, and any "disciplines," or occupational fields, for which the main entry is known. The Location section identifies the administrative unit of the Department that houses the collection, providing the collection's name, dates, and physical characteristics. In the Description section, the display provides the record's complete scope note, arrangement (if the collection is greater than one box in size), and list of related subject, title, and form headings. A brief display of a collection record is shown below. ----------------------------------------------------------------- Figure 3. Brief Display of a Collection Record ----------------------------------------------------------------- Burr, Aaron, 1716-1757 / [Collection *], Aaron Burr (1716-1757) Collection / Consists of Burr manuscripts, correspondence, and documents dating from the period (1748-1757) he was president of the College of New Jersey, now Princeton. Included are original manuscripts of sermons, a Latin oration, and letters and documents, as... / Date(s): 1750-1761 / Size: 1 box ----------------------------------------------------------------- + Page 71 + A full display of a collection record is shown below. ----------------------------------------------------------------- Figure 4. Full Display of a Collection Record ----------------------------------------------------------------- Name Burr, Aaron, 1716-1757 American Presbyterian clergyman, president of the College of New Jersey (Princeton) Discipline(s): religion, education Location Manuscripts Division [Collection *], Aaron Burr (1716-1757) Collection Date(s): 1750-1761 Size (in cubic feet): 0.25 Container count: 1 box Description Consists of Burr manuscripts, correspondence, and documents from the period (1748-1757) he was president of the College of New Jersey, now Princeton. Included are original manuscripts of sermons, a Latin oration, and letters and documents, as well as photostats and copies of additional material. There are also a contemporary silhouette of Burr and a letter, dated 1761, presenting a bill to his estate. Subjects/Titles/Forms of the Manuscripts: American orations--Colonial period, ca. 1600-1775 Burr, Aaron, 1716-1757--Silhouettes Clergy--United States--18th century--Letters College presidents--New Jersey--Princeton-- 18th century--Letters Presbyterian Church in the U.S.A.--Clergy-- 18th century--Letters Princeton University--History--Colonial period, ca. 1600-1775--Sources Sermons, American--18th century Silhouettes--United States--18th century ----------------------------------------------------------------- An asterisk in the collection name signifies that the collection has been processed and indexed. + Page 72 + 6.2 Indexing Records For indexing records, the brief display consists of the main entry, the name of the collection, the number of manuscripts described in the record, the type of manuscript(s) described, and inclusive dates. The full display contains the same three parts (Name, Location, Description) offered for a collection record, but the Location and Description elements are different. In an indexing record, the Location section identifies the specific address of the manuscript(s) being described: administrative unit, collection name, series name, box number, and folder title or number. The Description section expands the information in the brief display by adding a full description element. A brief indexing record is shown below. ----------------------------------------------------------------- Figure 5. Brief Display of an Indexing Record ----------------------------------------------------------------- Hemingway, Ernest, 1899-1961 / [Papers *], Sylvia Beach Papers / 1 document(s), 1923 ----------------------------------------------------------------- A full indexing record is shown below. ----------------------------------------------------------------- Figure 6. Full Display of an Indexing Record ----------------------------------------------------------------- Name Hemingway, Ernest, 1899-1961 American novelist, journalist, storywriter Discipline(s): literature Location Manuscripts Division [Papers *], Sylvia Beach Papers Box: 171 Folder: Corres. re Illustrations Description Number of original manuscripts: 1 Manuscript type: document Date(s): 1923 Description: photograph of Hemingway in Sylvia Beach's bookshop (Paris), SHAKESPEARE AND COMPANY ----------------------------------------------------------------- + Page 73 + 6.3 Other LITMSS Output Features To display LITMSS records in FOLIO, one uses the reference numbers on the left side of the search display to specify which records are wanted. For example, in the Aaron Burr search described previously that resulted in 42 records found, one could issue the command "display full 35" (abbreviated "df 35") to see a full display of the 35th record, or one could ask to see a range of records ("display full 20-24"). With a large search result, one can use the SCAN command to move back and forth through the records; for example, typing the command "scan 30" would cause the system to start its display over beginning at the 30th record. FOLIO also permits the user to print search results on a system printer or any other named printer, to save results in computer files, and to "mail" results over electronic networks to other accounts; the records can be in either brief or full form. 7.0 LITMSS Subfiles Besides the main MANUSCRIPTS subfile [5] in which collection and indexing records reside, LITMSS consists of several other linked subfiles, including an AUTHORS subfile and a COLLECTIONS subfile. While they are invisible to the user of LITMSS in FOLIO, they contain indexes that are indirectly used in some of the FOLIO searches. The linkages are provided by code numbers: an author code number and a collection code number. The use of these numbers in collection and indexing records makes inputting and updating of records easy and efficient. A typical author record in the AUTHORS subfile looks like this. ----------------------------------------------------------------- Figure 7. Example Author Record ----------------------------------------------------------------- AUTHOR.CODE 00797 AUTHOR Twain, Mark, 1835-1910 ALIAS Clemens, Samuel Langhorne NATIONALITY American IDENTITY novelist, humorist, storywriter DISCIPLINE literature REFERENCES OxAm AmA&B DcLEnL ----------------------------------------------------------------- + Page 74 + The main entry of all collection and indexing records contains the particular author's five-digit code number, not his/her name or pseudonym. For example, when inputting Mark Twain records, the cataloger only has to specify "00797" in the author element. If, at a later date, new information becomes available, such as a death date or the addition of a middle name, only the AUTHORS record has to be modified--all of the associated collection and indexing records can remain untouched because they are still linked by the author code number, which never changes. When author information is actually provided in FOLIO, the author record from the AUTHORS subfile is called up by the full display format. Searches that use the AUTHOR, NAT, ID, and DISC indexes are actually using AUTHORS subfile indexes to retrieve the appropriate author codes, which are then searched in the MANUSCRIPTS subfile to find the associated collection and indexing records. In the same way, collection code numbers used in the COLLECTIONS subfile simplify cataloging and updating for the Department's processing staff. And, in users' searches, the subfile becomes a "lookup table." For example, the record below is for the Aaron Burr Collection in the COLLECTIONS subfile. ----------------------------------------------------------------- Figure 8. Example Collection Record ----------------------------------------------------------------- COLLECTION.CODE C0090 COLLECTION.NAME [Collection *], Aaron Burr (1716-1757) Collection ----------------------------------------------------------------- A search for all of its records, both the collection record and associated indexing records, "looks up" the collection name ("Aaron Burr Collection") in the COLLECTIONS subfile to find its specific collection code (C0090) and then uses that number in the MANUSCRIPTS subfile. + Page 75 + 8.0 Conclusion LITMSS continues to grow. In the course of a year, approximately 3,000 to 5,000 indexing records are added to the database, representing an additional 500-600 "authors" that have not been established in the AUTHORS subfile before. Ideally, the Department would like to have all of its manuscript collections indexed and described in LITMSS and to be able to stay current with new acquisitions. On the collection level, this last goal has been achieved, for a temporary collection record is created at the time each new manuscript collection is acquired. The record is updated after processing, which may or may not include indexing depending on departmental priorities and staffing. LITMSS collection records are also input into the AMC (Archives and Manuscripts Control) file of RLIN, the online bibliographic database of the Research Libraries Group. Given the backlog of unprocessed collections, however, which have been described in the 1989 Guide, the Department will probably always have to approach the first goal like an asymptote. While work continues at the campus computer center to ease access to all of the Princeton FOLIO databases, the Department is trying to arrange a more equal distribution of responsibility for inputting and updating LITMSS records--an arrangement whereby each administrative unit would manage its own records, the collection and indexing records that describe its manuscripts holdings. At the moment, all of that responsibility resides in the Manuscripts Division. Notes [1] Some of the collections in the Theatre Collection and Western Americana units of the Department are not "manuscript" collections. They are really "special collections" of non- manuscript material--photographs, posters, and playbills--that are unified by subject and place. The archival sense of the word collection, however, pertains to all of the units represented here: (1) an artificial accumulation of materials devoted to a specific subject, person, place, event, or type of material; or (2) a body of materials having a common source, created by a person or corporate body as a natural function of the activities he, she, or it pursues. [2] Generally, only a few series of a manuscripts collection are targeted for indexing, usually the correspondence series or author series likely to contain the manuscripts of "others" (i.e., other than the main entry). Even then, only those people are indexed for whom there are good, authoritative biographical reference sources. Given the time-intensive nature of authority work, this indexing remains selective, not exhaustive. + Page 76 + [3] There are so many English and American authors indexed in LITMSS that searches using these nationalities without Boolean qualification are not fruitful. [4] Both indexes only retrieve collection records. The subjects of manuscripts described in indexing records are not analyzed because of the obvious amount of work that would be required of processing staff. In effect, every indexed letter would have to be read and interpreted according to LCSH subject headings. [5] "Subfile" is a SPIRES term for a set of goal records, the indexes to those goal records, and the access and update restrictions that apply to the data elements of those records. In essence, a subfile is a database. About the Author John Delaney Leader, Rare Books and Manuscripts Cataloging Team Department of Rare Books and Special Collections Princeton University One Washington Road Princeton, NJ 08544 BITNET: Q3784@PUCC ----------------------------------------------------------------- The Public-Access Computer Systems Review is an electronic journal. It is sent free of charge to participants of the Public-Access Computer Systems Forum (PACS-L), a computer conference on BITNET. To join PACS-L, send an electronic mail message to LISTSERV@UHUPVM1 that says: SUBSCRIBE PACS-L First Name Last Name. This article is Copyright (C) 1990 by John Delaney. All Rights Reserved. The Public-Access Computer Systems Review is Copyright (C) 1990 by the University Libraries, University of Houston. All Rights Reserved. Copying is permitted for noncommercial use by computer conferences, individual scholars, and libraries. Libraries are authorized to add the journal to their collection, in electronic or printed form, at no charge. This message must appear on all copied material. All commercial use requires permission. ----------------------------------------------------------------- + Page 100 + ----------------------------------------------------------------- The Public-Access Computer Systems Review 1, No. 3 (1990): 100-108. ----------------------------------------------------------------- ----------------------------------------------------------------- Recursive Reviews ----------------------------------------------------------------- Hypermedia, Interactive Multimedia, and Virtual Realities by Martin Halbert When I fly to a conference I always find the moment of takeoff exciting. After the tedium of airport lines and the slow process of boarding, the engines rev up and you are suddenly thrust back into your seat as the whole aircraft seems to strain trying to vault into the sky. Then, in another moment, the perspective dramatically changes as the ground is left below, the vast tangle of roads and locations becomes abruptly apparent, as if one were looking down at a map. You know in your bones then that you are going somewhere, not just wasting time in Kafkaesque delays. Today, technologies like hypermedia and interactive multimedia are like a plane ready to take off, gathering momentum for a jump that promises to take us to a new information environment. Reading about these new computer tools one feels that we are heading to an exciting, but unknown destination. No one knows what the landscape of information technology in the 21st century will look like, but there are many sources that will sketch the most prominent features. This column will direct the reader to the best "guidebooks" to new interactive computer technologies like hypermedia and virtual reality simulations. In the spirit of Recursive Reviews, I won't try to limit the discussion artificially to "just" hypermedia, or "just" interactive multimedia. Instead, the aim will be to point out: (1) practical sources that orient the reader to the newest computer media technologies, and (2) new journals that discuss the possibilities of the media. + Page 101 + It may be objected that buzzwords like "hypermedia" and "interactive multimedia" are not much more than hype currently. The terms have been bandied about so much in the last few years that it would be easy to conclude that they are nothing but empty phrases that the industry has been using for impressive ad campaigns. I don't agree with this. I think these concepts represent a host of human-computer interaction ideas that the most innovative thinkers have been developing for years, and which are only now beginning to enter the mainstream. These concepts are being embodied in the best new computer applications, which will have dramatic impact on the work of all information professionals in the 1990's. Now is the time to become familiar with the issues surrounding these technologies. But aside from the practical impacts on our jobs, following the development of new computer technologies is refreshing, and re-inspires us in our work. Innovations in computer media are exciting news for libraries, which have only dealt with one information medium for millennia. As We May Think There have been many seminal works that touched on the idea of automated handling of large bodies of different kinds of media. No discussion of the area would be complete without at least mentioning Vannevar Bush's article "As We May Think" (Atlantic Monthly, July 1945) which discussed a device called the MEMEX that could retrieve and manipulate large quantities of microfilm, audio recordings, and other media that would be of use to a researcher. Bush had all the right ideas (e.g., multimedia and automated links between pieces of information), but his article is outdated because of the obsolete technological framework that he uses to discuss his ideas. + Page 102 + Computer Lib/Dream Machines The first fully developed exposition of the idea of computer manipulated media was the seminal book that introduced the term "hypermedia," Computer Lib/Dream Machines by Theodore Nelson (first published privately in 1974, later reprinted with extensive updates in 1987 by Microsoft Press). If you read only one new book this year, read Computer Lib. It is the most insightful (and inciting!) book on computers that I know of. Computer Lib was one of the most influential early works that promoted the idea of personal computers. It had several themes: (1) everybody should understand computers; (2) computer systems are difficult to use only because they are designed poorly; and (3) computers can be wonderfully empowering and enjoyable tools when designed well. The book is written in an engagingly chatty tone (the book was consciously modeled after Stewart Brand's Whole Earth Catalog and resembles it in many ways), and is full of tongue-in-cheek pronouncements like "Computers are just as oppressive [in the 1980s] as before, but smaller and cheaper and more widespread. Now you can be oppressed by computers in your living room." Despite (or perhaps because of) all the humor in it, Computer Lib is an illuminating survey of the major issues of making computers usable. The flip side of the book (literally flip side, the book is printed back to back with its sister title), Dream Machines, canvases the most important ongoing developments in graphical computer systems. If you want an entertaining, opinionated, informative book on the fundamental issues of user interfaces, read Nelson's book. Hypertext Hands-On! For a more sedate and neutral treatment of hypertext issues, turn to Ben Shneiderman. Shneiderman is currently the most prominent researcher in the field of human/computer interaction. His book Hypertext Hands-On! is an excellent introduction to the topic that lives up to its title by including a hypertext version of the text on floppy disks. + Page 103 + The book and hypertext are written in a very clear and concise style. Hyperlinks in both the electronic and print versions are easy to follow and logically arranged (unlike many hypertexts I've run across which are tangled and confusing). The Hyperties software runs fine on any PC-compatible, but, if you have a Hercules monochrome monitor, it's difficult to spot most of the text embedded hyperlinks. Because of this drawback, I preferred using the print version of the work (sigh). Shneiderman covers both theory and implementations of hypertext systems. In his chapter on "Systems" he gives neutral descriptions of all major hypermedia products that are currently on the market. Also included in the work are examples of possible hypertext applications and a review of major personalities in the history of hypertext. Hypertext Hands-On! could easily be used as a textbook introducing the subject of hypermedia, and it is worth reading by anyone interested in the field. BYTE For those interested in the nitty-gritty of current computer systems and what they can offer, there is no better source than the many trade journals and tabloids of the computer industry. I offer up BYTE as a good one stop source for following personal computer technologies. It is not particularly biased toward one brand of computer, and is a monthly, so you will not be deluged by the amount of reading entailed in following weekly tabloids. The February 1990 issue had a particularly good in-depth section that analyzed what interactive multimedia means to different computer firms, what the pros and cons were of each company's system, and what new technical issues were raised by interactive media. My favorite article in the issue was "The Birth of the BLOB" by Tim Shetler, which discussed data storage implications of BLOBs (Binary Large OBjects, the nodes of multimedia databases). If you want to know why DVI is important to IBM, or why the Agnus blitter makes the Amiga display so good, read this issue of BYTE, and future ones too. + Page 104 + CD-ROM Professional A magazine that falls somewhere between the trade magazine and the academic journal is CD-ROM Professional. Subtitled "The Magazine for CD-ROM Publishers and Users," it is aimed at information professionals like librarians who want practical advice articles. It has many product reviews, how-to columns, and technology feature articles in each issue. Oriented specifically to optical storage topics, it is one of the best sources to follow interactive multimedia products in, since most of these products come out on CD-ROMs currently. The September 1990 issue is a good example of this journal. It had an interview with Sony's chief multimedia spokesman, Takashi Sugiyama, about where Sony is headed with the technology. The same issue had articles on problems encountered in CD-ROM technical support and how-to backup CD-ROM workstations. ACM Journals The Association for Computing Machinery generates a plethora of journals on all aspects of computer technology. Three ACM journals that are worth following regularly are the Communications of the ACM, Computer Graphics, and SIGIR Forum. Communications of the ACM features a special issue on interactive technologies like multimedia and hypertext roughly once a year. The July 1989 issue was devoted to interactive technologies and had several good articles on digital video. Computer Graphics (put out by ACM SIGGRAPH) is traditionally the place where the hottest, glitziest new research projects in computer graphics technology appear in living color. The March 1990 issue constituted the proceedings of the 1990 Symposium on Interactive 3D Graphics, and showed amazing new levels of sophistication. The issue is packed with project reports of the newest technological buzzword, "virtual realities." Also called microworlds, these are computer simulated environments. They may be close simulations of physical reality (useful for simulating physical systems), or they may be dazzlingly abstract environments like the higher-dimensional "hyperworlds" viewable with Columbia University's n-Vision system. + Page 105 + The SIGIR Forum, a publication of ACM's SIG on Information Retrieval, is an excellent journal for the information scientist in all of us. The Fall 87/Winter 88 issue had an article by Robin Hanson called "Toward Hypertext Publishing: Issues and Choices in Database Design" that is the best piece on the theoretical and practical concepts of hypertext systems that I have seen yet. The best feature of Hanson's article is the concise discussion of the various ways that one might run the fee structure on a commercial hypertext network. There are many other ACM publications that could be mentioned, but these three are particularly valuable sources. New interactive computer technologies are often dramatically different from the standard office software that we are accustomed to. I find it useful to follow journals that analyze the possible uses of new computer media. Two new journals, Hypermedia and Multimedia Review, feature scholarly discussions of next generation information technology. Hypermedia Hypermedia regularly reviews an eclectic variety of conferences and books related to hypermedia topics. Interestingly enough, its first issue had a review of William Gibson's seminal science fiction book Neuromancer in addition to more standard fare. In my opinion, this was entirely appropriate, considering the fact that many of Gibson's colorful SF concepts have been embraced wholeheartedly by software designers. My favorite Hypermedia article appeared in the Volume 1, Number 3 issue. It was a piece entitled "A Similarity-Based Hypertext Browser for Reading the Unix Network News," by Michael H. Anderson, Jakob Nielsen, and Henrik Rasmussen. The article described a prototype user interface called HyperNews that organizes incoming network news postings with hyperlinks following discussion streams and an automatic similarity/relevance rating feature (somewhat like fuzzy logic information retrieval systems). Although the system described was a prototype created solely for concept study, the need for systems like this to follow the colossal amount of electronic mail and forum postings is obvious (I often wish I had a working system like the HyperNews prototype to handle all the PACS-L messages I get every day). + Page 106 + Multimedia Review Multimedia Review is a fascinating journal that pledges "to acquire the kind of articles that give inspiration for reflection --for metacognitive understanding." Don't let the fancy language scare you off, this is a great journal to promote deeper understanding of the possibilities of multimedia. The articles often have catchy titles (my favorite title in the Summer 1990 issue was "Elements of a Cyberspace Playhouse" by Randal Walser), and are written by industry and academic experts in the field of multimedia systems. If the decade of the nineteen eighties was the era when the "personal computer" revolution came about, then the nineties may be the decade of the "personal simulator" revolution, and Multimedia Review may be its harbinger. Articles like Scott S. Fisher's "Virtual Environments: Personal Simulations & Telepresence" (Summer 1990 issue also) discuss current state-of- the-art systems in the historical context of what the designers are aiming for in the long run. As fact follows fancy we may all one day find ourselves working in virtual workspaces like William Gibson imagined in fiction, and Autodesk corporation has now implemented in actuality. Bringing It All Home A final anecdote may bring multimedia closer to home for you, as it did for me. As I was preparing to leave work today (eager to get home and finally finish this overdue column!) I took a break to try out a new computer that had appeared in the evaluation center of our campus computing center. It was a Silicon Graphics workstation, and as I logged on to the machine and explored some of its demo software packages I was staggered by the real-time animation capabilities of the machine. In twenty minutes, I had run through a fractal display system, an amazingly realistic flight simulator (it makes the latest version of Microsoft's Flight simulator look sick), a hilariously real looking interactive simulation of a Jello icosahedron bouncing around a room, a design tool for studying wave oscillation phenomena in surfaces, and a dazzling graphical visualization of a mechanical insect that obediently crawled after my cursor wherever I led it. + Page 107 + The image animation windows in all these applications were razor sharp, the kind of crispness that one sees in computer generated movie sequences like The Last Starfighter and Tron. The insect automaton moved realistically and cast a shadow. The illusion of depth and reality was dramatic. My point is that within this decade simulation technology like this will be on all our desktops! Interactive multimedia and hypermedia are technologies of the near future, and we librarians had better become accustomed to them and think about them before we are caught off guard. Besides, they are fun. I know I want another crack at that F-15 flight simulator. Perhaps next time I'll remember to bring up my landing gear so they don't get torn off at Mach 2. Books Reviewed: Nelson, Theodor H. Computer Lib/Dream Machines (Rev. Ed.). Redmond, Washington: Microsoft Press, 1987. (ISBN 0-914845-49-7) Shneiderman, Ben, and Greg Kearsley. Hypertext Hands-On!: An Introduction to a New Way of Organizing and Accessing Information. New York: Addison-Wesley, 1989. (ISBN 0-201-15171-5) Journals Reviewed: BYTE 15, No. 2 (February 1990). (ISSN 0360-5280) CD-ROM Professional 3, No. 5 (September 1990). (ISSN 1049-0833) Communications of the ACM 32, No. 7 (July 1989). (ISSN 0001-0782) Computer Graphics 24, No. 2 (March 1990). (ISSN 0097-8930) Hypermedia 1, No. 3 (1989). (ISSN 0955-8543) Multimedia Review 1, No. 2 (Summer 1990). (ISSN 1046-3550) SIGIR Forum 22, No. 1-2 (Fall 1987/Winter 1988). (ISSN 0163-5840) + Page 108 + About the Author Martin Halbert Automation and Reference Librarian Fondren Library Rice University Houston, TX 77251-1892 (713) 527-8181, ext. 2577 BITNET: HALBERT@RICEVM1.RICE.EDU ----------------------------------------------------------------- The Public-Access Computer Systems Review is an electronic journal. It is sent free of charge to participants of the Public-Access Computer Systems Forum (PACS-L), a computer conference on BITNET. To join PACS-L, send an electronic mail message to LISTSERV@UHUPVM1 that says: SUBSCRIBE PACS-L First Name Last Name. This article is Copyright (C) 1990 by Martin Halbert. All Rights Reserved. The Public-Access Computer Systems Review is Copyright (C) 1990 by the University Libraries, University of Houston. All Rights Reserved. Copying is permitted for noncommercial use by computer conferences, individual scholars, and libraries. Libraries are authorized to add the journal to their collection, in electronic or printed form, at no charge. This message must appear on all copied material. All commercial use requires permission. ---------------------------------------------------------------- + Page 30 + ----------------------------------------------------------------- Loney, George. "The University of Guelph Library's SearchMe Public-Access Catalogue." The Public-Access Computer Systems Review 1, No. 3 (1990): 30-43. ---------------------------------------------------------------- 1.0 Introduction The University of Guelph is a medium-sized university located in southwestern Ontario about 100 kilometers from Toronto. The library has been automating its various systems since the mid- 1960s, starting with electronic data collection devices for a batch-oriented circulation system. The systems that followed included a batch cataloguing system called Scope, the CODOC system, and the Geac online circulation system (co-developed with Geac). The Geac circulation system was expanded to include online public access, acquisitions, and cataloguing, all running on the Geac mini-computers. In 1987, the University of Guelph Library began a pilot project to determine the viability of individual CD-ROM workstations as a replacement for its centralized online catalogue. This storage medium for the nearly 900,000 record bibliographic database was chosen because it offered an extremely cost-effective method of distributing the 500-megabyte database to what is projected to be a network of over 100 workstations. The original version of the search software and database was the product of a commercial vendor. The pilot project determined that while CD-ROM was an acceptable medium for storing and retrieving the data, the software used during the pilot project was not desirable for the long term, and the inability to change the database would require frequent and costly remasterings. As a result, a database design was developed and tested that would allow the library to write its own search software, prepare its own database, deal directly with the CD-ROM manufacturers at a greatly reduced cost, and add changes to the CD-ROM data. This software project was started in May 1988, and the new system was installed in October 1988 on 25 workstations throughout the library. Since then, the system has completely replaced the old, centralized online public access system and is running on 85 workstations in the two library branches and on a few additional workstations in some academic departments. + Page 31 + This article will examine some of the issues surrounding the development of the SearchMe software relating to the user interface and implications of the use of the CD-ROM as the major storage medium. 2.0 User Survey Prior to the development of SearchMe, a survey of library users was conducted by the systems staff with the help of the reader service staff. Patrons were approached while they were using one of the publicly available search tools: the card catalogue (we still had one at the time), the online public access system, or the circulation inquiry system. Questions were asked to determine what information patrons had when they started a search, what it was they were looking for, and how well or poorly the current search tools satisfied them. A number of conclusions were inescapable: 1. Patrons learn how to use the library systems through different means, but self-teaching is the most usual method. 2. While patrons are concerned if system response time is slow, they become very frustrated when response time is inconsistent, e.g., when use is heavy. 3. Patrons migrate very easily from the card catalogue to computer-supported search tools. The only difficulty with the search tools is that many terminals or workstations are needed to prevent line ups. 4. Patrons using the automated search tools perceived that they had found most or all the available information. We were never able to establish how they knew that they had found "all" the information, but it was indicative of their perception that they were being adequately helped. As a result of this and other knowledge sources, we developed a design goal where the new system would attempt to: 1. Provide highly consistent response times no matter how high the user load or how many terminals were in the overall system. 2. Provide high functionality first and high speed second. 3. Be very consistent in its user interface and as intuitive as possible in its control functions. + Page 32 + 4. Provide context-sensitive help at all stages of system use. 5. Allow the novice user to become familiar with the search system with minimum formal instruction and permit the more experienced user to perform more complex searches. 6. Be very accurate in its information delivery and highly tolerant of user input error. We believe that SearchMe is very successful at meeting these goals. 3.0 Consistent Response Time SearchMe operates in a functionally distributed environment. Each workstation at the University of Guelph Library consists of a PC/XT clone with a 10 or 12.5 MHz 8088 chip, 640 KB of memory, an Ethernet card, one floppy drive, a 40 MB hard drive, an internal CD-ROM player, a monochrome monitor, and a rugged keyboard. There is a custom, lockable, front panel that covers the hard disk and CD-ROM player openings as well as blocks off the reset and turbo switches. The minimum hardware requirement for SearchMe is an XT with 640 KB of memory and a single floppy drive. The software will take advantage of colour monitors if they are present, and it will alter certain display characteristics for colour monitors. To reduce dependence on server or other response-time bottlenecks in the LAN, we make little use of the local area network. Changes to the catalogue database are transported automatically to the workstations during the night via the LAN. The workstation detects and transports software changes on start up, and requests for circulation information about patron or bibliographic records are handled by the LAN. If the LAN or the server is inoperative, the software recognizes this condition, and the affected functions are simply declared unavailable. + Page 33 + The key to consistent response times is the fact that each workstation contains the entire library catalogue database and its indexes on one resident CD-ROM. There is a limit of about 600-650 MB of data that can be put on a CD-ROM. We have our entire collection of about 900,000 bibliographic records on one CD-ROM disc, and we believe we can expand our database to about 1.2 million records without adding a second CD-ROM. If this possibility occurred (rather remote given current acquisitions budgets), we have several options: the text data could be compressed to reduce the amount of space required, machines could be twinned to share CD-ROM players, or machines could be clustered around a data server. Another advantage of a self-contained system is that functions that could previously be provided to users only with large (and expensive) centralized processors, are now possible with a microcomputer-based CD-ROM system since the computing resource is not shared by anyone else. Boolean searches on large collections of data can be provided with no penalty to the rest of the system. 4.0 Functionality As many functions as possible were considered in the design of SearchMe. Functions were rejected only if they were too complicated or were useful to only a very small group of users. As a result, the types of searches available on the system are: (1) full title search, (2) full author search, (3) full call number search, and (4) subject search. Subject search allows patrons to access data using: (1) titles; (2) corporate and personal authors; (3) call numbers; (4) Library of Congress Subject Headings; (5) material type names in the detailed holdings statements; (6) location names from the detailed holdings statements; (7) collection names; (8) any word from either the title, author, or subject heading fields; and (9) any word from most places in the record. These access points can be combined using the Boolean operators "AND," "OR," or "NOT." The full title, author, and call number searches allow a simple, single phrase search that our survey showed most people use to find much of the material they want. A further feature allows users to shelf browse forward and backward from any record they find. This capability closely corresponds to browsing the actual shelf because the database is organized in shelf sequence. + Page 34 + Users may also display search results on the screen, request a printout of the results, or save them as an ASCII file on a floppy diskette. Users may customize the output as they wish, and they may print, display, or save any result record. In addition, the system can link directly to current circulation status information so that users may request display of their own current biographical information, including items on loan, overdue fines, outstanding holds, and available holds. The system allows patrons to place holds on items and will automatically transfer them to the Circulation System. 5.0 Consistent User Interface In keeping with current user interface practice, a highly consistent interface has been implemented. The top of the screen is used to display messages about the current status of the search in progress; the middle of the screen is used to display index lists, search strategies, search results, and detailed help; and the bottom of the screen contains short directions to the user and error messages. No key is used for two different kinds of function command, and a special set of coloured key caps has been installed with customized legends (e.g., find by title, find by author, help, next record, and previous record). Assigning custom key caps has freed the screen for anecdotal directions (e.g., "Press one of the blue keys to start your search") instead of messages that are concerned solely with keyboard use. The largest key on the keyboard, coloured bright red, is the help key. When a user presses this key, a window pops up containing a description of the current screen. The amount of text that can be displayed in this window is unrestricted. + Page 35 + 6.0 Learning the System As our survey found, there are many ways that people learn to use the system. At the beginning of each semester, the library provides orientation classes that cover all the facilities available to our patrons. However, many users simply sit down at a terminal and start to use the system. As a result, we have made specific provisions for this type of approach. Use of the system itself is largely intuitive; the commands are printed right on the key caps. Using these and the screen prompts, many patrons can start doing simple searches without any previous instruction. Located at the various workstations are one-page instruction sheets that explain the purpose of the function keys and the contents of the index access points. Also available are scripts that lead the user through a sample search. 7.0 Searching To perform a simple search, the user presses the Find By Title key, the Find By Author key, or the Find By Call Number key (see Figure 1). ----------------------------------------------------------------- Figure 1. Main Screen ----------------------------------------------------------------- The University of Guelph Library Catalogue Access System Press one of the blue keys to do a simple search. Press the subject search key to perform combined term searches or to access indexes other than the title, author, or call number. ----------------------------------------------------------------- + Page 36 + The system then prompts the user to enter the title (see Figure 2), author, or call number and press Enter. ----------------------------------------------------------------- Figure 2. User Is Prompted to Enter a Title ----------------------------------------------------------------- The University of Guelph Library Catalogue Access System Enter Title: Type in the text that you wish to find and press the enter key. The system will search for the closest match to the text that you have entered. ----------------------------------------------------------------- When the Enter key is pressed, the system uses the search term entered to place the user as close as possible to the desired index entry (see Figure 3). Users can press the cursor control keys around the list of index entries until they have located the correct title, author, or call number. Then they can press the Display Result, Save Result, or Print Result keys to view, dump to diskette, or print the records. ----------------------------------------------------------------- Figure 3. List of Titles; Second Line Highlighted ----------------------------------------------------------------- The University of Guelph Library Catalogue Access System Enter Title: the large scale structure of space +-Index List-------------------------------------------------+ Large-Scale Sharing of Computer Resources 1 Large-Scale Structure of Space-Time 1 Large-Scale Structure of the Universe 3 Large-Scale Structures in the Universe 1 Large-Scale Superimposed Folds in Precambrian Rocks of... 1 Large-Scale Systems Modelling +------------------------------------------------------------+ Press the up, down, PgUp, PgDn keys to manipulate the index list. Display a highlighted record with the display result key. Press help for more information. ----------------------------------------------------------------- + Page 37 + While displaying records (see Figure 4), users can press the Page Up and Page Down keys to view multi-screen records. The Next Record and Previous Record keys are used to display other records in the result, and the Browse Forward and Browse Backward keys allow users to shelf browse around a specific result record. The red Undo/Esc key moves the user back one step at a time, and the Start Again key cancels everything and returns the screen to the beginning (see Figure 1). Any of the Find By keys also stop everything and start a new search. ----------------------------------------------------------------- Figure 4. Selected Record ----------------------------------------------------------------- The University of Guelph Library Catalogue Access System record 1 of 1 +-Bibliographic Window----------------------------------------+ Call Number QC 173.59.S65 H38 Title The Large Scale Structure of Space-Time Author Hawking, S. W. Edition Cambridge (Eng.) University Press, 1973 Contents Bibliography: p.373-380 Series Title Cambridge Monographs on Mathematical Physics Detailed Holdings: Cpy Location Mat'l Type Call Number 1 Science Book QA 173.59.S65 H38 +-------------------------------------------------------------+ Press cursor key to see more of the record. Press next or previous record to look at other records in the set. Press a browse key to browse forward or backward from this record. ----------------------------------------------------------------- The Subject Search key initiates complex searches (see Figure 5). The user is asked to choose an index from a list of All Keywords, Title Keywords, Author Keywords, Subject Heading Keywords, Titles, Authors, Subject Headings, Call Numbers, Material Types, Locations, and Collection Names. The system then prompts the user to enter the appropriate text and press the Enter key. After the search is conducted, the user is shown a list of index entries with the closest entry highlighted. The user selects a specific entry by moving the highlight around with the Up, Down, Page Up, and Page Down keys and then presses the Enter key. + Page 38 + ----------------------------------------------------------------- Figure 5. Select Initial or New Access Point ----------------------------------------------------------------- The University of Guelph Library Catalogue Access System +-Select Access Point-----------------------+ All Keywords Title Keywords Author Keywords LCSH Keywords Material Type Location Collection Name Title Author Library of Congress Subject Heading +------------------------------------------+ First, select an access point by pressing a cursor key to more through the list and pressing the enter key. ----------------------------------------------------------------- At this point, the procedure differs from the simple searches. A "Search Criteria" window opens, and the selected index entry moves into it. The system also builds a list of current result records and shows the user how many records are in it. Users can view, print, or save the results at any time, or they can continue to refine their results. To refine their results, patrons can enter another term and combine it with the previous terms by pressing the AND, OR, or NOT keys. Or, they can press the Change Index key to select any access point, enter another search term, and combine it with the previous results. The system maintains a continuous display of the search strategy and the result count (see Figure 6). Users can remove terms from the search by pressing the Undo key or delete the search by pressing the Start Again key. + Page 39 + ----------------------------------------------------------------- Figure 6. Multiple Access Point Combined Term Search ----------------------------------------------------------------- The University of Guelph Library current result: 1 Catalogue Access System Enter Keyword from Author: Hawking +-Index List-------------------+ +-Search Window-------------+ Hawkin 1 (keyword: space) AND Hawking 6 (keyword from author: Hawkings 352 hawking Hawkins 1 Hawkins-Whitehead 1 Hawkinson 3 Hawkridge 2 Hawks 37 Hawksley 3 Hawksworth 35 +------------------------------+ +---------------------------+ ENTER to add the highlighted entry to your search. Press OR, AND, or NOT to combine terms. DISPLAY RESULT to see current result of search. CHANGE INDEX to switch index. ----------------------------------------------------------------- 8.0 Current Information The major problem with systems that use CD-ROM as their data storage medium has been the inability to update the databases. As a result, CD-ROMs have tended to be used only in widely distributed, static database applications. At first glance, it would seem that a library catalogue is relatively static; after all only about five per cent of the database changes in one year. However, that five per cent represents over 40,000 records for a medium-sized university collection--a large number of changes by any measure. In our case, SearchMe was replacing a true online system. As changes were made to the database, they were immediately available to library patrons at the public terminals. The new system would have to be able to be updated on a regular, timely basis. SearchMe meets this objective in the design of its index system and its hardware configuration. + Page 40 + Each workstation is connected to an Ethernet LAN. Periodically, when the workstation is otherwise inactive, it checks the central data server to see if there are any changes to the database. If so, the changes are copied into the workstation's hard disk. These changes are logically merged into the original CD-ROM resident database so that the library patron never actually knows whether the information is being delivered from the database changes or the original database. 9.0 Error Tolerance There are two aspects to the system's ability to be tolerant of user errors: (1) how does it deal with incorrect control function commands, and (2) how does it react when search text is misspelled? In the first case, the system generates error messages that attempt to inform users that they have made an error and why. In the second case, the data retrieval software converts upper- case characters to lower case in both the entered text and the indexed text. Any punctuation (except for the call number) is changed to a space, and multiple occurrences of spaces are compressed to one space. In the title index, certain leading words (i.e., "the," "le," "la," and "les") are dropped unless they are the only word that was entered. Quite often, misspelled words will still result in the correct index entry display since the index mechanism attempts to find the entry that is "close to" the search term. 10.0 Technical Details Workstation software is written in the C language. We currently use Borland's Turbo C version 2.0. The screen management, text manipulation, and indexes were all written by our staff. The database generation software is also written in C and runs in the Unix environment. It, too, was written entirely by our staff. We currently send the prepared data to Discovery Systems in Columbus, Ohio, to have the CD-ROM discs made. + Page 41 + The indexing scheme, also designed by our staff, is very efficient in its use of space and provides excellent response times. Our current bibliographic database uses 331 MB of space. The total space used by all the indexes is 211 MB. Data indexed includes (1) 1,424,000 titles; (2) 602,500 authors; (3) 286,000 subject headings; (4) 829,000 call numbers; (5) 651,000 keywords; (6) 71,000 ISBNs and ISSNs; (7) 206,000 L.C. Card Numbers; (8) location names; (9) material types; and (10) collection names. We do not use character compression, although if space became a problem, we could. The CD-ROM disc is a very good device for serial access. It transfers data at much the same rate as a good hard disk; however, it is a slow random access device. For instance, the average seek time of a hard disk is about 30 ms whereas the CD- ROM needs between 270 and 340 ms. For this reason, the indexing scheme is optimized for the peculiarities of the CD-ROM medium (fewer than two disc seeks are required to go from search term entry to the closest occurrence of the term). Another two seeks are required to access the complete bibliographic record and display it on the screen. The system also attempts to predict user behaviour and pre-read data in order to speed the process even more. When run with a hard disk for storage, the software works very well and has extremely good response time. Because of the ability to update data on the CD-ROM, we normally create a new CD-ROM version only every eight months or so. This process costs us about $2,000 US for 300 copies of the disc. Every possible part of the SearchMe software was put into parameters. The parameters are pre-loaded and optimized so that SearchMe does not have to interpret the data. The parameters are loaded by a programme that runs under MS-DOS and checks that they are accurate and viable. Some of the features that are controlled by parameter are: o Size and location of the data display windows, plus the kind of outline and title of the window (if any). o Colour of the window outline, background, and text, and the colour and other attributes (e.g., flash and reverse video) of highlighted text. The programme alters these values if the system is using a monochrome monitor. o Prompts, error messages, field names, and help messages. o Format of the bibliographic record display. + Page 42 + o Whether or not commands will be entered using the special keyboard or pull-down menus. o The content of the pull-down menus. SearchMe supports multiple databases. The databases can be stored on CD-ROM, internal hard disk, or centralized (or distributed) data server. If there are multiple databases, users are given a menu of available databases and asked which one they wish to access. Each database uses its own parameter file so it is possible to configure each one quite differently from the others. Using multiple parameter files (which contain the prompts and other instructional text), it is possible to support multilingual applications by creating a parameter file for each language, where they all reference the same bibliographic data. 11.0 The Future SearchMe is only the first phase of the complete rewrite of our online library system. In December 1989, the cataloguing system was installed using the same type of architecture--distributed microcomputers accessing the main catalogue from a centralized server. With this approach, a highly sophisticated set of tools is available to the cataloguer, such as full-screen editing, interactive error detection, online coding manual, and online syntax checking. As with the SearchMe catalogue access, the system is highly reliable because it is not necessary for the central server to be available for work to continue. We are about to add a binding module to the system. The basis of our authority control system is already included in the system, and this will be fully implemented later this year. Work is just starting on the development of our new circulation system after which we will add acquisitions and serials control. We have also experimented with a low-cost optical scanner that we will use to scan and translate contents pages of incoming journals. From this, a SearchMe database of our journals, indexed by title, author, and keyword, will be maintained. + Page 43 + 12.0 Summary The advent of high-capacity, inexpensive, personal storage devices such as CD-ROM has made the development of practical, large database workstations possible. The movement away from a centralized super-mini or mainframe computer to functionally distributed microprocessor workstations has allowed the University of Guelph Library to provide a highly functional, cost-effective, flexible catalogue access system. Ultimately, it will offer us the ability to move much more quickly to take advantage of technological changes that benefit our user community. About the Author George Loney Staff Analyst University of Guelph Library Guelph, Ontario N1G 2W1 Canada BITNET: GLONWY@COSY.UOGUELPH.CA ----------------------------------------------------------------- The Public-Access Computer Systems Review is an electronic journal. It is sent free of charge to participants of the Public-Access Computer Systems Forum (PACS-L), a computer conference on BITNET. To join PACS-L, send an electronic mail message to LISTSERV@UHUPVM1 that says: SUBSCRIBE PACS-L First Name Last Name. This article is Copyright (C) 1990 by George Loney. All Rights Reserved. The Public-Access Computer Systems Review is Copyright (C) 1990 by the University Libraries, University of Houston. All Rights Reserved. Copying is permitted for noncommercial use by computer conferences, individual scholars, and libraries. Libraries are authorized to add the journal to their collection, in electronic or printed form, at no charge. This message must appear on all copied material. All commercial use requires permission. ---------------------------------------------------------------- + Page 51 + ----------------------------------------------------------------- Manojlovich, Slavko. "Mounting Commercial Databases Using the SPIRES DBMS." The Public-Access Computer Systems Review 1, No. 3 (1990): 51-57. ----------------------------------------------------------------- 1.0 Introduction Commercial databases like ERIC, DISSERTATION ABSTRACTS, and INSPEC have been publicly accessible through the various online search services for over 20 years. A relatively small number of universities and other institutions have acquired and mounted some of these databases on their local database management system (DBMS) for at least as long a period of time. A fairly recent phenomenon is the general belief and/or demand that universities should be locally mounting a variety of commercial databases. For those institutions with integrated library systems, the demand for locally accessible commercial databases is going one step further with the demand that access to these databases somehow be integrated with access to the library's catalogue. Integration can mean either the use of a common interface for searching both the catalogue and other databases or the creation of a link between the commercial databases and the library's serial holdings as reflected in the catalogue. The vendors of integrated library systems are beginning to respond to this new demand by offering their customers pre-loaded commercial databases which can reside along with the library's catalogue and be accessed using a common interface. Pre-loaded databases are similar to CD-ROM databases in that the data have been prepackaged by the vendor for consumer use. Issues surrounding the packaging of the data, such as the number and type of access points (i.e., indexing) and the data output formats, are important only when comparing databases from different vendors. The customer typically has no control over the manner in which a commercial database is accessible through a vendor's integrated system. Commercial databases on CD-ROM or pre-loaded by a vendor may not be suitable for many institutions because of expensive licensing fees, limited access, or just poor packaging of the data. Another alternative to acquiring commercial databases on CD-ROM or from an integrated library system vendor is to purchase the databases on magnetic tape and mount them using a DBMS such as SPIRES, BRS, or BASIS. + Page 52 + Stanford University, Rensselaer Polytechnic Institute, and Memorial University of Newfoundland use the SPIRES DBMS (developed by Stanford University) to provide access to both the library catalogue and to commercial databases. Princeton University, Syracuse University, University of British Columbia, Simon Fraser University, and other institutions use SPIRES to provide access to GPO, ERIC, COMPUSTAT, PSYCHINFO, GROLIER ACADEMIC AMERICAN ENCYCLOPEDIA, and other commercial databases. The remainder of the article will describe various issues associated with the local mounting of commercial databases and how SPIRES addresses and accommodates these issues. 2.0 Analyzing and Loading a Commercial Database Except for the U.S. MARC Communications Format there are no existing standards for the dissemination of commercial databases. A survey of a small number of commercial databases reveals that databases distributed on magnetic tape are written using either the ASCII or EBCDIC character set. They may be comprised of fixed or variable length records, and they may or may not represent diacritics following the American Library Association's standard. Given that these databases can be characterized as containing full-text, numeric, bibliographic, or other types of data, even the identification of a "record" or a "field" is not that straightforward. For example, what constitutes a record in ISI's CURRENT CONTENTS database? Is it the journal issue or the article within the issue? In the GROLIER ACADEMIC AMERICAN ENCYCLOPEDIA database a paragraph of an article and not the article constitutes a record. The loading of the database is the transformation of the original data into a format required by the DBMS. During the initial examination of the data the analyst is formulating a model of how the data will be represented in the DBMS. The primary factor determining how the data are stored is the DBMS's ability to accommodate the data. For example, MARC records contain the hexadecimal character code '1F' to indicate the start of a subfield or may contain hexadecimal characters representing diacritics. If the DBMS cannot store these characters, some form of data transformation must take place. The same is true of graphic images. + Page 53 + Ideally, the DBMS should preserve the original content of the data as supplied by the database vendor. The SPIRES load procedure is designed to accommodate the broad spectrum of data types supplied by commercial database vendors. Following the creation of a description of the database for SPIRES (i.e., the "file definition") there are two ways to "batch" load a database into SPIRES: writing a computer program to convert the data to the SPIRES input format or writing an input load procedure using SPIRES formats language. 2.1 Writing a Computer Program to Convert the Data The first method of loading data is to write a computer program that will convert the original data into SPIRES "input format." SPIRES input format identifies the start and end of a record, field, subfield, etc. A sample entry for the 245 MARC tag would be as follows: 245 = (10 aGone with the Wind.); In this example, "245" is the field name, the parentheses surround the value of the field, and the semi-colon is the end- of-field terminator. SPIRES will load anything found within the parentheses including the hexadecimal code "1F," which is stored after the "0" in the above example. 2.2 Writing a Load Procedure Using the SPIRES Formats Language The second method of loading data is to write a input load procedure using the SPIRES formats language. This load procedure will read in data from an external file and parse it into records, fields, subfields, etc. For an application which requires a lot of coding or parsing (e.g., a MARC record) it is probably easier to write a computer program using PL/1 than to do the equivalent using the SPIRES formats language. + Page 54 + 3.0 Indexing SPIRES provides the entire range of indexing options available in most DBMSs, including keyword, phrase, date, and coded indexes. SPIRES also provides a "personal name index" which is designed to accommodate simultaneously both a "first name surname" and "surname, first name" name search. A search for "John Smith" or "Smith, John" will both retrieve the same records in a personal name index search. Index names can have aliases associated with them. For example, someone accustomed to always using "FIND NAME" to search for individuals in every database can have "NAME" added as an alias for a "FIND ARTIST" search in a fine arts slides database or as an alias for "FIND FONDS" search in an archival and manuscripts database. ("FONDS" is the equivalent of "MAIN ENTRY" for archivists.) In the creation of an index, you specify to SPIRES the fields which will be included in the index. You also specify through actions called "PASSPROCS" how the index term will be created from the input data. For example, you can specify a list of stop words (terms which will not be indexed), or indicate that you don't want to include punctuation in the index term. Another important feature of SPIRES involves the ability to transform an index file into a separate database and associate additional information with each index record entry. In addition, SPIRES uses action statements called SEARCHPROCS that allow you to take a search term and process it through, for example, a thesaurus file, to determine the proper form of the search term. The SPIRES $REPARSE SEARCHPROC will then take this converted search expression and execute it. The use of SEARCHPROCS and $REPARSE to process and transform search statements is one of the methods of creating database linkages in SPIRES. Database linkages result in the delivery of value-added packaging of information. + Page 55 + Consider the following example of the implementation of the EXPLODE command on a sample MEDLINE file at Memorial University of Newfoundland. The EXPLODE command enables you to retrieve all the subordinate subject entries associated with a Medical Subject Heading (MeSH) term. MeSH terms are part of a hierarchical subject classification. An index is created from the MeSH database with the heading being the key of the record. Each index record also contains a concatenated list of MeSH tree numbers associated with the heading. When a patron performs an EXPLODE search (e.g., "FIND EXPLODE ABO FACTOR") on the MEDLINE bibliographic database SPIRES first looks up the heading in the MeSH heading index, retrieves a list of MeSH tree numbers, and appends a truncated search character to each tree number. This OR'd list of tree numbers is passed back to SPIRES, which then re-executes a new search on the tree number index which is built from the MEDLINE database. The above model of database linkages can be applied to any commercial database which has an associated machine-readable thesaurus or classification system (e.g., ERIC and PSYCINFO). It is also useful in multilingual database applications where a multilingual dictionary could be used by SPIRES to transform a search term into an OR'd set of corresponding search terms for each language. For example, a "FIND SUBJECT SOCIAL SCIENCES" search in the MICROLOG (Canadian Research and Report Literature) database would also retrieve all of the french records with the term "SCIENCES SOCIALES." 4.0 Data Output SPIRES data output, as with indexing and searching, has associated with it a range of actions which enable you to transform the data as per your requirements. SPIRES provides an almost unlimited variety of ways to output your data, including formatting reports with statistical calculations. Within the SPIRES FOLIO environment, the patron simply specifies the type of output by including a "format name" following the DISPLAY command. + Page 56 + SPIRES formats can do much more than simply provide brief, full, or MARC output. If the patron's workstation on a network can accommodate the display of diacritics, the user can specify a format which includes these characters. A format can also look up and display information from a database other than the one being searched. This ability provides the framework for linking journal holdings information to commercial databases. As part of displaying a citation, the format looks up the journal title, ISSN, or other key in a file containing a list of journals held by the library and adds a holdings status message. The SPIRES SAVE command allows you to write the formatted results of a search to a file. The SAVE command enables a patron to search a numeric database (e.g., COMPUSTAT) and output the data for input to a statistical package. Similarly, it allows users of a full-text database to output a true reproduction of an article, in contrast to obtaining a copy of the article using the screen dump procedure. Finally, it can be used to output bibliographic records for input to a micro-based DBMS. 5.0 Conclusion The SPIRES DBMS has served librarians for over a decade. It is now used primarily to create local databases and to mount commercial ones. Because of SPIRES ability to handle MARC records, institutions like Rensselaer Polytechnic Institute and Memorial University of Newfoundland are developing fully functional integrated library systems with linkages to commercial databases. SPIRES functionality and versatility as illustrated in this article insure that SPIRES will continue to meet the evolving needs of the library community. + Page 57 + About the Author Slavko Manojlovich Assistant to the University Librarian for Systems and Planning Memorial University of Newfoundland St. John's, Newfoundland A1B 3Y1 Canada BITNET Address: SLAVKO@KEAN.UCS.MUN.CA ----------------------------------------------------------------- The Public-Access Computer Systems Review is an electronic journal. It is sent free of charge to participants of the Public-Access Computer Systems Forum (PACS-L), a computer conference on BITNET. To join PACS-L, send an electronic mail message to LISTSERV@UHUPVM1 that says: SUBSCRIBE PACS-L First Name Last Name. This article is Copyright (C) 1990 by Slavko Manojlovich. All Rights Reserved. The Public-Access Computer Systems Review is Copyright (C) 1990 by the University Libraries, University of Houston. All Rights Reserved. Copying is permitted for noncommercial use by computer conferences, individual scholars, and libraries. Libraries are authorized to add the journal to their collection, in electronic or printed form, at no charge. This message must appear on all copied material. All commercial use requires permission. ---------------------------------------------------------------- + Page 77 + ----------------------------------------------------------------- Molholt, Pat. "The Libraries at Rensselaer Implement Access to Information Beyond Their Walls," The Public-Access Computer Systems Review 1, no 3. (1990): 77-82. ----------------------------------------------------------------- 1.0 Introduction Rensselaer Polytechnic Institute began automating its libraries some ten years ago. The choice of SPIRES was driven both by its functionality and its cost. With no increased funding available for automation, the library administration sought a tool that afforded maximum control over the development of systems while, at the same time, had a manageable price tag. Currently, our system, which has the trademarked name "InfoTrax," has nine sub-systems. SPIRES has successfully handling every challenge we have put to it in this complex system development effort. These accomplishments were shepherded through the design, implementation, and evaluation processes by a design team of four librarians and a programmer/analyst. One programmer/analyst has been entirely responsible for the programming and maintenance of our system. Three individuals have held that position over the years with no loss to our progress in the transitions. 2.0 InfoTrax Subsystems InfoTrax has the following subsystems: (1) Acquisitions, (2) Catalog, (3) Circulation, (4) Commercial Index and Abstracts, (5) Library News, (6) Message, (7) Reserves, (8) Serials Check-In, and (9) Campus Information (this is described in section 3.0). Although the general system is freely accessible and requires no passwords, several of the files do require Rensselaer affiliation. When users access a restricted file they are prompted for an authorization code. The commercial index and abstract files, IEEE and Current Contents, fall in this category. + Page 78 + 2.1 Acquisitions Subsystem The Acquisitions subsystem includes fund accounting and an interface to Rensselaer Polytechnic Institute's accounts payable system. Orders are generated by the system and records for items on order are listed in the catalog. 2.2 Catalog Subsystem The Catalog Subsystem merges all MARC record types in one file. This file can be searched with full Boolean logic applied to numerous fields, including author, title, subject, publisher, date, subject, collection, call number, material type (e.g., journal, conference, and software), and status (in circulation or available). 2.3 Circulation Subsystem In the Circulation Subsystem, item level records are linked to the catalog with real-time updating of circulation activity, including relocating items to the reserve collection and the transfer of whole call number ranges to a different library. In addition, the floor and sub-collection are noted for each item in the collection. 2.4 Commercial Abstract and Index Subsystem The Commercial Abstract and Index subsystem contains citation files that are linked by call number to the Catalog subsystem. Patrons can use "Photocopy" and "Interlibrary Loan" commands to electronically route their requests for materials found in citation files to the appropriate library unit. 2.5 Library News Subsystem The Library News subsystem contains the library's hours and service announcements. + Page 79 + 2.6 Message Subsystem The Message subsystem is used for acquisitions recommendations, reference questions, and other types of patron requests. Users from around the United States and several foreign countries have used MESSAGE to offer critiques of the system or ask for users assistance. Fortunately, no one has tried to use it for direct borrowing requests. We'd have to say no to them at this point. 2.7 Reserves Subsystem The Reserves subsystem records class lists of both library and non-library materials that are searchable by course name or number, course nickname, and instructor. Non-library materials are organized by folders with the contents listed for easy identification by users. 2.8 Serials Check-In Subsystem The Serials Check-in subsystem interfaces between the MicroLinx system and the catalog, providing issue level availability information in the catalog. Each night the day's check-in activity is automatically transferred between the networked microcomputer and the mainframe-based InfoTrax system. 3.0 Campus Information Campuses are rife with information that is critical to students, faculty, and staff. Good access to that information has been a long standing problem for many of us. Campus-Wide Information Systems (CWIS) are springing up in an effort to bring both control and organization to a wide range of internal information. Librarians have not typically taken a leadership role in these efforts even though, among campus professionals, librarians are singular in their training in the organization of information. In this context, for the past eighteen months the library's design team has turned its attention to the concept of a library without walls by opening up the definition of "library information." Specifically, the group has begun working with several campus units to bring existing information of broad campus interest into the InfoTrax system for dissemination. + Page 80 + 3.1 Telephone Directory File Our first project was the campus student, faculty, and staff telephone directory. Compiled from the Registrar's, Human Resources', and Telecommunications' files, the Telephone Directory File is searchable by name, department, building, and rank or school year. Individuals will be able to "update" their own records in this file. In actuality, the requested changes will move electronically, field by field, to the office responsible for maintaining the authoritative file for that information segment. The actual corrections will be fed back into a central file for the campus to draw on as needed. No more changing your address in six different places. 3.2 Undergraduate Research Program File The next file we mounted takes its structure from the Telephone Directory File. The Undergraduate Research Program File contains the research interests of faculty who would like to have undergraduates on their research teams. This file can be searched by subject area, department, and faculty name. 3.3 Contracts and Grants File The Contracts and Grants unit found it necessary to cease publication of its newsletter, which announced funding opportunities compiled from many sources. The library has designed an electronic version in its place. The Contracts and Grants File will be augmented with direct downloads from the commercial Legi-Slate database, subscribed to by yet another office on campus. As with all of the cooperative files, the content of the file is "owned" by the contributing unit, which is also responsible for maintaining the file. The library provides some basic mechanisms for the units to facilitate the updating and editing of their files. Cooperation is becoming a watchword in the information environment of Rensselaer. + Page 81 + 3.4 Office of News and Communications File In the Fall of 1990, the Office of News and Communications file will make available the full text of all Rensselaer Polytechnic Institute's press releases. This file will be searchable by any word in the text as well as by a standardized list of units, departments, and schools within the university. It is anticipated that some local newsrooms will choose to obtain their press releases by accessing InfoTrax. 4.0 Conclusion We also have plans to provide electronic access to the undergraduate and graduate catalogs, the student handbook, class hour course schedule, bookstore holdings, and other similar files. However, for the moment, we will be concentrating on mounting the first campus-wide link ever installed by United Press International. Our agreement with UPI provides all of their national, international, business, finance, and sports news simultaneous with their broadcast to newsrooms and other commercial customers around the world. We are excited about developing a SPIRES program allowing users to design their own "newspapers." As with all of our files, UPI will be on the campus mainframe and available throughout the campus on the variety of networks supported at Rensselaer. I mentioned that cooperation was a key word at Rensselaer. There is another term that is important to the design team--fun. The group truly enjoys the process of design and has yet to find a challenge it cannot handle. We intend to keep looking! + Page 82 + About the Author Pat Molholt, Associate Director of Libraries Folsom Library Rensselaer Polytechnic Institute Troy, NY 12180-3590 (518) 276-8300 Pat Molholt has been responsible for Rensselaer Libraries' automation since 1978. In addition to her library duties, she is a doctoral student in artificial intelligence and lexicography. She is co-editor of the newly released work, Beyond the Book: Extending MARC for Subject Access. ----------------------------------------------------------------- The Public-Access Computer Systems Review is an electronic journal. It is sent free of charge to participants of the Public-Access Computer Systems Forum (PACS-L), a computer conference on BITNET. To join PACS-L, send an electronic mail message to LISTSERV@UHUPVM1 that says: SUBSCRIBE PACS-L First Name Last Name. This article is Copyright (C) 1990 by Pat Molholt. All Rights Reserved. The Public-Access Computer Systems Review is Copyright (C) 1990 by the University Libraries, University of Houston. All Rights Reserved. Copying is permitted for noncommercial use by computer conferences, individual scholars, and libraries. Libraries are authorized to add the journal to their collection, in electronic or printed form, at no charge. This message must appear on all copied material. All commercial use requires permission. ---------------------------------------------------------------- + Page 44 + ----------------------------------------------------------------- Parker, Bo. "An Overview of SPIRES and the SPIRES Consortium." The Public-Access Computer Systems Review 1, No. 3 (1990): 44-50. ----------------------------------------------------------------- 1.0 Introduction SPIRES is the Stanford Public Information REtrieval System, a sophisticated information retrieval and database management system. It has been used at Stanford and over forty other research centers and academic institutions within the SPIRES Consortium for more than 15 years. Applications that have been written in SPIRES range from library catalogs to electronic messaging systems. It is the principle database management system in use on the central computer system at Stanford for research, instruction, and administration. 2.0 The SPIRES Consortium Written and developed initially at Stanford University, SPIRES has subsequently been licensed for use at over 40 other university, research, and government institutions. Together with Stanford, these institutions comprise the SPIRES Consortium, a non-profit association created expressly for the maintenance and development of the SPIRES software, consulting and installation support, user forums, and training and instruction. Membership in the Consortium provides access to SPIRES--a tool comparable in power to database management systems costing over 10 times more than the membership fee--and access to shared applications from the members. Sharing is one of the great success stories in the Consortium. For example, Memorial University of Newfoundland is in the final stages of creating an integrated library system, which incorporates modules borrowed from Stanford University, Princeton University, and Rensselaer Polytechnic Institute. Memorial was up and running with an online catalog only six months after joining the Consortium. A circulation module obtained from RPI was added later; the modular nature of SPIRES made it easy for Memorial to modify the OPAC to include circulation information. + Page 45 + 3.0 Capabilities of the SPIRES Software SPIRES is a general-purpose DBMS. You may create flat files, hierarchical files, relational, or network files. You may retrieve information sequentially, or through an index. You may store information in any form, and enter or display it in a different form. SPIRES is flexible. You may build into your files the restrictions that different groups of users must follow. You may define many different views (schemas) for your files, either for convenience or security. The users of your files may change or refine their view of the file at their convenience, within the bounds of the restrictions you place on the file. SPIRES is integrated. You define and create your database interactively, without intervention from a database administrator. You define the user views and user dialogs for your files through SPIRES, not through COBOL or PL/I programs. You create sophisticated reports from any of your files with the SPIRES report writer, and refine them interactively. You set up a full-screen dialog for data input and inquiry with the SPIRES screen definer. 4.0 Creating Databases Using SPIRES A file definition describes each SPIRES file. The definition divides the database into different logical record types, and names the elements in each record. (An element to SPIRES is a field in a record to other systems.) Elements are known by their names to SPIRES, not by position or cryptic mnemonic. The definition for each element also includes its relationship with other elements; the encoding, decoding, and validation to be performed on its contents; and any restrictions on who may see, search, and update the element. This sort of record is called a goal record, since it is often the goal of a search. + Page 46 + You can also use SPIRES to define index records. These records have the same general form as goal records, but contain information that SPIRES extracts from elements in the goal record that you have designated. SPIRES can use these records to locate goal records very efficiently, usually with as few as five records read from disk to retrieve a goal record from a seven million record file. Moreover, since index records have the same form as goal records, you can treat them as such, and examine and manipulate data in them. An index record can even be another goal record in the file, allowing you to build relationships between different files. SPIRES can set up simple databases with little more information than the names of the elements. You can exercise complete control over the level of detail contained in the file definition. You need only learn as much of it as you need to fit the complexity of your application. The entire process is interactive; you can define, test, refine, and implement a simple database in less than an hour. Once your database is loaded, you can still make many changes to it. You can add additional elements, change or add validation rules, or add or remove indices, all without reloading the data. 5.0 Entry and Display of Information You control the entry and display of information in SPIRES with the FORMATS language. Formats give you flexible control over the form of your input and output, and are used to provide or enforce different user views of your file. Some sophisticated "system formats" can be used with any file to give you this flexibility, with little or no time invested in design and implementation. Some examples are the SPIRES report writer, the prompting input format, and the screen definer. Your input can be in SPIRES standard format, columnar format, or free form. Your input may come from disk or tape files, from a line-by-line prompt at your terminal, or from a full-screen menu on a display terminal. Special tools are available for building extremely large databases quickly and efficiently in batch mode. You may arrange your output in any form for a printed report, a disk or tape file, a display on a full-screen terminal, or as input to another processor (e.g., SCRIPT or SAS). + Page 47 + 6.0 User Interfaces SPIRES provides four user-interface environments: (1) the native SPIRES command language; (2) the Prism environment for transaction processing, searching and report writing; (3) the Folio environment for search, browse, and display of textual data; and (4) Remote SPIRES for access to SPIRES databases over networks such as BITNET or Internet. The rich SPIRES native command language is made up of English words, such as SELECT, FIND, SHOW, and EXPLAIN. The database owner and the end user alike use these commands to: (1) select a database and have its contents and organization explained; (2) search a database, either using an index or sequentially; (3) display records retrieved by a search; (4) choose among input, output, and report formats; (5) create, update, or delete records in the database; and (6) ask for online assistance with HELP, EXPLAIN, and TUTORIAL commands. Either the database owner or the end user can tailor a particular application. A procedural language is provided so that a packaged set of SPIRES commands can implement new, higher-level commands for the user, or carry on a dialog with the user and issue SPIRES commands to carry out the requests. The Prism and Folio environments are designed to allow end-user access to applications without heavy investments in training. Both environments have rich built-in help facilities, options for guided (inexperienced user) versus command (experienced user) modes, and the ability to chain a series of commands together to bypass screens. 6.1 Prism Prism is a full-screen application support tool designed for major transaction processing applications. Examples of how Prism is used at Stanford includes the following applications. o NSI (Network for Student Information) Users may look up course, classroom, and student information for their own department. Selected course and classroom information may also be entered into Prism. + Page 48 + o SMAS (Salary Management Administrative System) Authorized staff may look up salaries and other job-related information, or enter proposed salary information and produce salary setting reports. o SNAP (Stanford Network for Accounting and Purchasing) In SNAP files, users may enter purchase requisitions electronically (rather than on paper), look up requisition and payment status information, and look up vendor information. o SUFIN (Stanford University Financial Information Network) The SUFIN files provide a variety of reporting functions for university accounts and expenditure data. 6.2 Folio Folio is the backbone of the online public access catalog in the Stanford University Libraries, where over two million volumes have been added to the library holdings database. Folio is also used to provide public access to general interest applications like JOBS (job openings at Stanford), HOUSING (available housing in the local community), ODYSSEY (research opportunities for students), and special bibliographies like TECHNICAL REPORTS and the MARTIN LUTHER KING BIBLIOGRAPHY. Folio is simple enough for first-time users to walk up to public terminals and successfully complete searches and comprehensive enough to support downloading of data to workstations. 6.3 Remote SPIRES Remote SPIRES is being used at various universities to make local databases accessible to individuals at other institutions without requiring logon to the local host. For example, the HEP (High Energy Physics Preprints) database at the Stanford Linear Accelerator is accessed by physicists from over 100 institutions around the world. Simple, one-line mail messages comprise the "dialog" between the remote user and the Remote SPIRES database. Interactive messages and search results are sent by e-mail to the user. + Page 49 + 7.0 Technical Information SPIRES currently runs on IBM System/370 or plug-compatible mainframe computers under VM/CMS (SP and XA), MVS/TSO, and the less-well-known MVS/WYLBUR/ORVYL and MTS operating systems. A project is currently in progress to convert SPIRES to the C programming language. This effort will position SPIRES to participate in the distributed, client/server environments of the future, as well as expand the range of hardware platforms on which SPIRES will run. 8.0 Conclusion SPIRES is a powerful, flexible database management system that libraries can use to build a wide variety of public-access computer systems. In addition to its native command mode, it provides system developers with three other user interface tools--Prism, Folio, and Remote SPIRES. For more information on the SPIRES Consortium contact the SPIRES Consortium Office at 415-725-1308, or HQ.CON@STANFORD.BITNET. + Page 50 + About the Author Bo Parker Associate General Manager SPIRES Consortium Office Jordan Quadrangle Stanford University Stanford, CA 94305-4136 BITNET: GA.SBP@STANFORD.BITNET ----------------------------------------------------------------- The Public-Access Computer Systems Review is an electronic journal. It is sent free of charge to participants of the Public-Access Computer Systems Forum (PACS-L), a computer conference on BITNET. To join PACS-L, send an electronic mail message to LISTSERV@UHUPVM1 that says: SUBSCRIBE PACS-L First Name Last Name. This article is Copyright (C) 1990 by Bo Parker. All Rights Reserved. The Public-Access Computer Systems Review is Copyright (C) 1990 by the University Libraries, University of Houston. All Rights Reserved. Copying is permitted for noncommercial use by computer conferences, individual scholars, and libraries. Libraries are authorized to add the journal to their collection, in electronic or printed form, at no charge. This message must appear on all copied material. All commercial use requires permission. ---------------------------------------------------------------- + Page 83 + ---------------------------------------------------------------- Piovesan, Walter. "Mounting a Full-Text Database Using SPIRES." The Public-Access Computer Systems Review 1, no. 3 (1990): 83-88. ---------------------------------------------------------------- 1.0 Introduction The demand for enhanced online services has led many libraries to provide users with access to machine-readable indexes and other products in addition to the online catalogue. The proliferation of networks and the merging of two heretofore separate service bureaus--the library and computer services, has facilitated the emergence of new partnerships providing new, improved services. This article describes how the Library and Computer Services of Simon Fraser University worked together to select and mount the GROLIER ACADEMIC AMERICAN ENCYCLOPEDIA database on a mainframe using the SPIRES system. 2.0 Database Selection In the summer of 1986, the Vice President for Research and Information Services at Simon Fraser University, who was responsible for both the Library and Computing Services, called together staff from both units. The Vice President had just returned from the 1986 Education Conference held at Carnegie Mellon University, and he had been impressed with the emerging new library information systems that were being demonstrated there. He requested that a working group be formed to investigate what new types of databases we could provide to the campus, such as index, encyclopedia, dictionary, and directory databases. As Head of the Research Data Library, I was responsible for the collection and maintenance of machine-readable data for the campus community. Consequently, I was asked to head the project and to report back with a list of databases that would be feasible to load onto the campus mainframe. The databases that were identified as being suitable for the initial phase of the project were CURRENT CONTENTS, ERIC, GROLIER ACADEMIC AMERICAN ENCYCLOPEDIA, MEDLINE, and PSYCHINFO. A working team of Wolfgang Richter, a Database Administrator from Computing Services, and myself was formed. We were asked to load the ERIC, GROLIER ACADEMIC AMERICAN ENCYCLOPEDIA, and PSYCHINFO databases on the campus mainframe. All of these databases were subsequently loaded. + Page 84 + The Database Administrator had already designed a menu-driven user interface to a number of applications on our central mainframe: e-mail, word processing, CS Newsletter, and the exam schedule. These services were part of EASYMTS (MTS being our operating system). We decided that we would add an additional level of menus--InfoServe--which would contain an array of library-based services. 3.0 Selection of SPIRES Prior to ordering the Grolier database, we contacted Nancy Evans of Carnegie Mellon University, who provided some key bits of information on how they had approached the task of loading the Grolier database into their STAIRS system. The main point that Ms. Evans stressed was the need for full-text indexing. The Database Manager and myself then met to decide on which of the two campus database management systems--SPIRES or ORACLE-- we would choose to load the Grolier database into. After an examination of the pros and cons of each system, we settled on SPIRES. The main reasons for this decision were that SPIRES had: (1) the ability to easily index on individual words; (2) high- performance characteristics; (3) superior and flexible report generation capabilities; (4) the ability to easily handle large data files; and (5) superiority in handling multiple users on our IBM mainframe computer. 4.0 Characteristics of the Grolier Database The GROLIER ACADEMIC AMERICAN ENCYCLOPEDIA, which is approximately 170 megabytes in size, comes in the form of a single file on magnetic tape. The cost of subscribing to the database is based on size of the institution. There are quarterly updates. 5.0 Pre-Load Activities In late 1986, we ordered a sample copy of the GROLIER ACADEMIC AMERICAN ENCYCLOPEDIA database. The Database Administrator designed a SPIRES database definition (called a FILEDEF) and a report definition (called a FORMATS definition) for displaying search results. The FILEDEF would allow for indexing on every word. We realized that this would make for a lengthy process in loading the full database, but we knew that if the product was to be successful with users it had to be fully indexed. + Page 85 + After giving a demonstration of the Grolier database, we received approval to purchase the full database and proceed to make it available via the expanded EASYMTS service as a part of the InfoServe menu. Once we started to load the full database, we had to make a minor change to the existing FILEDEF. In the initial FILEDEF, each item in the database corresponded to an article; however, this proved problematic with large encyclopedia articles. The FILEDEF was modified so that we would have smaller units of information: paragraphs. The database was indexed on four principal fields: (1) article number (this is mostly useful for the database manager and is used for checking for duplicate articles), (2) article name (3) text type (e.g., bibliographic, tables, and see also references), and (4) word (this is every word in the encyclopedia, excluding the common words like "as," "is," and "to"). 6.0 Loading the Database To ensure that any database errors were identified prior to loading the database into SPIRES, the Database Administrator wrote a series of utility programs. The programs scan the data on tape to ensure that: (1) all the fields are present, (2) fields are properly delineated, (3) there are no duplicate article numbers and that numbers be of the correct length, and (4) the information is the proper sequence as specified by the vendor. (Interested SPIRES users can contact the author to obtain copies of these utility programs, which tend to be specific to the MTS operating system.) There were some initial problems with the database, such as errors in format and improperly delimited fields. We were able to easily identify the errors and correct them prior to loading. Processing the database through our error checking programs added a couple of extra steps to the process, but we found that the extra time spent is well worthwhile as it saves us time in the long run. Although we found errors during the initial database load, the database has been very stable for the past two years. + Page 86 + 7.0 Processing Quarterly Updates The quarterly updates for the Grolier database are processed as follows. First, we copy the tape data to disk and run the above-mentioned checking programs, which alert us to errors that need correcting. This checking is done via utility programs specific to our MTS operating system. Second, we correct any errors and run a FORTRAN program to convert the data into the SPIRES batch-load format. This "tags" the database for loading into SPIRES, somewhat like adding MARC tags for loading bibliographic data into an OPAC. Third, we batch load the data into a test subfile using the SPIBILD program. We briefly check the data with SPIRES for glaring errors, such as duplicate article numbers. Fourth, we run a utility program that: (1) dumps out the data from the test subfile, (2) checks the main database for articles with the same name (the Grolier people do not flag updated material as such--we have to deduce it), and (3) automatically generates the appropriate set of SPIRES REMOVE and ADD commands for SPIBILD. Finally, we run an overnight job so that SPIBILD can process the REMOVE and ADD commands generated in the previous step. We process half of the Grolier database at one time in order to reduce down time as much as possible. It takes approximately 3 hours of CPU time on our 3091 IBM mainframe to process half of the database (the elapsed clock time comes to about 14 hours). SPIRES spends most of the processing time updating the article text index, which is based on individual words used in articles. At the time that we update the database, we insert an edition statement so that when users select the database they will know how current the information in it is. + Page 87 + 8.0 Reactions to the Grolier Database During our initial investigation of the products that we wanted to offer on the InfoServe service there was some skepticism on the part of librarians who felt that students would not be able to properly search the databases and that the GROLIER ACADEMIC AMERICAN ENCYCLOPEDIA would not meet the needs of university students. After three years of using the service and hearing from students that they really find the encyclopedia useful and use it regularly, the librarians have come to appreciate the need for self-serve reference information and are encouraging us to find other products to load, such as dictionaries. There are on average 1,200 searches per month on the GROLIER ACADEMIC AMERICAN ENCYCLOPEDIA database. It has also proved to be very successful with the Education department, which uses the encyclopedia in their courses on computers and information that they give to high school students. These students have no problem in using the service. 9.0 Conclusion Using the SPIRES software, Simon Fraser University has successfully mounted the full-text GROLIER ACADEMIC AMERICAN ENCYCLOPEDIA and other databases. The encyclopedia database has received a warm reception from the university community, and it has proven itself to be a valuable information resource. + Page 88 + About the Author Walter Piovesan Head, Research Data Library W.A.C. Bennett Library Simon Fraser University Burnaby, British Columbia, CANADA BITNET: USERVINO@SFU.BITNET Internet: walter_piovesan@cc.sfu.ca ----------------------------------------------------------------- The Public-Access Computer Systems Review is an electronic journal. It is sent free of charge to participants of the Public-Access Computer Systems Forum (PACS-L), a computer conference on BITNET. To join PACS-L, send an electronic mail message to LISTSERV@UHUPVM1 that says: SUBSCRIBE PACS-L First Name Last Name. This article is Copyright (C) 1990 by Walter Piovesan. All Rights Reserved. The Public-Access Computer Systems Review is Copyright (C) 1990 by the University Libraries, University of Houston. All Rights Reserved. Copying is permitted for noncommercial use by computer conferences, individual scholars, and libraries. Libraries are authorized to add the journal to their collection, in electronic or printed form, at no charge. This message must appear on all copied material. All commercial use requires permission. ---------------------------------------------------------------- + Page 89 + ---------------------------------------------------------------- Ritchie, Mark. "The WatMedia Project." The Public-Access Computer Systems Review 1, No. 3 (1990): 89-95. ----------------------------------------------------------------- 1.0 Introduction The WatMedia Project utilizes the SPIRES software to provide users with access to information about nonprint materials in the collections of 22 members of the Interfilm Group. The WatMedia system is available to authorized users on BITNET and other networks. 2.0 The Need for the WatMedia Project The WatMedia Project was begun by the Media Library of the University of Waterloo in 1974 in order to improve both patron and staff access to information about non-print resources. A brief analysis of the access problem showed that the prime area of difficulty was in the information retrieval interface, both between user and library and between library staff and source collections. Media resources tend to be invisible in the sense that they have no index or table of contents through which the potential user may browse. The obvious answer was a catalogue of some sort. A detailed analysis of user statistics revealed that most materials were being used for purposes far different than those for which the materials were originally produced. The wide scope of these uses was such that it persuaded us that any cataloguing system adopted must consider and cater to these uses as well as more traditional uses. The problem of browsing became a prime consideration during our investigations, as the standard bibliographic information was judged to be inadequate by our users and the costs in time and labour for users to view many different titles in order to find the one title applicable to their needs was exorbitant. We decided that the system should, in effect, create a table of contents and an index for each item, in addition to the information found in the standard bibliographic reference. A cataloguing system needed to be developed to act as an access point to the collection. In deciding on a system there were several factors that need to be considered. These were: (1) ease of use for faculty and students; (2) impact on the organization of the collection; (3) impact on staffing levels in the media library; (4) currency of information and ease of updating; (5) comprehensiveness of entries and indexes; and (6) cost. + Page 90 + Our conclusions were that an analytical catalogue should be devised and that this new catalogue be based largely on an existing cataloguing system if possible. The system which was finally chosen was the one in use, at that time, by the British National Film Archives. The Waterloo Media Cataloguing System is based on this system, with extensive modifications to permit efficient use in a computerized environment. However, the basic philosophy behind the two systems is the same, only the means of recording the data and the means of accessing it are essentially different. 3.0 Selection of SPIRES The next step was the choosing of an appropriate method of retrieval of the information from the computer. The existing retrieval methods available on our campus were primarily of the sequential search variety. We did not want to use this method due to the high costs involved when searching large databases. Therefore, we endeavored to find a system using some form of tree-structured indexing. We also determined at an early stage that the primary access point to the system would be online and that hardcopy catalogues would be of secondary importance. We came to this conclusion because of the unusually high availability of computer access at the University of Waterloo. The results of our search indicated that the best choice was the Stanford Public Information REtrieval System, or SPIRES for short. SPIRES was initially chosen by the University of Waterloo for the WatMedia project because nothing else was available that had the potential to handle our projected requirements. One of the main factors which influenced our decision was the ability of SPIRES to handle large multiple indexes efficiently, something that competing systems could not do. SPIRES also allowed the system designer to modify the file definition for the database without necessarily having to rebuild the whole database. Despite this "Hobson's Choice" we have never regretted the decision. Only now are some of SPIRES's features being implemented by other systems, and some features, like remote access capability, have yet to be implemented by these systems. + Page 91 + 4.0 The WatMedia Database The original WatMedia database has been expanded to become a union catalogue for the twenty-two universities, colleges and institutes of the Interfilm Group. It also contains extensive listings of the holdings of commercial distributors and libraries. The catalogue is basically a title main entry format with a number of classed and alphabetical indexes: (1) title catalogue; (2) subject indexes; and (3) biographic index and analytic index. These form the permanent catalogue, but there is also preliminary catalogue data maintained. In the preliminary catalogue such information about an item that can be readily obtained--accurate or inaccurate--is immediately entered. As soon as possible the item is viewed, further information is obtained, the existing information verified, and the record is modified and placed in the permanent catalogue. The rules governing entry are the same for both catalogues. In a sense an entry is never complete. As more information on a particular item or person becomes necessary, it is sometimes required that records which may not have been touched for years need to be updated. This is particularly true when persons who may have been involved in a production in a minor role become important as their careers develop and adjustments must be made to update the indexes to make it possible to retrieve as complete a filmography as possible on that person. The fundamental difference between most other published rules and the rules we use is the way title entries are handled. Since nonprint materials are most commonly identified by title, we feel that they should be entered under title. The preliminary rules of the Library of Congress and UNESCO recommended that each language version of an item should be entered under the title of the version in hand, which follows the recognized procedure for book cataloguing. However, it is felt that the title and credit frames of nonprint items (film and video in particular) cannot be treated with the respect traditionally accorded to the title page of a book, since they may be in any language and subject to no recognized principles of accuracy. Therefore we enter all materials under the original title of release and, so far as possible, in the language of origin. This principle has also been adopted by the Aslib committee and is recognized by the International Federation of Film Archives (FIAF). + Page 92 + In order to have a system which is largely compatible with a recognized international standard, these rules have been developed from those used by the British Film Institute's National Film Archives, which have been adopted by many other national film archives around the world. Philosophically, our rules remain substantially unaltered from the original British rules, but they still cannot be considered as definitive. Further revision may be necessary as new technical developments appear. Since the first prototype version was produced in 1975, many procedures that were originally designed for manual systems have been rethought for the computer's online environment. Discussions with librarians and archivists in some 23 countries have resulted in a major change in the handling of items. Title main entries are still used; however, instead of making separate entries for each copy of each title or version of a title, we make a generic entry covering the original version, with separate collations for each copy of each version in the same record. It should be noted that WatMedia was designed for a university academic and research library situation and, as such, is much more elaborate and comprehensive than any such system required by a public library or lower school application. 5.0 Searching WatMedia To search the WatMedia database, the user sends a "find" command to the system. The basic syntax of this command is: find [index name] [value] For example, to find the film Citizen Kane, the user would enter: find title Citizen Kane + Page 93 + Table 1 shows the basic indexes that are available. ----------------------------------------------------------------- Table 1. Some Selected Indexes ----------------------------------------------------------------- Index Valid Index Names Title T, TI, TIT, TITL, TITLE Subject SB, SUB, SUBJ, SUBJECT (Synonym) Dewey Decimal DC, DCL, DCLASS Person NAME, PERSON Place COUNTRY, PL, PLACE Distributor D, DIST, DISTR, DISTRIBUTOR Sponsor SP, SPON, SPONSOR Audience AUDIENCE, LEVEL, TARGET Language LANG, LANGUAGE ----------------------------------------------------------------- The system has a rich assortment of other searching capabilities; however, this topic beyond the scope of the current article. Users can send the "manual" command to retrieve WatMedia's user's guide. 6.0 Access to WatMedia via Remote SPIRES A relatively recent addition to SPIRES is a utility called Remote SPIRES. This tool was originally a set of CMS execs and XEDIT macros; however, once its viability had been demonstrated, it was recoded in SPIRES' own procedural language. From a developer's perspective, Remote SPIRES is relatively straightforward to implement. Any application can be installed as a remotely accessible database. The inquiry language used for remote queries is the same as the inquiry language used for local queries, so anyone familiar with SPIRES should be able to use a remote application with little or no training. Those not already familiar with SPIRES should be pleased to see that the inquiry language is not terribly arcane. Remote SPIRES implementers are encouraged to support at least the two standard views of the data, brief and full. If this suggested standard is adhered to (as WatMedia does), a user familiar with one Remote SPIRES application can easily use a different application. + Page 94 + For a SPIRES application to become remotely accessible, it is necessary that it be installed in a server. WatMedia currently has a dedicated server, named appropriately enough, "WatMedia." While it is possible for the system administrator to authorize all users at all nodes, we do not do that. Such authorization disables the transaction accounting features of the remote server and, at this point in time, we wish to have these statistics. Using wildcards, everyone at a given network node can be authorized, but this limitation does mean that anyone seeking access must first have their machine made known to the server. Anyone needing Remote SPIRES access to WatMedia can contact the author for authorization. Authorized users with an account on a computer connected to BITNET can search WatMedia either with interactive messages or e- mail messages. Users on other networks can use e-mail messages. If the user sends an e-mail message, the message should only contain a one-line command, such as "find title Blade Runner." It should be noted that the server's default method of returning information to the requester is via an e-mail message. 7.0 Conclusion The WatMedia system provides its users with dramatically improved access to information about the nonprint holdings of the members of the Interfilm Group. The system was developed using the SPIRES software, and this software has proven itself capable of meeting the evolving software development needs of the WatMedia Project. + Page 95 + About the Author Mark Ritchie University of Waterloo Library 200 University Ave. W Waterloo, Ontario N2L 3G1 Canada (519) 888-4070 BITNET: avfilm@watdcs.UWaterloo.ca ----------------------------------------------------------------- The Public-Access Computer Systems Review is an electronic journal. It is sent free of charge to participants of the Public-Access Computer Systems Forum (PACS-L), a computer conference on BITNET. To join PACS-L, send an electronic mail message to LISTSERV@UHUPVM1 that says: SUBSCRIBE PACS-L First Name Last Name. This article is Copyright (C) 1990 by Mark Ritchie. All Rights Reserved. The Public-Access Computer Systems Review is Copyright (C) 1990 by the University Libraries, University of Houston. All Rights Reserved. Copying is permitted for noncommercial use by computer conferences, individual scholars, and libraries. Libraries are authorized to add the journal to their collection, in electronic or printed form, at no charge. This message must appear on all copied material. All commercial use requires permission. ---------------------------------------------------------------- + Page 4 + ----------------------------------------------------------------- Troll, Denise A. "Library Information System II: Progress Report and Technical Plan." The Public-Access Computer Systems Review 1, No. 3 (1990): 4-29. ----------------------------------------------------------------- ----------------------------------------------------------------- Note from the Editor: This article has been condensed from a Carnegie Mellon University Libraries technical report--Library Information System II: Progress Report and Technical Plan, Mercury Technical Report Series, Number 3. To obtain a copy of the full printed report, send a check for $5 to: Mercury Documents Coordinator, Administrative Offices, Carnegie Mellon University Libraries, Frew Street, Pittsburgh, PA 15213. ----------------------------------------------------------------- Abstract This article describes the work at Carnegie Mellon University in library automation and information retrieval systems. Specific projects include: broadening the range of electronic bibliographic resources by adding databases and expanding the range of stand-alone CD-ROM databases; deepening access to book resources by enhancing catalog records, and adding contents information for scientific and technical proceedings and book reviews to the online catalog; designing a new library information system (LIS II) on a hardware and software platform that demonstrates the feasibility of distributed library systems running on UNIX workstations; and building image databases for the delivery of full-text documents. The Library Information System II provides for retrieval from several DEC VAX servers using Z39.50 layered on TCP/IP, a search engine from OCLC called Newton, a pilot user interface in OSF X.11 Motif, and an authentication system based on Kerberos and Hesiod developed at MIT. The system is being built to existing and proposed standards, and it is designed to be machine independent. A system which distributes databases over a number of file servers will thus be affordable to a wide range of libraries. This article address a number of technical and design issues and concludes with an outline of the research and development agenda for the coming year. + Page 5 + 1.0 Background In 1988, Carnegie Mellon proposed building the Library Information System II, a state-of-the-art electronic library capable of delivering a broad range of bibliographic and textual information to students and scholars. LIS II would be a second- generation system of the highly successful Library Information System currently in place in the University Libraries. In addition to support from the Pew Memorial Trust, the LIS II project also receives support from the Digital Equipment Corporation, the American Association for Artificial Intelligence, the Online Computer Library Center (OCLC), and Carnegie Mellon University. 1.1 General Goals The four major goals for LIS II are: (1) expand the breadth and depth of library information available over the campus network, focusing first on expanded coverage of bibliographic information and later on the delivery of the full text of documents; (2) to provide more information about the contents of books by indexing and retrieving the table of contents; (3) to use the capabilities of advanced workstations to improve retrieval, interfaces, and reduce the cost of a large scale retrieval system; and (4) to document and disseminate the results of our work so that if we are successful, our innovations can be diffused within academia. This report discusses progress toward each of these goals. 1.2 General Architecture Moving information retrieval from a mainframe computer to multiple server machines requires considerable planning and changes in hardware and software. A special computer will be used to build LIS II databases, and special machines will be used as database or retrieval servers. All computers on the campus network or with access to the campus network will have access to LIS II. Workstations and X Windows terminals in the University Libraries and workstations in offices and public computing clusters on campus will run the graphical interface currently being built. Users of other personal computers, like the IBM PC and Apple Macintosh, will run a terminal interface similar to the current LIS I interface. + Page 6 + 2.0 Improving Electronic Resources As the new hardware and software platform is being designed and developed, we are making significant improvements in our electronic resources. We are expanding resources in the existing Library Information System, adding stand-alone databases on CD- ROM, and providing more information about the contents of books we acquire. To expand the breadth of our electronic collection, we have purchased databases from commercial vendors, and are exploring the production of databases from local resources. We are also negotiating with publishers to acquire machine-readable journals and technical reports. To expand the depth of our collection, we have designed and implemented several projects to enhance our catalog records for books and technical reports. Each of these developments is discussed briefly below. Whenever possible, additions to the collection are made available to campus as quickly as possible through the current Library Information System, LIS I, so that usage and impact can be monitored and thus contribute to the design of LIS II. 2.1 Expanding the Breadth of the Electronic Collection We have broadened the scope of our electronic collection by purchasing commercial databases, by acquiring machine-readable text to be mounted locally as databases, and by designing a system architecture that will facilitate the integration of locally produced databases, e.g., Carnegie Mellon administrative databases, into LIS II. + Page 7 + 2.1.1 Commercial Databases To make the best use of our human resources, while developing the distributed retrieval architecture detailed in Section 3, we limited the addition of commercial databases available through the Library Information System to those needed for user tests and planning. We purchased INSPEC (Information Services for Physics, Electronics, and Computing), 1987-present, on magnetic tape and released it to campus (LIS I) in November 1989. INSPEC corresponds to four printed publications: Physics Abstracts, Electrical and Electronic Abstracts, Computer and Control Abstracts, and Update on Information Technology (IT Focus). INSPEC was well received by the physics and engineering communities at Carnegie Mellon. More than 1,700 searches were conducted in this database in May 1990, with an average of 1,900 searches per month since January. Transaction logs of INSPEC searches were used to construct a model of how users search a large, complex database (see Section 3.1.1.4 "Search Complexity and Performance" for details). In the interest of immediate improvements in resource availability and recognizing that not all databases need to be online on the campus network, we expanded our electronic resources by acquiring a number of CD-ROM products. Eventually we want to provide network access to CD-ROM databases, with the delivery mechanism transparent to the user. The following CD- ROMs have been added to the University Libraries' collection since July 1988. + Page 8 + ----------------------------------------------------------------- Table 1. CD-ROM Databases Added Since July 1988 ----------------------------------------------------------------- CIRR (May 1990) Bibliographic citations and abstracts of company and industry research reports provided by securities and investment firms. Art Index (April 1990) Bibliographic citations of journal articles, yearbooks, and museum bulletins in all areas of art. Compact Disclosure (April 1990) Financial and management information on public companies. COMPENDEX (April 1990) Citations of articles, conference papers, and monographs in all aspects of engineering and related areas. PAIS (April 1990) Bibliographic citations of journal articles, books, and government documents in public affairs. COMPUTSTAT (March 1990) Financial and statistical information on public companies. CD-MARC (October 1989) Library of Congress subject authority file and subject headings. MathSci (October 1989) Reviews and citations of the world's research literature in mathematics and related areas. NTIS (September 1989) Bibliographic citations and abstracts of government-sponsored research and development reports. ----------------------------------------------------------------- + Page 9 + The following CD-ROMs are also available. ----------------------------------------------------------------- Table 2. Other CD-ROM Databases ----------------------------------------------------------------- CIS Masterfile (Test Copy) Bibliographic citations and abstracts of congressional publications. Statistical Masterfile (Test Copy) Bibliographic citations and abstracts of statistical information from various publishers. Social Science Citation Index (Test Copy) Bibliographic citations of journal articles in the social sciences. PsycLit (March 1988) Journal article citations and abstracts in all areas of psychology. ABI/Inform (January 1988) Journal article citations and abstracts on business. Dissertation Abstracts OnDisc (August 1987) Bibliographic citations and abstracts of dissertations in all subject areas. Books In Print Plus (July 1987) Bibliographic citations of books (in print and forthcoming) in all subject areas. ERIC (July 1987) Bibliographic citations and abstracts of journal articles and research reports in education. ----------------------------------------------------------------- + Page 10 + 2.1.2 Machine-Readable Text In preparation to begin experiments with the delivery of full- text documents, we are acquiring machine-readable journals and technical reports in the subject field of computer science. We have negotiated with several leading publishers to include their materials online. Elsevier, Pergamon, and the Association of Computing Machinery (ACM) are willing to give us access to their materials. The ACM has committed to providing machine-readable versions of four of its publications: Computing Reviews (10 years), Collected Algorithms (25 years), Communications (2 years), and Guide to Computing Literature (10 years). We have been approached by the Institution of Electrical and Electronics Engineers (IEEE) to provide storage and access to their entire collection of journal page images, over 30 CD-ROMs per year, indexed through INSPEC, and are working on electronic publishing with the American Association for Artificial Intelligence (AAAI). In addition, we are working with MIT, Stanford University, University of Illinois, and the University of California to collect machine-readable computer science technical reports (see Section 3.4 "Developing Standards and Sharing Resources" for details). These materials will be mounted locally as databases. 2.1.3 Local Databases The success of the Library Information System (LIS I) has stimulated the demand for more online access to campus information. In response to this need, the University Libraries have set the goal of becoming a general electronic publisher for Carnegie Mellon. We intend to provide online full-text databases of campus information and online ordering of specific services (e.g., ordering textbooks or audio-visual equipment and putting books on reserve) to create an infrastructure for improving support for instruction in the University. As a first step in this direction, we mounted the Faculty/Staff Directory and the C- Book (the student directory) as a database called Who's Who at CMU and released it to campus (LIS I) in February 1989; Who's Who accounts for approximately 8-11% of all searches in LIS, ranging from 5-8,000 searches per month, during the academic year. Plans to mount additional full-text databases are discussed in Section 3.2.2 "Full-Text Databases." + Page 11 + 2.2 Expanding the Depth of the Electronic Record Bibliographic records, originally designed for card catalog use, continue to be the primary access to book collections for users of online catalogs. However, research indicates that the new technology has changed information-seeking behavior, with the result that users are essentially using new search strategies with old information structures. For example, users do more subject searching in online catalogs than they did in card catalogs, and are finding the information in bibliographic records inadequate to their needs--it is often insufficient to retrieve the record or to judge the book's relevance even if the record is retrieved. According to Richard Van Orden, enriching catalog records with information about the content of books may be the next major improvement in information retrieval. Enhanced information can expedite both the remote selection of material and document delivery. The ultimate purpose of catalog enhancements is "the timely provision of selected full-text materials to individuals when and where they need them." [1] Adding information about the content of books to our online catalog will increase the number of records retrieved and allow users to make better judgments about the value of a book for their particular query. University Libraries have several projects underway to expand the depth of content information available in the online catalog. Some record enhancements have been done entirely in-house and released to campus in LIS I. Two other enhancements have been acquired from commercial vendors and implemented but not yet released to campus: book reviews from Choice, and analytics for books and conference proceedings from ISI (the Institute for Scientific Information). 2.2.1 In-House Catalog Enhancements Barbara Richards, Alice Bright, and Terry Hurlbert implemented the Online Catalog Enhancements Project in the spring of 1989. The first stage of the project thoroughly examined sample contents pages to determine which kinds of material and how many of each kind should be included in an enhancement project, and to assess the problems that might occur. Based on this review, the cataloging staff established criteria for enhancing books using definitions of works to be included and works to be excluded; the criteria are discussed below. The review suggested that, provided scientific and technical conference proceedings were excluded, only 25-30% of the new books purchased would qualify for adding table of contents information. + Page 12 + 2.2.1.1 Criteria for Enhancement of Catalog Records o If the contents of a book can be cited separately, then the record is enhanced. Anthologies of plays, collections of critical essays written by different authors, and separately authored chapter titles are three categories of enhanced books. However, proceedings of scientific and technical conferences are excluded from this enhancement for two reasons. First, the length of the tables of contents may exceed a hundred titles, requiring extensive inputting of data, and second, alternative electronic sources, like INSPEC, can provide this information. However, we are placing a flag in conference proceedings catalog records to indicate that the items could be enhanced. o If the chapter titles within a book provide valuable information about the contents that is not already provided by keywords in the title or subject headings, then the record is enhanced. This category includes chapter titles that delineate historical time periods. Books for which words in the title and supplied subject headings already provide appropriate and sufficient access are excluded. If no unique keywords exist in the contents to improve the description of the monograph beyond the standard cataloging information, then the record is not enhanced; this decision is made by the cataloger. o If a monograph is an exhibition catalog, then the record is enhanced for each exhibitor whose work is included in the exhibition, with the exception that any exhibition catalog containing more than 25 artists is not enhanced. We are placing a flag in records of exhibition catalogs with more than 25 artists to indicate that the items could be enhanced. o If a Carnegie Mellon computer science or EDRC (Engineering Design Research Center) technical report has an author-supplied abstract less than one page in length, then the record is enhanced by adding the abstract. If the abstract is longer than one page, then the record is not enhanced. + Page 13 + 2.2.1.2 Catalog Enhancement Projects Three enhancements projects were undertaken in-house. The first project, the only review of existing catalog records, is a special service for the Drama and English departments at Carnegie Mellon, which have a great demand for plays. This project is adding contents notes (MARC field 505) or added entries (MARC fields 700 and 740) for plays in collections with different authors or the same author. The project was begun by reviewing catalog records for American and English drama; 3,857 catalog records were reviewed and 635 works of collected plays were enhanced. The project is continuing with review of Scandinavian, Italian, Latin, Spanish and French drama. The second project is adding contents notes (MARC field 505) to the records of newly acquired books with separately authored chapters or chapter titles with valuable keyword information (not provided in the title or subject headings), and to art exhibition catalogs with 25 or fewer artists. To date, 1,187 records have been enhanced. We are flagging records that should be enhanced but are currently not being enhanced, e.g., art exhibition catalogs with more than 25 artists, conference proceedings, and unanalyzed series. Enhancing recently purchased books that meet the criteria for enhancement is an ongoing project. The third enhancement project is adding abstracts (MARC field 520) to CMU computer science and EDRC (Engineering Design Research Center) technical reports. To date, 1,649 of the total 1,832 technical reports cataloged have been enhanced. The technical reports that were cataloged but not enhanced either had no abstract or the abstract exceeded one printed page in length. The Online Catalog Enhancements Project has enhanced a total of 3,471 catalog records since October of 1989. Though the project is ongoing, a sufficient number of records have been enhanced and made available online in LIS I to begin studying the effects of these records on retrieval and browsing, i.e., on users' access to information and their ability to discriminate between relevant and irrelevant information. We are collaborating with OCLC to investigate the effects of these catalog enhancements (see Section 4.3 "Research Plans" for a brief overview of our plans). + Page 14 + 2.2.1.3 Sharing Enhanced Catalog Records At the present time, the contents information input by Carnegie Mellon Library staff is only useful to our clientele. The enhanced records created in this project, although created on the OCLC system, are not available to other libraries. Discussions led by Tom Michalak at the February and May 1990 Users Council meetings at OCLC suggest that while many libraries are interested in the potential of enhanced catalog records, support for including records "enhanced" with contents information in the OCLC cataloging system is not yet widespread. However, it seems reasonable that OCLC should allow the contents information input by member libraries to be made available to other libraries who may wish to add such in formation to their catalog records. Unquestionably there will be technical problems which will have to be solved if libraries are to share enhanced records, and Carnegie Mellon will continue to raise the issue of sharing enhanced records in national databases. 2.2.2 Commercial Catalog Enhancements Though our in-house record enhancement projects address certain information needs, technical and financial constraints limit what we can do in-house. For example, works with several hundred author-title entries, like conference proceedings, are too costly for an individual library to catalog and the resulting records with contents notes are too large for current systems to handle. One alternative is to purchase analytic records for these items from a commercial vendor and merge these with the Library Catalog. We have two projects of this type underway; the effects of these enhancements will be evaluated along with the in-house record enhancements (see Section 4.3). + Page 15 + 2.2.2.1 CHOICE Catalog Enhancements Choice is a basic book reviewing service for academic and public libraries, emphasizing scholarly titles in their reviews. Choice reviews are available in machine-readable form. Current plans are to make selected records from the Choice database, specifically those that review books in our collection, searchable in our Library Catalog. These records will be searchable along with the catalog records, so that a search for a book title, for example, will retrieve two records--the catalog record for the book and the Choice record with the book review. We modified the Choice records slightly for inclusion in the Catalog. For example, we removed the prices for hardback and paperback purchases, and appended the HOLDINGS field from the Library Catalog record for the book being reviewed to the Choice record reviewing that book. At present, we estimate the addition of 4,000 records to the Catalog using this enhancement for the past three years of the Choice database. The decision to provide searchable book review records, rather than a hypertext link between the Catalog and Choice records that could be traversed once the bibliographic record was displayed, was a conscious one; its impact on retrieval will have to be measured. We assume that searching book review records with catalog records will facilitate recall of materials, but we do not know if it will facilitate precision and relevance judgments. We will do a cost-benefit analysis after releasing the Choice records to campus. Perhaps later, as an additional test of usage, we will release the entire Choice database as a separate database in LIS II. 2.2.2.2 ISI Catalog Enhancements Similar to the Choice enhancement project, we plan to include selected ISI (Institute for Scientific Information) analytic and full records, for books and conference proceedings in science and engineering, in the Library Catalog. Again, we appended the HOLDINGS field from the Library Catalog record for the item indexed in ISI to the ISI record for that item. The analytic records will be searchable, and have a hypertext link to the associated full record with table of contents, which will be displayable from any analytic record. In contrast to the Choice project, where the review record was searchable along with the catalog records, we chose not to make the ISI full table of contents records searchable because all of the information they contain is available in the individual analytic records. We estimate the addition of 15,000 analytic records to the Library Catalog using this enhancement, indexing approximately 1,000 scientific and technical conference proceedings. + Page 16 + 3.0 Retrieval System Development The technical goal of LIS II is to produce an affordable library information system for networked campuses, which are evolving across the nation and the world. Realistically, if libraries are to deliver documents to scholars at their desks, the storage, retrieval, and delivery of information must be cost effective. Furthermore, if libraries are to share electronic resources like enhanced records, we need a communication protocol that supports shared access to information. The goal is to build, not an experimental system, but a hardware and software platform that demonstrates the affordability and usability of the system for campuses of any size. Success depends on establishing standards. The LIS II development team is committed to using established standards, and when development mandates changing or extending standards, to do so within the proper forum for implementing such standards. See Section 3.4 "Developing Standards and Sharing Resources" for details. LIS II is based on the Andrew system at Carnegie Mellon, developed in a partnership with IBM. Named for both Andrew Carnegie and Andrew Mellon, Andrew encompasses the campus network--in reality a network of more than fifty local area networks, a distributed file system with hundreds of file servers, and thousands of high-function workstations. Workstations facilitate working with multiple applications by providing a window for each application and a window manager to manipulate the application windows, which can be tiled or stacked to produce a two- or three-dimensional workspace, or iconified (shrunk to a graphic) to clear the electronic desktop. These features provide a common user interface to network services, including electronic mail and bulletin boards, printing, and access to the Library Information System. Users can also access the Internet from Andrew, extending their research and collaborative efforts beyond Carnegie Mellon. + Page 17 + The Open Software Foundation (OSF), a non-profit research and development company sponsored by many of the world's major computer firms, recently incorporated the Andrew File System (AFS) into its Distributed Computing Environment (DCE), indicating the acceptance of AFS as a distributed file system standard. OSF distributes a software toolkit and interface style guide that, packaged with the mwm (X.11) window manager, comprise the graphical user interface standard called Motif. Motif has achieved wide acceptance as a standard among hardware and software vendors, and the body of applications implemented with the Motif toolkit, running under mwm, and conforming to the Motif style specifications is growing. Carnegie Mellon has adopted Motif as the campus standard, and the Motif Window Manager (mwm) will be the default window manager for workstations in the Fall 1990. The LIS II development team has adopted Motif as the library standard for user interface design. The result will be a single interface that brings together local applications and services with new third-party software, running across a wide range of machines. The following two paragraphs provide an overview of our current status and future plans. The rationale and details of each phase of the project are discussed in the sections that follow. To date, we have created a reasonable model for libraries to share resources under a common interface and demonstrated that the OSI Z39.50 protocol can work across separate servers. The Z39.50 information retrieval protocol allows an application on one computer to query a database on another computer; it specifies procedures and structures for submitting searches, transmitting database records, and access and resource control. An alpha version of basic software components for LIS II was demonstrated at the EDUCOM conference in October 1989. This demonstration included retrieval from several servers across the NSFnet using Z39.50 layered on TCP/IP, a new retrieval system from OCLC called Newton, and a pilot user interface for workstations written in DecWindows. Since then we have added a generalized authentication scheme based on the Kerberos system, converted the user interface to OSF X.11/Motif, and begun name service using Hesiod. + Page 18 + Meanwhile, work has continued on the next phase of the project. By the 1990 EDUCOM conference, we hope to be able to demonstrate storage, retrieval and display of bitmapped images using Fax Group 4 formats. The first work with compound documents, using SGML (Standard Generalized Markup Language) and CDA (Compound Document Architecture), will follow shortly thereafter. During the next year, we will implement LIS II on a new generation of small RISC servers supported by major vendors. This will bring the price of a minimal campus retrieval system to below $100,000, which is considerably less than the cost of running information retrieval on a mainframe. The same technology can be extended to CD-ROM if CD-ROM producers accept networking standards. Though many vendors are still reluctant to support standards, and licensing restrictions limit networking, we expect to integrate some of our CD-ROM databases into campus networking by the end of 1991. Future work also includes the development of a simple user interface for other personal computers, and a method of statistically monitoring usage. 3.1 User Interface Design A quality user interface is critical to the success of LIS II. Quality storage, indexing, and retrieval will only enable users to access the breadth and depth of our electronic collection if the user interface supports the tasks they want to do. This phase of LIS II development focuses on building a single workstation interface following OSF's Motif Style Guide. Developing a graphical interface for workstations using the Motif toolkit enables us to overcome some of the problems in interface design encountered with LIS I. For example, users sometimes lost their context when they were working with the VT100 display of LIS I--the only interface available, which responded to each user action by displaying a panel that replaced the panel that prompted the action. Motif offers multiple windows, one for each conceptual task, enabling users to keep their context and build a better conceptual model of information search and retrieval online. + Page 19 + 3.1.1 User Studies In conjunction with implementing a dynamic user interface in Motif, we have analyzed transaction logs, done protocol studies, and conducted lengthy interviews in a wide range of research areas to understand the human factors involved in online information retrieval. The remainder of this section discusses several of these projects, specifically the requirements for journal information, the sequence in which information fields are displayed, the problem of library jargon, and search complexity and performance. Plans for future user studies are included in Section 4, "Research and Development Agenda." 3.1.1.1 Requirements for Journal Information We have spent considerable time exploring the special requirements of journal and conference information. Journals and conference proceedings often have long titles that, when truncated for the one-line-per-record display, become meaningless, e.g., "International Journal of." Furthermore, journal titles often change over time as the journal is re-named to better identify its contents in a changing discipline or to reflect a merger with another publication; these name changes create considerable problems for users and are difficult to track in systems without cross referencing and linked records. Indications of journal holdings are also problematic because subscriptions are sometimes intermittent, issues are sometimes missing, and information about the most recent issue is often not entered into the system in a timely way. Additionally, since our journals are for the most part shelved alphabetically by main entry (which is not necessarily the same as the title) rather than by assigned call number, users often have trouble locating the journal even when they know we have it in our collection. To complicate matters still further, LIS I transaction logs indicate that users clearly want to search the contents of journals for author, title, and subject information, not just search a database of journal records to see if we have a journal. + Page 20 + The results of our research on journals to date indicate that LIS II should provide the following: o a one-line-per-record display that includes meaningful (usable) information o a brief record display that includes variant journal titles and Carnegie Mellon holdings o a full record display o an item- or issue-level display that includes real-time updates of latest issues o a table of contents display accessible from the issue-level display o a simple way to track journal title changes o a display for browsing variations of journal titles o links between records in other databases (e.g., INSPEC) and associated journal records o a simple way to request a photocopy or FAX or to submit an interlibrary loan request 3.1.1.2 Sequence of Displayed Fields Since many database records are several screens long (in LIS I) and research indicates that users often do not display more than the first screen, the sequence in which information fields are displayed is very important to user satisfaction. Traditionally, our catalog records have displayed information in a sequence suitable for librarians or system designers, but not necessarily suitable for patrons of the electronic library. For example, esoteric information fields like 008, CODES, ACQNUM, and DOCNUM are displayed at the top of the record, while the information fields that users typically use are displayed farther down the record, often interspersed with more esoteric or less important fields, e.g., LC-CARD and LANGUAGE (usually English). This sequence results in users having to scan the full records for relevant information, which may be displayed on subsequent screens. Our goal is to reorganize the sequence of fields so that those typically used by library patrons are at the top of the record and thus appear on the first screen when the full record is displayed. + Page 21 + 3.1.1.3 Library Jargon Another study examined jargon in library handouts and reference interviews (in preparation for online searching). The results of the study reveal that patrons misunderstand library terms approximately half of the time. The implications for LIS II are far reaching, not only in terms of the language to be used in the online help and on the buttons and menus, but in terms of what tags or labels to attach to the different information fields in the records themselves. For example, in a multiple choice test, only 35 out of 100 test subjects (CMU freshmen) selected the correct definition for the term "citation"; most subjects drew on their knowledge of parking or speeding violations and defined "citation" in the library context as a notice of overdue books. At present, "citation" is a tag we use to identify a field in our Library Catalog records; obviously this tag does not communicate effectively to everyone. 3.1.1.4 Search Complexity and Performance Using transaction logs for INSPEC, we created a model of user searches to use as a base line for preparing LIS II. We examined the logs from one of the busiest afternoons of the academic year to determine the following: o the number of searches issued per minute o the number of users on the system simultaneously o the complexity of user searches, defined as a function of the number of terms per search; the use of Boolean and proximity operators, field restrictors and truncation; and instances of browsing or scanning the index LIS II will handle 25 simultaneous users generating searches at the same rate and complexity found in LIS I. The goal is to provide performance that exceeds current LIS I performance on 70% of real searches. Users entering searches that exceed performance guidelines by 50% will be given a resource control option to cancel or proceed; if the search exceeds the guidelines by 100%, the user will be given an option to cancel or browse the index to narrow the search. Resource control is discussed in Section 3.3 "Distributed Retrieval Architecture." + Page 22 + 3.2 Database and Document Types One of the goals of LIS II is the delivery of complex documents over the network. While the current implementation of LIS II supports only ASCII text, both the full text of documents and structured information such as bibliographic records, over the next few years, the formats and sources of data available in LIS II will increase. The focus of our research in this area is on image databases, full-text databases, and personal databases. 3.2.1 Image Databases Our first priority is to extend the architecture of our entire computing environment so that it supports bitmapped images as well as ASCII text. Information from paper sources, such as journal articles, will be made available in bitmap format (see Section 2.1.2 "Machine-Readable Text"). We will use CCITT Fax Group 4 format to store the compressed images and will provide software decompression and display tools on the individual workstation. This area calls for a wide range of research on storing and displaying images with high resolution, gray scales, and color. Reasonable display performance of bitmaps depends on the speed of the decompression algorithm, the caching of data, and the ability of the decompression algorithm to work ahead of the user interface. The retrieval of bitmap data has implications for the retrieval protocol and requires changes in Z39.50. For example, the application level flow control in Z39.50 is record oriented, but the size of records containing bitmapped images may exceed 50 KB, making it necessary either to retrieve partial records or to retrieve bitmapped images from a secondary server. The format of the data likewise requires special handling by the user interface. + Page 23 + 3.2.2 Full-Text Databases In the future, full-text databases with very different indexing schemes from bibliographic databases will be added to LIS II. As electronic publishers for Carnegie Mellon, the University Libraries intend to provide online full-text databases of the following campus information: o software licensing and availability information o career resources information o Carnegie Mellon policies and procedures manual o the undergraduate catalog o Macintosh and Andrew system user help files o faculty and staff publications and research profiles o indexes to student and faculty newspapers--perhaps with full text Additional full-text databases will include research materials as well as standard office reference materials, such as phone books, encyclopedias and dictionaries. Because unpublished working papers and postings on bulletin boards are of vital importance in some disciplines, e.g., computer science, LIS II will merge published and unpublished information. We will provide indexed access to Carnegie Mellon working papers, and make use of work that is being carried out on automatic indexing of Arpanet bulletin boards, so that selected bulletin board postings can also be added to the retrieval servers. + Page 24 + 3.2.3 Personal Databases The original conception of a library information system was to bring a search index to a user as a single isolated tool. Our investigations and interviews with users led to a new conception based on the knowledge that documents are rarely used alone. The new understanding is that retrieval technology is an adjunct to desktop management, therefore a library information system must be integrated into the larger work environment. There is a growing tendency among users to want to leave the library connection active all day rather than log in and out of the application repeatedly; this trend will have a significant impact on established system designs, which commit actual hardware to each connection. With this in mind, we intend to use emerging standards to link LIS II documents to word processors, databases, electronic mail, and similar applications. We will provide toolkits for individual users to make databases available through LIS II. Using the toolkits, personal database creators will be able to access their databases through LIS II, or provide their colleagues with access to their databases through LIS II. The next challenge in handling document types is storing the source of a document, e.g., author-contributed text in machine- readable form. We are acquiring source documents for future research and development. We will use SGML and CDA to describe the intellectual structure and content of the document and to guide the format of the display. An example of the problems to be solved in this area is the relationship between spreadsheets and tables for display and page layout. A major area for future research, but beyond the current plans for LIS II, is the handling of dynamic documents. Postscript is another format for non-revisable documents, and we are planning support for it. + Page 25 + 3.3 Distributed Retrieval Architecture The distributed architecture of LIS II requires a range of support services. The first is a mechanism to identify and describe databases on the network. Our total Database Information Service requires a number of features in addition to those traditionally provided. The long term goal of the Database Information Service is for users to be able to find information without knowing which database to search. In conjunction with this service, the system requires authentication, access control, and resource control. We need a password and security (authentication) system for multiple reasons. The primary reason is to control access to licensed databases, but we must also limit access to sensitive data within databases, for example, to social security numbers in the Who's Who at CMU database. Additionally, authentication of individual users enables us to collect meaningful statistics about the behavior of different classes of users, e.g., in different disciplines. The Kerberos authentication scheme is used as the basis for this service. Resource control is the final major service required by LIS II. A distributed architecture designed to be used across institutions must include a mechanism for limiting the amount of resources that can be consumed by a remote user. This protects against abuse, makes it possible to provide subscription services for licensed databases, and protects users from potentially costly mistakes by notifying them of expensive requests. 3.4 Developing Standards and Sharing Resources As an affordable platform for sharing library information, we expect that LIS II will be expanded in the future. To this end, we are working with other groups to develop standards that all libraries can use. This section briefly discusses several projects in this area. See also the earlier discussion of sharing enhanced catalog records (see Section 2.2.1.3). + Page 26 + Members of the LIS II development team participate in the Z39.50 Implementors Group. We are lobbying for extensions to the protocol based on our work with LIS II, where we found it necessary to extend the protocol by devising local conventions: o for representing Boolean queries o for using Z39.50 element set names to provide alternate views of retrieved records o for sorting retrieved records on both the retrieval server and the user's workstation o for browsing indexes Further extensions to the protocol may also be necessary, e.g., to handle retrieving image data. Two other projects for testing shared resources are in the planning stages. The first project, with MIT, Stanford University, the University of Illinois, and the University of California, is to build a distributed collection of computer science technical reports and working papers; the result will be a full-text database with the items held at separate locations but with a shared index. Searchable bibliographic records with abstracts will be provided at each site, with the full text stored as page images in an image database at the home site. The second project, with the University of California and Pennsylvania State University, will test extensions of the Z39.50 protocol by sharing library catalog records; this project is sponsored by Digital Equipment Corporation. Additionally, we are working with Andrew system administrators to implement standards for Motif applications and window management at Carnegie Mellon. This involves collaboration on user testing and document preparation so that interactions and terminology are identical across applications. 4.0 Research and Development Agenda In conclusion, our LIS II plans for the next year include work in development, implementation, and research. Each of these is discussed briefly below, with the items in each section listed in order of priority. + Page 27 + 4.1 Development Plans o Test the graphical user interface--the number, placement and design of the windows; the text of error messages, buttons, menus, online help; the interactions between searching and browsing; the number and type of indexes to provide for each database; the information to include in the one-line-per-record displays; and the sequence of displayed fields in database records. Several research methods will be used, including protocol analysis, structured interviews, and user questionnaires. The results of these studies will affect the design of the user interface. o Build a terminal interface for personal computers like the IBM PC and Apple Macintosh. Because of the popularity of the Macintosh at Carnegie Mellon, long-term plans include building a Macintosh interface to LIS II. o Instrument the system to monitor user behavior based on a profile of significant characteristics--like college, department and status (e.g., Fine Arts, Drama , undergraduate); location where search was issued (e.g., office, public cluster, or library); database selection; search terms (including operators and restrictors); browse terms; instances of opening and closing windows; the number of short (one line per record) and full records viewed, and the number and sequence of page images viewed, etc. o Handle complex documents--using SGML and CDA to describe their form and content. 4.2 Implementation Plans o Implement LIS II on distributed file servers and release to campus. o Provide training and documentation for library staff and patrons--to facilitate the shift from a terminal emulation interface (LIS I) to a workstation interface (LIS II). o Broaden the range of bibliographic databases available in LIS II. + Page 28 + o Provide full-text databases--both searchable ASCII text of campus information and reference works, as discussed in Section 3.2.2 "Full-Text Databases," and displayable page images, as discussed in Section 3.2.1 "Image Databases." We are focusing on image databases and will continue experiments with different scanning, scaling, and compression-decompression algorithms. 4.3 Research Plans o Evaluate the effects of catalog enhancements on recall and precision--preliminary results from a pilot study of the current system (LIS I), planned for Fall 1990, will be used to design a more rigorous evaluation of catalog enhancements in the new system (LIS II). We want to assess the number of additional access points made available in the enhancements, the effects on retrieval, the effects on relevance judgments, the impact on the size of the catalog, and the cost per enhancement. The results of this evaluation should facilitate sharing enhanced catalog records. o Evaluate and document the transition from LIS I to LIS II. o Evaluate user behavior and preferences with LIS II--how skills develop over time; how acceptance is influenced by user characteristics, such as social group (student, faculty, staff, alumni) and discipline (engineering vs. social sciences), and by various features of the system itself, e.g., multiple windows, databases, indexes. Results from studies of user characteristics and skill levels will contribute to the ongoing design of the system. o Study how users use full-text databases--for example, given page images of journal articles or technical reports, do users read the pages sequentially or skip around in the text? This study will entail instrumenting the system to monitor user behavior and running user protocols to better understand why users do what they do. The results of the study will help us develop suitable navigational tools and caching procedures for full-text databases. + Page 29 + References Van Orden, Richard. "Content Enriched Access to Electronic Information: Summaries of Selected Research," Library Hi Tech 8, No. 3 (1990): 28. About the Author Denise Troll Carnegie Mellon University Libraries Frew Street Pittsburgh, PA 15213. BITNET: troll+@andrew.cmu.edu ----------------------------------------------------------------- The Public-Access Computer Systems Review is an electronic journal. It is sent free of charge to participants of the Public-Access Computer Systems Forum (PACS-L), a computer conference on BITNET. To join PACS-L, send an electronic mail message to LISTSERV@UHUPVM1 that says: SUBSCRIBE PACS-L First Name Last Name. This article is Copyright (C) 1990 by Carnegie Mellon University. All Rights Reserved. The Public-Access Computer Systems Review is Copyright (C) 1990 by the University Libraries, University of Houston. All Rights Reserved. Copying is permitted for noncommercial use by computer conferences, individual scholars, and libraries. Libraries are authorized to add the journal to their collection, in electronic or printed form, at no charge. This message must appear on all copied material. All commercial use requires permission. ----------------------------------------------------------------