Welcome to

MultiText



Digital Library Collection: The Barren Lands
The Barren Lands  site documents two exploratory surveys of the Barren Lands region west of Hudson Bay, in northern Manitoba and Saskatchewan and the area now known as Nunavut. Drawing on materials from the J.B. Tyrrell, James Tyrrell and related collections at the Thomas Fisher Rare Book Library, University of Toronto, it includes over 5,000 images from original field notebooks, correspondence, photographs, maps and published reports.

Search Early Canadian Literature
Early Canadiana Online  Our search system holds a growing collection of searchable books and pamphlets from the 18th and 19th centuries. Text from each page is searched and images of the pages containing the search results are returned. Over 3,500 documents and over 650,000 page images are available.


The Collection of Computer Science Bibliographies

MultiText provides an up-to-date mirror and a unique structured search index to the Computer Science Bibliography Collection.


Shodouka

We were the temporary home of Shodouka - the award-winning program by Ka-Ping Yee to render Japanese Web pages.


The People


The Project

The MultiText Project is concerned with developing techniques for the indexing and retrieval of very large electronic collections of text. By "very large" we are not referring merely to collections such as the Complete Works of William Shakespeare or the Encyclopedia Britannica that might fit on one or more CD-ROM disks and be purchasable by the owner of a personal computer. Rather, we are concerned with techniques for collections many times larger --- all issues of a large newspaper for several decades, all journals in a subject area, or, ultimately, a significant fraction of all text available electronically.

In developing these techniques we are considering the many unique requirements of very large text collections:

Multiple Users

It is not possible for each user to have a copy of the text and indexing information on his or her own personal computer. Our techniques allow many thousands of users to simultaneously query a text collection across a computer network. Incoming requests are scheduled to minimize the impact users have on one another.

Multiple Server Machines

Several computers must work in cooperation to provide storage and indexing for collections of this size. It is not feasible to store all information on a single computer or even at a single site. Our techniques allow effective and efficient communication of information between user's machines and the various machines indexing and storing the text.

Continuous Availability

The text collection must be updated, reorganized and extended while remaining available to users. The individual computers storing and indexing the text must be maintained and repaired with only a minimal reduction in performance. An unexpected failure of one of the individual computers must have no effect on availability and only a minimal effect on performance.

Multiple Query Languages

A variety of query languages and graphical user interfaces must be simultaneously supported, accommodating variances in user's tastes and abilities.

Multiple Text Formats

Documents in different formats must be stored in the same collection. Despite differences in format, users may still formulate queries that refer to document structure --- title or author, for example.


Technical Reports and Software

The Multitext project takes place at the University of Waterloo and the University of Toronto.

The Multitext project is sponsored by CITO


Send mail about this page or this server to: multitext@isagn.com.