MultiText provides an up-to-date mirror and a unique structured search index to the Computer Science Bibliography Collection.
We were the temporary home of Shodouka - the award-winning program by Ka-Ping Yee to render Japanese Web pages.
The MultiText Project is concerned with developing techniques for the indexing and retrieval of very large electronic collections of text. By "very large" we are not referring merely to collections such as the Complete Works of William Shakespeare or the Encyclopedia Britannica that might fit on one or more CD-ROM disks and be purchasable by the owner of a personal computer. Rather, we are concerned with techniques for collections many times larger --- all issues of a large newspaper for several decades, all journals in a subject area, or, ultimately, a significant fraction of all text available electronically.
In developing these techniques we are considering the many unique requirements of very large text collections:
It is not possible for each user to have a copy of the text and indexing information on his or her own personal computer. Our techniques allow many thousands of users to simultaneously query a text collection across a computer network. Incoming requests are scheduled to minimize the impact users have on one another.
Several computers must work in cooperation to provide storage and indexing for collections of this size. It is not feasible to store all information on a single computer or even at a single site. Our techniques allow effective and efficient communication of information between user's machines and the various machines indexing and storing the text.
The text collection must be updated, reorganized and extended while remaining available to users. The individual computers storing and indexing the text must be maintained and repaired with only a minimal reduction in performance. An unexpected failure of one of the individual computers must have no effect on availability and only a minimal effect on performance.
A variety of query languages and graphical user interfaces must be simultaneously supported, accommodating variances in user's tastes and abilities.
Documents in different formats must be stored in the same collection. Despite differences in format, users may still formulate queries that refer to document structure --- title or author, for example.
The Multitext project is sponsored by CITO