文摘
This dissertation investigates pattern-oriented access to collections of unstructured text documents. A pattern-oriented information search differs from a more traditional record-oriented search just as the study of an entire forest differs from the inspection of specific trees. For example, to enjoy Abraham Lincoln's eloquence, we might look up a particular speech such as the Gettysburg Address (a trees-perspective); to understand the evolution of Lincoln's ideas, we must seek trends across the collection of his public statements (a forest perspective). Data-mining seeks this forest-perspective by finding statistical patterns in data. Unfortunately, data-mining is only applied to highly-structured data, and therefore ignores much, if not most, of the world's information, which exists as unstructured text.;Evidence from the Information Retrieval, Information Visualization, Bibliometrics, and Library Science literatures demonstrate that pattern-oriented access to document collections is a critically important task; one in which people often engage even if they do not have tools designed for this purpose. Informed by these literatures, a prototypical pattern-discovery system named Homer is introduced and applied in two empirical studies. The first study required subjects to answer specific questions about the prose of a photographer's captions; the second study required subjects to respond to open-ended medical questions based on a collection of emergency room medical reports. Results show Homer users learning more and taking less time, on average, than users of more-traditional record-oriented systems. These results, combined with evidence from the literature, argue strongly that pattern-oriented access to document collections is possible, and can potentially tap vast, previously-unavailable sources of knowledge by helping us find the stories hidden within our document collections.