Reading Notes For Week 13

IIR

Most language-modeling work in IR has used unigram language models. IR is not the place where you most immediately need complex language models. Unigram models are often sufficient to judge the topic of a text.
Language modeling is a quite general formal approach to IR, with many variant realizations. The original and basic method for using language models in IR is the query likelihood model.
Vector space systems have generally preferred more lenient matching, though recent web search developments have tended more in the direction of doing searches with such conjunctive semantics.
Group-average agglomerative clustering is avoiding the pitfalls of the single-link and complete-link criteria, which equate cluster similarity with the similarity of a single pair of documents.
Flat clustering creates a flat set of clusters without any explicit structure that would relate clusters to each other. Hierarchical clustering creates a hierarchy of clusters.
The inverted index supports fast nearest-neighbor search for the standard IR setting. However, sometimes we may not be able to use an inverted index efficiently.
Feature selection makes training and applying a classifier more efficient by decreasing the size of the effective vocabulary.
Differential cluster labeling selects cluster labels by comparing the distribution of terms in one cluster with that of other clusters.

2140 - Information Storage and Retrieval