Reading Notes For Week 5

Djoerd Hiemstra and Arjen de Vries

The paper shows the existence of efficient retrieval algorithms that only use the matching terms in their computation. And the language models could be interpreted as belonging to tf.idf term weighting algorithms
It introduces three traditional retrieval models: The vector space model(rank documents by the similarity between the query and each document), the probabilistic model(rank documents by the probability of relevance given a query) and Boolean model(use the operations of Boolean algebra for query formulations).
The vector space model and the probabilistic model stand for different approaches to information retrieval. The former is based on the similarity between query and document, the latter is based on the probability of relevance, using the distribution of terms over relevant and non-relevant documents.
The paper differs considerably from other publications that also compare retrieval models within one frame work, beacuse it is not to show that the language modelling approach to information retrieval is so flexible that it can be used to model or implement many other approaches to information retrieval.
As a side effect of the introduction of language models for retrieval, this paper introduced new ways of thinking about two popular information retrieval tools: the use of stop words and the use of a stemmer.

2140 - Information Storage and Retrieval