2140 - Information Storage and Retrieval : October 2015

Saturday, October 31, 2015

MIR Chapter 10

The design principles are offer informative feedback, reduce working memory, provide alternative interface for novice and expert users, information access interfaces.
Precision and recall measures have been widely used for comparing the ranking results of non-interactive systems, but are less appropriate for assessing interactive systems. The standard evaluations emphasize high recall levels.
The user interface should also support methods for monitoring the status of the current strategy in relation to the user's current task and high-level goals.
Studies show that users tend to start out with very short queries, inspect the results, and then modify those queris in an incremental feedback cycle.

The Design of Search User Interfaces

The most understandable and transparent way to order search results is according to how recently they appeared.
Another important issue in the tradeoff between system cleverness and user control lies with query transformations.
Keyboard shortcuts can save time and effort when the user is typing, as the shortcuts remove the need to move hands away from the keyboard to the mouse. But there is a barrier to using shortcuts, as they require memorization.

Information Visualization For Text Analysis

One of the most common strategies used in text mining is to identify important entities within the text and attempt to show connections among those entities.
Standard data graphics can be an effective tool for understanding frequencies of usage of terms within documents.

The article begin with a discussion of measuring the effectiveness of IR systems and the test collections that are most often used for this purpose. We then present the straightforward notion of relevant and nonrelevant documents and the formal evaluation methodology that has been developed for evaluating unranked retrieval results.
That relevance of retrieval results is the most important factor: blindingly fast, useless answers do not make a user happy. However, user perceptions do not always coincide with system designers’ notions of quality.
The standard way to measure human satisfaction is by various kinds of user studies. These might include quantitative measures, both objective, such as time to complete a task, as well as subjective, such as a score for satisfaction with the search engine, and qualitative measures, such as user comments on the search interface.

The paper talks about the experiments with a range of collections of different sizes and languages, comparing no-expansion base and conventional independent local expansion. The experiment results are very helpful for expansions in all kinds of form.
The comparison between local context pseudo relevance feedback and real relevance feedback are very interesting two strategies for the special case.

The paper conduct a study of methods for negative relevance feedback. They compare representative negative feedback methods, covering vector space models.
Also the author mentioned how to evaluate negative feedback, which requires a test set with sufficient difficult topics. from the vector space models in the paper, I think model based negative feedback methods are generally more effective than those based on vector-space models.

How to balance the efficiency and the quality of ranking in real world?

Djoerd Hiemstra and Arjen de Vries

The paper shows the existence of efficient retrieval algorithms that only use the matching terms in their computation. And the language models could be interpreted as belonging to tf.idf term weighting algorithms
It introduces three traditional retrieval models: The vector space model(rank documents by the similarity between the query and each document), the probabilistic model(rank documents by the probability of relevance given a query) and Boolean model(use the operations of Boolean algebra for query formulations).
The vector space model and the probabilistic model stand for different approaches to information retrieval. The former is based on the similarity between query and document, the latter is based on the probability of relevance, using the distribution of terms over relevant and non-relevant documents.
The paper differs considerably from other publications that also compare retrieval models within one frame work, beacuse it is not to show that the language modelling approach to information retrieval is so flexible that it can be used to model or implement many other approaches to information retrieval.
As a side effect of the introduction of language models for retrieval, this paper introduced new ways of thinking about two popular information retrieval tools: the use of stop words and the use of a stemmer.

What's the effect of idf on ranking for one term queries?