Reading Notes For Week 11

IIR Chapter 19 & 21

Despite these words being consequently invisible to the human user, a search engine indexer would parse the invisible words out the HTML representation of the web page and index these words as being present in the page.
Web search engines frown on this business of attempting to decipher and adapt to their proprietary ranking techniques and indeed announce policies on forms of SEO behavior they do not tolerate.
Current search engines follow precisely this model: they provide pure search results (generally known as algorithmic search results) as the primary response to a user’s search, together with sponsored search results displayed separately and distinctively to the right of the algorithmic results.

Authoritative Sources in a Hyper linked Environment

The central issue we address within our framework is the distillation of broad search topics, through the discovery of “authorative” information sources on such topics.
The motivation of the algorithm is highly intuitive and is, in itself, an interesting and insightful contribution.
The formulation of this paper has connections to the vectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for link-based analysis.

The Anatomy of a Large-Scale Hyper Textual Web Search Engine

The final design goal was to build an architecture that can support novel research activities on large-scale web data.
PageRank can be thought of as a model of user behavior. The probability that the random surfer visits a page is its PageRank. And, the d damping factor is the probability at each page the "random surfer" will get bored and request another random page.
While a complete user evaluation is beyond the scope of this paper, our own experience with Google has shown it to produce better results than the major commercial search engines for most searches.

2140 - Information Storage and Retrieval