LinkCommand User Manual
Consider the search term "web ring". Are links from nature sites relevant? What about "closed cell"? Are
links from websites about our judicial system relevant? Perhaps the user is referring to special research
about living cells or a terrorist cell?
Many of the words we use have several meanings. It is context that determines the meaning or `sense' of
specific words. Perhaps this is the reason `personalized search' is such a hot topic these days. For it is
`personalized search' that relates keywords with website classifications, and website classifications allow the
search engines to qualify related links.
Google is already nee deep into TSPR. Google has launched 2 beta projects:
Each of these beta programs was introduced in the first 1/2 of 2004 and gives us a glimpse of the power of
TSPR. In order to understand how TSPR works, we need a little background on PR (Page Rank).
The equation for PageRank is:
PR(A) = (1-d) + d(PR(t1)/C(t1) + ... + PR(tn)/C(tn))
Where:
't1 - tn' are pages linking to `A'
'C' is the number of outbound links on that page
'd' is a damping factor, typically set to 0.85.
In order to run the equation, you must know all the pages linking to `A' and the
PageRank of each of these linking pages. The only way to perform this calculation is
to iterate: run the equation many times, typically 20-40 times. In order to determine
the true PageRank of any page in Google's index, you must be Google: have an index
of how all website pages are connected to each other.
In laymen's terms, PageRank is a measure of the interconnection or popularity of any
one website page as measured by links from other website pages. It is commonly
thought of as `votes'. The more links or `votes' from other website pages, the higher
the PageRank score. But there is one more element to the equation. The value of a
`vote" from a website page is divided by the number of outbound links (total `votes').
So then links or `votes' from pages that have fewer outbound links cast more of their
voting power to a page.
The important thing to note is the use of the damping factor of 0.85. We don't really know the exact value that
Google uses, but we do know that there must be a value here (perhaps +/- 0.05). If we assume that 0.85 is
the actual value, then we can say that Page Rank only accounts for 85% of the possible value. So what
about the 15% remaining? It really does not matter much if all website pages are calculated in the same
manner. But what is important is that there is a 15% component that can be added to each website page and
this component can be the TS (Topic Sensitive) portion of TSPR. Another way of saying it is that 85% of a
calculated value is PR, 15% is TS.
So how is TS figured out? You can read the original authors paper here:
The theory of Topic Sensitivity starts with the assumption that there are authority websites for a specific
subject or keyword phrase. These authority websites link out to other websites and other websites link to
more websites. Every time there is an outbound link, there is a component of that link that had its origins from
the authority websites. The topic sensitivity that is passed through the link is defined by where the upstream
websites got their links.
It may be easier to think of Topic Sensitivity as a `bloodline". Some dogs are pure breeds, but most dogs are
mutts, a combination of bloodlines. Mutts may have blood from a few or many different dog types just as
websites have links from different website categories. If a dog has a lot of bloodline from golden retrievers,
LinkCommand Users Manual
page 51 of 52
Revision A 0505