Latent Semantic Indexing

"For all their problems, online search engines have come a long way. Sites like Google are pioneering the use of sophisticated techniques to help distinguish content from drivel, and the arms race between search engines and the marketers who want to manipulate them has spurred innovation."
Source: http://www.knowledgesearch.org/lsi/lsa_intro.htm

Earlier more simplistic search engine indexing algorithms functioned primarily by counting things: keywords, pages, links. Since that led inevitably to keyword spamming, search engines had to find more sophisticated means for detecting relevance. Latent semantic indexing is one solution to this challenge.

Latent semantic indexing is a remarkable technique that examines a very large collection of documents as a whole to learn what words frequently occur together in individual documents. LSI appears to be "smart" enough to recognize that "Saddam Hussein" is somehow closely related to "Iraq", "Gulf War" and "terrorism". But that apparent "intelligence" is actually only derived from the mathematical analysis of the number of times "Iraq", "Gulf War" and "terrorism" appear in the same document as "Saddam Hussein" on the internet.

When you consider that LSI has no understanding whatsoever of the actual meaning of the words it is indexing and only uses mathematical algorithms, you can imagine how complex a relational matrix must result. The number of words it has to look at and correlate the relatedness of is the entire English language. And the entire French language, Spanish language, German, etc. etc. It is actually quite mind boggling.

Latent Semantic Indexing Summary

If you want to understand in graphic detail how LSI actually manages its herculean task, take a look at the surprisingly readable "Patterns in Unstructured Data" which explains all about how a term-document matrix is generated.

Following is a brief summary of latent semantic indexing. If you're not interested in these details, feel free to skip directly to What Latent Semantic Indexing Means To You.

Something approximating meaning results from "singular value decomposition" which eliminates "noise" from the original term-document matrix, revealing word similarities that were "latent" in the document collection. This reductive mapping is what gives LSI its seemingly intelligent behavior of being able to correlate semantically related terms.

While singular value decomposition produces acceptable results, LSI goes a step further and applies two common-sense insights:

You can study the specifics in this discussion of the term-document matrix, but the bottom line is that the application of weighting factors (local weight, global weight and a normalization factor) lead to much better results because it takes into account the relative importance of potential search terms.

What Latent Semantic Indexing Means To You

What Does All This Mean To You?

ROI-SEO specializes in Architectural SEO™. We select semantically related keywords from your target market's vocabulary that applies to your products and services and the vocabulary an authority in your industry would use. From this set we select the keywords for SEO by doing research on search frequency and keyword competition. From the entire set of semantically related keywords we build an internal linking structure that exploits the latent semantic indexing of all of these words.

Learn more about how you will achieve maximum ROI when we build Architectural SEO™ into your website.

Strategy

What can we do for you?