Site   Web

January 22, 2008

SEO and LSI: How to Use Latent Semantic Indexing

People get worried when they hear about terms such as SEO and LSI, and when they try to find out how to use latent semantic indexing, they get even more worried when they get conflicting messages. On the one hand some say that LSI doesn’t exist and therefore can’t be used, while others, on the other hand, state that it is critical to your website’s success with the search engines.

We all know what computer people are like, the way they try to make acronyms of everything. LSI is one of these, although not quite what you would call an acronym. LSI does exist, but not in the form that Google would have us believe, and not in any form that you can use to make your website ‘LSI compliant’. Anybody claiming that they can do that are simply playing Google’s trick and using big names for what is a very simple thing to do.

LSI or LSA?

Without going into any detail as to the mathematical background of LSA, it can be used, and is used, to determine the relevance of a passage of text to any given topic based upon a keyword or multiple word search term. LSA was, incidentally, patented by a group of people in 1988, although the basics were known prior to this. LSI is nothing more than the use of LSA in the indexing and retrieval of information. It is therefore a concept, and you cannot make a ‘concept compliant’ web page.

However, it is all semantics (ha-ha!) and the meat of this article is not to knock holes in the way the terms LSA and LSI are being wrongly used by SEO experts, but inform as to how you can make your web page more likely to be considered relevant to the main keyword for which you want your page indexed. This is very simple and does not warrant all the books now being offered on the subject.

The Development of Adsense

Latent semantic analysis is used by Google primarily to detect spam, in respect of excessive repetition of keywords in order to fool the search engines into providing a high listing for that keyword. There was a time when smart people could indeed achieve this simply by writing a meaningless template with rotating synonyms into which any keyword could be multiply inserted by means of software. Thousands of pages could be generated in minutes, each targeting a different keyword. Some were making $thousands daily from Adsense using this method.

In fact the principles of LSA to determine the content of web pages were used by a small company called Oingo that changed its name to Applied Semantics who developed a search system to determine the relevance of page content for specific advert placement. They called this Adsense. This company was in turn bought by Google in April, 2003, and Adsense used to replace their own system which was still under development. Adsense, then, was not developed by Google, but purchased by them.

BigDaddy and Character String Analysis

The principles were also applied to determine the relevance of on-page text to specific search terms and used in the web indexing algorithm called BigBaddy, used by the Googlebot to index your web pages. BigDaddy appears to view links and relevance as the two major factors among many others that determine your listing position in the index for any specific search term as used by a Google customer.

Back to spam. Your web page content is now analyzed by the statistical mathematical analysis tool known as LSA/LSI and indexed according to the meaning of the words in your text. It goes further than just checking for the excessive use of specific words, and no longer searches only for indices of your stated keywords. LSA informs Google of the true meaning of your text, and you cannot hide this by repetitions of a single key phrase. Let’s call it LSI because that’s what Google calls it. LSI analyzes the character strings in your text and compares them to a large database of words, the meanings of which have been defined.

Same Words – Different Meanings

LSI is used to determine the true meaning of homonyms, heteronyms and polysemes. Homonyms are spelled and pronounced the same, but have different meanings, such as lock, with three meanings. A heteronym is a word spelled the same as another, but with a different pronunciation and meaning, such as lead: a metal or to be in front. Polysemes are words spelled the same, and from the same root, but used differently such as a mole – a burrowing animal, or a mole – a spy deliberately placed in an organization. Both moles have the same root, but the words are used in different contexts. LSI or LSA can be used to determine the difference by means of analysis of the other words in the text.

If your page has been written around the keyword lock (my usual example of a homonym), without any decent content the reader would find it difficult to tell what type of locks you were writing about. The LSA algorithm would be looking for words such as canals, keys or hair to tell the difference and know where to list it.

All you need do is to look up thesaurus.com, and then use plenty of alternative vocabulary in your content that explains its meaning precisely. You can also use the tilde (~) in a Google search for your keyword. While Google does not highlight exactly correct synonyms, it will give you an indication of what vocabulary it regards as being equivalent. If you do that with ‘locks’ all you get are ‘lock’ and ‘locks’, and all are security locks. Interestingly, when you do it with ‘canal locks’, Google also highlight ‘narrow’. This indicates that if your topic is on canal locks, using the word ‘narrow’ will be to your advantage.

Semantics and Vocabulary

If you keep in mind that the main purpose of the LSI component of BigDaddy is to detect keyword spamming, and to determine for what search term the page should be indexed, then it should be obvious to you that the sue of contextually related vocabulary will reveal the semantics of your page. Semantics is nothing more than the meaning of the words you are using, and where your keywords could have more than one meaning, you have to make the meaning clear through the use of related text. Nothing more than that.

If you write naturally, as you would if you were talking to somebody, and trying to explain your subject, then you will not have any problems with the LSI algorithm. There is no need to use an SEO expert, since they are not necessarily qualified in their knowledge and use of language. A thesaurus will do the job fine.

Keyword Density is not What it Was

Do not overuse your keywords, and the old adage that you should have between 1% and 3% keyword density on your page no longer applies. Use it often enough to stress its importance, which means as the page title, as the heading in H1 tags, and in the first 100 characters and in the last paragraph. Google will check all four of these, and will regard any words it finds there, other than fillers and stop words, as being important. Use it again every 300 words or so and that is enough.

SEO and LSI are not really related since the term LSI is used in the wrong context here. However, in the way that it is used, if you use good vocabulary, contextually relevant to your keywords, then you will also be using good SEO. How to use latent semantic indexing properly is irrelevant in pure terminology, since you can’t use it on your web page. Google can use it in their algorithm, and you should make your vocabulary as understandable as possible by means of simple words that express the meaning of your text.

Author:  Pete has his own theories of the way that SEO and LSA can be used to improve your web page listing positions, and more information is available on his website SEOcious and his blog SEOscopy where you will find how to use these concepts to their maximum effect.

Submit a Comment

Your email address will not be published. Required fields are marked *






You may use these HTML tags and attributes: <a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Please leave these two fields as-is:

Protected by Invisible Defender. Showed 403 to 4,221,254 bad guys.

css.php