One of the standard elements of web page optimization is
Keyword Density: up until very recently the ratio of keywords
to rest of body text was generally deemed to be one of the
most important factors employed by search engines to determine
a web site's ranking.
However, this basically linear approach is gradually changing
now. As mathematical linguistics and automatic content
recognition technology progresses, the major search engines
are shifting their focus towards "theme" biased algorithms
that do not rely on analysis of individual web pages anymore
but, rather, will evaluate whole web sites to determine their
topical focus or "theme" and its relevance in relation to
users' search requests.
This is not to say that keyword density is losing in
importance, quite the contrary. However, it is turning into a
lot more complex technology than a simple computation of word
frequency per web page can handle.
Context analysis is now being determined by a number of
auxiliary linguistic disciplines and technology. For example:
* semantic text analysis * textlexical database technology *
distribution analysis of lexical components (such as nouns,
adjectives, verbs) * evaluation of distance between semantic
elements * AI and data mining technology based pattern
recognition; * term vector database technology, etc.
All these are now contributing to the increasing
sophistication of the relevance determination process. If you
feel this is beginning to sound too much like rocket science
for comfort, you may not be very far from the truth. It seems
that the future of search engine optimization will be
determined by what the industry is fond to term the "word
gurus".
|
FREE DAILY SCREEN SAVERS
Sign up for this free service and you'll receive a daily free,
original screen saver in your E-mail. These programs are created
by the Web's leading screen saver authors and include a wide
variety of themes, including art, cartoons, 3D images, vehicles,
animals, fantasy, sports, TV, music, cinema, games, sci-fi and
more.
Sign up Here
|
A sound knowledge of fundamental linguist methodology plus
more than a mere smattering of statistical calculus will most
probably be paramount to achieve successful search engine
rankings in the foreseeable future. Merely repeating the well
worn mantra "content is king!", as some of the lesser
qualified SEO professionals and very many amateurs are
currently doing, may admittedly have a welcome sedative effect
by creating a feeling of fuzzy warmth and comfort. But, for all
practical purposes it is tantamount to whistling in the dark
and fails miserably in doing justice to the overall complexity
of the process involved.
It should be noted that we are talking present AND future
here: many of the classical techniques of search engine
optimization are still working more or less successfully, but
there is little doubt that they are rapidly losing their
cutting edge and will probably be as obsolete in a few months'
time as spamdexing or invisible text - both optimization
techniques well worth their while throughout the 90s - have
become today.
So where does keyword density come into this equation? And how
is it determined anyway?
There's the rub: the term "keyword density" is by no means as
objective and clear-cut as many people (some SEO experts
included) will have it! The reason for this is the inherent
structure of hypertext markup language (HTM) code. As text
content elements are embedded in clear text command tags
governing display and layout, it is not easy to determine what
should or should not be factored into any keyword density
calculus.
The matter is complicated further by the fact that the meta
tags inside a HTML page's header may contain keywords and
description content: should these be added to the total word
count or not? Seeing that some search engines will ignore meta
tags altogether (e.g. Lycos, Excite and Fast/Alltheweb),
whereas others are still considering them (at least
partially), it gets even more confusing. What may qualify for
a keyword density of 2% under one frame of reference (e.g.
including meta tags, graphics ALT tags, comment tags, etc.)
may easily be reduced to 1% or less under another.
Further questions arise. Will meta tags, following the Dublin
Convention ("D.C. tags"), be counted in or not? And what about
HTTP-EQUIV tags? Would you really bet the ranch that TITLE
tags in tables, forms or DIV elements will be ignored? Etc.,
etc.
Another fundamental factor generating massive fuzziness left,
right and center, is the issue of semantic delimiters. What's
a "word" and what isn't? Determining a lexical unity (aka a
"word") by punctuation is a common though pretty low tech
method which may lead to some rather unexpected results.
Say you are featuring an article by an author named "John Doe"
who happens to sport a master's degree in arts, commonly
abbreviated as "M.A.". While most algorithms will correctly
count "John" and "Doe" as separate words, the "M.A." string is
quite another story. Some algorithms will actually count this
for two words ("M" and "A") because of the period (dot) is
considered a delimiter - whereas others (surprise!) will not.
But how would you know which search engines are handling it in
which way? Answer: you don't, and that's exactly where the
problems start.
The only feasible approach to master this predicament is trial
and error. The typical beginner's inquiry "What's the best
keyword density for AltaVista?", understandable and basically
rational as it may be, is best answered with the fairly
frustrating but ultimately precise: "It all depends - your
mileage may vary." It is only by experimenting with keyword
densities under standardized, comparable conditions yourself
that you will be able to come to significant and viable
conclusions.
To get going, here are some links to pertinent programs that
will help you determine (and, in one case, even generate)
keyword densities.
KeyWord Density Analyzer (KDA)
An all time classic of client based keyword density software
is Roberto Grassi's powerful KeyWord Density Analyzer (KDA).
It is immensely configurable and offers a fully featured free
evaluation version for download.
Find it here
(Expect to pay appr.$99 for the registered version.)
Concordance
Concordance is a powerful client based
text analysis tool for making word lists and concordances
from electronic texts. A trial version can be
downloaded here
(Expect to pay appr. $89 for the registered version.)
fantomas keyMixer(TM)
Our own
fantomas keyMixer(TM)
is the world's first automatic
keyword density generator, enabling you to create web pages
with ultra precise densities to the first decimal digit. Read
more about this server based Perl/CGI application by clicking on the above link.
(Expect to pay appr. $99 for the registered version.)