Site   Web

July 20, 2007

Aspects of SEO that can Prevent Indexing

I am often asked to find reasons why a site is not ranking? That’s a pretty broad question that requires a lot of investigation to get to the bottom. The best way to start is to determine which page is being optimized for that specific term. Take the URL and enter it into a search query. If no match comes up, the page is not indexed by the search engine and therefore cannot rank for that term.

In situations like this, I start to go through my mental checklist of things that could be preventing the crawling and indexation of a page. Here is my checklist in order of importance:

Robots.txt Protocol Exclusion
Pages and directory folders excluded via robot.txt will never be crawled because the file tells the crawlers not to read the page. A robots.txt file is the first file a search engine bot calls and if it excludes a page, it will not read or index it.

“No Index” Meta Tag
If a page has the following Meta tag: <meta name=”robots” content=”noindex” />, it essentially does the same as a robots.txt exclusion file. The page will not be indexed.

Duplicate Content
Two pages with unique URL’s are often filtered out of search results and sometimes one is removed from the index for the simple fact that a search engine would not allow their search result quality to suffer by showing the same information one after the other. This is often the case of why people cannot get a specific page indexed.

Multiple Canonical URL Versions
Much like duplicate content, multiple URL versions of the same content act in the same way. While search engines are constantly improving their algorithms to avoid this issue, it may still cause indexation problems.

Dynamic URL’s
Dynamic URL’s tend to provide the same content on a unique URL string which can trigger duplicate content filters. This works in the same way as direct duplication of content and multiple canonical URL versions.

Algorithm Compliance Problems
Simply put, if you do something the search engines don’t like such as use black hat techniques, they can easily de-index a page or remove your entire domain from the index.

Replicating Title and Meta Tags
I have noticed oftentimes, when two pages have similar content, and the title and Meta description and keywords tags are replicated between the two pages, the page if often not indexed because it has triggered a duplicate content filter.

Programming Language Usage
Search engines generally only read HTML. They are making advances into indexing JavaScript and Flash but they still have a long way to go. With that, a page comprised of only JavaScript, Flash, or images may have a hard time being indexed because the search engines don’t have anything to read.

Page File Size
Google recommends a page file size of under 99k. I try to abide by this and have noticed that larger pages can cause indexation problems.

Table-based Layout
Nested tables often cause search engine crawlers to skip over sections of a page and not index certain content. If they miss your important stuff, they may not index you at all.

Navigational and Linking Structure
It is vital to get the crawler to the page. This must be done with standard HTML text links preferably without rel=”nofollow” attributes on those links. Further, some links on pages with over 100 navigational links may not be crawled and followed to index. Every page should be no more than 3 clicks deep off the homepage. This can often be accomplished using sitemaps, CSS navigation bars, breadcrumb links, and cross-content interlinking. At times search engines still may have a hard time indexing some pages. In such cases, I recommend direct sitemap feeds via means such as XML.

I often see people simply being too anxious to see indexation. Search engine resources are limited and they will devote them in direct correlation with how important a site is. Importance if usually determined by link profiles and fresh content. If a site is deemed to be less important, the crawler may come around less often and not dig down through as many links to index new pages and simply not index all of your pages because your site does not provide enough value to index.

I’m sure these are not all of the possible reasons why a page might not get indexed. Can anyone comment on other possible reasons?

Author:  Gennady Lager is the Senior SEO Specialist for, a full service online marketing firm specializing in Search Engine Marketing and Search Engine Optimization. In addition to the SEM and SEO offerings, SendTraffic provides Paid Inclusion and Feed Management, Website, and Phone Sales Analytics. Our People, Process Proprietary Technologies and Pricing make SendTraffic the leader in providing Online Marketing solutions.