Page Not Found: Search Engines Not Indexing Pages
Why isn't all the content you've put online generating better search engine rankings and website traffic? One explanation could be that the search engines haven't spidered (scanned) all of your site. It's easy to assume that finding some of your content in Google or Yahoo! means that your entire site is represented in their index. Often this is not the case.
Finding Out What Has Been Indexed
It's easy to find out how many pages of your site have been indexed because most major search engines provide the ability to search for results within a particular domain. This feature can typically be found in the "advanced features" page.
Pick a word that appears on every page of your site, such as text in the footer, and search within your domain for that word. For instance, the pages of this site that have been indexed can be found by searching for "marketing" within the web1marketing.com domain. Click here to see the results of this query on Google. Note that often these queries don't return the entire list of pages; they may shown an abbreviated listing due to similarity of pages. Usually the search engines display the total number of results above the results.
Search Robot Directives
There are several reasons that some pages may not be indexed. The most obvious and easiest to fix is the presence of directives that tell search engines not to index some content. These directives can reside either in a robots.txt file or robots META tags. More information on both can be found at The Web Robots Pages.
Another major culprit is the use of dynamic content. Pop-up menus and other forms of dynamic links usually cannot be followed by search engine spiders. Likewise many sites, especially in e-commerce, use ASP, PHP, and other dynamic techniques to create page links. The structure of these links can prevent search spiders from reaching your content.
Orphaned content is that which has no links pointing to it. It's rare for someone to create a page without linking to it from other pages within the same site. The more common occurrence is for links to be removed during site changes. Finding a complete absence of links is an exhaustive process unless you have access to good web editing software and the entire site source.
Site Depth and Breadth
There is yet one more factor that affects many sites that have few in-links and lots of content. Some search engine spiders won't plumb the entire breadth and depth of a large site that has yet to achieve a high enough ranking. The thresholds for partial spidering are very fuzzy, but it's good practice to limit the number of pages in a given folder to less than 50 and to make it possible to get to any page within 3 clicks. (Site maps are handy for this.)
There are various ways to prevent search engines from including all of your content in their indices. Fortunately most of these problems can be fixed quickly and easily.