One of the concerns facing most website owners today is how to make search engines find all the pages of their website. Search engines do have a fairly smart spider which is capable of crawling through your entire website and extracting all links. However, for large websites consisting of several hundred or perhaps a few thousand web pages, search engines might miss some deeper level pages, especially if they are linked from within inner pages and do not find a place in your main navigation menu tree.
Hence, it is always a good idea to present your entire list of links to search engines in an easy way so that the search engine can find all your links in one page. Off-course there is more to a sitemap than merely presenting a list of links, which you will learn as you read on.
A sitemap for a website is analogous to the index page of a book. Normally, when you build your website you would provide a nice easy to navigate multi-level menu bar at the top, so that visitors can quickly find out what they are looking for and jump to that page by clicking at the appropriate link in your menu tree.
So you may ask If I have created a nice multi-level menu tree for ease of navigation, why do I then need a second index in the form of a sitemap. Well, to answer this - while your menu tree is useful for your human site visitors, a sitemap file is more meaningful for search engine crawlers.
Normally, a sitemap would just be a single file containing your entire list of links along with other meaningful information for the crawler. Naturally, this file must be written in a program friendly format, and that format is XML. This file is always named sitemap.xml (all lower case). Nearly all search engine crawlers today support the xml format sitemap. So one file does it all for all search engines.
Note that providing a sitemap xml file does not necessarily guarantee that search engines will index all your pages. Finally, it is the prerogative of the crawler to decide which pages to ignore, based upon several other factors which is a subject matter of SEO.
Before we delve on this further, let us first take a look at a typical sitemap file. Check the sitemap.xml file for this website to get an idea of a real sitemap file.
Below is an example of a very basic sitemap file with just 3 links. Note that the file should contain characters in UTF-8 encoding.
The sitemap.xml file must always reside in the home directory of your hosting account which is usually the public_html directory (in case of a linux system) and the httpdocs directory in case of a windows system.
Tell all search engines the location of your xml sitemap by placing an entry into your robots.txt file as below:
Sitemap: http://www.yoursite.tld/sitemap.xml
Here is a typical example of a robots.txt with the sitemap entry. The robots.txt file must also reside in the home directory.
Building multiple sitemaps for very large websites
How to tell search engines not to crawl your entire website?
How to move your Email accounts from one hosting provider to another without losing any mails?
How to resolve the issue of receiving same email message multiple times when using Outlook?
Self Referential Data Structure in C - create a singly linked list
Mosquito Demystified - interesting facts about mosquitoes
Elements of the C Language - Identifiers, Keywords, Data types and Data objects
How to pass Structure as a parameter to a function in C?
Rajeev Kumar is the primary author of How2Lab. He is a B.Tech. from IIT Kanpur with several years of experience in IT education and Software development. He has taught a wide spectrum of people including fresh young talents, students of premier engineering colleges & management institutes, and IT professionals.
Rajeev has founded Computer Solutions & Web Services Worldwide. He has hands-on experience of building variety of websites and business applications, that include - SaaS based erp & e-commerce systems, and cloud deployed operations management software for health-care, manufacturing and other industries.