Google - Crawling Vs Indexing
Google crawls and indexes all the websites and both the processes are different from each other.
If Google has trouble crawling and indexing your website then the website will nor make it into the search engine.
- Crawling is the process through which Google follows the links and finds new pages and continues following links from the new pages to find the other new pages. This process continues until there are no more new pages or links to crawl. In SEO the word crawling means ‘following the links’.
- Web crawlers are known by different names: crawlers or spiders or robots or bots. Google web crawler is called Googlebot.
- Crawling has to begin somewhere so it begins with a ‘seed list’ which is a list of trusted websites. These websites link to other websites. They also see the list of websites they have crawled in the past and the sitemaps of the websites.
- Crawling is a process that really never stops. Google has to find the new published pages and the updates done to the old pages.
- Google prioritises the crawling of pages that are linked to often (popular), of high quality and updated frequently. The websites that publish new and quality content get a higher priority.
- The number of pages that Google will crawl for a website over a period of time is called the ‘Crawl Budget’. The crawl budget depends on the size, popularity, updates, quality and the website speed.
- If your website is serving too many low quality pages to the crawler then your crawl budget will reduce and your website will not be crawled often. This will result in lower rankings.
- You can opt your website out of crawling or restrict the crawling of parts of a website by using directives in the ‘robots.txt’ file<. The directives tell search engine web crawlers which pages of a website are allowed to be crawled and which are not.
- Indexing is the process of storing and organising the information found on a page. It analyses all the information on the pages and saves the pages in the index. Once a page is indexed it is eligible to show up in the Google search results.
- Indexing builds the index by using every significant word found on the web page in the title, heading, meta tags, subtitles, alt text and others. It is a process of placing a page. The process identifies every word on the page and adds the web page to the entry for every word or phrase it contains.
- Indexing takes place after the search engine crawlers have crawled the web page. The search engines take a note of the signals they find like contextual clues, links, behavioural data and determine how relevant the page is for each word it contains.
How to check for crawling and indexing issues?
- Google Search:You can use a command to see how Google indexes your website. In the Google search box you can enter the ‘site:’ command. Use is as site:yourdomain.com To check for all the pages that are in the same directory use: site:yourdomain.com/blog
- Google Search Console:In this tool the Index Coverage reports list any of the errors that Google has found. The Crawl Stats report can tell you about how Google is crawling your website.
- Web crawlers: Use web crawlers to better understand how your website is crawled by the search engine. Screaming Frog and Sitebulb are some of the popular ones. Xenu’s Link Sleuth is one of the older crawlers but it can quickly crawl large websites.
- Server log analysis: One of the best ways to understand how Google is crawling your website is through server logs. A web server can be used to save log files that contain requests by any user agent.
For more useful tips on SEO, please check out the other blog as well.
- Log in to post comments