Three things search engines do: crawling, indexing, and ranking.
As mentioned in Chapter 1, search engines are answer machines. Their job is to discover and organize content found on the internet to show results that are relevant to the queries searchers search and their needs. Thus – search engines have evolved into answer engines. And very sophisticated ones.
How do search engines work?
They do these three things: crawling, indexing, and ranking.
- Crawling. Search engines scour and analyze webpages on the internet. The search engine scans coded and content for each URL they find. They store and organize these web pages on their servers.
- Indexing. This is the process of displaying webpages as a result of relevant queries.
- Ranking. Provide the searcher pieces of content that best answer their query, thus results are ordered by most relevant to least relevant.
Visualizing the crawling process of search engines.
Crawling. This is the discovery process where search spiders are deployed to find the latest and updated content. Content can be a webpage, an image, a video, a PDF, a product sheet, a link, a word file, etc.
Crawling: Can search engines find your web content?
Search engine crawlers (also known as bots and spiders) start by fetching few web pages following specific algorithms and scan web content to find new URLs. By hopping along with the links, the crawler is able to find new content and add it to their index called Caffeine — a massive database of discovered URLs — to later be retrieved when a searcher is seeking information that the content on that URL is a good match for.
Did you know? You can actually block or restrict crawlers. Search engine crawlers can be instructed to crawl the entire site, block or restrict them in crawling and indexing pages.
As mentioned earlier, making sure the website gets crawled and indexed is a prerequisite to showing up in the SERPs.
If you already have a website, it might be a good idea to start off by seeing how many of your pages are in the index. One way to check your indexed pages is “site:yourdomain.com”, an advanced search operator. Head to Google and type “site:yourdomain.com” into the search bar. This will return the results Google has in its index for the site specified:
The results displayed may not be exact, but Google Search Console can provide you index report. You can sign up for a free Google Search Console account if you don’t currently have one.
Possible reasons why your site is not showing on SERP
- The site is new and hasn’t been crawled yet.
- The site is not linked to any external websites.
- The site’s navigation is difficult to crawl.
- The site has codes that blocking the crawlers.
- The site has been penalized for spammy tactics.
Fixing code to satisfy crawlers.
SEO specialists making sure that the website is crawlable by search engines’ spiders, but sometimes there are special pages that you don’t want to be indexed by search engines that’s why you’re adding codes to block them.
To direct Googlebot and other search engines bots away from certain pages and sections of your site, use robots.txt.
What is the role of robotst.txt
Robots.txt files are located in the root directory of websites (ex. remar.me/robots.txt) that suggest which parts of your site search engines should and shouldn’t crawl, as well as the speed at which they crawl your site, via specific robots.txt directives.
Can crawlers find all-important content?
Yes and no. Yes, if crawlers have the chance to penetrate to all of your web pages. No, if they can’t. Sometimes a search engine will be able to find parts of your site by crawling, but other pages or sections might be obscured for one reason or another. It’s important to make sure that search engines are able to discover all the content you want to be indexed and not just your homepage.
Common navigation mistakes that keep crawlers away.
- Navigation menu items that are not in HTML format.
- Forgetting to link to a primary page on the website.
- Different versions of navigation on mobile vs desktop.
Clean and clear information architecture
information architecture is the practice of organizing and labeling content on a website to improve efficiency and findability for users. The best information architecture is intuitive, meaning that users shouldn’t have to think very hard to flow through your website or to find something.
Importance of sitemap.
A sitemap is just what it sounds like: a list of URLs on your site that crawlers can use to discover and index your content. One of the easiest ways to ensure Google is finding your highest priority pages.
Getting acquainted with indexing.
Once the process of crawling was done, search engine extract page content and store information found on the web in a huge database of all the content spiders discovered and organized this information ready to serve to searchers.
How search engines store and interpret web pages?
The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query. Without an index, the search engine would scan every document in the corpus, which would require a considerable time and computing power. The additional computer storage required to store the index, as well as the considerable increase in the time required for an update to take place, are traded off for the time saved during information retrieval.
Can I see how a Googlebot crawler sees my pages?
Yes! The cached version of your page will reflect a snapshot of the last time Googlebot crawled it.
Are pages ever removed from the index?
Yes. Pages can be removed from the search engine index. Below are some of the main reasons why pages might be removed:
- URL is returning a “not found” error (4XX) or server error (5XX)
- URL page had a no-index meta tag added
- URL has been flag and has been manually penalized
- URL blocked crawlers from crawling
If you believe that a page on your website that was previously in Google’s index is no longer showing up, you can use the URL Inspection tool to learn the status of the page, or use Fetch as Google which has a “Request Indexing” feature to submit individual URLs to the index.
Manipulate search engines through robots meta directives
Meta directives (or “meta tags”) are instructions you can give to search engines regarding how you want your web page to be treated. This tag tells search engine crawlers things like “do not index this page” or “don’t pass any link equity to any on-page links”. These instructions are executed via Robots Meta Tags in the <head> of your HTML pages (most commonly used) or via the X-Robots-Tag in the HTTP header.
Robots meta tag
Robots meta tags can be placed within the <head> of the HTML webpage. This tag can exclude all or specific search engines by following the most common meta directives.
index/no-index this tells search engines whether the page should be crawled and kept in a search engines’ index for retrieval. If you opt to use “noindex,” you’re telling crawlers that you want the page excluded from search results. By default, search engines assume they can index all pages, so using the “index” value is unnecessary.
follow/nofollow this tells search engines whether links on the page should be followed or nofollowed. “Follow” results in bots following the links on your page and passing link equity through to those URLs. Or, if you elect to employ “nofollow,” the search engines will not follow or pass any link equity through to the links on the page. By default, all pages are assumed to have the “follow” attribute.
Here’s an example of a meta robots noindex, nofollow tag:
Understanding the different ways you can influence crawling and indexing will help you avoid the common pitfalls that can prevent your important pages from getting found.
Rankings – how do search engines ranks webpages?
This is the process where search engines qualify the indexed pages for highly relevant and in order in hopes of solving the searcher’s query. Generally, we can assume that the higher a website ranked, the more relevant the site to the query.
Search engines ensure that when someone types a query into the search bar, they get relevant results thus this process is known as ranking, or the ordering of search results by most relevant to least relevant to a particular query.
In determining the relevancy of the content from the searcher’s query, search engines use algorithms or formula by which information is retrieved and ordered in meaningful ways. Search engine algorithms have gone through many changes over the years in order to improve the quality of search results.
What do search engines want?
Google and Bing share a common goal, to provide useful answers to searcher’s questions in the most helpful formats.
Links play an important role in SEO
When we talk about links, we could mean two things. Backlinks or “inbound links” are links from other websites that point to your website, while internal links are links on your own site that point to your other pages (on the same site).
Links have been part of SEO. Before, search engines have no restrictions in regards to links. The more links pointing to your site the higher the ranking you’ll have. But today, search engines are becoming intelligent where the can already determine if a link is toxic or untrusted.
Quality links over quantity links.
The better the quality of links pointing to your website the higher the spot you’ll get on SERP.
The reasoning behind backlinks explained
- Referrals from others = good sign of authority
- Referrals from yourself = biased, so not a good sign of authority
- Referrals from irrelevant or low-quality sources = not a good sign of authority and could even get you flagged for spam
- No referrals = unclear authority / low authority
Why content is vital in SEO?
Ensuring visibility and exposure to searchers you need to come up with web content for search engines’ spiders.
According to Actionable marketer Heidi Cohen, she describes the content as:
“High quality, useful information that conveys a story presented in a contextually relevant manner with the goal of soliciting an emotion or engagement. Delivered live or asynchronously, content can be expressed using a variety of formats including text, images, video, audio, and/or presentations.”
The reason optimized content is important is simple… you won’t rank in search engines without it. Without search engine friendly content, you can’t get the full benefits of SEO having relevant and useful content on your website visitors to stay for longer, and this can positively impact your search rankings.
Today, with hundreds or even thousands of ranking signals, the top three have stayed fairly consistent: links to your website (which serve as a third-party credibility signals), on-page content (quality content that fulfills a searcher’s intent), and RankBrain.
What is Rankbrain?
RankBrain is Google’s name for a machine-learning artificial intelligence system that’s used to help process its search results. RankBrain is a component of Google’s core algorithm to determine the most relevant results to search engine queries.
Engagement metrics what Google says?
According to Google’s former Chief of Search Quality, Udi Manber:
“The ranking itself is affected by the click data. If we discover that, for a particular query, 80% of people click on #2 and only 10% click on #1, after a while we figure out probably #2 is the one people want, so we’ll switch it.”
Another comment from former Google engineer Edmond Lau corroborates this:
“It’s pretty clear that any reasonable search engine would use click data on their own results to feed back into ranking to improve the quality of search results. The actual mechanics of how click data is used is often proprietary, but Google makes it obvious that it uses click data with its patents on systems like rank-adjusted content items.”
The evolution of search results
Years ago when search engines are not yet as what search engines today, the term “10 blue links” was coined to describe the flat structure of the SERP. Any time a search was performed, Google would return a page with 10 organic results, each in the same format. Thus holding the number 1 spot was the holy grail of SEO.
However, because of the recent changes and updates, everything has changed. Google began adding results in new formats on its search result pages, called SERP features. You have learned this from Chapter 1. SEO 101. Some of these SERP features include:
- Paid advertisements
- Featured snippets
- People Also Ask boxes
- Local (map) pack
- Knowledge panel
As search engines get more sophisticated and intelligent, they become more sensitive to user behaviors and preferences. Search algorithms have advanced and cannot be fooled by keyword stuffing and artificial link schemes.
Search engines continue to be obsessed with the primary goal “To deliver the best possible result for every query”.
Chapter 2: How Search Engines work
Chapter 3: Website optimization
Chapter 4: Measuring SEO success
Glossary, resources, and tools.