How Google Search Operates

1. Overview of the Three Stages of Google Search

The workflow of Google Search is divided into 4 stages, and not every webpage will go through all three stages:

1.1 Crawler Crawling

Google uses automated programs called “crawlers” to discover various web pages on the Internet and download their text, images, and videos. (The crawl setting tool, once used to control crawl rate, has been discontinued.)

Regardless of the type of page—be it an article, news, image, video, or music page—Google relies on the textual information obtained from the webpage to understand it. When a URL is obtained, the program sends a GET request to fetch the HTML code corresponding to that URL. This is the primary task of a crawler. Note that this process only retrieves the HTML code and does not execute it.

You can imagine that if your webpage uses front-end rendering, only an HTML template is obtained without any content, making it impossible to rank.

Generally, publishing is sufficient for indexing. You can also submit a sitemap and create backlinks to aid indexing.

1.2 HTML Code Parsing

After obtaining the HTML code, Google’s crawler uses a program to parse it. The purpose of parsing is to extract the main content of the webpage, while also identifying and extracting the header, footer, and meta information. Additionally, all links within the current HTML page will be extracted.

1.3 Indexing

Google analyzes the text, images, and video files on web pages, storing the information in its vast database, the Google Index. It then performs tokenization, topic extraction, semantic understanding, and creates forward and reverse indexes. This preprocessing step ensures that relevant content can be quickly presented when users perform a search.

1.4 Displaying Search Results

When users search on Google, it returns information relevant to their queries. For a website to appear in these results, its content must sufficiently match the query, necessitating two key requirements: first, server-side rendering to ensure Google’s crawlers can access the content; second, even for tool sites, providing substantial content so Google understands the purpose of the tools.

Q: Why not choose front-end rendering?
A: Because Google cannot access the text content of pages rendered on the front end.

Q: Doesn’t Google run our JS code?
A: Running code requires additional computational power, which Google reserves for a few large websites because their content is crucial to Google. For smaller sites like ours, Google won’t expend those resources, so we need to focus on back-end rendering.

Q: Why not use JS switching for multilingual pages?
A: Because JS switching doesn’t change the URL, crawlers can only access content in one language, participating in search rankings for only that language, rendering the effort on multilingual pages in vain.