Google 搜尋運作原理：深入解析

Speaker: Cherry Prommawin

TL;DR

Discover new URLs in the internet
If you have any new sub-page, sub-domain, setting internal links in the indexed sites is important
if an page has multiple internal links, the page would be checked by bot more often
檢索器 crawler: the one doing the collection
1. a.k.a Google Bot
2. Factors impact crawling
  1. speed of loading
  2. quality of content
  3. potential server error
  4. other signals
3. How to make crawler crawl?
  - LESS: HTTP status: Returning 500, 503, 429
  - MORE: Avoid returning errors, improve site quality
    - make people feel site important
    - make hot page as internal links
4. How to make certain pages not crawled?
  1. robots.txt
  2. head meta tag robots in HTML
Process of crawling
1. Fetching and rendering

Definition: Identify whether save the crawled pages’ info into the database
Process
1. Parsing the HTML
2. Understanding the content 了解網頁、計算信號
每個頁面中，關鍵字應該放多少？→ 不是越多越好
Meta tag → keyword meta tags are not used
Will images be understood by Google bot?
1. Use attribute text: alt with text in img
Canonical: Avoid duplication by clustering
1. 將重複的頁面建立叢集
2. 中文：建立標準頁面
3. rel canonical in head
Index selection
1. Choose the pages with good quality based on the understanding of bot
2. How to know if the sites are indexed?
  1. site:example.com in search bar
  2. Using search console is more accurate