Speaker: Cherry Prommawin

TL;DR

Steps of Result Generation

crawling 檢索

  1. Discover new URLs in the internet
  2. If you have any new sub-page, sub-domain, setting internal links in the indexed sites is important
  3. if an page has multiple internal links, the page would be checked by bot more often
  4. 檢索器 crawler: the one doing the collection
    1. a.k.a Google Bot
    2. Factors impact crawling
      1. speed of loading
      2. quality of content
      3. potential server error
      4. other signals
    3. How to make crawler crawl?
      • LESS: HTTP status: Returning 500, 503, 429
      • MORE: Avoid returning errors, improve site quality
        • make people feel site important
        • make hot page as internal links
    4. How to make certain pages not crawled?
      1. robots.txt
      2. head meta tag robots in HTML
  5. Process of crawling
    1. Fetching and rendering

indexing 索引

  1. Definition: Identify whether save the crawled pages’ info into the database
  2. Process
    1. Parsing the HTML
    2. Understanding the content 了解網頁、計算信號
  3. 每個頁面中,關鍵字應該放多少?→ 不是越多越好
  4. Meta tag → keyword meta tags are not used
  5. Will images be understood by Google bot?
    1. Use attribute text: alt with text in img
  6. Canonical: Avoid duplication by clustering
    1. 將重複的頁面建立叢集
    2. 中文:建立標準頁面
    3. rel canonical in head
  7. Index selection
    1. Choose the pages with good quality based on the understanding of bot
    2. How to know if the sites are indexed?
      1. site:example.com in search bar
      2. Using search console is more accurate

serving 提供搜尋結果