learned the analysis system of search engine working process, so, the author believes that we should analyze system for search engine optimization work to do the following.
1. from the first step of the analysis system and the second process, told us to clear the need to retain the information
fourth. Text segmentation module to segment the text page into a set of lexical units.
is currently the mainstream search engine according to its function can be divided into can be divided into analysis, indexing, query, download the 4 systems. The analysis system in the search engine architecture is mainly responsible for the structure of the web page, calculating weight, text segmentation and eliminating the importance of web pages (such as Google PR) this four basic tasks. It can be said that plays a decisive role in the analysis system of search engine website ranking, search engine analysis system by analysis and optimization work, our website can better guide here, according to their own views,
fifth. Finally, the results of the analysis to the index module, storage index.
. Download system Page database read through the original web crawler to download.
third. Discard redundant pages, only retain a similar or identical "to the word module, web page elimination.
2. from the third step analysis system, we should pay attention to the content of the page to tell the construction of
"is a HTML language and object 1.5 structured, which will be valuable information, such as title and text will be retained, and discarded the useless information, such as HTML tags, mainly through the structure of the web, generally speaking, TITLE tag, MEAT tag, H tag search engine is that information on the web page the most important. For example, for the TITLE label, the search engine spider crawling in the process of <, TITLE and < > /TITLE; > the content is often the first to obtain the spiders web page text. In addition, anchor text, web page text is valuable information to be retained and attention.
second. Through the establishment of a tag tree from a web page and extract attributes of value, complete the package from the original page to a web object, namely "structured process.
first, step analysis system of search engine the author briefly introduced:
in the network of "hundreds of millions of dollars for storage and processing of massive web is a difficult task, but these pages also contain many of the same or similar pages. So the search engine analysis system, the first thing to do before the formal analysis of "is the work of removing duplicated web pages. The search engine in the 4 page as the same or similar, two pages of content and Format > completely