High Performance Computing Web Search System Based on Computerized Big Data

Jun Ma

Abstract


This paper presents a high-performance web search system based on big data technology. The system adopts a heterogeneous architecture and a parallel distributed computing model to improve efficiency, scalability and reliability. Meanwhile, the system designs a storage management scheme that combines cloud storage and grid computing technologies to realize efficient storage and fast access to massive data. This paper also introduces the principle and architecture of the web search system, and the use of Map-Reduce framework, inverted index structure, vector space model, semantic analysis model and other technologies to realize the functions of the data layer, logic layer and display layer. In addition, this paper builds an experimental environment on Microsoft Azure cloud platform and tests it with Common Crawl dataset. Finally, this paper evaluates the performance of the system by three indicators: response time, accuracy and stability, and compares it with two other systems to prove the superiority and effectiveness of this paper's method.

翻译

搜索

复制


Full Text:

PDF


DOI: https://doi.org/10.31449/inf.v48i20.6776

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.