Improving Big Data Recommendation System Performance using NLP techniques with multi attributes
Abstract
Due to the wide availability of big data, institutions and companies are currently concentrating on developing highly effective recommender systems for their users. Traditional recommender systems use standard information like user, item, and ratings. However, this data may not be sufficient for precise results. To enhance accuracy, it is recommended to include additional information such as textual data in the recommendation system. When dealing with large textual data, employing Natural Language Processing (NLP) techniques is essential for effective data analysis. Hence, this paper proposed a novel big data recommender system that enhances collaborative filtering (CF) results by leveraging NLP techniques and dealing with multiple attributes. The study constructs two big data recommendation system models by using a machine learning algorithm. In both models, the Alternating Least Squares (ALS) algorithm within the Apache Spark big data tool was utilized. The first model did not incorporate NLP techniques, while the second model considered the novel NLP techniques by taking into account the user's review comments. A dataset of more than 3 million ratings and reviews was gathered from the Amazon website, amounting to a size of 3.1 GB. The results demonstrated significant improvement after incorporating the suggested NLP-based techniques with multiple attributes.DOI:
https://doi.org/10.31449/inf.v48i5.5255Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







