Big Data Processing Frameworks for Handling Huge Data Efficiencies and Challenges: A Survey
International Journal of Data Science and Big Data Analytics, 2(1), 1-9. doi: 10.51483/IJDSBDA.2.1.2022.1-9.
9 Pages Posted: 23 Jun 2022 Last revised: 27 Oct 2022
Date Written: June 22, 2022
The increasing expansion of digital data collected from many sources renders traditional storage, processing, and analysis methods obsolete. For these restrictions, new technologies for processing and storing very massive datasets have been developed. Big data processing is required to extract relevant information from it. Transforming data into information and knowledge is what processing implies. Big data processing is the process of dealing with massive amounts of data and changing it from its raw form into useable information in a more understandable manner. As a result, numerous big data processing execution frameworks have emerged, but determining and selecting the appropriate framework for processing your big data applications is a significant challenge. Therefore, this paper investigates the possible influence of big data challenges and discusses in depth the most well-known approaches to big data processing, which are divided into five classes: batch processing, streaming processing, real-time processing, interactive processing, and hybrid processing, as well as the variety of the most popular frameworks associated with them such as Apache Hadood, Dryad, Samza, IBM Infosphere, Storm, Amazon Kinesis, Drill, Impala, Flink, and Spark. Furthermore, this study presents a comparison among the several features of the frameworks by highlighting their drawbacks and strengths. Thus, it can be used as a guideline for picking the best application framework in IT analytics and will help business users make faster decisions.
Keywords: Big data, Challenges in big data, Big data processing, Big data frameworks
Suggested Citation: Suggested Citation