A Review on Storage and Large-Scale Processing of Data-Sets Using Map Reduce, YARN, SPARK, AVRO, MongoDB

7 Pages Posted: 14 Jun 2019

See all articles by Monika Monu

Monika Monu

Baba Mast Nath University

Sat Pal

Baba Mast Nath University

Date Written: April 4, 2019

Abstract

This paper focus on Hadoop Distributed File System. It is a File System that is used to collect huge data sets reliably. It streams data sets to user applications with large bandwidth. Thousands of servers host are connected to storage and execute tasks. Hive data warehouse is facilitating querying. It is also managing huge datasets that are residing at distributed storage. MapReduce is moving computation processes to data over HDFS. Processing of tasks is made on physical node where data is residing. The network I/O patterns are reduced. Input outputs are kept on local disc. High aggregate read/write bandwidth is provided. HBase has been considered as column-oriented database management system. It executes on top of HDFS. Sqoop is well known tool that has been designed to shift data among Hadoop and relational databases. Pig would be used to analyze huge data sets. These data sets are consisting high-level language in order to express data analysis programs. On other hand Avro is providing easy method for complex data structures representation in case of Hadoop MapReduce job. Spark is data analytics cluster computing framework that is capable to provide performance up to 100 times faster than Hadoop MapReduce in several applications. MongoDB deployment is capable to host lot of databases. YARN is also extending power of Hadoop in order to incumbent and recent mechanism available in the data center.

Suggested Citation

Monu, Monika and Pal, Sat, A Review on Storage and Large-Scale Processing of Data-Sets Using Map Reduce, YARN, SPARK, AVRO, MongoDB (April 4, 2019). Proceedings of International Conference on Sustainable Computing in Science, Technology and Management (SUSCOM), Amity University Rajasthan, Jaipur - India, February 26-28, 2019. Available at SSRN: https://ssrn.com/abstract=3365415 or http://dx.doi.org/10.2139/ssrn.3365415

Monika Monu (Contact Author)

Baba Mast Nath University ( email )

Rohtak
India

Sat Pal

Baba Mast Nath University ( email )

Rohtak
India

Register to save articles to
your library

Register

Paper statistics

Downloads
16
Abstract Views
110
PlumX Metrics