Category: Big Data

0

Features of RDD & its Operations

Lets look at some of the more appealing features of apache spark and RDD. Apache Spark performs in-memory computation, also it evaluates RDDs lazily i.e. they do not compute their results right away. Instead,...

0

Resilient Distributed Dataset (RDD)

Before we discuss Resilient Distributed Dataset , lets see how do we launch Spark? A Spark shell executable file is usually present in Spark version folder which in turn is present under the “opt”...

0

Apache Spark Architecture

From the image shown above one can easily understand the huge dynamics of spark. The section on the left hand side of the image depicts all the different sources which provides the input data...

0

What is Apache Spark?

Apache Spark is a powerful open source processing engine, with a cluster computing framework. Spark is designed in such a way to ensure lightening fast data processing of large datasets. this includes Batch processing...

0

MapReduce Introduction

To begin, lets start with the most basic question of what is hadoops Mapreduce? MapReduce is a Simple programming model for data processing. MapReduce is an inherently parallel processing unit. Which essentially consists of...

0

Big Data Developer

Big Data Developer – Most Exciting IT Job of the Century This century marks a new era, an era of Data. The data today is growing stupendously with each passing second. Hence, the role...

0

Scheduling and types of scheduler in YARN

In an ideal world, the requests that a YARN application makes would be granted immediately. In the real world, however, resources are limited, and on a busy cluster, an application will often need to...

0

Yarn Comparison to MapReduce

Major Difference between map reduce 1  and mapreduce2 i.e YARN. In MapReduce 1, there are two types of daemon that control the job execution process: a jobtracker and one or more tasktrackers. The jobtracker...

0

YARN Introduction

Apache YARN introduction: it is short for Yet Another Resource Negotiator. As the name indicates it is a Hadoop’s cluster resource management system. YARN was introduced in Hadoop version 2 to improve the MapReduce...

0

Hadoop File System and Operations

Hadoop has an abstract notion of filesystems, of which HDFS is just one implementation. First we see local, now it’s a filesystem for a locally connected disk with client-side checksums. Then hdfs i.e. Hadoop’s...