Author: Suresh Nambiar


What is Apache Spark?

Apache Spark is a powerful open source processing engine, with a cluster computing framework. Spark is designed in such a way to ensure lightening fast data processing of large datasets. this includes Batch processing...


MapReduce Introduction

To begin, lets start with the most basic question of what is hadoops Mapreduce? MapReduce is a Simple programming model for data processing. MapReduce is an inherently parallel processing unit. Which essentially consists of...


Big Data Developer

Big Data Developer – Most Exciting IT Job of the Century This century marks a new era, an era of Data. The data today is growing stupendously with each passing second. Hence, the role...


Scheduling and types of scheduler in YARN

In an ideal world, the requests that a YARN application makes would be granted immediately. In the real world, however, resources are limited, and on a busy cluster, an application will often need to...


Yarn Comparison to MapReduce

Major Difference between map reduce 1  and mapreduce2 i.e YARN. In MapReduce 1, there are two types of daemon that control the job execution process: a jobtracker and one or more tasktrackers. The jobtracker...


YARN Introduction

Apache YARN introduction: it is short for Yet Another Resource Negotiator. As the name indicates it is a Hadoop’s cluster resource management system. YARN was introduced in Hadoop version 2 to improve the MapReduce...


Data Scientist | An in-demand occupation

A Data Scientist is someone who makes information or valuable insights out of data. To understand what a data scientist is, what they do and more about them lets begin with understanding; what data...


Hadoop File System and Operations

Hadoop has an abstract notion of filesystems, of which HDFS is just one implementation. First we see local, now it’s a filesystem for a locally connected disk with client-side checksums. Then hdfs i.e. Hadoop’s...


Reading and Writing Files in Hadoop

Failover and Fencing: are 2 very important properties of HDFS, which aims to provide an overall efficiency of the eco-system. The transition from the active namenode to the standby is managed by a new...


Name Node and Data Node

HDFS works upon a master-slave architecture, where It consists of a single NameNode, referred as master node and many DataNodes, referred as slave nodes. Master node consists of all the meta information of the...