Author: Suresh Nambiar

0

What is Hadoop?

What is Hadoop ? what is Hadoop ? Hadoop is a complete eco-system of open source framework from Apache . It is used to store, process and analyze data which are very huge in...

0

Big Data

“Big Data” Just as the name itself implies ‘Big Data’ is  huge amount data with complex structure and that grows exponentially with time. Such a data is so large and complex that none of...

0

Sentiment Analysis using Python

Sentiment Analysis: The process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic, product, etc. is positive, negative,...

0

Deep learning

Deep learning (also known as deep structured learning or hierarchical learning) is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Learning can be...

0

Features of RDD & its Operations

Lets look at some of the more appealing features of apache spark and RDD. Apache Spark performs in-memory computation, also it evaluates RDDs lazily i.e. they do not compute their results right away. Instead,...

0

Resilient Distributed Dataset (RDD)

Before we discuss Resilient Distributed Dataset , lets see how do we launch Spark? A Spark shell executable file is usually present in Spark version folder which in turn is present under the “opt”...

0

Apache Spark Architecture

From the image shown above one can easily understand the huge dynamics of spark. The section on the left hand side of the image depicts all the different sources which provides the input data...

0

Python installations, tutorials, and cheat sheets

Installing the Anaconda distribution The Anaconda distribution includes Python 2, Python 3, JupyterHub, and many common data science packages. The Continuum page has the latest Anaconda distribution. Download Anaconda and follow the installation instructions...