What is Hadoop?
What is Hadoop ? what is Hadoop ? Hadoop is a complete eco-system of open source framework from Apache . It is used to store, process and analyze data which are very huge in...
Transforming Business
What is Hadoop ? what is Hadoop ? Hadoop is a complete eco-system of open source framework from Apache . It is used to store, process and analyze data which are very huge in...
Sentiment Analysis: The process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic, product, etc. is positive, negative,...
Deep learning (also known as deep structured learning or hierarchical learning) is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Learning can be...
Machine learning is a field of computer science that often uses statistical techniques to give computers the ability to “learn” (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed....
Step by step guide to getting PySpark working with Jupyter Notebook on an instance of Amazon EC2. This article assumes some basic familiarity with the command line and AWS console. Step 1: Create an...
Lets look at some of the more appealing features of apache spark and RDD. Apache Spark performs in-memory computation, also it evaluates RDDs lazily i.e. they do not compute their results right away. Instead,...
Before we discuss Resilient Distributed Dataset , lets see how do we launch Spark? A Spark shell executable file is usually present in Spark version folder which in turn is present under the “opt”...
From the image shown above one can easily understand the huge dynamics of spark. The section on the left hand side of the image depicts all the different sources which provides the input data...
Installing the Anaconda distribution The Anaconda distribution includes Python 2, Python 3, JupyterHub, and many common data science packages. The Continuum page has the latest Anaconda distribution. Download Anaconda and follow the installation instructions...