Lesson 1: Data Science Overview

• Data Science

• Data Scientist

• Example of Data Science

• Python for Data Science

Lesson 2: Data Analytics Overview

• Introduction to Data Visualization

• Process in Data Science

• Data Wrangling, Data Exploration, and Model Selection

• Exploratory Data Analysis or EDA

• Data Visualization

• Plotting

• Hypothesis Building and Testing

Lesson 3: Statistical Analysis and Business Applications

• Introduction to Statistics

• Statistical and Non-Statistical Analysis

• Some common Terms used in Statistics

• Data Distribution: Central Tendency, Percentiles, Dispersion

• Histogram

• Bell Curve

• Hypothesis Testing

• Chi-Square Test

• Correlation Matrix

• Inferential Statistics

Lesson 4: Python: Environment Setup and Essentials

• Introduction to anaconda

• Installation of Anaconda Python Distribution – For Windows, Mac OS, and Linux

• Jupyter Notebook Installation

• Jupyter Notebook Introduction

• Variable Assignment

• Basic Data Types: Integer, Float, String, None, and Boolean; Typecasting

• Creating, accessing, and slicing tuples

• Creating, accessing, and Slicing lists

• Creating, viewing, accessing, and modifying dicts

• Creating and using operations on sets

• Basic Operation: ‘in’,’+’,’*’

• Functions

• Control Flow

Lesson 5: Mathematical computing With Python(NumPy)

• NumPy Overview

• Properties, purpose, and types of ndarray

• Class and Attributes of ndarray Object

• Basic Operations: Concept and Examples

• Accessing Array Elements: Indexing, Slicing, Indexing with Boolean Arrays

• Copy and Views

• Universal function (ufunc)

• Shape Manipulation

• Broadcasting

• Linear Algebra

Lesson 6: Scientific computing with Python (Scipy)

• SciPy and its Characteristics

• SciPy sub-packages

• SciPy sub-packages – integration

• SciPy Sub-packages – Optimze

• Linear Algebra

• SciPy sub-packages – Statistics

• SciPy sub-packages – Weave

• SciPy sub-packages – IO

Lesson 7: Data Manipulation With Python (Pandas)

• Introduction to pandas

• Data Structures

• Series

• Data Frame

• Missing Values

• Data Operations

• Data Standardization

• Pandas File Read and Write Support

• SQL Operation

Lesson 8: Machine Learning with Python (Scikit-Learn)

• Introduction to Machine Learning

• Machine Learning Approach

• How Supervised and Unsupervised Learning Models Work

• Scikit-Learn

• Supervised Learning Models – Linear Regression

• Supervised Learning Models: Logistics Regression

• K Nearest Neighbors (K-NN) Model

• Unsupervised Learning Models: Clustering

• Unsupervised Learning Models: Dimensionality Reduction

• Pipeline

• Model persistence

• Model Evaluation- Metric functions

Lesson 9: Natural Language Processing with Scikit-Learn

• NLP Overview

• NLP Approach for Text Data

• NLP Environment Setup

• NLP sentence analysis

• NLP Applications

• Major NLP Libraries

• Scikit-Learn Approach

• Scikit-Learn Approach Built-in Modules

• Scikit-Learn Approach Feature Extraction

• Bag of Words

• Extraction Considerations

• Scikit – Learn Approach Model Training

• Scikit – Learn Grid Search and Multiple Parameters

• Pipeline

Lesson 10: Data Visualization in Python using Matplotlib

• Introduction to Data Visualization

• Python Libraries

• Plots

• Matplotlib Features:
a.Line Properties Plot with a(x,y)

b.Controlling Line Patterns and Colors

c.Set Axis, Labels, and Legend Properties

d.Alpha and Annotation

e.Multiple Plots

f.Subplots

• Types of Plots and Seaborn

Lesson 11: Data Science with Python Web Scraping

• Web Scraping

• Common Data/page Formats on Web

• The Parser

• Important of the Objects

• Understanding the tree

• Searching the Tree

• Navigating options

• Modifying the Tree

• Parsing an only part of the Document

Lesson 12: Python integration with Hadoop, MapReduce and Spark

• Need for integrating Python with Hadoop

• Big Data Hadoop Architecture

• MapReduce

• Cloudera QuickStart VM Set up

• Apache Spark

• Resilient Distributed System (RDD)

• PySpark

• Spark Tools

• PySpark Integration