Lesson 1: Data Science Overview
• Data Science
• Data Scientist
• Example of Data Science
• Python for Data Science
Lesson 2: Data Analytics Overview
• Introduction to Data Visualization
• Process in Data Science
• Data Wrangling, Data Exploration, and Model Selection
• Exploratory Data Analysis or EDA
• Data Visualization
• Plotting
• Hypothesis Building and Testing
Lesson 3: Statistical Analysis and Business Applications
• Introduction to Statistics
• Statistical and Non-Statistical Analysis
• Some common Terms used in Statistics
• Data Distribution: Central Tendency, Percentiles, Dispersion
• Histogram
• Bell Curve
• Hypothesis Testing
• Chi-Square Test
• Correlation Matrix
• Inferential Statistics
Lesson 4: Python: Environment Setup and Essentials
• Introduction to anaconda
• Installation of Anaconda Python Distribution – For Windows, Mac OS, and Linux
• Jupyter Notebook Installation
• Jupyter Notebook Introduction
• Variable Assignment
• Basic Data Types: Integer, Float, String, None, and Boolean; Typecasting
• Creating, accessing, and slicing tuples
• Creating, accessing, and Slicing lists
• Creating, viewing, accessing, and modifying dicts
• Creating and using operations on sets
• Basic Operation: ‘in’,’+’,’*’
• Functions
• Control Flow
Lesson 5: Mathematical computing With Python(NumPy)
• NumPy Overview
• Properties, purpose, and types of ndarray
• Class and Attributes of ndarray Object
• Basic Operations: Concept and Examples
• Accessing Array Elements: Indexing, Slicing, Indexing with Boolean Arrays
• Copy and Views
• Universal function (ufunc)
• Shape Manipulation
• Broadcasting
• Linear Algebra
Lesson 6: Scientific computing with Python (Scipy)
• SciPy and its Characteristics
• SciPy sub-packages
• SciPy sub-packages – integration
• SciPy Sub-packages – Optimze
• Linear Algebra
• SciPy sub-packages – Statistics
• SciPy sub-packages – Weave
• SciPy sub-packages – IO
Lesson 7: Data Manipulation With Python (Pandas)
• Introduction to pandas
• Data Structures
• Series
• Data Frame
• Missing Values
• Data Operations
• Data Standardization
• Pandas File Read and Write Support
• SQL Operation
Lesson 8: Machine Learning with Python (Scikit-Learn)
• Introduction to Machine Learning
• Machine Learning Approach
• How Supervised and Unsupervised Learning Models Work
• Scikit-Learn
• Supervised Learning Models – Linear Regression
• Supervised Learning Models: Logistics Regression
• K Nearest Neighbors (K-NN) Model
• Unsupervised Learning Models: Clustering
• Unsupervised Learning Models: Dimensionality Reduction
• Pipeline
• Model persistence
• Model Evaluation- Metric functions
Lesson 9: Natural Language Processing with Scikit-Learn
• NLP Overview
• NLP Approach for Text Data
• NLP Environment Setup
• NLP sentence analysis
• NLP Applications
• Major NLP Libraries
• Scikit-Learn Approach
• Scikit-Learn Approach Built-in Modules
• Scikit-Learn Approach Feature Extraction
• Bag of Words
• Extraction Considerations
• Scikit – Learn Approach Model Training
• Scikit – Learn Grid Search and Multiple Parameters
• Pipeline
Lesson 10: Data Visualization in Python using Matplotlib
• Introduction to Data Visualization
• Python Libraries
• Plots
• Matplotlib Features:
a.Line Properties Plot with a(x,y)
b.Controlling Line Patterns and Colors
c.Set Axis, Labels, and Legend Properties
d.Alpha and Annotation
e.Multiple Plots
f.Subplots
• Types of Plots and Seaborn
Lesson 11: Data Science with Python Web Scraping
• Web Scraping
• Common Data/page Formats on Web
• The Parser
• Important of the Objects
• Understanding the tree
• Searching the Tree
• Navigating options
• Modifying the Tree
• Parsing an only part of the Document
Lesson 12: Python integration with Hadoop, MapReduce and Spark
• Need for integrating Python with Hadoop
• Big Data Hadoop Architecture
• MapReduce
• Cloudera QuickStart VM Set up
• Apache Spark
• Resilient Distributed System (RDD)
• PySpark
• Spark Tools
• PySpark Integration