**Introductory Session**: 28th May 2017

**When**: 3rd June 2017 - 25th June 2017 (8 Sessions, 3 hours each inclusive of Hands-on Project Session)

**Support & Assessments**: 2 Internal Assessments, 2 Doubt Sessions

**Where**: Weekendr Training Centre, 2230, 1st Floor, Outram Lines, Delhi (5 mins walk from Exit 2 of GTB Metro Station, Yellow Line)

**Fee**: Rs 8,500, 10% Group discount for a group of 3 or more

NITIKA MALHOTRA - Data Scientist @Zomato | R faculty at Weekendr.

Professional Skills:Probability, Statistics, Data Structures, PostgreSQL, R, SPSS, Pentaho, SAS, Machine learning, Hive.

Analytics Experience: 1. Data Scientist: Zomato (Working on data science and machine learning projects 2. Analytics Specialist: Transorg (Providing analytical solution for Telecom client) 3. Research Associate: IIIT-Delhi 4.Research Intern: MOSPI (Ministry of planning and programme implementation)

Training Experience: Part of Analytics Faculty Panel at Weekendr Conducted Data Science training at IBM Part of analytics trainings at Transorg

**1. Getting Started with R**

• 1.1. About the Software - History and Overview

• 1.2. Installation

• 1.3. Getting Familiar with R Environment

**2. Programming in R : Part 1**

• 2.1. R Nuts and Bolts

2.1.0. Essentials

2.1.1. Entering Input

2.1.2. Evaluation

2.1.3. R Objects

2.1.4. Numbers

2.1.5. Attributes

2.1.6. Creating Vectors

2.1.7. Mixing Objects

2.1.8. Explicit Coercion

2.1.9. Matrices

2.1.10. Lists

2.1.11. Factors

2.1.12. Missing Values

2.1.13. Data Frames

2.1.14. Names

2.1.15. Summary

• 2.2. Getting Data In and Out of R

2.2.1. Reading Data Files with read.table()

2.2.2. Reading Larger Datasets with read.table()

2.2.3. Using Textual and Binary formats for Storing Data

2.2.4. Interfaces to Outside World

2.2.5. Reading Lines of a Text File

2.2.5. Reading Data from Internet and URL Connections

**3. Programming in R : Part 2**

• 3.1. Subsetting R Objects

3.1.1. Subsetting a Vector

3.1.2. Subsetting a Matrix

3.1.3. Subsetting Lists

• 3.2. Vectorized Operations

• 3.3. Dates and Times

3.3.1. Dates in R

3.3.2. Times in R

3.3.3. Operations on Dates and Times

• 3.4. Control Structures

3.4.1. if-else

3.4.2. for Loops

3.4.3. Nested for Loops

3.4.4. while Loops

3.4.5. repeat Loops

3.4.6. next, break

• 3.5. apply Family of Functions

3.5.1. lapply

3.5.2. sapply

3.5.3. apply

3.5.4. tapply

3.5.5. split

3.5.6. mapply

• 3.6. Sampling in R

3.6.1. Simulation

3.6.2. Random Sampling

**4. Exploratory Data Analysis (EDA)**

• 4.1. Basics of Distribution of Data

• 4.2. EDA for Individual Variables:

4.2.1. Summarization: Measures of Central Tendancy, Dispersion, Skewness and Kurtosis

4.2.2. Data Visualization: Histogram/Bar Chart, Box Plot, Stem and Leaf Display

4.2.3. Missing Value Imputation

4.2.4. Outlier Detection

4.2.5. Testing for Normality: Histogram, QQ Plot, KS Test and SW Test

• 4.3. EDA for Multiple Variables:

4.3.1. Pairwise Scatter Plots

4.3.2. Correlation Analysis

• 4.4. Case Study: EDA for Motor Trend Car Road Tests Dataset

**5. Statistical Inference**

• 5.1. Parameter Estimation

5.1.1. Parametric Estimation

5.1.2. Non-Parametric Estimation

• 5.2. Parametric Testing of Hypothesis

5.2.1. Testing for Hypothetical Value of Population Mean

5.2.2. Testing for Equality of Two Population Means

5.2.3. Testing for Hypothetical Value of Population Variance

5.2.4. Testing for Equality of Two Population Variances

5.2.5. Testing for Equality of Several Population Means

• 5.3. Non-Parametric Testing of Hypothesis

5.3.1. Testing for Hypothetical Value of Population Median

5.3.2. Testing for Equality of Two Populations

5.3.3. Testing for Equality of Several Populations

5.3.4. Testing for Goodness of Fit

5.3.5. Testing for Independence of Attributes

• 5.4. Case Study: Parametric and Non-Parametric Tests

**6. Linear Regression Analysis**

• 6.1. Model Building

6.1.1. Fitting a Linear Regression Model

6.1.2. Testing the Significance of Individual Regressors and Overall Regression

6.1.3. Goodness of the Model: R Square and Adjusted R Square

• 6.2. Multicolloinearity

6.2.1. Problem and its Consequences

6.2.2. Detection and Removal of Multicollinearity using Correlation Analysis

6.2.3. Detection and Removal of Multicollinearity using Variance Inflation Factors (VIFs)

• 6.3. Parsimonious Modelling or Model Selection

6.3.1. Forward Selection

6.3.2. Backward Elimination

6.3.3. Stepwise Selection

• 6.4. Validation of Assumptions and Residual Analysis

6.4.1. Linearity of Regression

6.4.2. Autocorrelation

6.4.3. Heteroscedasticity

6.4.4. Normality of Errors

6.4.5. Outliers Detection

• 6.5. Case Study: Regression Analysis for Motor Trend Car Road Tests Dataset

**7. Logistic Regression Analysis**

• 7.1. Fitting a Logistic Regression Model

• 7.2. Testing the Significance of Individual Regressors and Overall Regression

• 7.3. Goodness of the Model: Confusion Matrix, Sensitivity and Specificity

• 7.4. Odds Ratio

• 7.5. Multiclass Classification

• 7.5. Case Study: Logistic Regression Analysis for Students’ Admission Dataset

**8. Random Forest**

**9. Forecasting and Time Series Analysis**

• 9.1.Estimating and eliminating the deterministic components if they are present in the model

9.1.1. Testing for Presence of Trend - Relative Ordering Test

9.1.2. Estimation and Elimination of Trend - Small Trend Method, Least Squares Method, Moving Averages Method

9.1.3. Testing for Presence of Seasonality - Friedman (JASA) Test

9.1.4. Estimation and Elimination of Seasonality - Small Trend Method, Large Trend Method

• 9.2. Modeling the residual using Auto Regressive Integrated Moving Average (ARIMA) model

9.2.1. Testing for ‘stationarity’ using Augmented Dickey Fuller (ADF) Test

9.2.2. Identifying the ‘order’ of the ARMA model using Correlogram, Partial Correlogram and Akaike Information Criterion (AIC)

9.2.3. ‘Fitting’ the model using Least Squares (LSE) and/or Maximum Likelihood Estimation (MLE)

9.2.4. ‘Forecasting’ or predicting future values using Naive, Moving Average, Growth, Random Walk with Drift forecast.

• 9.3. Case Study - Forecasting and Time Series Analysis for Air Passengers Data

**10. Unsupervised learning - Cluster and factor analysis**

In over 80+ Analytics (SAS, SPSS, Excel, SQL & R) engagements conducted so far in last 12 months, we have had over 1000+ participants from some of the most prominent colleges and courses from Delhi University (DSE,SRCC, Stephen's, DRC, SSCBS, Miranda, Dept. of OR, Stats and many more), JNU, Jamia, Ambedkar University, IIT Delhi etc. and working professionals from companies like American Express, RMS, Mercer Consulting, WNS, Koncept Analytics, KBR and many more with an average rating of 9/10 across all our engagements.