Statistics and Analytics: Data Preparation and Cleaning for Analytics

Books on data mining (my own included) usually focus on the statistical and machine learning algorithms used to make predictions, associations, etc. Real-world data miners, however, spend most of their time preparing and cleaning the data. This potentially overwhelming task is easier, though, if you can learn from the experience of hundreds of other data miners and break the task down into a standard set of steps and procedures. Learn how in Dr. Robert Nisbet's "Data Prep and Cleaning for Analytics" at Statistics.com. For more details please visit at http://www.statistics.com/data-prep/.

"Data Preparation and Cleaning for Analytics" covers joining and merging tables, recoding data, detecting outliners, dealing with missing data, deriving new variables, and more. The course culminates in a data mining project in which you will bring data through the cleaning and preparation stages, and to the point where you implement a data mining model.

Who Should Take This Course:

Anyone involved in the specification or preparation of data mining or predictive modeling application.

Course Program:

Lesson 1 - Introduction

Introduction to the course elements
Introduction to the major elements of a data mining project
The iterative nature of data mining
Perform several common data description analyses
Submit a data description report

Lesson 2 - Data Integration, Cleaning, Standardization

Metadata analysis of multiple data sources
Merge multiple tables/files with the same structure
Join multiple tables/files with different structures
Data lookup operation
Assemble the Customer Analytic Record (CAR)
"Dirty data" analysis and deletion
Data recoding
Outlier analysis and deletion
Missing data imputation by multiple regression or decision tree
Data standardization and normalization
Reverse Pivoting

Lesson 3 - Operations on Variables

Assign variable weights
Balance data sets with rare target values
Create data abstractions for categorical variables
Create temporal abstraction (lag) variables
Perform a data de-duplication operation
Perform a data filtering operation
Perform a simple random sampling operation
Perform a stratified random sampling operation

Lesson 4 - Operations on Variables, cont.

Perform a data binning operation for continuous variables
Understand how to use data bins
Create "dummy" variables for categorical variables
Derive new continuous variables for data mining
Derive new categorical variables for data mining
Perform feature selection using simple correlation coefficients
Perform feature selection using various advanced methods

Dr. Robert Nisbet has over 35 years’ experience in analytics and modeling as a college professor, researcher, and data miner in telecommunications, retail, membership clubs (AAA), insurance and banking. is the lead author of the "Handbook of Statistical Analysis and Data Mining Applications." He is skilled also in the use of Extract-Transform-Load (ETL) tools for building dependent data marts designed for management reporting and data mining.

You will be able to ask questions and exchange comments with Dr. Robert Nisbet via a private discussion board throughout the course. The courses take place online at statistics.com in a series of 4 weekly lessons and assignments, and require about 15 hours/week. Participate at your own convenience; there are no set times when you must be online. You have the flexibility to work a bit every day, if that is your preference, or concentrate your work in just a couple of days.

For Indian participants statistics.com accepts registration for its courses at special prices in Indian Rupees through its partner, the Center for eLearning and Training (C-eLT), Pune.

For India Registration and pricing, please visit us at www.india.statistics.com.

Email: info@c-elt.com

Call: 020 66009116

Websites:

Statistics and Analytics

Thursday, 20 September 2012

Data Preparation and Cleaning for Analytics

No comments: