Books on data mining (my own included) usually
focus on the statistical and machine learning algorithms used to make
predictions, associations, etc. Real-world data miners, however,
spend most of their time preparing and cleaning the data. This potentially
overwhelming task is easier, though, if you can learn from the experience of
hundreds of other data miners and break the task down into a standard set of
steps and procedures. Learn how in Dr. Robert Nisbet's "Data Prep
and Cleaning for Analytics" at Statistics.com. For more details please
visit at http://www.statistics.com/data-prep/.
"Data Preparation and Cleaning for
Analytics" covers joining and merging tables, recoding data, detecting
outliners, dealing with missing data, deriving new variables, and more.
The course culminates in a data mining project in which you will bring data
through the cleaning and preparation stages, and to the point where you
implement a data mining model.
Who Should Take
This Course:
Anyone involved in the specification or
preparation of data mining or predictive modeling application.
Course Program:
Lesson 1 - Introduction
- Introduction to the course
elements
- Introduction to the major
elements of a data mining project
- The iterative nature of data
mining
- Perform several common data
description analyses
- Submit a data description
report
Lesson 2 - Data Integration, Cleaning, Standardization
- Metadata analysis of
multiple data sources
- Merge multiple tables/files
with the same structure
- Join multiple tables/files
with different structures
- Data lookup operation
- Assemble the Customer Analytic
Record (CAR)
- "Dirty data"
analysis and deletion
- Data recoding
- Outlier analysis and
deletion
- Missing data imputation by
multiple regression or decision tree
- Data standardization and
normalization
- Reverse Pivoting
Lesson 3 - Operations on Variables
- Assign variable weights
- Balance data sets with rare
target values
- Create data abstractions for
categorical variables
- Create temporal abstraction
(lag) variables
- Perform a data
de-duplication operation
- Perform a data filtering
operation
- Perform a simple random
sampling operation
- Perform a stratified random
sampling operation
Lesson 4 - Operations on Variables, cont.
- Perform a data binning
operation for continuous variables
- Understand how to use data
bins
- Create "dummy"
variables for categorical variables
- Derive new continuous
variables for data mining
- Derive new categorical
variables for data mining
- Perform feature selection
using simple correlation coefficients
- Perform feature selection
using various advanced methods
Dr. Robert Nisbet has over 35 years’ experience in analytics and modeling as a college professor, researcher, and data miner in telecommunications, retail, membership clubs (AAA), insurance and banking. is the lead author of the "Handbook of Statistical Analysis and Data Mining Applications." He is skilled also in the use of Extract-Transform-Load (ETL) tools for building dependent data marts designed for management reporting and data mining.
You will be able
to ask questions and exchange comments with Dr. Robert Nisbet via a private discussion board throughout the course.
The courses take place online at statistics.com in a series of 4 weekly lessons
and assignments, and require about 15 hours/week. Participate at your own
convenience; there are no set times when you must be online. You have the
flexibility to work a bit every day, if that is your preference, or concentrate
your work in just a couple of days.
For Indian
participants statistics.com accepts registration for its courses at special
prices in Indian Rupees through its partner, the Center for eLearning and
Training (C-eLT), Pune.
Email: info@c-elt.com
Call: 020 66009116
Websites:
No comments:
Post a Comment