Tuesday 7 January 2014

Data Mining in R

Data mining is a fast growing area in statistics, but if you lack institutional access to standard data mining software from SAS and IBM/SPSS your options are limited. If you are interested in Data Mining, Dr. Luis Torgo's "Data Mining in R-Learning with Case Studies," online course introduce you to both R and data mining.

This course teaches you how to do data mining in the increasingly-dominant open-source R software. This course follows a "Lear by doing it" strategy where data mining topics are introduced as needed when addressing a series of real world data mining case studies. Join Dr. Luis Torgo in his online course "Data Mining in R-Learning with Case Studies," at statistics.com. For More details please visit at http://www.statistics.com/data-mining-r.

Aim of Course:
The main goal of this course is to teach users how to perform data mining tasks using R. The course follows a learn by doing it strategy, where data mining topics are introduced as needed when addressing a series of real world data mining case studies.

Who can take this course:
R users who want to learn how to apply R to data mining.  Data mining analysts in search of new tools.  Students in statistics.com's PASS program in Data Mining seeking an affordable data mining tool.  Note that working in R will be more involved than using a specially designed interface for data mining, such as those found in major commercial data mining programs.

Course Program:

Course outline: The course is structured as follows
SESSION 1: Predicting Algae Blooms (Case Study 1)
  • Descriptive statistics
  • Data visualization
  • Strategies to handle unknown variable values
  • Regression tasks
  • Evaluation metrics for regression tasks

SESSION 2: Predicting Algae Blooms (Continuation of Case Study 1)
  • Multiple linear regression
  • Regression trees
  • Model selection/comparison through k-fold cross-validation

SESSION 3:  Detecting Fraudulent Transactions (Case Study 2)
  • Clustering methods
  • Classification methods
  • Imbalanced class distributions and methods for handling this type of problems
  • Naive Bayes classifiers
  • Precision/recall and precision/recall curves

SESSION 4: Classifying Microarray Samples (Case Study 3)
  • Feature selection methods for problems with a very large number of predictors
  • Random forests
  • k-Nearest neighbors

The instructor, Dr. Luis Torgo, is an Associate Professor of the Department of Computer Science of the Faculty of Sciences of the University of Porto and a researcher of the Laboratory of Artificial Intelligence and Data Analysis (LIAAD) belonging to INESC Porto LA.  He is the author of Data Mining With R, as well as a number of scholarly articles and other publications. He teaches R at different levels and has given courses in the use of R for data mining in several countries.  Dr. Torgo will be available to course participants throughout the period, taking comments and questions on a private discussion forum.

Schedule:
January 10, 2014 to February 07, 2014
June 27, 2014 to July 25, 2014

You will be able to ask questions and exchange comments with the instructors via a private discussion board throughout the course.   The courses take place online at statistics.com in a series of 4 weekly lessons and assignments, and require about 15 hours/week.  Participate at your own convenience; there are no set times when you must be online. You have the flexibility to work a bit every day, if that is your preference, or concentrate your work in just a couple of days.

For Indian participants statistics.com accepts registration for its courses at special prices in Indian Rupees through its partner, the Center for eLearning and Training (C-eLT), Pune.

For India Registration and pricing, please visit us at www.india.statistics.com.

Call: 020 6680 0300

Websites: