Wednesday, 11 April 2012

Statistical Analysis of Microarray Data with R

The "Southern Blot," commonly regarded as the predecessor to today's genetic microarray analysis, originated in the 1970's (the term refers to its creator - Edwin Southern).  Since that time, this field has become a massive data-generating machine, and statistical analysis is a key component of that machine. You can get a statistical introduction to the topic in Dr. Sudha Purohit's online course "Statistical Analysis of Microarray Data Using R,"  at For more details please visit at

In "Statistical Analysis of Microarray Data Using R", Dr. Purohit will cover the statistical methods used to analyze microarray data, how to apply them using R software, and how to interpret the results meaningfully.

This course will inform the process of analysis of microarray data. You will learn how to preprocess the data, short list the differentially expressed genes, carryout principal component analysis to reduce the dimensionality and to detect interesting gene expression patterns, and clustering of genes and samples. Illustrations of the statistical issues involved at the various stages of the analysis will use real data sets from DNA microarray experiments.

Dr. Sudha Purohit is visiting Lecturer in Statistics at the University of Pune and, before her retirement in 2000, was Head of the Department of Statistics at A. G. College, Pune, India. She is a co-author of three books, "Life-Time Data: Statistical Models and Methods", "Introduction to Biometry", and (with Dr. Shailaja Deshmukh) "Microarray Data: Statistical Analysis Using R". She is a coauthor (jointly with Prof. Shailaja Deshmukh and Dr. Sharad Gore) of "Statistics Using R". Her areas of interest are survival analysis, reliability, programming with R and analysis of microarray data. She has published a number of research papers in various peer-reviewed journals. Participants can ask questions and exchange comments with Dr. Purohit on a private discussion board throughout the period.

Aim of the course:
In this course, participants will learn the statistical tools required for the analysis of microarray data, how to apply them using R software and how to interpret the results meaningfully. We will review the biology relevant to microarray data, then cover microarray experiment set up, quantification of information generated from the experiment, preprocessing of data including statistical tools for between array and within array normalization, statistical inference procedures to identify differentially expressed genes under two different conditions, and its extension to situations involving more than two conditions. The course will also introduce multivariate statistical tools, such as principal component analysis & cluster analysis. These tools help to identify differentially expressed genes, sets of co-regulated genes, which in turn will help to assign functions to genes.

Who Should Take This Course:
Biologists and geneticists who need to use statistical methods to analyze microarray data; also computer scientists and statisticians involved in microarray analysis projects. The course is designed to bridge the gap between several disciplines by providing the necessary information to participants with varied background.

Course Program:

Course outline: The course is structured as follows


SESSION 1: Background of Microarrays and Normalization
  • Microarray experimental set up and quantification of information available from microarray experiments.
  • Data cleaning.
  • Transformation of data.
  • Between array and within array normalization.
  • Concordance coefficients and their use in normalization.
  • Numerical illustration for 4-6 with complete set of annotated R-commands.

SESSION 2: Statistical Inference Procedures in Comparative Experiments
  • Basics of statistical hypothesis testing.
  • Two sample t- test.
  • paired t-test.
  • Tests for validating assumptions of t-test.
  • Welch test.
  • Wilcoxon rank sum test, signed rank test.
  • Adjustments for Multiple hypotheses testing including false discovery rate.
  • Numerical illustration for 2-8 with complete set of annotated R-commands.
  • One way ANOVA.

SESSION 3: Multivariate Techniques
  • Principal component analysis.

SESSION 4: Clustering
  • Cluster analysis.


You will be able to ask questions and exchange comments with the instructors via a private discussion board throughout the course.   The courses take place online at in a series of 4 weekly lessons and assignments, and require about 15 hours/week.  Participate at your own convenience; there are no set times when you must be online. You have the flexibility to work a bit every day, if that is your preference, or concentrate your work in just a couple of days.

For Indian participants accepts registration for its courses at special prices in Indian Rupees through its partner, the Center for eLearning and Training (C-eLT), Pune (

For India Registration and pricing, please visit us at

If you have any query please feel free to call me or write to me. 

For More details contact at
Call: 020 66009116


No comments: