Statistics and Analytics

Statistics, Analytics, Calculus, Modeling, Bayesian, Logistic Regression, Spatial Analysis, GIS, Clinical Trials, Data Mining, Microarray, R Programming, STATISTICA, Natural Language Processing, Sentiment Analysis, Text Mining, Rasch, Sample Size, Survey, Simulation, Life Science, Biostatistics, Clinical Trial, Pharmacokinetics, Bioequivalence, Epidemiology, Bootstrap, Meta Analysis, Inference, Survival Analysis, Forecasting, Bioequivalence, Linear Models, Quantitative Risk, Sampling, ANOVA

Monday 24 September 2012

Analysis of Survey Data from Complex Sample Designs

When you first took statistics, surveys such as presidential opinion polls were probably prominent in learning inference for proportions. Unfortunately, that "simple random sample" from your textbook is more a creature of myth than an actual reality. Most surveys nowadays are complex, with stratification, multi-stage sampling, cluster sampling, etc. Analysis via a simple "confidence interval for a proportion" is rarely suitable.

In "Analysis of Survey Data from Complex Sample Designs," you'll learn how to estimate variances for complex surveys, and also how to model the results using linear and logistic regression, and other generalized linear models with Dr. Brady T. West and Ms. Patricia Berglund at Statistics.com. For more details please visit at http://www.statistics.com/surveycomplex.

Participants could use R, WesVar, or IVEware (free packages) or SAS, Stata, SUDAAN, or SPSS (commercial packages, with SPSS users required to purchase the Complex Samples Module).

Aim of Course:

In order to extract maximum information at minimum cost, sample designs are typically more complex than simple random samples. Cluster sampling and stratified designs are common. But how do you analyze the resulting data - in particular, how do you determine margins of error? This course teaches you how to estimate variances when analyzing survey data from complex samples, and also how to fit linear and logistic regression models to complex sample survey data.

Who Should Take This Course:

Anyone designing surveys or analyzing survey data.

Course Program:

SESSION 1: Overview

§ Applied Survey Data Analysis: An Overview

Important terms, concepts, and notation
Software Overview

§ Getting to Know the Complex Sample Design

Classification of Sample Designs
Target Populations and Survey Populations
Simple Random Sampling
Complex Sample Design Effects
Complex Samples: Clustering and Stratification
Weighting in Analysis of Survey Data
Multi-stage Area Probability Sample Designs

SESSION 2: Overview continued

§ Foundations and Techniques for Design-based Estimation and Inference

Finite Populations and Superpopulation Models
Confidence Intervals for Population Parameters
Weighted Estimation of Population Parameters
Probability Distributions and Design-based Inference
Variance Estimation
Hypothesis Testing in Survey Data Analysis
Total Survey Error

§ Preparation for Complex Sample Survey Data Analysis

Analysis Weights: Review by the Data User
Understanding and Checking the Sampling Error Calculation Model
Addressing Item Missing Data in Analysis Variables
Preparing to Analyze Data from Sample Subclasses
A Final Checklist for Data Users

SESSION 3: Descriptive Statistics

§ Descriptive Analysis for Continuous Variables

Special Considerations in Descriptive Analysis of Complex Sample Survey Data
Simple Statistics for Univariate Continuous Distruibutions
Bivariate Relationships between Two Continuous Variables
Descriptive Statistics for Subpopulations
Linear Functions of Descriptive Estimates and Differences of Means

§ Categorical Data Analysis

A Framework for Analysis of Categorical Survey Data
Univariate Analysis of Categorical Data
Bivariate Analysis of Categorical Data
Analysis of Multivariate Categorical Data

SESSION 4: Regression Models

§ Linear Regression Models

The Linear Regression Model

Fitting linear regression models to survey data

Four Steps in Linear Regression Analysis
Some Practical Considerations and Tools
Application: Modeling Diastolic Blood Pressure with the NHANES Data

§ Logistic Regression and Generalized Linear Models for Binary Survey Variables

Generalized Linear Models (GLMs) for Binary Survey Responses
Building the Logistic Regression Model: Stage 1-Model Specification
Building the Logistic Regression Model: Stage 2-Estimation of Model Parameters and Standard Errors
Building the Logistic Regression Model: Stage 3-Evaluation of the Fitted Model
Building the Logistic Regression Model: Stage 4-Interpretation and Inference
Analysis Application
Comparing the Logistic, Probit, and Complementary-Log-Log (C-L-L) GLMs for Binary Dependent Variables

The instructors are Dr. Brady West (Lead Statistician, Center for Statistical Consultation and Research, Univ. of Michigan) and Ms. Patricia Bergland (Senior Research Associate in the Youth and Social Indicators Program and the Survey Methodology Program at the University of Michigan-Institute for Social Research). Brady West is the lead author of "Linear Mixed Models: A Practical Guide using Statistical Software" (Chapman Hall/CRC) and a co-author of "Applied Survey Data Analysis" (Chapman Hall/CRC).

You will be able to ask questions and exchange comments with Dr. Brady West and Ms. Patricia Bergland via a private discussion board throughout the course. The courses take place online at statistics.com in a series of 4 weekly lessons and assignments, and require about 15 hours/week. Participate at your own convenience; there are no set times when you must be online. You have the flexibility to work a bit every day, if that is your preference, or concentrate your work in just a couple of days.

For Indian participants statistics.com accepts registration for its courses at special prices in Indian Rupees through its partner, the Center for eLearning and Training (C-eLT), Pune.

For India Registration and pricing, please visit us at www.india.statistics.com.

Email: info@c-elt.com

Call: 020 66009116

Websites:

www.india.statistics.com

www.c-elt.com

Thursday 20 September 2012

Data Preparation and Cleaning for Analytics

Books on data mining (my own included) usually focus on the statistical and machine learning algorithms used to make predictions, associations, etc. Real-world data miners, however, spend most of their time preparing and cleaning the data. This potentially overwhelming task is easier, though, if you can learn from the experience of hundreds of other data miners and break the task down into a standard set of steps and procedures. Learn how in Dr. Robert Nisbet's "Data Prep and Cleaning for Analytics" at Statistics.com. For more details please visit at http://www.statistics.com/data-prep/.

"Data Preparation and Cleaning for Analytics" covers joining and merging tables, recoding data, detecting outliners, dealing with missing data, deriving new variables, and more. The course culminates in a data mining project in which you will bring data through the cleaning and preparation stages, and to the point where you implement a data mining model.

Who Should Take This Course:

Anyone involved in the specification or preparation of data mining or predictive modeling application.

Course Program:

Lesson 1 - Introduction

Introduction to the course elements
Introduction to the major elements of a data mining project
The iterative nature of data mining
Perform several common data description analyses
Submit a data description report

Lesson 2 - Data Integration, Cleaning, Standardization

Metadata analysis of multiple data sources
Merge multiple tables/files with the same structure
Join multiple tables/files with different structures
Data lookup operation
Assemble the Customer Analytic Record (CAR)
"Dirty data" analysis and deletion
Data recoding
Outlier analysis and deletion
Missing data imputation by multiple regression or decision tree
Data standardization and normalization
Reverse Pivoting

Lesson 3 - Operations on Variables

Assign variable weights
Balance data sets with rare target values
Create data abstractions for categorical variables
Create temporal abstraction (lag) variables
Perform a data de-duplication operation
Perform a data filtering operation
Perform a simple random sampling operation
Perform a stratified random sampling operation

Lesson 4 - Operations on Variables, cont.

Perform a data binning operation for continuous variables
Understand how to use data bins
Create "dummy" variables for categorical variables
Derive new continuous variables for data mining
Derive new categorical variables for data mining
Perform feature selection using simple correlation coefficients
Perform feature selection using various advanced methods

Dr. Robert Nisbet has over 35 years’ experience in analytics and modeling as a college professor, researcher, and data miner in telecommunications, retail, membership clubs (AAA), insurance and banking. is the lead author of the "Handbook of Statistical Analysis and Data Mining Applications." He is skilled also in the use of Extract-Transform-Load (ETL) tools for building dependent data marts designed for management reporting and data mining.

You will be able to ask questions and exchange comments with Dr. Robert Nisbet via a private discussion board throughout the course. The courses take place online at statistics.com in a series of 4 weekly lessons and assignments, and require about 15 hours/week. Participate at your own convenience; there are no set times when you must be online. You have the flexibility to work a bit every day, if that is your preference, or concentrate your work in just a couple of days.

For Indian participants statistics.com accepts registration for its courses at special prices in Indian Rupees through its partner, the Center for eLearning and Training (C-eLT), Pune.

For India Registration and pricing, please visit us at www.india.statistics.com.

Email: info@c-elt.com

Call: 020 66009116

Websites:

www.india.statistics.com

www.c-elt.com

Tuesday 18 September 2012

Advanced Optimization

Optimization methods such as network modeling and integer programming are among the most engaging analytics techniques - almost magical in the way they deliver cost savings (or revenue boosts) in problems that feature both transparency and complexity. Our online course "Advanced Optimization" covers:

- Network Flow Problems

- Integer Programming

- Problems with Multiple Goals

- Nonlinear Programming

Learn in online course "Advanced Optimization" taught by Dr. Cliff Ragsdale at Statistics.com. For more details please visit at http://www.statistics.com/optimization-advanced/.

Aim of Course:

Many business problems involve flows through a network - transportation, stages of an industrial process, routing of data. Students taking this course will learn to specify and implement optimization models that solve network problems (what is the shortest path through a network, what is the least cost way to route material through a network with multiple supply nodes and multiple demand nodes). Students will also learn how to solve Integer Programming (IP) problems (constrained optimization problems except with one or more decision variable constrained to be an integer: e.g. a firm setting up a wi-fi hotspot could use 2 routers or 3 routers, but not 2.5 routers), and Nonlinear Programming (NLP) problems. Students will use spreadsheet-based software to specify and implement models.

Who Should Take This Course:

Business analysts with responsibility for specifying, creating, deploying or interpreting quantitative decision models. Users of optimization software who need to attain a more solid grounding in network optimization, integer programming, non-convex optimization, and multi-criteria optimization.

Course Program:

Course outline: The course is structured as follows

SESSION 1: Network Flow Problems

Characteristics (nodes, arcs, decision variables)
The objective function & constraints
Modeling in a spreadsheet

SESSION 2: Integer Linear Programming

Integrality condition, relaxation
Rounding
Stopping rules
Binary variables
Implementing/solving the model
Branch & bound

SESSION 3: Multiple goals

Soft/hard constraints
Defining the objective
Analysis/solution
Tradeoffs & goal revision
Multiple objective linear programming (MOLP)
Minimax

SESSION 4: Nonlinear Programming (NLP)

Generalized reduced gradient (GRG) overview
Local vs. Global optimality
Economic Order Quantity (EOQ) problem
Location problem
Evolutionary Optimization

The instructor, Dr. Cliff T. Ragsdale, is Bank of America Professor of Business Information Technology at Virginia Tech, and author of the #1 optimization text "Spreadsheet Modeling and Decision Analysis: A Practical Introduction to Management Science," now in its sixth edition. Dr. Ragsdale has served as a financial, statistical and information systems consultant for General Mills and the public accounting firm of Deloitte and Touche.

Software: Risk Solver Platform for Education, the Excel add-in from Frontline systems that performs risk analysis, simulation, optimization, decision trees and more. With the purchase or rental of the course text, you will have a course code that will enable you to download and install the software for 140 days.

You will be able to ask questions and exchange comments with Dr. Cliff T. Ragsdale via a private discussion board throughout the course. The courses take place online at statistics.com in a series of 4 weekly lessons and assignments, and require about 15 hours/week. Participate at your own convenience; there are no set times when you must be online. You have the flexibility to work a bit every day, if that is your preference, or concentrate your work in just a couple of days.

For Indian participants statistics.com accepts registration for its courses at special prices in Indian Rupees through its partner, the Center for eLearning and Training (C-eLT), Pune.

For India Registration and pricing, please visit us at www.india.statistics.com.

Email: info@c-elt.com

Call: 020 66009116

Websites:

www.india.statistics.com

www.c-elt.com

Thursday 13 September 2012

Modeling Count Data

Do you work with binary, categorical, or count data? You'd be a rare analyst if you didn't. There are lots of details to consider - when to do exact tests, what they are, how to handle ordinal data, the role of modeling as opposed to significance testing, what to do when the assumptions required for Poisson regression fail, and much more. Learn more in online course "Modeling Count Data" (Dr. Joseph Hilbe) at Statistics.com.

"Modeling Count Data," deals with regression models where the response or dependent variable is a count or rate. A count is understood as the number of times an event occurs; a rate as how many events occur within a specific area or time interval.The course will cover Poisson regression, the foundation for modeling counts, as well as extensions and modifications to the basic model. Extensions are required when the assumptions underlying the Poisson model are violated. Negative binomial regression is the foremost method used to extend the Poisson model. Since Poisson assumptions are rarely met in practice, substantial attention will be devoted to the negative binomial model and its variants. For more details please visit at http://www.statistics.com/count/.

Who Should Take This Course:

Analysts and researchers in a wide variety of fields who are concerned with modeling counts and rates.

Course Program:

Course outline: The course is structured as follows

SESSION 1: Overview of Count Models and Methods of Estimation

Varieties of count model
History of count models
Derivation of GLM-based algorithm
Derivation of maximum likelihood count models
Methods of assessing fit for count models
Residual analysis
The nature of risk and risk ratios

SESSION 2: Poisson Regression and the Problem of Overdispersion

Poisson regression
Creating synthetic models; simulation
Predicting counts
Effect plots
Marginal effects/Discrete change
Parameterization as a rate model
Defining extra-dispersion: varieties
Problem of overdispersion: apparent vs real
Tests for handling overdispersion
Negative binomial extra-dispersion

SESSION 3: Negative Binomial Regression and Alternative Parameterizations

Negative Binomial Regression: varieties, derivation, and distributions
Synthetic data modeling
Marginal effects/Discrete change: NB models
Binomial vs Count models
Geometric regression: canonical and log
Alternative parameterizations: NB-1, NB-C, NB-H, NB-P
Generalized Poisson and negative binomial models
Extended Poisson models: bivariate; Poisson-inverse Gaussian; double Poisson
Extended negative binomial models: bivariate; others

SESSION 4: Problem with Zero Counts; Censored and Truncated Models, Latent Models

Zero-truncated models
Zero-inflated models
Zero-altered models
Hurdle models
Censored count models
Finite Mixture models
Quantile count models
Exact Poisson and negative binomial regression
Project preparation

Dr. Joseph Hilbe is President of the International Astrostatistics Association, an Emeritus Professor at the University of Hawaii, Solar System Ambassador with NASA's Jet Propulsion Laboratory at California Institute of Technology, and Adjunct Professor of Statistics at Arizona State University. Dr. Hilbe has authored some twelve books on statistics, over one hundred journal articles, and various packages and functions for Stata and R. and is author of the COUNT package in R, located on the CRAN website. Dr. Hilbe is Editor-in-Chief of the Springer Series in Astrostatistics.

You will be able to ask questions and exchange comments with Dr. Joseph Hilbe via a private discussion board throughout the course. The courses take place online at statistics.com in a series of 4 weekly lessons and assignments, and require about 15 hours/week. Participate at your own convenience; there are no set times when you must be online. You have the flexibility to work a bit every day, if that is your preference, or concentrate your work in just a couple of days.

For Indian participants statistics.com accepts registration for its courses at special prices in Indian Rupees through its partner, the Center for eLearning and Training (C-eLT), Pune.

For India Registration and pricing, please visit us at www.india.statistics.com.

Email: info@c-elt.com

Call: 020 66009116

Websites:

www.india.statistics.com

www.c-elt.com