Monday 24 September 2012

Analysis of Survey Data from Complex Sample Designs


When you first took statistics, surveys such as presidential opinion polls were probably prominent in learning inference for proportions.  Unfortunately, that "simple random sample" from your textbook is more a creature of myth than an actual reality.  Most surveys nowadays are complex, with stratification, multi-stage sampling, cluster sampling, etc.  Analysis via a simple "confidence interval for a proportion" is rarely suitable.

In "Analysis of Survey Data from Complex Sample Designs," you'll learn how to estimate variances for complex surveys, and also how to model the results using linear and logistic regression, and other generalized linear models with Dr. Brady T. West and Ms. Patricia Berglund at Statistics.com. For more details please visit at
http://www.statistics.com/surveycomplex.

Participants could use R, WesVar, or IVEware (free packages) or SAS, Stata, SUDAAN, or SPSS (commercial packages, with SPSS users required to purchase the Complex Samples Module).

Aim of Course:
In order to extract maximum information at minimum cost, sample designs are typically more complex than simple random samples. Cluster sampling and stratified designs are common. But how do you analyze the resulting data - in particular, how do you determine margins of error? This course teaches you how to estimate variances when analyzing survey data from complex samples, and also how to fit linear and logistic regression models to complex sample survey data.

Who Should Take This Course:
Anyone designing surveys or analyzing survey data.

Course Program:

SESSION 1: Overview

§   Applied Survey Data Analysis: An Overview
  • Important terms, concepts, and notation
  • Software Overview

§   Getting to Know the Complex Sample Design
  • Classification of Sample Designs
  • Target Populations and Survey Populations
  • Simple Random Sampling
  • Complex Sample Design Effects
  • Complex Samples: Clustering and Stratification
  • Weighting in Analysis of Survey Data
  • Multi-stage Area Probability Sample Designs


SESSION 2: Overview continued

§   Foundations and Techniques for Design-based Estimation and Inference
  • Finite Populations and Superpopulation Models
  • Confidence Intervals for Population Parameters
  • Weighted Estimation of Population Parameters
  • Probability Distributions and Design-based Inference
  • Variance Estimation
  • Hypothesis Testing in Survey Data Analysis
  • Total Survey Error

§   Preparation for Complex Sample Survey Data Analysis
  • Analysis Weights: Review by the Data User
  • Understanding and Checking the Sampling Error Calculation Model
  • Addressing Item Missing Data in Analysis Variables
  • Preparing to Analyze Data from Sample Subclasses
  • A Final Checklist for Data Users

SESSION 3: Descriptive Statistics

§   Descriptive Analysis for Continuous Variables
  • Special Considerations in Descriptive Analysis of Complex Sample Survey Data
  • Simple Statistics for Univariate Continuous Distruibutions
  • Bivariate Relationships between Two Continuous Variables
  • Descriptive Statistics for Subpopulations
  • Linear Functions of Descriptive Estimates and Differences of Means

§   Categorical Data Analysis
  • A Framework for Analysis of Categorical Survey Data
  • Univariate Analysis of Categorical Data
  • Bivariate Analysis of Categorical Data
  • Analysis of Multivariate Categorical Data

SESSION 4: Regression Models

§   Linear Regression Models
  • The Linear Regression Model
    • Fitting linear regression models to survey data
  • Four Steps in Linear Regression Analysis
  • Some Practical Considerations and Tools
  • Application: Modeling Diastolic Blood Pressure with the NHANES Data

§   Logistic Regression and Generalized Linear Models for Binary Survey Variables
  • Generalized Linear Models (GLMs) for Binary Survey Responses
  • Building the Logistic Regression Model: Stage 1-Model Specification
  • Building the Logistic Regression Model: Stage 2-Estimation of Model Parameters and Standard Errors
  • Building the Logistic Regression Model: Stage 3-Evaluation of the Fitted Model
  • Building the Logistic Regression Model: Stage 4-Interpretation and Inference
  • Analysis Application
  • Comparing the Logistic, Probit, and Complementary-Log-Log (C-L-L) GLMs for Binary Dependent Variables

The instructors are Dr. Brady West (Lead Statistician, Center for Statistical Consultation and Research, Univ. of Michigan) and Ms. Patricia Bergland (Senior Research Associate in the Youth and Social Indicators Program and the Survey Methodology Program at the University of Michigan-Institute for Social Research).  Brady West is the lead author of "Linear Mixed Models: A Practical Guide using Statistical Software" (Chapman Hall/CRC) and a co-author of "Applied Survey Data Analysis" (Chapman Hall/CRC).

You will be able to ask questions and exchange comments with Dr. Brady West and Ms. Patricia Bergland via a private discussion board throughout the course.   The courses take place online at statistics.com in a series of 4 weekly lessons and assignments, and require about 15 hours/week.  Participate at your own convenience; there are no set times when you must be online. You have the flexibility to work a bit every day, if that is your preference, or concentrate your work in just a couple of days.

For Indian participants statistics.com accepts registration for its courses at special prices in Indian Rupees through its partner, the Center for eLearning and Training (C-eLT), Pune.

For India Registration and pricing, please visit us at www.india.statistics.com.

Call: 020 66009116

Websites:

Thursday 20 September 2012

Data Preparation and Cleaning for Analytics


Books on data mining (my own included) usually focus on the statistical and machine learning algorithms used to make predictions, associations, etc.   Real-world data miners, however, spend most of their time preparing and cleaning the data.  This potentially overwhelming task is easier, though, if you can learn from the experience of hundreds of other data miners and break the task down into a standard set of steps and procedures.  Learn how in Dr. Robert Nisbet's "Data Prep and Cleaning for Analytics" at Statistics.com. For more details please visit at http://www.statistics.com/data-prep/.

"Data Preparation  and Cleaning for Analytics" covers joining and merging tables, recoding data, detecting outliners, dealing with missing data, deriving new variables, and more.  The course culminates in a data mining project in which you will bring data through the cleaning and preparation stages, and to the point where you implement a data mining model.

Who Should Take This Course:
Anyone involved in the specification or preparation of data mining or predictive modeling application.

Course Program:
Lesson 1 - Introduction
  • Introduction to the course elements
  • Introduction to the major elements of a data mining project
  • The iterative nature of data mining
  • Perform several common data description analyses
  • Submit a data description report

Lesson 2 - Data Integration, Cleaning, Standardization
  • Metadata analysis of multiple data sources
  • Merge multiple tables/files with the same structure
  • Join multiple tables/files with different structures
  • Data lookup operation
  • Assemble the Customer Analytic Record (CAR)
  • "Dirty data" analysis and deletion
  • Data recoding
  • Outlier analysis and deletion
  • Missing data imputation by multiple regression or decision tree
  • Data standardization and normalization
  • Reverse Pivoting

Lesson 3 - Operations on Variables
  • Assign variable weights
  • Balance data sets with rare target values
  • Create data abstractions for categorical variables
  • Create temporal abstraction (lag) variables
  • Perform a data de-duplication operation
  • Perform a data filtering operation
  • Perform a simple random sampling operation
  • Perform a stratified random sampling operation

Lesson 4 - Operations on Variables, cont.
  • Perform a data binning operation for continuous variables
  • Understand how to use data bins
  • Create "dummy" variables for categorical variables
  • Derive new continuous variables for data mining
  • Derive new categorical variables for data mining
  • Perform feature selection using simple correlation coefficients
  • Perform feature selection using various advanced methods

Dr. Robert Nisbet has over 35 years’ experience in analytics and modeling as a college professor, researcher, and data miner in telecommunications, retail, membership clubs (AAA), insurance and banking. is the lead author of the "Handbook of Statistical Analysis and Data Mining Applications."   He is skilled also in the use of Extract-Transform-Load (ETL) tools for building dependent data marts designed for management reporting and data mining.

You will be able to ask questions and exchange comments with Dr. Robert Nisbet via a private discussion board throughout the course.   The courses take place online at statistics.com in a series of 4 weekly lessons and assignments, and require about 15 hours/week.  Participate at your own convenience; there are no set times when you must be online. You have the flexibility to work a bit every day, if that is your preference, or concentrate your work in just a couple of days.

For Indian participants statistics.com accepts registration for its courses at special prices in Indian Rupees through its partner, the Center for eLearning and Training (C-eLT), Pune.

For India Registration and pricing, please visit us at www.india.statistics.com.

Call: 020 66009116

Websites:

Tuesday 18 September 2012

Advanced Optimization


Optimization methods such as network modeling and integer programming are among the most engaging analytics techniques - almost magical in the way they deliver cost savings (or revenue boosts) in problems that feature both transparency and complexity.  Our online course "Advanced Optimization" covers:

- Network Flow Problems
- Integer Programming
- Problems with Multiple Goals
- Nonlinear Programming

Learn in online course "Advanced Optimization" taught by Dr. Cliff Ragsdale at Statistics.com. For more details please visit at http://www.statistics.com/optimization-advanced/.

Aim of Course:
Many business problems involve flows through a network - transportation, stages of an industrial process, routing of data.  Students taking this course will learn to specify and implement optimization models that solve network problems (what is the shortest path through a network, what is the least cost way to route material through a network with multiple supply nodes and multiple demand nodes).  Students will also learn how to solve Integer Programming (IP) problems (constrained optimization problems except with one or more decision variable constrained to be an integer: e.g. a firm setting up a wi-fi hotspot could use 2 routers or 3 routers, but not 2.5 routers), and Nonlinear Programming (NLP) problems.  Students will use spreadsheet-based software to specify and implement models.

Who Should Take This Course:
Business analysts with responsibility for specifying, creating, deploying or interpreting quantitative decision models.  Users of optimization software who need to attain a more solid grounding in network optimization, integer programming, non-convex optimization, and multi-criteria optimization.

Course Program:

Course outline: The course is structured as follows
SESSION 1: Network Flow Problems
  • Characteristics (nodes, arcs, decision variables)
  • The objective function & constraints
  • Modeling in a spreadsheet

SESSION 2: Integer Linear Programming
  • Integrality condition, relaxation
  • Rounding
  • Stopping rules
  • Binary variables
  • Implementing/solving the model
  • Branch & bound

SESSION 3: Multiple goals
  • Soft/hard constraints
  • Defining the objective
  • Analysis/solution
  • Tradeoffs & goal revision
  • Multiple objective linear programming (MOLP)
  • Minimax

SESSION 4: Nonlinear Programming (NLP)
  • Generalized reduced gradient (GRG) overview
  • Local vs. Global optimality
  • Economic Order Quantity (EOQ) problem
  • Location problem
  • Evolutionary Optimization

The instructor, Dr. Cliff T. Ragsdale, is Bank of America Professor of Business Information Technology at Virginia Tech, and author of the #1 optimization text "Spreadsheet Modeling and Decision Analysis: A Practical Introduction to Management Science," now in its sixth edition. Dr. Ragsdale has served as a financial, statistical and information systems consultant for General Mills and the public accounting firm of Deloitte and Touche.

Software: Risk Solver Platform for Education, the Excel add-in from Frontline systems that performs risk analysis, simulation, optimization, decision trees and more.  With the purchase or rental of the course text, you will have a course code that will enable you to download and install the software for 140 days.

You will be able to ask questions and exchange comments with Dr. Cliff T. Ragsdale via a private discussion board throughout the course.   The courses take place online at statistics.com in a series of 4 weekly lessons and assignments, and require about 15 hours/week.  Participate at your own convenience; there are no set times when you must be online. You have the flexibility to work a bit every day, if that is your preference, or concentrate your work in just a couple of days.

For Indian participants statistics.com accepts registration for its courses at special prices in Indian Rupees through its partner, the Center for eLearning and Training (C-eLT), Pune.

For India Registration and pricing, please visit us at www.india.statistics.com.

Call: 020 66009116

Websites:

Thursday 13 September 2012

Modeling Count Data


Do you work with binary, categorical, or count data?  You'd be a rare analyst if you didn't.  There are lots of details to consider - when to do exact tests, what they are, how to handle ordinal data, the role of modeling as opposed to significance testing, what to do when the assumptions required for Poisson regression fail, and much more.  Learn more in online course "Modeling Count Data" (Dr. Joseph Hilbe) at Statistics.com.

"Modeling Count Data," deals with regression models where the response or dependent variable is a count or rate. A count is understood as the number of times an event occurs; a rate as how many events occur within a specific area or time interval.The course will cover Poisson regression, the foundation for modeling counts, as well as extensions and modifications to the basic model. Extensions are required when the assumptions underlying the Poisson model are violated. Negative binomial regression is the foremost method used to extend the Poisson model. Since Poisson assumptions are rarely met in practice, substantial attention will be devoted to the negative binomial model and its variants. For more details please visit at http://www.statistics.com/count/.

Who Should Take This Course:
Analysts and researchers in a wide variety of fields who are concerned with modeling counts and rates.

Course Program:

Course outline: The course is structured as follows
SESSION 1: Overview of Count Models and Methods of Estimation
  • Varieties of count model
  • History of count models
  • Derivation of GLM-based algorithm
  • Derivation of maximum likelihood count models
  • Methods of assessing fit for count models
  • Residual analysis
  • The nature of risk and risk ratios

SESSION 2: Poisson Regression and the Problem of Overdispersion
  • Poisson regression
  • Creating synthetic models; simulation
  • Predicting counts
  • Effect plots
  • Marginal effects/Discrete change
  • Parameterization as a rate model
  • Defining extra-dispersion: varieties
  • Problem of overdispersion: apparent vs real
  • Tests for handling overdispersion
  • Negative  binomial extra-dispersion

SESSION 3: Negative Binomial Regression and Alternative Parameterizations
  • Negative Binomial Regression: varieties, derivation, and distributions
  • Synthetic data modeling
  • Marginal effects/Discrete change: NB models
  • Binomial vs Count models
  • Geometric regression: canonical and log
  • Alternative parameterizations: NB-1, NB-C, NB-H, NB-P
  • Generalized Poisson and negative binomial models
  • Extended Poisson models: bivariate; Poisson-inverse Gaussian; double Poisson
  • Extended negative binomial models: bivariate; others

SESSION 4: Problem with Zero Counts; Censored and Truncated Models, Latent Models
  • Zero-truncated models
  • Zero-inflated models
  • Zero-altered models
  • Hurdle models
  • Censored count models
  • Finite Mixture models
  • Quantile count models
  • Exact Poisson and negative binomial regression
  • Project preparation

Dr. Joseph Hilbe is President of the International Astrostatistics Association, an Emeritus Professor at the University of Hawaii, Solar System Ambassador with NASA's Jet Propulsion Laboratory at California Institute of Technology, and Adjunct Professor of Statistics at Arizona State University. Dr. Hilbe has authored some twelve books on statistics, over one hundred journal articles, and various packages and functions for Stata and R.  and is author of the COUNT package in R, located on the CRAN website. Dr. Hilbe is Editor-in-Chief of the Springer Series in Astrostatistics.

You will be able to ask questions and exchange comments with Dr. Joseph Hilbe via a private discussion board throughout the course.   The courses take place online at statistics.com in a series of 4 weekly lessons and assignments, and require about 15 hours/week.  Participate at your own convenience; there are no set times when you must be online. You have the flexibility to work a bit every day, if that is your preference, or concentrate your work in just a couple of days.

For Indian participants statistics.com accepts registration for its courses at special prices in Indian Rupees through its partner, the Center for eLearning and Training (C-eLT), Pune.

For India Registration and pricing, please visit us at www.india.statistics.com.

Call: 020 66009116

Websites: