Several
statistical analyses can be applied to data to answer different business
questions. It is important to know when to apply which model and what the
limitations are, but there are two important principles when performing
statistical analysis:
- Context:
this means understanding the data, the goal of the model, the exactitude needed
for the results, whether or not we need to use all the data set or just a
portion of it, etc.
- Segmentation:
this refers to segmenting the data to fit the model better.
In the
following chapters I will start by explaining how to perform a descriptive
statistics analysis of a variable and then I will explain several analysis
techniques, which can be grouped into three main categories:
- - Regressions: the analysis of the relationship between different variables and the creation of models in which unknown values of an outcome variable are estimated using one or more predictor variables;
- - Hypothesis testing: the analysis of the differences between groups based on one or more variables of interest;
- - Classification models: models of which the outcome is membership of a group.
Sometimes
several models can be used for the same business issue, and the choice will
depend on the kind of variables that we are using (categorical, ordinal, or
quantitative) and their distribution.
Once the
model has been chosen, the next steps are to check the conditions, carry out
the analysis, check the significance and fit, validate, and refine.
An
important part of statistical analysis consists of manipulating data to prepare
it for the model implementation:
- - Outliers can significantly bias results, so we have three options:
- o Maintain the outlier: in the case that we verify that the value is correct;
- o Elimination: in the case that the outlier is an error;
- o Transformation: in the case that we can correct the error and replace it with the correct value;
- - Registers with missing data cannot be used in some statistical methods, such as regressions:
- o Leave missing values if we are using a model that is not affected by them, if the number of affected registers is small, or if we cannot replace them with appropriate values;
- o Replace missing values using a prediction method (we can use a simple average or a more complex method).
- Data
binning is necessary when a variable has too many categories and we need to
group them, for example to analyze their distribution;
- Variable
transformation can be necessary either to meet the requirements for the
implementation of a specific model or to improve the model outcome.
There is a useful
website that shows step by step how to create statistical analysis models in
Excel: http://blog.excelmasterseries.com/.
No comments:
Post a Comment