Statistical Analysis

Several statistical analyses can be applied to data to answer different business questions. It is important to know when to apply which model and what the limitations are, but there are two important principles when performing statistical analysis:
-     Context: this means understanding the data, the goal of the model, the exactitude needed for the results, whether or not we need to use all the data set or just a portion of it, etc.
-       Segmentation: this refers to segmenting the data to fit the model better.
In the following chapters I will start by explaining how to perform a descriptive statistics analysis of a variable and then I will explain several analysis techniques, which can be grouped into three main categories:

  • -    Regressions: the analysis of the relationship between different variables and the creation of models in which unknown values of an outcome variable are estimated using one or more predictor variables;
  • -      Hypothesis testing: the analysis of the differences between groups based on one or more variables of interest;
  • -    Classification models: models of which the outcome is membership of a group.

Sometimes several models can be used for the same business issue, and the choice will depend on the kind of variables that we are using (categorical, ordinal, or quantitative) and their distribution.
Once the model has been chosen, the next steps are to check the conditions, carry out the analysis, check the significance and fit, validate, and refine.
An important part of statistical analysis consists of manipulating data to prepare it for the model implementation:

  • -          Outliers can significantly bias results, so we have three options:
    • o  Maintain the outlier: in the case that we verify that the value is correct;
    • o   Elimination: in the case that the outlier is an error;
    • o   Transformation: in the case that we can correct the error and replace it with the correct value;
  • -  Registers with missing data cannot be used in some statistical methods, such as regressions:
    • o   Leave missing values if we are using a model that is not affected by them, if the number of affected registers is small, or if we cannot replace them with appropriate values;
    • o   Replace missing values using a prediction method (we can use a simple average or a more complex method).

-     Data binning is necessary when a variable has too many categories and we need to group them, for example to analyze their distribution;
-   Variable transformation can be necessary either to meet the requirements for the implementation of a specific model or to improve the model outcome.

There is a useful website that shows step by step how to create statistical analysis models in Excel: http://blog.excelmasterseries.com/.

No comments:

Post a Comment