Thursday, December 29, 2016

33. SCORING MODELS

OBJECTIVE
Define the priority of action concerning customers, employees, products, and so on.


DESCRIPTION
Scoring models help to decide which elements to act on as a priority based on the score that they obtain. For example, we can create a scoring model to prevent employees leaving the company in which the score depends on both the probability of leaving and the performance (we will act first on those employees who have a higher probability of leaving and are important to the company). Scoring models are also quite useful in marketing; for example, we can score customers based on their probability of responding positively to a telemarketing call and, based on our resources, call just the first “X” customers.

The model that I will propose concerns a scoring model of customers’ value based on the probability of purchasing a product and on the amount that they are likely to spend. This model is the result of two sub-models:

  • -          Purchase probability: We will use a logistic regression to estimate the purchase probability of a customer in the next period (see 26. RFM MODEL and 60. LOGISTIC REGRESSION);
  • -          Amount: We will use a linear regression to estimate the amount that each customer is likely to spend on his or her next purchase (see 38. LINEAR REGRESSION).

The first step is to choose the predictor variables. In our case I suggest using recency, first purchase, frequency, average amount, and maximum amount of year -2, but we could try additional or different variables. The target variable will be a binary variable that represents whether the client made a purchase during the following period (year -1). A logistic regression is run with the eventual transformation of variables and after verifying that all the necessary assumptions are met (see 36. INTRODUCTION TO REGRESSIONS and 60. LOGISTIC REGRESSION).

scoring model logistic regression

Coefficients of the Logistic and Linear Regressions

In the second part of the model, we can use for example only the average amount and the maximum amount of year -2, and the total amount spent in year -1 is used as the target variable. We run a multivariate linear regression with the eventual transformation of the variables, after verifying that all the necessary assumptions are met (see 36. INTRODUCTION TO REGRESSIONS, 38. LINEAR REGRESSION, and 39. OTHER REGRESSIONS). It is important to note that in this regression we will not use the whole customer database but select only those customers who realized a purchase in year -1.

The last step is to put together the two regressions to score customers based on both their purchase probability and the likely amount that they will spend. We use the regression coefficients for the estimates of each customer. In the linear regression, we directly sum the intercept and multiply the variables’ coefficients (Figure below) by the actual values of each customer to estimate the amount.[1] However, in the logistic regression we should use the exponential function to calculate the real odds of purchasing:

Probability = 1 / (1 + exp(- (intercept coefficient + variable 1 coefficient * variable 1 + variable n coefficient * variable n)))


Result Table with the Purchase Probability, Estimated Amount, and Final Score

Now that we have two more columns in our database, we just need to add a third one for the final score, which will be the purchase probability times the estimated amount (Figure above). With this indicator we can either rank our customers (to prioritize marketing and resource allocation for some customers) or use this indicator to estimate next-period revenues.


Download the Scoring Models Template




[1] Estimated amount = Intercept + Coefficient 1 * Variable 1 + Coefficient 2 * Variable 2.
Be aware that, if we have transformed some of the variables, we cannot simply multiply the coefficient but should make some additional calculations.

Tuesday, December 13, 2016

5. COMPETITIVE MAP

OBJECTIVE

Analyze the positioning of a company or a product in comparison with other companies or products by evaluating several attributes.


DESCRIPTION

A competitive map is a tool that compares a company or products with several competitors according to the most important attributes. We can also add to the map the relative importance of each attribute. From this map we can understand how a company is positioned in relation to several attributes and define the key issues for strategic decisions. For example, we can decide to focus on the communication and promotion about the quality of our product if we are well positioned and it is important for customers.

Example of Competitive Map

Competitive Map

The data for this map are usually gathered through surveys. If the selection of attributes is not clear or the number of attributes is large, we can ask a preselection question whereby interviewees rank the most important attributes. Then they will be asked about the importance and performance of the selected attributes for each company or product. I suggest dividing the question into two parts:
  • -          Importance: give a score from 1 to 5 for each attribute;
  • -          Performance: give a score from 1 to 5 for each combination of attribute and company (or product).




TEMPLATE

Tuesday, December 6, 2016

45. A/B TESTING

OBJECTIVE

Test two or more items/objects and identify the one with the best performance.


DESCRIPTION

A/B testing is part of a broader group of methods used for statistical hypothesis testing in which two data sets are compared. Having defined a probability threshold (significance level), we can determine statistically whether to reject the null hypothesis or not. Usually, the null hypothesis is that there is no significant difference between the two data sets.
A/B testing is a randomized experiment with two variants (two-sample hypothesis testing), but we can also add more samples. The difference from multivariate testing is that in A/B testing only one element varies, while in the other test different elements vary and we should test several combinations of elements. These tests are used in several sectors and for different business issues, but nowadays they are quite popular in online marketing and website design.

A/B Testing of Conversion Rate

Output of Conversion Rate A/B Testing

Usually the steps to follow are:
  • -        Identify the goals: for example “improving the conversion rate of our website”;
  • -       Generate hypotheses: for example “a bigger BUY button will convert more”;
  • -       Create variables: in our example the element to be modified is the BUY button and the variation website can be created with a double-size BUY button;
  • -       Run the experiment:

o   Establish a sample size: depending on the expected conversion rate, the margin of error that is acceptable, the confidence level, and the population, the minimum sample size can be calculated (see the template);
o   The two versions must be shown to visitors during the same period, and the visitors must be chosen randomly (we are interested in testing the effect of a larger button; if we do not choose visitors randomly or show the two versions during different periods, the results will probably be biased);

-          Analyze the results:
o   Significance: depending on the significance level chosen for the test (usually 90%, 95%, or 99%), we can be X% confident that the two versions convert differently;
o   Confidence intervals: depending on the confidence level chosen, there will be a probable range of conversion rates (we will be X% confident that the conversion rate ranges from X to Y);
o   Effect size: the effect size represents the difference between the two versions.

The proposed template provides a simple calculator for the necessary sample size and for testing the significance of conversion rate A/B testing. However, a considerable amount of information about A/B testing is available online.[1] The template of chapter 44. TEST OF PROPORTIONS shows the same test with more statistical detail as well as the calculation of the mean difference confidence interval, while the A/B testing template presents confidence intervals for each mean of the two samples.

In the proposed example, the data are obtained using a web analytics tool (for example Google Analytics), but they can come from any experiment that we decide to run.


TEMPLATE


Wednesday, November 30, 2016

41. INTRODUCTION TO HYPOTHESIS TESTING

OBJECTIVE

Verify whether two (or more) groups are significantly different from each other, usually by comparing their means or medians.


DESCRIPTION
Generally speaking, statistical hypothesis testing concerns all the techniques that test a null hypothesis versus an alternative hypothesis. Although it also includes regressions, I will only focus on the testing performed on samples.
There are three main steps in hypothesis testing:
  • -          Definition: identify the problem, study it, and formulate hypotheses;
  • -          Experiment: choose and define the data collection technique and sampling method;
  • -          Results and conclusion: check the data, choose the most appropriate test, analyze the results, and make conclusions.


DEFINITION

The first step in hypothesis testing is to identify the problem and analyze it. The three main categories of hypothesis testing are:
  • -          to test whether two samples are significantly different; for example, after conducting a survey in two hotels of the same hotel chain, we want to check whether the difference in average satisfaction is significant or not;
  • -          to test whether a change in a factor has a significant impact on the sample by conducting an experiment (for example to check whether a new therapy has better results than the traditional one);
  • -          to test whether a sample taken from a population truly represents it (if the population’s parameters, i.e. the mean, are known); for example, if a production line is  expected to produce objects with a specific weight, it can be checked by taking random samples and weighting them. If the average weight difference from the expected weight is statistically significant, it means that the machines should be revised.

After defining and studying the problem, we need to define the null hypothesis (H0) and alternative hypothesis (Ha), which are mutually exclusive and represent the whole range of possibilities. We usually compare the means of the two samples or the sample mean with the expected population mean. There are three possible hypothesis settings:
  • -          To test any kind of difference (positive or negative), the H0 is that there is no difference in the means (H0: μ = μ0 and Ha: μ ≠ μ0);
  • -          To test just one kind of difference:
    • o   positive (H0: μ ≤ μ0 and Ha: μ > μ0);
    • o   negative (H0: μ ≥ μ0 and Ha: μ < μ0).


EXPERIMENT

The sampling technique is extremely important; it must be certain that the sample is randomly chosen (in general) and, in the case of an experiment, the participants must not know in which group they are placed. Depending on the problem to be testing and the test to be performed, different techniques are used to calculate the required sample size (check www.powerandsamplesize.com, which allows the calculation of the sample size for different kinds of tests).

RESULTS AND CONCLUSIONS
Once the data have been collected, it is necessary to check for outliers and missing data (see 36. INTRODUCTION TO REGRESSIONS) and choose the most appropriate test depending on the problem studied, the kind of variables, and their distribution. There are two main approaches to testing hypotheses:
  • -          The frequentist approach: this makes assumptions on the population distribution and uses a null hypothesis and p-value to make conclusions (almost all the methods presented here are frequentist);
  • -          The Bayesian approach: this approach needs prior knowledge about the population or the sample, and the result is the probability for a hypothesis (see 42. BAYESIAN APPROACH TO HYPOTHESIS TESTING).

DEPENDENT VARIABLE
SAMPLE CHARACTERISTICS (INDEPENDENT VARIABLES)
CORRELATION
1 SAMPLE
2 SAMPLES
SAMPLES > 2
INDEPENDENT
DEPENDENT
INDEPENDENT
DEPENDENT
DICHOTOMOUS
Test of proportions
McNemar test
Cochran's Q
Phi coefficient, contingency tables
CATEGORICAL
ORDINAL
MannWhitney U test
Wilcoxon signed-rank test
KruskalWallis test, Wilcoxon rank sum test
ScheirerRayHare test (two-way), Friedman test (one-way)
Spearman’s correlation
INTERVAL OR RATIO
One-sample z-test or t-test
Two-sample t-test
Paired t-test
One-way ANOVA
Repeated measure ANOVA
Pearson’s correlation
Two-way ANOVA

Summary of Parametric and Non-parametric Tests

Tests usually analyze the difference in means, and the result is whether or not the difference is significant. When we make these conclusions, we have two types of possible errors:
-          α: the null hypothesis is true (there is no difference) but we reject it (false positive);
-          β: the null hypothesis is false (there is a difference) but we do not reject it (false negative).

Possible outcomes of hypothesis testing
NOT REJECT NULL HYPOTHESIS
REJECT NULL
HYPOTHESIS
THE NULL HYPOTHESIS IS TRUE
1-α
Type I error: α
THE NULL HYPOTHESIS IS FALSE
Type II error: β
1-β

Possible Outcomes of Hypothesis Testing

The significance of the test depends on the size of α, that is, the possibility of rejecting the null hypothesis when it is true. Usually we use 0.05 or 0.01 as a critical value and reject the null hypothesis when α is smaller than the p-value (the critical value representing the probability, assuming that the null hypothesis is true, of observing a result at least as extreme as the one that we have (i.e. the actual mean difference).
It is important to remember that, if we are running several tests, the likelihood of committing a type I error (false positive) increases. For this reason we should use a corrected α, for example by applying the Bonferroni correction (divide α by the number of experiments).[1]
In addition, it is necessary to remember that, with an equal sample size, the smaller the α chosen, the larger the β will be (false negative).

If the test is significant, we should also compute the effect size. It is important not only whether the difference is significant but also how large the difference is. The effect size can be calculated by dividing the difference between the means by the standard deviation of the control group (to be precise, we should use a pooled standard deviation, but it requires some calculation). As a rule of thumb, an effect size of 0.2 is considered to be small, 0.5 medium, and above 0.8 large. However, in order contexts the effect size can be given by other statistics, such as the odds ratio or correlation coefficient.

Confidence intervals are also usually calculated to have a probable range of values to derive a conclusion in which there is, for example, 95% confidence that the true value of the parameter is within the confidence interval X‒Y. The confidence interval reflects a specific interval level; for example, a 95% interval reflects a significance level of 5% (or 0.05). When comparing the difference between two means, if 0 is within the confidence interval, it means that the test is not significant.


ALTERNATIVE METHODS

In the following chapters I will present several methods for hypothesis testing, some of which have specific requirements or assumptions (type of variables, distribution, variance, etc.). However, there is also an alternative that we can use when we have numerical variables but are not sure about the population distribution or variance. This alternative method uses two simulations:

  • -          Shuffling (an alternative to the significance test): we randomize the groups’ elements (we mix the elements of the two groups randomly, each time creating a new pair of groups) and compute the mean difference in each simulation. After several iterations we calculate the percentage of trials in which the difference in the means is higher than the one calculated between the two original groups. This can be compared with the significance test; for example, if fewer than 5% of the iterations indicate a larger difference, the test is significant with α < 0.05.

  • -          Bootstrapping (an alternative to confidence intervals): we resample each of our groups by drawing randomly with replacement from the groups’ elements. In other words, with the members of a group, we recreate new groups that can contain an element multiple times and not contain another one at all. An alternative resampling method would be to resample the original groups in smaller subgroups (jackknifing). After calculating the difference in means of the new pairs of samples, we have a distribution of means and can compute our confidence interval (i.e. 95% of the computed mean differences are between X and Y).






[1] There are also other methods that can be more or less conservative, for example the Šidák correction or the false discount rate controlling procedure.