Wednesday, February 19, 2020

24. Choice-Based Conjoint Analysis


Identify customers and potential customers’ preferences for specific attributes of a product. It can also be used to define the willingness to pay and market share of different products.


This method is preferred to conjoint analysis because it represents a more realistic purchase situation and, in the case of having a large number of possible combinations, because it is sufficient to show only a certain number of combinations to each respondent. Then the responses are analyzed together and the utility is defined at the aggregated level (not at the individual level as in conjoint analysis).

For this method it is also very important to choose carefully the attributes (as a rule of thumb, no more than seven including the price) and the product profiles to present, that is, the combinations of attributes. Once the attributes and product profiles have been defined, choice scenarios are designed. A scenario is a combination of several products that is presented to the respondents. When defining scenarios, these recommendations should be followed:

  • A “none” choice should be included among the products presented in each purchase scenario;
  • Each scenario should not have more than 5 products;
  • Between 12 and 18 scenarios are usually presented to each respondent.

Usually, all the combinations cannot be presented in the same scenario, and a good practice is to show from two to five products in each scenario. When choosing the combination of products for each scenario, it is important that all the products are shown an equal number of times and that each product is compared equally with other alternatives.
Once the data have been collected, utilities are estimated at the aggregate level. The market share of each product can be calculated using the “share of preferences”:
  • Products’ utilities are calculated by summing all the attributes’ utilities;
  • Products’ utilities are exponentiated;
  • The market share is calculated as the product’s exponentiated utility divided by the sum of all the exponentiated utilities.

To obtain utilities at the individual level, a method called “hierarchical Bayes” is used. This method enables us to calculate a more reliable market share based on the choice of each respondent using three main techniques:
  • First choice: each respondent chooses the product that maximizes her utility (this technique is suggested for expensive products that imply a careful evaluation, such as houses and cars);
  • Share of preference: each respondent purchases a share of each product based on the share of utilities (suggested when a product is purchased several times during a certain period);
  • Randomized first choice: each respondent chooses one product with a probability proportional to its utility.

This method is also useful for predicting variations in the market share compared with competitors by creating simulations in which prices or other products’ attributes are changed. For example, we can analyze whether a discount can attract a big enough market share to compensate for the reduction in price. In this kind of simulation, we assume that competitors are not modifying both attributes and price, but in reality this could not be the case. This is why we should at least simulate several scenarios including possible competitors’ reactions. A more complex approach would be to include a game theory model (see 76. GAME THEORYMODELS).


Due to the complexity of choice-based conjoint analysis, I suggest using specific tools (several of these can be found by searching on the net) or using Excel complements, such as the one offered by XLSTAT,[1] from which an example can be downloaded in Excel with the results of a choice-based conjoint analysis.


Monday, February 17, 2020

52. Wilcoxon Signed-Rank Test


Verify whether the difference in two groups is significantly different (in the case that the assumptions for parametric tests are not met).


It is a non-parametric substitute of the t-test in the case that normality cannot be verified or the sample is too small (smaller than 30), but, differing from the MannWhitney test, it is used for paired samples (the groups are not independent). The example presented here is a “signed-rank Wilcoxon test” (for paired samples and as a counterpart of the parametric paired t-test).

The assumptions for this test are: the samples are from the same population, the pairs are chosen independently, the data are measured at least on an ordinal scale (continuous, discrete, or ordinal), and the distribution is not especially skewed (that is to say it is approximately symmetrical).

Wilcoxon Signed-Ranks Test Excel

Wilcoxon Signed-Ranks Test

To perform the test, the difference between observations is calculated and ranked. When we have samples with fewer than 25 observations, we use a table with a t-critical value to verify the significance; specifically, we reject the null hypothesis (there is no difference in the samples) if T is lower than T-critical. If we have a larger sample, we can use the p-value for the assessment of the significance.


Friday, February 14, 2020



Verify whether two or more groups are significantly different.


While with a t-test we can only test two groups, with ANOVA we can test several groups and decide whether the means of these samples are significantly different. The simplest analysis is one-way ANOVA, in which the variance depends on one factor. Suppose that we want to know whether the age of people buying three different products is significantly different to focus the promotional campaigns better. In this case we have just one factor (type of product); therefore, we run a one-way ANOVA after checking the normality assumption (see 36. INTRODUCTION TO REGRESSIONS).

One-way ANOVA excel

Output of a One-way ANOVA Analysis

Once the test is concluded, we check the p-value (< 0.05) and F-value (larger than the F-critical value) to reject the null hypothesis and infer that the populations are not equal. In the proposed example, there is a significant age difference among products. However, we should perform a t-test of each pair of groups to determine where the difference lies.
In a two-way ANOVA we have two factors to be tested. For example, we are selling products A, B, and C in three countries (1, 2, and 3). In the proposed example, we use a two-way ANOVA “without replication,” since only one observation is recorded for each factor’s combination (we will use an ANOVA “with replication” if there are more observations recorded for each combination). In a two-way ANOVA, there are two null hypotheses to be tested, one for each factor, and it is possible that the hypothesis will be rejected for one factor but not for the other one.

In our example we reject the null hypothesis for the factor “type of product” (rows), since its p-value is smaller than 0.05, but we cannot reject the null hypothesis for the factor “country” (columns).


Output of a Two-Way ANOVA Analysis

We can also perform an ANOVA with repeated measures when we have repeated measures within the same group. In our example a company decided to start a four-week training program for five employees to diminish the number of errors made at work. In this case the repeated measures are the errors of each employee in the same week of training.

Single-factor ANOVA Excel

Results of a Single-Factor ANOVA with Repeated Measures

The template contains the calculations for a single-factor repeated-measures ANOVA. In our example, since the p-value is lower than 0.05 (our chosen alpha), we reject the null hypothesis of no difference among the week’s means and infer that the training has had an impact on the number of errors.

We can use the Excel Data Analysis complement to perform a two-factor repeated-measures ANOVA, choosing the test “Anova: Two-Factor With Replication.” In the proposed example, we are selling different product versions in different markets, and we want to test whether either the product or the market (or both) have an impact on the number of products sold. The results show that, while the kind of product (rows) affects the sales (p-value < 0.05), the market (columns) does not.

Two-factor ANOVA Excel

Results of a Two-Factor ANOVA with Repeated Measures

An extension of ANOVA is MANOVA, which allows the test to be run with more than one dependent variable. For example, it is possible to run a MANOVA using “level of education” as the categorical independent variable and “test score” and “yearly income” as the continuous dependent variables.


Tuesday, February 11, 2020



Verify whether the difference between the observed frequencies and the expected frequencies is significant or not.


A chi-square test is used to test the frequencies of independent observations (not suitable for paired testing) with two main purposes:
  • Test of independence: to determine the association between two categorical variables, for example if the kind of civil status does not affect the kind of service that customers buy;
  • Test of goodness of fit: to determine the difference between the observed values and the expected values (for example whether a sample taken from a population follows the expected population distribution or a theoretical distribution).

In either case the method used is the same; specifically, we apply a chi-square test using the observed values and the expected values. When using a chi-square test, it is important to bear in mind that this test is sensitive to the sample size (with fewer than 50 this test is not appropriate) and needs to have a minimum frequency in each bin or class (at least 5). If these conditions are not met, we should consider using Fisher’s exact test.


In the example we are testing the independence of the variables “civil status” and “service level” chosen by customers.

chi-square excel

Observed Values vs Expected Values in a Chi-Square Test

For this test we assume that the probability of having a specific service level and the probability of being married, single, or divorced are independent events. With these assumptions, we compare the actual distribution of the products sold in each country and the expected distribution based on the independent probabilities of the two events (see the template). In other words, the test compares the expected frequencies with the actual frequencies. The Excel formula “=CHITEST” is then applied, and if the resulting p-value is smaller than 0.05 (or a different alpha), we reject the null hypothesis (the two variables are independent) and we can infer that the two variables are associated. In other words, we can say that civil status affects the level of service purchased.


Goodness of fit can be calculated in Excel using the same formula (=CHITEST), which is applied to a column (or row) with observed values and a column with expected values. For example, if we throw a die, we expect that each number has the same probability of appearing (1/6) but suspect the die to have been loaded. In our example we throw the die 60 times and expect to produce each number 10 times. If the p-value is lower than 0.05, we reject the null hypothesis, which is that the variables are independent (as in the chi-square independence test). Since the p-value is larger than 0.05, we conclude that the die is not loaded.

chi-square goodness of fit Excel

Data and Results of a Goodness of-Fit Chi-Square Test

We can also compare our observed values with a theoretical distribution.


The chi-square test can also be used instead of the z-test in the test of proportions (see 44. TEST OF PROPORTIONS) when the assumptions for using the parametric test are not met. In this case we will compare the observed proportion with the expected proportion with a double-entry contingency table (the same table used in the independence test but with only two categories per row and two categories per column).


Download the Chi-square Excel Template

Sunday, February 9, 2020



Define the optimal price range for a product or service.


This method is useful in taking general pricing decisions. Data are collected through surveys in which each respondent is asked about his intention to purchase and shown several prices that move up or down depending on the previous answers. Alternatively, prices can be shown randomly or in a fixed series. The highest price at which a respondent reports that he would buy is considered to be his WTP. Once we have a specific price limit (WTP) for each interviewee, we can draw an accumulated demand curve.

Gabor–Granger Pricing Method

GaborGranger Pricing Method

Since we have the information about demand and WTP available, we can calculate the revenue curve in the graph and establish the optimal price at which revenues are maximized.

Donwload the Gabor-Granger Excel Template

Thursday, April 18, 2019


Verify whether two proportions are significantly different.


This test has the same purpose as a t-test but is applied to compare proportions, that is, when elements can take the value of 0 or 1. For example, we can compare the conversion rate of two advertising campaigns (two-sample test of proportions) or compare the improvement in patients who received a new drug compared with the average improvement of these patients.

As in the t-test, we need to specify our hypothesis and, based on that, run either a two-tailed t-test or a one-tailed t-test (see 43. t-TEST); the only difference is that instead of using means we use proportions. We then need to specify the required significance level (α) and the hypothesized difference between the proportions.
Other differences from the t-test is that we resolve the test by calculating the z-critical value instead of the t-critical value and that the calculation is slightly different because we use proportions instead of means, variances, and standard deviations. In addition, we have to check that the number of events (conversions) and the number of “no events” (users who did not buy) are at least 10.


The boss of a company claims that 80% of people are very satisfied with their working conditions. Our null hypothesis is that the satisfaction is equal to 80%, and our alternative hypothesis is that the satisfaction rate is different from 80%. The HR department decides to survey 100 employees, with a result of 73%. At the 0.05 level of significance, we fail to reject the null hypothesis, so we cannot state that the satisfaction of our survey is significantly different from 80%. In Figure 47 we can see that the confidence interval (95%) includes 80%.

Results of a One-Sample Test of Proportions


A company has started a new online marketing campaign and wants to compare it with a standard online marketing campaign. The goal is to increase the conversion rate of online users, and the conversion rates of the two campaigns are compared using a proportion test. The results are not presented here, since this test is explained in more detail in chapter 45. A/B TESTING.


Thursday, February 15, 2018



Identify customers’ likelihood of making recommendations.


Usually the value of customers is calculated using only the variables amount spent and frequency of purchases. However, customers can create value in several other ways, one of which is by recommending the service to other potential buyers. The so-called “word-of-mouth” phenomenon is nowadays empowered by social networks, metasearch websites, or portals that provide customer feedback on products.

Positive recommendations can not only increase sales but also allow companies to save money on advertising. The Net Promoter Score® is an indicator that estimates the probability of recommendation of a group of customers based on the recommendation intention of single customers. It was developed and registered as a trademark by Fred Reichheld, Bain & Company, and Satmetrix.

The data are collected through surveys, and the respondents are asked to score from 0 to 10 the likelihood of recommending the product or service. Those who respond 9 or 10 are the real promoters, while those who respond 6 or lower are the detractors. The respondents whose score is from 7 to 8 are considered passive, since, even if they say that they would recommend the product or service, in reality they do not recommend it. The NPS® is the percentage of promoters minus the percentage of detractors.


Thursday, December 7, 2017



Define the most appropriate business strategy based on the existence of markets and products.


Ansoff’s matrix is usually performed after a SWOT analysis, in which strengths, weaknesses, opportunities, and threats can be transformed into business strategies:

  • -          Market penetration: the organization decides to use existing products in the existing market by improving tactics and strategies to push sales, for example through advertising, promotion, and pricing;
  • -          Product development: the organization decides to develop new products for an existing market or to add new features;
  • -          Market development: the organization decides to sell existing products to new markets, for example exporting to new countries;
  • -          Diversification: the organization decides to take a more radical approach by creating a new product for a new market. This can be the result of an opportunity caused by a new trend, identified in the SWOT analysis.

Ansoff's matrix

Figure 14: Ansoff’s Matix

Once the strategy has been defined using Ansoff’s matrix, the objectives, strategies, and tactics can be revised in the VMOST model. The information needed for this matrix can usually be found in previous internal analysis, external analysis, and SWOT analysis.


Sunday, October 29, 2017



Analyze the interrelations among several variables and explain them with a reduced number of variables.

A principal component analysis (PCA) analyzes the interrelations among a large number of variables to find a small number of variables (components) that explain the variance of the original variables. This method is usually performed as the first step in a series of analyses; for example, it can be used when there are too many predictor variables compared with the number of observations or to avoid multicollinearity.

Suppose that a company is obtaining responses about many characteristics of a product, say a new shampoo: color, smell, cleanliness, and shine. After a PCA it finds out that the four original variables can be reduced to two components[1]:
  • -          Component “quality”: color and smell;
  • -          Component “effect on hair”: cleanliness and shine.

Even though it is possible to run a PCA in Excel with complex calculations or special complements,[2] I suggest using a proper statistical tool. Here I will only explain some guidelines when performing a PCA.

First of all, the analysis starts with a covariance or correlation matrix. I suggest using a correlation matrix, since we cannot use a covariance matrix if the variables have different scales or the variances are too different. Then, eigenvectors (the direction of the variance) and eigenvalues (the degree of variance in a certain direction) are calculated. Now we have a number of components that is equal to the number of variables, each one with a specific eigenvalue.

Results of Principal Component Analysis
Figure 74: Results of a PCA

The more variance (eigenvalue) that a component explains, the more important it is. There are several approaches that we can use to choose the number of components to retain:
-          Defining a threshold before the analysis:

  •   choose all the components with a certain eigenvalue (usually > 1);
  •    choose a priori a specific number of components (then verify the total variance       explained and other validity tests);
  •    choose the first x components that explain at least X% of the variance, for example      80% if using the results for descriptive purposes or higher if the results will be used      in other statistical analysis (Figure 74);

-          Use a scree plot (Figure 75) and “cut” the line at the main inflexion point or at one of the main inflexion points where there is an acceptable total variance explained (for example, in Figure 75 the first four components can be chosen, since there is an important inflexion point, but they just explain 60% of the variance).

Scree Plot Principal Component Analysis

 Figure 75: Scree Plot

The next step is to analyze the principal components’ correlation coefficients in a matrix with variables and components. Ideally we want one variable to have a high correlation with one component to define each component conceptually (smell and color = component “quality”). However, even if we cannot explain the resulting components conceptually, we have to bear in mind that the main objective of a PCA is to reduce a large number of variables to a manageable number of components, while interpreting the component is not strictly necessary. In chapter 64. EXPLORATORY FACTOR ANALYSIS, PCA analysis will be used as the method for a factor analysis, and I will introduce optimization methods, factor scoring, and validity tests.


[1] In spite of this example, PCA is usually performed when we have a larger number of variables.