Example of a Db-efficient RUM design with ChoiceDesign
This notebook illustrates how to use ChoiceDesign to generate a Db-efficient (Bayesian) experimental design for a Random Utility Maximisation (RUM) model.
What is a Db-efficient design?
A standard Dp-efficient design minimises the D-error at fixed prior parameter values. The problem is that priors are always uncertain: if the assumed values are wrong, the resulting design can be suboptimal for estimation.
A Db-efficient design (Sandor & Wedel, 2001) addresses this by treating the researcher’s priors as random variables following a probability distribution. Instead of minimising a point D-error, it minimises the Db-error — the expected D-error integrated over the entire prior distribution:
This is estimated by Monte Carlo: draw \(R\) samples from the prior distribution, evaluate the D-error at each draw, and take the mean. The result is a design that performs well on average across the range of plausible true parameter values.
Note: This is different from a Random Parameters (Mixed Logit) design. In a Db design the uncertainty is the researcher’s uncertainty about fixed true parameters. The underlying estimation model is still plain MNL.
Step 1: Load modules, define design parameters and set attributes
The following lines load:
EffDesign: the class of efficient designs,AttributeandParameter: the classes of attributes and parameters, respectively.
[1]:
from choicedesign.design import EffDesign
from choicedesign.expressions import Attribute, Parameter
Each attribute is defined by the Attribute class. The arguments of this class are:
name: a string with the attribute name,levels: a list of levels of the attribute.
Each attribute is alternative-specific. The following lines define 2 alternatives, named alt1 and alt2, and 4 attributes named from \(A\) to \(D\):
[2]:
alt1_A = Attribute('alt1_A', [1, 2, 3])
alt1_B = Attribute('alt1_B', [10, 15, 15.5])
alt1_C = Attribute('alt1_C', [0, 3, 5])
alt1_D = Attribute('alt1_D', [0, 1, 2])
alt2_A = Attribute('alt2_A', [1, 2, 3])
alt2_B = Attribute('alt2_B', [10, 15, 15.5])
alt2_C = Attribute('alt2_C', [0, 3, 5])
alt2_D = Attribute('alt2_D', [0, 1, 2])
Step 2: Construct efficient design object and generate initial design matrix
The second step consists of constructing the experimental design object, which requires the following parameters:
X: A list ofAttributeclass elements,ncs: The number of choice situations.
[3]:
design = EffDesign(
X=[alt1_A, alt1_B, alt1_C, alt1_D,
alt2_A, alt2_B, alt2_C, alt2_D],
ncs=18)
[4]:
init_design = design.gen_initdesign(seed=42)
init_design
[4]:
| alt1_A | alt1_B | alt1_C | alt1_D | alt2_A | alt2_B | alt2_C | alt2_D | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1.0 | 10.0 | 5.0 | 1.0 | 2.0 | 15.5 | 5.0 | 2.0 |
| 1 | 2.0 | 15.0 | 3.0 | 2.0 | 3.0 | 15.0 | 5.0 | 0.0 |
| 2 | 3.0 | 10.0 | 5.0 | 1.0 | 2.0 | 15.5 | 5.0 | 1.0 |
| 3 | 3.0 | 15.5 | 0.0 | 1.0 | 1.0 | 10.0 | 3.0 | 2.0 |
| 4 | 1.0 | 15.5 | 3.0 | 0.0 | 1.0 | 15.5 | 5.0 | 0.0 |
| 5 | 2.0 | 15.0 | 3.0 | 2.0 | 3.0 | 15.0 | 3.0 | 0.0 |
| 6 | 2.0 | 10.0 | 3.0 | 2.0 | 2.0 | 10.0 | 0.0 | 1.0 |
| 7 | 1.0 | 10.0 | 0.0 | 0.0 | 3.0 | 15.5 | 0.0 | 0.0 |
| 8 | 3.0 | 15.5 | 5.0 | 0.0 | 1.0 | 10.0 | 5.0 | 2.0 |
| 9 | 3.0 | 10.0 | 0.0 | 2.0 | 2.0 | 15.0 | 0.0 | 0.0 |
| 10 | 1.0 | 10.0 | 0.0 | 0.0 | 1.0 | 15.5 | 0.0 | 0.0 |
| 11 | 3.0 | 15.0 | 5.0 | 1.0 | 3.0 | 10.0 | 3.0 | 1.0 |
| 12 | 2.0 | 15.0 | 5.0 | 2.0 | 1.0 | 15.0 | 3.0 | 1.0 |
| 13 | 1.0 | 15.5 | 3.0 | 2.0 | 3.0 | 15.0 | 0.0 | 2.0 |
| 14 | 2.0 | 15.5 | 3.0 | 0.0 | 3.0 | 15.5 | 0.0 | 1.0 |
| 15 | 2.0 | 15.0 | 5.0 | 1.0 | 2.0 | 15.0 | 5.0 | 2.0 |
| 16 | 3.0 | 15.5 | 0.0 | 1.0 | 2.0 | 10.0 | 3.0 | 2.0 |
| 17 | 1.0 | 15.0 | 0.0 | 0.0 | 1.0 | 10.0 | 3.0 | 1.0 |
Step 3: Set the utility functions with uncertain priors
For a Db-efficient design, each parameter can carry a prior_std argument that specifies the standard deviation of its prior distribution. The prior value is interpreted as the mean.
Parameters with
prior_stdset are treated as uncertain — draws are taken from \(\mathcal{N}(\text{prior}, \text{prior\_std})\) during Db-error computation.Parameters without
prior_std(or withprior_std=None) are fixed — they contribute no uncertainty.
Here, attributes \(A\) and \(B\) have uncertain priors; \(C\) and \(D\) are treated as known:
[5]:
beta_A = Parameter('beta_A', -0.1, prior_std=0.01) # uncertain: N(-0.1, 0.01)
beta_B = Parameter('beta_B', -0.2, prior_std=0.03) # uncertain: N(-0.2, 0.03)
beta_C = Parameter('beta_C', 0.1) # fixed prior
beta_D = Parameter('beta_D', 0.15) # fixed prior
The utility functions use the same linear MNL structure as any other example. There are no random draws inside the utility expression — the Bayesian averaging happens at the criterion level, not inside the utility tree:
[6]:
V1 = beta_A * alt1_A + beta_B * alt1_B + beta_C * alt1_C + beta_D * alt1_D
V2 = beta_A * alt2_A + beta_B * alt2_B + beta_C * alt2_C + beta_D * alt2_D
V = {1: V1, 2: V2}
Step 4: Optimise the design minimising the Db-error
The optimise() method accepts a bayes_draws argument. When set, the swapping algorithm minimises the Db-error (expected D-error over prior draws) instead of the point D-error.
bayes_draws: number of Monte Carlo draws from the prior distributions per D-error evaluation. Higher values reduce Monte Carlo noise at the cost of longer run times. Values between 200 and 1000 are typically sufficient.
All other stopping criteria work as usual:
[7]:
optimal_design, init_perf, final_perf, final_iter, ubalance_ratio = design.optimise(
init_design=init_design,
V=V,
bayes_draws=500,
time_lim=1,
verbose=True
)
Evaluating initial design
Optimization complete 0:00:59 / D-error: 0.042138
Elapsed time: 0:01:00
D-error of initial design: 0.099028
D-error of last stored design: 0.042138
Utility Balance ratio: 82.87 %
Algorithm iterations: 471
Step 5: Compare Db-error vs point D-error
The evaluate() method also accepts bayes_draws. Calling it with and without this argument shows the difference between the Db-error (what the design was optimised for) and the point D-error (what a Dp design would report at the prior means).
A well-calibrated Db design will have a higher point D-error than a Dp design optimised at the same prior means — that is the cost of robustness. The benefit is a design that degrades more gracefully when the true parameters differ from the assumed priors.
[8]:
db_error, ubalance_db = design.evaluate(optimal_design, V, bayes_draws=500)
dp_error, ubalance_dp = design.evaluate(optimal_design, V)
print(f'Db-error (robust, averaged over prior draws): {db_error:.6f}')
print(f'Dp-error (point, evaluated at prior means): {dp_error:.6f}')
print(f'Utility balance: {ubalance_dp:.2f} %')
Db-error (robust, averaged over prior draws): 0.042240
Dp-error (point, evaluated at prior means): 0.042150
Utility balance: 82.87 %
(optional) Block the design
The optimal design can be blocked using gen_blocks(). This method randomly creates candidate blocks and keeps the one with the minimum correlation between the blocking column and all attributes:
[9]:
optimal_design_blocked, corr_history = design.gen_blocks(optimal_design, n_blocks=3)
optimal_design_blocked
[9]:
| CS | alt1_A | alt1_B | alt1_C | alt1_D | alt2_A | alt2_B | alt2_C | alt2_D | Block | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.0 | 1.0 | 10.0 | 5.0 | 1.0 | 2.0 | 15.0 | 0.0 | 2.0 | 2 |
| 1 | 2.0 | 2.0 | 15.5 | 0.0 | 2.0 | 1.0 | 10.0 | 5.0 | 0.0 | 1 |
| 2 | 3.0 | 3.0 | 10.0 | 0.0 | 0.0 | 1.0 | 15.0 | 5.0 | 2.0 | 3 |
| 3 | 4.0 | 2.0 | 15.0 | 0.0 | 1.0 | 2.0 | 10.0 | 5.0 | 1.0 | 3 |
| 4 | 5.0 | 3.0 | 15.0 | 5.0 | 0.0 | 1.0 | 15.5 | 0.0 | 2.0 | 2 |
| 5 | 6.0 | 2.0 | 15.0 | 5.0 | 2.0 | 2.0 | 15.5 | 0.0 | 0.0 | 3 |
| 6 | 7.0 | 3.0 | 15.5 | 3.0 | 1.0 | 2.0 | 10.0 | 0.0 | 1.0 | 2 |
| 7 | 8.0 | 1.0 | 10.0 | 5.0 | 1.0 | 3.0 | 15.0 | 3.0 | 1.0 | 1 |
| 8 | 9.0 | 3.0 | 15.0 | 0.0 | 1.0 | 1.0 | 15.0 | 3.0 | 1.0 | 1 |
| 9 | 10.0 | 1.0 | 10.0 | 0.0 | 2.0 | 3.0 | 15.5 | 5.0 | 0.0 | 3 |
| 10 | 11.0 | 1.0 | 15.0 | 0.0 | 0.0 | 3.0 | 15.5 | 5.0 | 2.0 | 1 |
| 11 | 12.0 | 2.0 | 15.5 | 5.0 | 2.0 | 3.0 | 10.0 | 0.0 | 0.0 | 3 |
| 12 | 13.0 | 3.0 | 10.0 | 3.0 | 2.0 | 1.0 | 15.5 | 5.0 | 0.0 | 2 |
| 13 | 14.0 | 1.0 | 15.0 | 3.0 | 0.0 | 3.0 | 15.0 | 3.0 | 1.0 | 2 |
| 14 | 15.0 | 1.0 | 15.5 | 3.0 | 0.0 | 3.0 | 10.0 | 0.0 | 2.0 | 2 |
| 15 | 16.0 | 2.0 | 10.0 | 3.0 | 1.0 | 2.0 | 15.5 | 3.0 | 1.0 | 1 |
| 16 | 17.0 | 2.0 | 15.5 | 3.0 | 2.0 | 2.0 | 10.0 | 3.0 | 0.0 | 1 |
| 17 | 18.0 | 3.0 | 15.5 | 5.0 | 0.0 | 1.0 | 15.0 | 3.0 | 2.0 | 3 |
References
[1] Sandor, Z., & Wedel, M. (2001). Designing conjoint choice experiments using managers’ prior beliefs. Journal of Marketing Research, 38(4), 430–444.
[2] Quan, W., Rose, J. M., Collins, A. T., & Bliemer, M. C. (2011). A comparison of algorithms for generating efficient choice experiments.