Example of a D-efficient RUM design with dummies in ChoiceDesign

This notebook illustrates how to use ChoiceDesign to generate a D-efficient experimental design with some attributes coded as dummies. Given a set of attributes and prior parameters, ChoiceDesign uses a variation of the random swapping algorithm [1] to minimise the D-error of the information matrix of a Multinomial Logit (MNL) model.

Step 1: Load modules, define design parameters and set attributes

The following lines load:

EffDesign: the class of efficient designs,
Attribute and Parameter: the classes of attributes and parameters, respectively.

[1]:

from choicedesign.design import EffDesign
from choicedesign.expressions import Attribute, Parameter

Each attribute is defined by the Attribute class. The arguments of this class are:

name: a string with the attribute name,
levels: a list of levels of the attribute,

Each attribute is alternative-specific. Hence, attributes must be defined for each alternative that contains them.

The following lines define 2 alternatives, named alt1 and alt2, and 4 attributes named from \(A\) to \(D\):

[2]:

alt1_A = Attribute('alt1_A',[1,2,3])
alt1_B = Attribute('alt1_B',[1,2,3])
alt1_C = Attribute('alt1_C',[0,3,5])
alt1_D = Attribute('alt1_D',[0,1,2])

alt2_A = Attribute('alt2_A',[1,2,3])
alt2_B = Attribute('alt2_B',[1,2,3])
alt2_C = Attribute('alt2_C',[0,3,5])
alt2_D = Attribute('alt2_D',[0,1,2])

Step 2: Construct efficient design object and generate initial design matrix

The second step consists of constructing the experimental design object, which requires the following parameters:

X: A list of Attribute class elements,
ncs: The number of choice situations.

The following lines define a object named design using EffDesign of 16 choice situations:

[3]:

design = EffDesign(
    X = [alt1_A,alt1_B,alt1_C,alt1_D,
         alt2_A,alt2_B,alt2_C,alt2_D],
    ncs=18)

After the design object is defined, the method gen_initdesign() generates the initial design matrix. This method accepts the following optional parameters:

cond: List of conditions that the final design must hold. Each element is a string that contains a single condition. Conditions can be of the form of binary relations (e.g., X > Y where X and Y are attributes of a specific alternative) or conditional relations (e.g., if X > a then Y < b where a and b are values). Users can specify multiple conditions when the operator if is defined, separated by the operator &.
seed: Random seed

For this example, neither of the arguments above will be used:

[4]:

init_design = design.gen_initdesign()
init_design

[4]:

	alt1_A	alt1_B	alt1_C	alt1_D	alt2_A	alt2_B	alt2_C	alt2_D
0	1	1	5	0	3	2	5	2
1	2	2	5	2	1	1	5	0
2	3	2	0	2	2	3	0	0
3	3	1	5	0	3	2	3	1
4	3	2	3	1	3	3	0	1
5	1	3	0	1	1	3	5	2
6	1	3	3	0	1	3	0	0
7	1	3	5	1	2	3	5	1
8	2	2	0	2	2	1	3	2
9	1	3	5	0	2	2	3	0
10	3	3	3	2	2	3	5	1
11	2	1	5	0	3	1	0	2
12	1	1	0	2	1	2	5	1
13	2	1	0	1	2	1	3	1
14	3	3	3	0	3	1	0	0
15	3	2	3	1	1	2	3	0
16	2	2	0	1	1	1	0	2
17	2	1	3	2	3	2	3	2

Step 3: Set the utility functions

ChoiceDesign uses a native expression system to define utility functions. Parameters and attributes are combined using standard arithmetic operators. For this, we use the Parameter class, which requires the following arguments:

name: The parameter name
prior: The prior value

We will assume that attributes A and B are coded as dummies in which level 1 is the baseline. Therefore, we must define additional parameters. Dummy indicators are created directly using the == operator on an Attribute:

The following lines define six parameters:

[5]:

beta_A_2 = Parameter('beta_A_2',-0.1)
beta_A_3 = Parameter('beta_A_3',-0.4)

beta_B_2 = Parameter('beta_B_2',-0.02)
beta_B_3 = Parameter('beta_B_3',-0.01)

beta_C = Parameter('beta_C',0.1)
beta_D = Parameter('beta_D',0.15)

Then, the utility functions are defined using standard arithmetic operators. The == operator on an Attribute returns an indicator (1 where the condition holds, 0 otherwise), which is used here for dummy coding.

[6]:

V1 = beta_A_2 * (alt1_A==2) + beta_A_3 * (alt1_A==3) + beta_B_2 * (alt1_B==2) + beta_B_3 * (alt1_B==3) + beta_C * alt1_C + beta_D * alt1_D
V2 = beta_A_2 * (alt2_A==2) + beta_A_3 * (alt2_A==3) + beta_B_2 * (alt2_B==2) + beta_B_3 * (alt2_B==3) + beta_C * alt2_C + beta_D * alt2_D

The utility functions must be stored in a dictionary object. In this dictionary, each key is a consecutive number from 1 to the number of alternatves. The values of each key are the corresponding utility functions:

[7]:

V = {1: V1, 2: V2}

Step 3: Optimise the initial design, given the utility functions and priors:

The method optimise() starts the D-error minimisation routine, given the initial design matrix and the utility functions. This method requires the following parameters:

init_design: The objective design matrix to optimise
V: The dictionary object with utility functions
model: The base model of the efficient design. By default is 'mnl' for a Multinomial Logit model.

In addition, optimise() admits the following optional parameters:

iter_lim: number of iterations before the algorithm stops.
noimprov_lim: Number of iterations without improvement before the algorithm stops,
time_lim: time (in minutes) before the algorithm stops,
seed: Random seed
verbose: Whether status messages and progress are shown.

The outputs of optimise are:

optimal_design: The optimised design matrix
init_perf: The initial D-Error
final_perf: The D-error of the last stored design
final_iter: The last iteration number
ubalance_ratio: The utility balance ratio. A 0% value indicates strict dominance of an alternative, whereas 100% indicates equal market shares.

The following line starts the optimisation routine during 1 minute:

[8]:

optimal_design, init_perf, final_perf, final_iter, ubalance_ratio = design.optimise(init_design=init_design,V=V,model='mnl',time_lim = 1, verbose = True)

Evaluating initial design
Optimization complete 0:00:59 / D-error: 0.183009
Elapsed time: 0:01:00
D-error of initial design:  0.419515
D-error of last stored design:  0.183009
Utility Balance ratio:  94.31 %
Algorithm iterations:  27064

Blocking the design

The optimal design can be blocked using the method gen_blocks(). This method randomly creates candidate blocks and keeps the one with the minimum correlation between the blocking column and all the attributes. The method allows for the following arguments:

optimal_design: the experimental design
n_blocks: number of blocks.
n_iter (optional): number of iterations of the blocking algorithm

The following line creates 4 blocks in the optimal design:

[9]:

optimal_design_blocked = design.gen_blocks(optimal_design,n_blocks=3)

Lastly, the optimal design can be printed:

[10]:

optimal_design

[10]:

	CS	alt1_A	alt1_B	alt1_C	alt1_D	alt2_A	alt2_B	alt2_C	alt2_D	Block
0	1	1	1	3	2	3	3	3	0	2
1	2	3	3	5	2	1	2	0	0	2
2	3	3	2	5	2	2	3	0	0	1
3	4	3	2	3	0	2	1	3	2	2
4	5	2	2	0	2	1	1	5	0	3
5	6	2	3	3	1	3	1	3	1	1
6	7	2	3	5	0	1	2	0	2	3
7	8	1	1	0	1	2	2	5	1	3
8	9	1	2	0	0	3	1	5	2	2
9	10	1	2	5	0	2	3	0	2	2
10	11	3	3	0	2	2	2	5	0	1
11	12	1	3	5	1	3	2	0	1	1
12	13	2	1	3	2	3	3	3	0	1
13	14	3	1	5	0	1	3	0	2	1
14	15	1	3	3	1	2	1	3	1	3
15	16	3	1	0	1	1	3	5	1	3
16	17	2	2	3	1	1	1	3	1	3
17	18	2	1	0	0	3	2	5	2	2

(optional) Evaluate the design

The method evaluate() allows to evaluate a design stored in a data frame, under the specification provided when EffDesign was initialised. evaluate() requires the following parameters:

optimal_design: The objective design matrix to evaluate
V: The dictionary object with utility functions
model: The base model of the efficient design. By default is mnl for a Multinomial Logit model.

[11]:

perf, ubalance = design.evaluate(optimal_design,V,model='mnl')

print(perf, ubalance)

0.1830092370281435 94.31092661140309

References

[1] Quan, W., Rose, J. M., Collins, A. T., & Bliemer, M. C. (2011). A comparison of algorithms for generating efficient choice experiments.