{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Example of a D-efficient RUM design with dummies in ChoiceDesign\n",
"\n",
"This notebook illustrates how to use **ChoiceDesign** to generate a D-efficient experimental design with some attributes coded as dummies. Given a set of attributes and prior parameters, ChoiceDesign uses a variation of the random swapping algorithm [1] to minimise the D-error of the information matrix of a Multinomial Logit (MNL) model.\n",
"\n",
"## Step 1: Load modules, define design parameters and set attributes\n",
"\n",
"The following lines load:\n",
"- `EffDesign`: the class of efficient designs,\n",
"- `Attribute` and `Parameter`: the classes of attributes and parameters, respectively."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from choicedesign.design import EffDesign\n",
"from choicedesign.expressions import Attribute, Parameter"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each attribute is defined by the `Attribute` class. The arguments of this class are:\n",
"\n",
"* `name`: a string with the attribute name,\n",
"* `levels`: a list of levels of the attribute,\n",
"\n",
"Each attribute is alternative-specific. Hence, attributes must be defined for each alternative that contains them.\n",
"\n",
"The following lines define 2 alternatives, named `alt1` and `alt2`, and 4 attributes named from $A$ to $D$:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"alt1_A = Attribute('alt1_A',[1,2,3])\n",
"alt1_B = Attribute('alt1_B',[1,2,3])\n",
"alt1_C = Attribute('alt1_C',[0,3,5])\n",
"alt1_D = Attribute('alt1_D',[0,1,2])\n",
"\n",
"alt2_A = Attribute('alt2_A',[1,2,3])\n",
"alt2_B = Attribute('alt2_B',[1,2,3])\n",
"alt2_C = Attribute('alt2_C',[0,3,5])\n",
"alt2_D = Attribute('alt2_D',[0,1,2])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Construct efficient design object and generate initial design matrix\n",
"\n",
"The second step consists of constructing the experimental design object, which requires the following parameters:\n",
"\n",
"- `X`: A list of `Attribute` class elements,\n",
"- `ncs`: The number of choice situations.\n",
"\n",
"The following lines define a object named `design` using `EffDesign` of 16 choice situations:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"design = EffDesign(\n",
" X = [alt1_A,alt1_B,alt1_C,alt1_D,\n",
" alt2_A,alt2_B,alt2_C,alt2_D],\n",
" ncs=18)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After the design object is defined, the method `gen_initdesign()` generates the initial design matrix. This method accepts the following optional parameters:\n",
"\n",
"* `cond`: List of conditions that the final design must hold. Each element is a string that contains a single condition. Conditions can be of the form of binary relations (e.g., `X > Y` where `X` and `Y` are attributes of a specific alternative) or conditional relations (e.g., `if X > a then Y < b` where `a` and `b` are values). Users can specify multiple conditions when the operator `if` is defined, separated by the operator `&`.\n",
"\n",
"* `seed`: Random seed\n",
"\n",
"For this example, neither of the arguments above will be used:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" alt1_A | \n",
" alt1_B | \n",
" alt1_C | \n",
" alt1_D | \n",
" alt2_A | \n",
" alt2_B | \n",
" alt2_C | \n",
" alt2_D | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 1 | \n",
" 1 | \n",
" 5 | \n",
" 0 | \n",
" 3 | \n",
" 2 | \n",
" 5 | \n",
" 2 | \n",
"
\n",
" \n",
" | 1 | \n",
" 2 | \n",
" 2 | \n",
" 5 | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" 5 | \n",
" 0 | \n",
"
\n",
" \n",
" | 2 | \n",
" 3 | \n",
" 2 | \n",
" 0 | \n",
" 2 | \n",
" 2 | \n",
" 3 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 3 | \n",
" 3 | \n",
" 1 | \n",
" 5 | \n",
" 0 | \n",
" 3 | \n",
" 2 | \n",
" 3 | \n",
" 1 | \n",
"
\n",
" \n",
" | 4 | \n",
" 3 | \n",
" 2 | \n",
" 3 | \n",
" 1 | \n",
" 3 | \n",
" 3 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" | 5 | \n",
" 1 | \n",
" 3 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 3 | \n",
" 5 | \n",
" 2 | \n",
"
\n",
" \n",
" | 6 | \n",
" 1 | \n",
" 3 | \n",
" 3 | \n",
" 0 | \n",
" 1 | \n",
" 3 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 7 | \n",
" 1 | \n",
" 3 | \n",
" 5 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
" 5 | \n",
" 1 | \n",
"
\n",
" \n",
" | 8 | \n",
" 2 | \n",
" 2 | \n",
" 0 | \n",
" 2 | \n",
" 2 | \n",
" 1 | \n",
" 3 | \n",
" 2 | \n",
"
\n",
" \n",
" | 9 | \n",
" 1 | \n",
" 3 | \n",
" 5 | \n",
" 0 | \n",
" 2 | \n",
" 2 | \n",
" 3 | \n",
" 0 | \n",
"
\n",
" \n",
" | 10 | \n",
" 3 | \n",
" 3 | \n",
" 3 | \n",
" 2 | \n",
" 2 | \n",
" 3 | \n",
" 5 | \n",
" 1 | \n",
"
\n",
" \n",
" | 11 | \n",
" 2 | \n",
" 1 | \n",
" 5 | \n",
" 0 | \n",
" 3 | \n",
" 1 | \n",
" 0 | \n",
" 2 | \n",
"
\n",
" \n",
" | 12 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 2 | \n",
" 1 | \n",
" 2 | \n",
" 5 | \n",
" 1 | \n",
"
\n",
" \n",
" | 13 | \n",
" 2 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 1 | \n",
" 3 | \n",
" 1 | \n",
"
\n",
" \n",
" | 14 | \n",
" 3 | \n",
" 3 | \n",
" 3 | \n",
" 0 | \n",
" 3 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 15 | \n",
" 3 | \n",
" 2 | \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
" 0 | \n",
"
\n",
" \n",
" | 16 | \n",
" 2 | \n",
" 2 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 2 | \n",
"
\n",
" \n",
" | 17 | \n",
" 2 | \n",
" 1 | \n",
" 3 | \n",
" 2 | \n",
" 3 | \n",
" 2 | \n",
" 3 | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" alt1_A alt1_B alt1_C alt1_D alt2_A alt2_B alt2_C alt2_D\n",
"0 1 1 5 0 3 2 5 2\n",
"1 2 2 5 2 1 1 5 0\n",
"2 3 2 0 2 2 3 0 0\n",
"3 3 1 5 0 3 2 3 1\n",
"4 3 2 3 1 3 3 0 1\n",
"5 1 3 0 1 1 3 5 2\n",
"6 1 3 3 0 1 3 0 0\n",
"7 1 3 5 1 2 3 5 1\n",
"8 2 2 0 2 2 1 3 2\n",
"9 1 3 5 0 2 2 3 0\n",
"10 3 3 3 2 2 3 5 1\n",
"11 2 1 5 0 3 1 0 2\n",
"12 1 1 0 2 1 2 5 1\n",
"13 2 1 0 1 2 1 3 1\n",
"14 3 3 3 0 3 1 0 0\n",
"15 3 2 3 1 1 2 3 0\n",
"16 2 2 0 1 1 1 0 2\n",
"17 2 1 3 2 3 2 3 2"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"init_design = design.gen_initdesign()\n",
"init_design"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Set the utility functions\n",
"\n",
"`ChoiceDesign` uses a native expression system to define utility functions.\n",
"Parameters and attributes are combined using standard arithmetic operators.\n",
"For this, we use the `Parameter` class, which requires the following arguments:\n",
"\n",
"* `name`: The parameter name\n",
"* `prior`: The prior value\n",
"\n",
"We will assume that attributes `A` and `B` are coded as dummies in which level 1 is the baseline.\n",
"Therefore, we must define additional parameters. Dummy indicators are created directly using\n",
"the `==` operator on an `Attribute`:\n",
"\n",
"The following lines define six parameters:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"beta_A_2 = Parameter('beta_A_2',-0.1)\n",
"beta_A_3 = Parameter('beta_A_3',-0.4)\n",
"\n",
"beta_B_2 = Parameter('beta_B_2',-0.02)\n",
"beta_B_3 = Parameter('beta_B_3',-0.01)\n",
"\n",
"beta_C = Parameter('beta_C',0.1)\n",
"beta_D = Parameter('beta_D',0.15)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, the utility functions are defined using standard arithmetic operators. The `==` operator on an `Attribute` returns an indicator (1 where the condition holds, 0 otherwise), which is used here for dummy coding."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"V1 = beta_A_2 * (alt1_A==2) + beta_A_3 * (alt1_A==3) + beta_B_2 * (alt1_B==2) + beta_B_3 * (alt1_B==3) + beta_C * alt1_C + beta_D * alt1_D\n",
"V2 = beta_A_2 * (alt2_A==2) + beta_A_3 * (alt2_A==3) + beta_B_2 * (alt2_B==2) + beta_B_3 * (alt2_B==3) + beta_C * alt2_C + beta_D * alt2_D"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The utility functions must be stored in a dictionary object. In this dictionary, each key is a consecutive number from 1 to the number of alternatves. The values of each key are the corresponding utility functions:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"V = {1: V1, 2: V2}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Optimise the initial design, given the utility functions and priors:\n",
"\n",
"The method `optimise()` starts the D-error minimisation routine, given the initial design matrix and the utility functions. This method requires the following parameters:\n",
"\n",
"* `init_design`: The objective design matrix to optimise\n",
"* `V`: The dictionary object with utility functions\n",
"* `model`: The base model of the efficient design. By default is `'mnl'` for a Multinomial Logit model.\n",
"\n",
"In addition, `optimise()` admits the following optional parameters:\n",
"\n",
"* `iter_lim`: number of iterations before the algorithm stops.\n",
"* `noimprov_lim`: Number of iterations without improvement before the algorithm stops,\n",
"* `time_lim`: time (in minutes) before the algorithm stops,\n",
"* `seed`: Random seed\n",
"* `verbose`: Whether status messages and progress are shown.\n",
"\n",
"The outputs of `optimise` are:\n",
"\n",
"* `optimal_design`: The optimised design matrix\n",
"* `init_perf`: The initial D-Error\n",
"* `final_perf`: The D-error of the last stored design\n",
"* `final_iter`: The last iteration number\n",
"* `ubalance_ratio`: The utility balance ratio. A 0% value indicates strict dominance of an alternative, whereas 100% indicates equal market shares.\n",
"\n",
"The following line starts the optimisation routine during 1 minute:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Evaluating initial design\n",
"Optimization complete 0:00:59 / D-error: 0.183009\n",
"Elapsed time: 0:01:00\n",
"D-error of initial design: 0.419515\n",
"D-error of last stored design: 0.183009\n",
"Utility Balance ratio: 94.31 %\n",
"Algorithm iterations: 27064\n",
"\n"
]
}
],
"source": [
"optimal_design, init_perf, final_perf, final_iter, ubalance_ratio = design.optimise(init_design=init_design,V=V,model='mnl',time_lim = 1, verbose = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Blocking the design\n",
"\n",
"The optimal design can be blocked using the method `gen_blocks()`. This method randomly creates candidate blocks and keeps the one with the minimum correlation between the blocking column and all the attributes. The method allows for the following arguments:\n",
"\n",
"- `optimal_design`: the experimental design\n",
"- `n_blocks`: number of blocks.\n",
"- `n_iter` (optional): number of iterations of the blocking algorithm\n",
"\n",
"The following line creates 4 blocks in the optimal design:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"optimal_design_blocked = design.gen_blocks(optimal_design,n_blocks=3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lastly, the optimal design can be printed:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" CS | \n",
" alt1_A | \n",
" alt1_B | \n",
" alt1_C | \n",
" alt1_D | \n",
" alt2_A | \n",
" alt2_B | \n",
" alt2_C | \n",
" alt2_D | \n",
" Block | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 3 | \n",
" 2 | \n",
" 3 | \n",
" 3 | \n",
" 3 | \n",
" 0 | \n",
" 2 | \n",
"
\n",
" \n",
" | 1 | \n",
" 2 | \n",
" 3 | \n",
" 3 | \n",
" 5 | \n",
" 2 | \n",
" 1 | \n",
" 2 | \n",
" 0 | \n",
" 0 | \n",
" 2 | \n",
"
\n",
" \n",
" | 2 | \n",
" 3 | \n",
" 3 | \n",
" 2 | \n",
" 5 | \n",
" 2 | \n",
" 2 | \n",
" 3 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" | 3 | \n",
" 4 | \n",
" 3 | \n",
" 2 | \n",
" 3 | \n",
" 0 | \n",
" 2 | \n",
" 1 | \n",
" 3 | \n",
" 2 | \n",
" 2 | \n",
"
\n",
" \n",
" | 4 | \n",
" 5 | \n",
" 2 | \n",
" 2 | \n",
" 0 | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" 5 | \n",
" 0 | \n",
" 3 | \n",
"
\n",
" \n",
" | 5 | \n",
" 6 | \n",
" 2 | \n",
" 3 | \n",
" 3 | \n",
" 1 | \n",
" 3 | \n",
" 1 | \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" | 6 | \n",
" 7 | \n",
" 2 | \n",
" 3 | \n",
" 5 | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 0 | \n",
" 2 | \n",
" 3 | \n",
"
\n",
" \n",
" | 7 | \n",
" 8 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 2 | \n",
" 5 | \n",
" 1 | \n",
" 3 | \n",
"
\n",
" \n",
" | 8 | \n",
" 9 | \n",
" 1 | \n",
" 2 | \n",
" 0 | \n",
" 0 | \n",
" 3 | \n",
" 1 | \n",
" 5 | \n",
" 2 | \n",
" 2 | \n",
"
\n",
" \n",
" | 9 | \n",
" 10 | \n",
" 1 | \n",
" 2 | \n",
" 5 | \n",
" 0 | \n",
" 2 | \n",
" 3 | \n",
" 0 | \n",
" 2 | \n",
" 2 | \n",
"
\n",
" \n",
" | 10 | \n",
" 11 | \n",
" 3 | \n",
" 3 | \n",
" 0 | \n",
" 2 | \n",
" 2 | \n",
" 2 | \n",
" 5 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" | 11 | \n",
" 12 | \n",
" 1 | \n",
" 3 | \n",
" 5 | \n",
" 1 | \n",
" 3 | \n",
" 2 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" | 12 | \n",
" 13 | \n",
" 2 | \n",
" 1 | \n",
" 3 | \n",
" 2 | \n",
" 3 | \n",
" 3 | \n",
" 3 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" | 13 | \n",
" 14 | \n",
" 3 | \n",
" 1 | \n",
" 5 | \n",
" 0 | \n",
" 1 | \n",
" 3 | \n",
" 0 | \n",
" 2 | \n",
" 1 | \n",
"
\n",
" \n",
" | 14 | \n",
" 15 | \n",
" 1 | \n",
" 3 | \n",
" 3 | \n",
" 1 | \n",
" 2 | \n",
" 1 | \n",
" 3 | \n",
" 1 | \n",
" 3 | \n",
"
\n",
" \n",
" | 15 | \n",
" 16 | \n",
" 3 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 3 | \n",
" 5 | \n",
" 1 | \n",
" 3 | \n",
"
\n",
" \n",
" | 16 | \n",
" 17 | \n",
" 2 | \n",
" 2 | \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 3 | \n",
" 1 | \n",
" 3 | \n",
"
\n",
" \n",
" | 17 | \n",
" 18 | \n",
" 2 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 3 | \n",
" 2 | \n",
" 5 | \n",
" 2 | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" CS alt1_A alt1_B alt1_C alt1_D alt2_A alt2_B alt2_C alt2_D Block\n",
"0 1 1 1 3 2 3 3 3 0 2\n",
"1 2 3 3 5 2 1 2 0 0 2\n",
"2 3 3 2 5 2 2 3 0 0 1\n",
"3 4 3 2 3 0 2 1 3 2 2\n",
"4 5 2 2 0 2 1 1 5 0 3\n",
"5 6 2 3 3 1 3 1 3 1 1\n",
"6 7 2 3 5 0 1 2 0 2 3\n",
"7 8 1 1 0 1 2 2 5 1 3\n",
"8 9 1 2 0 0 3 1 5 2 2\n",
"9 10 1 2 5 0 2 3 0 2 2\n",
"10 11 3 3 0 2 2 2 5 0 1\n",
"11 12 1 3 5 1 3 2 0 1 1\n",
"12 13 2 1 3 2 3 3 3 0 1\n",
"13 14 3 1 5 0 1 3 0 2 1\n",
"14 15 1 3 3 1 2 1 3 1 3\n",
"15 16 3 1 0 1 1 3 5 1 3\n",
"16 17 2 2 3 1 1 1 3 1 3\n",
"17 18 2 1 0 0 3 2 5 2 2"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"optimal_design"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## (optional) Evaluate the design\n",
"The method `evaluate()` allows to evaluate a design stored in a data frame, under the specification provided when `EffDesign` was initialised. `evaluate()` requires the following parameters:\n",
"\n",
"* `optimal_design`: The objective design matrix to evaluate\n",
"* `V`: The dictionary object with utility functions\n",
"* `model`: The base model of the efficient design. By default is `mnl` for a Multinomial Logit model."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.1830092370281435 94.31092661140309\n"
]
}
],
"source": [
"perf, ubalance = design.evaluate(optimal_design,V,model='mnl')\n",
"\n",
"print(perf, ubalance)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## References\n\n[1] Quan, W., Rose, J. M., Collins, A. T., & Bliemer, M. C. (2011). A comparison of algorithms for generating efficient choice experiments.\n\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "choicedesign-oSBhddzi-py3.13",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.5"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}