{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Example of a D-efficient RUM design with dummies in ChoiceDesign\n", "\n", "This notebook illustrates how to use **ChoiceDesign** to generate a D-efficient experimental design with some attributes coded as dummies. Given a set of attributes and prior parameters, ChoiceDesign uses a variation of the random swapping algorithm [1] to minimise the D-error of the information matrix of a Multinomial Logit (MNL) model.\n", "\n", "## Step 1: Load modules, define design parameters and set attributes\n", "\n", "The following lines load:\n", "- `EffDesign`: the class of efficient designs,\n", "- `Attribute` and `Parameter`: the classes of attributes and parameters, respectively." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from choicedesign.design import EffDesign\n", "from choicedesign.expressions import Attribute, Parameter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each attribute is defined by the `Attribute` class. The arguments of this class are:\n", "\n", "* `name`: a string with the attribute name,\n", "* `levels`: a list of levels of the attribute,\n", "\n", "Each attribute is alternative-specific. Hence, attributes must be defined for each alternative that contains them.\n", "\n", "The following lines define 2 alternatives, named `alt1` and `alt2`, and 4 attributes named from $A$ to $D$:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "alt1_A = Attribute('alt1_A',[1,2,3])\n", "alt1_B = Attribute('alt1_B',[1,2,3])\n", "alt1_C = Attribute('alt1_C',[0,3,5])\n", "alt1_D = Attribute('alt1_D',[0,1,2])\n", "\n", "alt2_A = Attribute('alt2_A',[1,2,3])\n", "alt2_B = Attribute('alt2_B',[1,2,3])\n", "alt2_C = Attribute('alt2_C',[0,3,5])\n", "alt2_D = Attribute('alt2_D',[0,1,2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2: Construct efficient design object and generate initial design matrix\n", "\n", "The second step consists of constructing the experimental design object, which requires the following parameters:\n", "\n", "- `X`: A list of `Attribute` class elements,\n", "- `ncs`: The number of choice situations.\n", "\n", "The following lines define a object named `design` using `EffDesign` of 16 choice situations:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "design = EffDesign(\n", " X = [alt1_A,alt1_B,alt1_C,alt1_D,\n", " alt2_A,alt2_B,alt2_C,alt2_D],\n", " ncs=18)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After the design object is defined, the method `gen_initdesign()` generates the initial design matrix. This method accepts the following optional parameters:\n", "\n", "* `cond`: List of conditions that the final design must hold. Each element is a string that contains a single condition. Conditions can be of the form of binary relations (e.g., `X > Y` where `X` and `Y` are attributes of a specific alternative) or conditional relations (e.g., `if X > a then Y < b` where `a` and `b` are values). Users can specify multiple conditions when the operator `if` is defined, separated by the operator `&`.\n", "\n", "* `seed`: Random seed\n", "\n", "For this example, neither of the arguments above will be used:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
alt1_Aalt1_Balt1_Calt1_Dalt2_Aalt2_Balt2_Calt2_D
011503252
122521150
232022300
331503231
432313301
513011352
613301300
713512351
822022132
913502230
1033322351
1121503102
1211021251
1321012131
1433303100
1532311230
1622011102
1721323232
\n", "
" ], "text/plain": [ " alt1_A alt1_B alt1_C alt1_D alt2_A alt2_B alt2_C alt2_D\n", "0 1 1 5 0 3 2 5 2\n", "1 2 2 5 2 1 1 5 0\n", "2 3 2 0 2 2 3 0 0\n", "3 3 1 5 0 3 2 3 1\n", "4 3 2 3 1 3 3 0 1\n", "5 1 3 0 1 1 3 5 2\n", "6 1 3 3 0 1 3 0 0\n", "7 1 3 5 1 2 3 5 1\n", "8 2 2 0 2 2 1 3 2\n", "9 1 3 5 0 2 2 3 0\n", "10 3 3 3 2 2 3 5 1\n", "11 2 1 5 0 3 1 0 2\n", "12 1 1 0 2 1 2 5 1\n", "13 2 1 0 1 2 1 3 1\n", "14 3 3 3 0 3 1 0 0\n", "15 3 2 3 1 1 2 3 0\n", "16 2 2 0 1 1 1 0 2\n", "17 2 1 3 2 3 2 3 2" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "init_design = design.gen_initdesign()\n", "init_design" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3: Set the utility functions\n", "\n", "`ChoiceDesign` uses a native expression system to define utility functions.\n", "Parameters and attributes are combined using standard arithmetic operators.\n", "For this, we use the `Parameter` class, which requires the following arguments:\n", "\n", "* `name`: The parameter name\n", "* `prior`: The prior value\n", "\n", "We will assume that attributes `A` and `B` are coded as dummies in which level 1 is the baseline.\n", "Therefore, we must define additional parameters. Dummy indicators are created directly using\n", "the `==` operator on an `Attribute`:\n", "\n", "The following lines define six parameters:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "beta_A_2 = Parameter('beta_A_2',-0.1)\n", "beta_A_3 = Parameter('beta_A_3',-0.4)\n", "\n", "beta_B_2 = Parameter('beta_B_2',-0.02)\n", "beta_B_3 = Parameter('beta_B_3',-0.01)\n", "\n", "beta_C = Parameter('beta_C',0.1)\n", "beta_D = Parameter('beta_D',0.15)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, the utility functions are defined using standard arithmetic operators. The `==` operator on an `Attribute` returns an indicator (1 where the condition holds, 0 otherwise), which is used here for dummy coding." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "V1 = beta_A_2 * (alt1_A==2) + beta_A_3 * (alt1_A==3) + beta_B_2 * (alt1_B==2) + beta_B_3 * (alt1_B==3) + beta_C * alt1_C + beta_D * alt1_D\n", "V2 = beta_A_2 * (alt2_A==2) + beta_A_3 * (alt2_A==3) + beta_B_2 * (alt2_B==2) + beta_B_3 * (alt2_B==3) + beta_C * alt2_C + beta_D * alt2_D" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The utility functions must be stored in a dictionary object. In this dictionary, each key is a consecutive number from 1 to the number of alternatves. The values of each key are the corresponding utility functions:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "V = {1: V1, 2: V2}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3: Optimise the initial design, given the utility functions and priors:\n", "\n", "The method `optimise()` starts the D-error minimisation routine, given the initial design matrix and the utility functions. This method requires the following parameters:\n", "\n", "* `init_design`: The objective design matrix to optimise\n", "* `V`: The dictionary object with utility functions\n", "* `model`: The base model of the efficient design. By default is `'mnl'` for a Multinomial Logit model.\n", "\n", "In addition, `optimise()` admits the following optional parameters:\n", "\n", "* `iter_lim`: number of iterations before the algorithm stops.\n", "* `noimprov_lim`: Number of iterations without improvement before the algorithm stops,\n", "* `time_lim`: time (in minutes) before the algorithm stops,\n", "* `seed`: Random seed\n", "* `verbose`: Whether status messages and progress are shown.\n", "\n", "The outputs of `optimise` are:\n", "\n", "* `optimal_design`: The optimised design matrix\n", "* `init_perf`: The initial D-Error\n", "* `final_perf`: The D-error of the last stored design\n", "* `final_iter`: The last iteration number\n", "* `ubalance_ratio`: The utility balance ratio. A 0% value indicates strict dominance of an alternative, whereas 100% indicates equal market shares.\n", "\n", "The following line starts the optimisation routine during 1 minute:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Evaluating initial design\n", "Optimization complete 0:00:59 / D-error: 0.183009\n", "Elapsed time: 0:01:00\n", "D-error of initial design: 0.419515\n", "D-error of last stored design: 0.183009\n", "Utility Balance ratio: 94.31 %\n", "Algorithm iterations: 27064\n", "\n" ] } ], "source": [ "optimal_design, init_perf, final_perf, final_iter, ubalance_ratio = design.optimise(init_design=init_design,V=V,model='mnl',time_lim = 1, verbose = True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Blocking the design\n", "\n", "The optimal design can be blocked using the method `gen_blocks()`. This method randomly creates candidate blocks and keeps the one with the minimum correlation between the blocking column and all the attributes. The method allows for the following arguments:\n", "\n", "- `optimal_design`: the experimental design\n", "- `n_blocks`: number of blocks.\n", "- `n_iter` (optional): number of iterations of the blocking algorithm\n", "\n", "The following line creates 4 blocks in the optimal design:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "optimal_design_blocked = design.gen_blocks(optimal_design,n_blocks=3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lastly, the optimal design can be printed:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CSalt1_Aalt1_Balt1_Calt1_Dalt2_Aalt2_Balt2_Calt2_DBlock
01113233302
12335212002
23325223001
34323021322
45220211503
56233131311
67235012023
78110122513
89120031522
910125023022
1011330222501
1112135132011
1213213233301
1314315013021
1415133121313
1516310113513
1617223111313
1718210032522
\n", "
" ], "text/plain": [ " CS alt1_A alt1_B alt1_C alt1_D alt2_A alt2_B alt2_C alt2_D Block\n", "0 1 1 1 3 2 3 3 3 0 2\n", "1 2 3 3 5 2 1 2 0 0 2\n", "2 3 3 2 5 2 2 3 0 0 1\n", "3 4 3 2 3 0 2 1 3 2 2\n", "4 5 2 2 0 2 1 1 5 0 3\n", "5 6 2 3 3 1 3 1 3 1 1\n", "6 7 2 3 5 0 1 2 0 2 3\n", "7 8 1 1 0 1 2 2 5 1 3\n", "8 9 1 2 0 0 3 1 5 2 2\n", "9 10 1 2 5 0 2 3 0 2 2\n", "10 11 3 3 0 2 2 2 5 0 1\n", "11 12 1 3 5 1 3 2 0 1 1\n", "12 13 2 1 3 2 3 3 3 0 1\n", "13 14 3 1 5 0 1 3 0 2 1\n", "14 15 1 3 3 1 2 1 3 1 3\n", "15 16 3 1 0 1 1 3 5 1 3\n", "16 17 2 2 3 1 1 1 3 1 3\n", "17 18 2 1 0 0 3 2 5 2 2" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "optimal_design" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## (optional) Evaluate the design\n", "The method `evaluate()` allows to evaluate a design stored in a data frame, under the specification provided when `EffDesign` was initialised. `evaluate()` requires the following parameters:\n", "\n", "* `optimal_design`: The objective design matrix to evaluate\n", "* `V`: The dictionary object with utility functions\n", "* `model`: The base model of the efficient design. By default is `mnl` for a Multinomial Logit model." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.1830092370281435 94.31092661140309\n" ] } ], "source": [ "perf, ubalance = design.evaluate(optimal_design,V,model='mnl')\n", "\n", "print(perf, ubalance)" ] }, { "cell_type": "markdown", "id": "a5dff0bc", "metadata": {}, "source": [ "## Export the design\n", "\n", "Export the optimised design to Excel. The dummy coding used in the utility functions does not affect the export — the raw attribute levels are written as-is." ] }, { "cell_type": "code", "execution_count": null, "id": "127e60dc", "metadata": {}, "outputs": [], "source": [ "attr_names = {\n", " 'alt1_A': 'Attribute A', 'alt2_A': 'Attribute A',\n", " 'alt1_B': 'Attribute B', 'alt2_B': 'Attribute B',\n", " 'alt1_C': 'Attribute C', 'alt2_C': 'Attribute C',\n", " 'alt1_D': 'Attribute D', 'alt2_D': 'Attribute D',\n", "}\n", "design.export_design(optimal_design, attr_names, 'rum_dummy_design.xlsx')" ] }, { "cell_type": "markdown", "id": "54b20673", "metadata": {}, "source": [ "## Save the optimisation summary\n", "\n", "After calling `optimise()`, the method `export_output()` writes a plain-text summary of the optimisation run — design configuration, stopping criteria, criterion values, utility balance, elapsed time, and iteration count — to a file." ] }, { "cell_type": "code", "execution_count": null, "id": "b0688825", "metadata": {}, "outputs": [], "source": [ "design.export_output('rum_dummy_output.txt')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "\n", "[1] Quan, W., Rose, J. M., Collins, A. T., & Bliemer, M. C. (2011). A comparison of algorithms for generating efficient choice experiments.\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "choicedesign-oSBhddzi-py3.13", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.5" } }, "nbformat": 4, "nbformat_minor": 2 }