---
title: "Basic Usage"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Basic Usage}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r setup, include=FALSE, message=FALSE, warning=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  warning = FALSE,
  message = FALSE,
  fig.retina = 3,
  comment = "#>"
)
```

```{r, child='../man/rmdchunks/header.Rmd'}
```

## Data format

```{r, child='../man/rmdchunks/dataFormat.Rmd'}
```

The ["Data Formatting and Encoding"](data_formatting.html) vignette has more details about the required data format.

## Model specification interface

Models are specified and estimated using the `logitr()` function. The `data` argument should be set to the data frame containing the data, and the `outcome` and `obsID` arguments should be set to the column names in the data frame that correspond to the dummy-coded outcome (choice) variable and the observation ID variable, respectively. All variables to be used as model covariates should be provided as a vector of column names to the `pars` argument. Each variable in the vector is additively included as a covariate in the utility model, with the interpretation that they represent utilities in preference space models and WTPs in a WTP space model.

For example, consider a preference space model where the utility for yogurt is given by the following utility model:

```{r, child='../man/rmdchunks/yogurtUtilityPref.Rmd'}
```

\noindent
where $p_{j}$ is `price`, $x_{j1}$ is `feat`, and $x_{j2-4}$ are dummy-coded variables for each `brand` (with the fourth brand representing the reference level). This model can be estimated using the `logitr()` function as follows:

```{r}
library("logitr")

mnl_pref <- logitr(
    data    = yogurt,
    outcome = "choice",
    obsID   = "obsID",
    pars    = c("price", "feat", "brand")
)
```

The equivalent model in the WTP space is given by the following utility model:

```{r, child='../man/rmdchunks/yogurtUtilityWtp.Rmd'}
```

\noindent
To specify this model, simply move `"price"` from the `pars` argument to the `scalePar` argument:

```{r}
mnl_wtp <- logitr(
    data     = yogurt,
    outcome  = "choice",
    obsID    = "obsID",
    pars     = c("feat", "brand"),
    scalePar = "price"
)
```

\noindent
In the above model, the variables in `pars` are marginal WTPs, whereas in the preference space model they are marginal utilities. Price is separately specified with the `scalePar` argument because it acts as a scaling term in WTP space models. While price is the most typical scaling variable, other continuous variables can also be used, such as time.

Interactions between covariates can be entered in the `pars` vector separated by the `*` symbol. For example, an interaction between `price` with `feat` in a preference space model could be included by specifying `pars = c("price", "feat", "brand", "price*feat")`, or even more concisely just `pars = c("price*feat", "brand")` as the interaction between `price` and `feat` will produce individual parameters for `price` and `feat` in addition to the interaction parameter.

Both of these examples are multinomial logit models with fixed parameters. See the ["Estimating Multinomial Logit Models"](mnl_models.html) vignette for more details.

## Parallelized multi-start estimation

Since WTP space models are non-linear and have non-convex log-likelihood functions, it is recommended to use a multi-start search to run the optimization loop multiple times to search for different local minima. This is implemented using the `numMultiStarts` argument, e.g.:

```{r}
mnl_wtp <- logitr(
    data     = yogurt,
    outcome  = "choice",
    obsID    = "obsID",
    pars     = c("feat", "brand"),
    scalePar = "price",
    numMultiStarts = 10
)
```

The multi-start is parallelized by default for faster estimation, and the number of cores to use can be manually set using the `numCores` argument. If `numCores` is not provide, then the number of cores is set to `parallel::detectCores() - 1`. For models with larger data sets, you may need to set `numCores = 1` to avoid memory overflow issues.

## Mixed logit models

> See the ["Estimating Mixed Logit Models"](mxl_models.html) vignette for more details.

To estimate a mixed logit model, use the `randPars` argument in the `logitr()` function to denote which parameters will be modeled with a distribution. The current package version supports normal (`"n"`), log-normal (`"ln"`), and zero-censored normal (`"cn"`) distributions.

For example, assume the observed utility for yogurts was $v_{j} = \alpha p_{j} + \beta_1 x_{j1} + \beta_2 x_{j2} + \beta_3 x_{j3} + \beta_4 x_{j4}$, where $p_{j}$ is `price`, $x_{j1}$ is `feat`, and $x_{j2-4}$ are dummy-coded variables for `brand`. To model `feat` as well as each of the brands as normally-distributed, set `randPars = c(feat = "n", brand = "n")`:

```{r, eval=FALSE}
mxl_pref <- logitr(
    data     = yogurt,
    outcome  = 'choice',
    obsID    = 'obsID',
    pars     = c('price', 'feat', 'brand'),
    randPars = c(feat = 'n', brand = 'n'),
    numMultiStarts = 10
)
```

Since mixed logit models also have non-convex log-likelihood functions, it is recommended to use a multi-start search to run the optimization loop multiple times to search for different local minima.

## Viewing results

> See the ["Summarizing Results"](summarizing_results.html) vignette for more details.

Use the `summary()` function to print a summary of the results from an estimated model, e.g.

```{r}
summary(mnl_pref)
```

Use `statusCodes()` to print a description of each status code from the `nloptr` optimization routine.

You can also extract other values of interest at the solution, such as:

**The estimated coefficients**

```{r}
coef(mnl_pref)
```

**The coefficient standard errors**

```{r}
se(mnl_pref)
```

**The log-likelihood**

```{r}
logLik(mnl_pref)
```

**The variance-covariance matrix**

```{r}
vcov(mnl_pref)
```

## Computing and comparing WTP

For models in the preference space, a summary table of the computed WTP based on the estimated preference space parameters can be obtained with the `wtp()` function. For example, the computed WTP from the previously estimated fixed parameter model can be obtained with the following command:

```{r}
wtp(mnl_pref, scalePar = "price")
```

The `wtp()` function divides the non-price parameters by the negative of the `scalePar` parameter (usually "price"). Standard errors are estimated using the Krinsky and Robb parametric bootstrapping method [@Krinsky1986]. Similarly, the `wtpCompare()` function can be used to compare the WTP from a WTP space model with that computed from an equivalent preference space model:

```{r}
wtpCompare(mnl_pref, mnl_wtp, scalePar = "price")
```

## Predicting probabilities and outcomes

Estimated models can be used to predict probabilities and outcomes for a set (or multiple sets) of alternatives based on an estimated model. As an example, consider predicting probabilities for two of the choice observations from the `yogurt` dataset:

```{r}
data <- subset(
  yogurt, obsID %in% c(42, 13),
  select = c('obsID', 'alt', 'choice', 'price', 'feat', 'brand')
)

data
```

In the example below, the probabilities for these two sets of alternatives are computed using the fixed parameter `mnl_pref` model using the `predict()` method:

```{r}
probs <- predict(
  mnl_pref,
  newdata = data,
  obsID   = "obsID",
  ci      = 0.95
)

probs
```

The resulting `probs` data frame contains the expected probabilities for each alternative. The lower and upper predictions reflect a 95% confidence interval (controlled by the `ci` argument), which are estimated using the Krinsky and Robb parametric bootstrapping method [@Krinsky1986]. The default is `ci = NULL`, in which case no CI predictions are made.

WTP space models can also be used to predict probabilities:

```{r}
probs <- predict(
  mnl_wtp,
  newdata = data,
  obsID   = "obsID",
  ci      = 0.95
)

probs
```

You can also use the `predict()` method to predict outcomes by setting `type = "outcome"` (the default value is `"prob"` for predicting probabilities). If no new data are provided for `newdata`, then outcomes will be predicted for every alternative in the original data used to estimate the model. In the example below the `returnData` argument is also set to `TRUE` so that the predicted outcomes can be compared to the actual ones.

```{r}
outcomes <- predict(
  mnl_pref,
  type = "outcome",
  returnData = TRUE
)

head(outcomes[c('obsID', 'choice', 'predicted_outcome')])
```

You can quickly compute the accuracy by dividing the number of correctly predicted choices by the total number of choices:

```{r}
chosen <- subset(outcomes, choice == 1)
chosen$correct <- chosen$choice == chosen$predicted_outcome
sum(chosen$correct) / nrow(chosen)
```

See the ["Predicting Probabilities and Choices from Estimated Models"](predict.html) vignette for more details about making predictions.