Chapter 3 EDINA: Exploratory Deterministic Input, Noisy “And” Gate

3.1 Methodology

3.2 Single Model Estimation

When working with a single \(K\) dimension, the easiest way to proceed is to use the edina function. This function requires:

data: Item Matrix
k: Number of Traits associated with the Q Matrix.
burnin: Amount of iterations to discard
chain_length: Amount of iterations to keep

model = edina(Y, k = 3, burnin = 10000, chain_length = 20000)

Let’s take the fraction-subtraction data that we loaded earlier and perform an estimation with it using k = 2. Please note it will take about 3 minutes to complete.

edina_fractions_k2 = edina(data = items_fractions, k = 2)

3.2.1 Structure of EDINA

Underneath the edina_fractions_k2 variable is a wealth of information regarding the model fit. This information can be used in subseqent analysis. To aide in this endeavor, we’ve crafted a series of helper functions that will be discussed next. In the interim, please feel free to look at the underlying structure of edina_fractions_k2 using str().

str(edina_fractions_k2)

## List of 15
##  $ coefficients  : num [1:20, 1:4] 0.0712 0.0757 0.027 0.2211 0.3048 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:20] "Item1" "Item2" "Item3" "Item4" ...
##   .. ..$ : chr [1:4] "Guessing" "SD(Guessing)" "Slipping" "SD(Slipping)"
##  $ loglike_summed: num -93840906
##  $ loglike_pmean : num -4671
##  $ pi_classes    : num [1:4, 1] 0.3858 0.141 0.0301 0.4431
##  $ avg_q         : num [1:20, 1:2] 0 0 0 1 0 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:20] "Item1" "Item2" "Item3" "Item4" ...
##   .. ..$ : chr [1:2] "Trait1" "Trait2"
##  $ est_q         : num [1:20, 1:2] 0 0 0 1 0 0 0 1 1 1 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:20] "Item1" "Item2" "Item3" "Item4" ...
##   .. ..$ : chr [1:2] "Trait1" "Trait2"
##  $ or_tested     : num [1:20, 1:20] 0 0 0 0 0 0 0 0 0 0 ...
##  $ sample_or     : num [1:20, 1:20] 0 0 0 0 0 0 0 0 0 0 ...
##  $ n             : int 536
##  $ j             : int 20
##  $ k             : int 2
##  $ burnin        : num 10000
##  $ chain_length  : num 20000
##  $ timing        : Named num [1:3] 62.282 0.015 62.3
##   ..- attr(*, "names")= chr [1:3] "user.self" "sys.self" "elapsed"
##  $ dataset_name  : chr "items_fractions"
##  - attr(*, "class")= chr "edina"

The details on the return values contained in the EDINA model can be found in ?edina.

3.2.2 Extracting the Q Matrix

As the Q matrix is estimated, there are two ways to extract the Q matrix. The first way involves looking directly at the average and the second way involves looking at the dichotomous state of the Q matrix (default). The latter is constructed by treating element-wise entries with values greater than 0.5 as being 1 and values less than 0.5 as being 0.

Extracting the Q matrix from an estimated model can be done using

extract_q_matrix(x, binary_q = TRUE)

Let’s view both forms of the estimated Q Matrix for edina_fractions_k2

extract_q_matrix(edina_fractions_k2)

##         Trait1  Trait2
## Item1  0.00000 1.00000
## Item2  0.00000 1.00000
## Item3  0.00000 1.00000
## Item4  1.00000 0.13255
## Item5  0.00000 1.00000
## Item6  0.00000 1.00000
## Item7  0.00000 1.00000
## Item8  0.97990 0.99595
## Item9  0.99420 0.99985
## Item10 1.00000 0.02235
## Item11 1.00000 0.00000
## Item12 0.29850 0.70215
## Item13 1.00000 1.00000
## Item14 0.00000 1.00000
## Item15 0.00000 1.00000
## Item16 0.06445 0.93745
## Item17 1.00000 0.00000
## Item18 1.00000 0.00005
## Item19 1.00000 0.99995
## Item20 1.00000 0.00000

extract_q_matrix(edina_fractions_k2, binary_q = TRUE)

##        Trait1 Trait2
## Item1       0      1
## Item2       0      1
## Item3       0      1
## Item4       1      0
## Item5       0      1
## Item6       0      1
## Item7       0      1
## Item8       1      1
## Item9       1      1
## Item10      1      0
## Item11      1      0
## Item12      0      1
## Item13      1      1
## Item14      0      1
## Item15      0      1
## Item16      0      1
## Item17      1      0
## Item18      1      0
## Item19      1      1
## Item20      1      0

An alternative way to view the estimated Q matrix is to plot it on a graph. Graphing a Q matrix is done using a heatmap to show areas strength of the estimation for a given item and trait. The underlying graphs for ecdm are constructed using the ggplot2 library and, thus, can be further manipulated by adding new layers to the plot.

q_graph(x, binary_q = TRUE, ...)

How does the average Q matrix plot differ from the dichotomous Q matrix plot?

q_graph(edina_fractions_k2)

q_graph(edina_fractions_k2, binary_q = FALSE)

3.2.3 Extracting the Model Coefficients

Outside of the estimated Q matrix, you also have the estimated slipping and guessing parameters for the EDINA model.

Recall:

Guessing represents guessing or the probability of correctly answering item \(j\) when at least one attribute is lacking, e.g. \(g_j=P\left(Y_{ij}=1|\eta_{ij}=0\right)\)
Slipping represents slipping or the probability of an incorrect response for individuals with all of the required attributes, e.g. \(s_j=P\left(Y_{ij}=0|\eta_{ij}=1\right)\).

These coefficients can be retrieved using either coef() or coefficients() akin to base R.

coef(edina_fractions_k2) # or: coefficients(edina_fractions_k2)

##          Guessing SD(Guessing)   Slipping SD(Slipping)
## Item1  0.07122652  0.018789967 0.14108199  0.020152254
## Item2  0.07569449  0.020013270 0.06824275  0.015039251
## Item3  0.02698653  0.014703219 0.13807849  0.019806591
## Item4  0.22112037  0.026061723 0.12578350  0.021945289
## Item5  0.30476626  0.031127407 0.21850436  0.023712028
## Item6  0.53652583  0.033753208 0.02510464  0.008884901
## Item7  0.02517658  0.011554679 0.33296921  0.027177836
## Item8  0.58404755  0.029383234 0.05382223  0.015911676
## Item9  0.51316566  0.029305936 0.18846420  0.025805495
## Item10 0.03385231  0.011589874 0.22674090  0.027093081
## Item11 0.06855694  0.015388724 0.07440054  0.017605533
## Item12 0.48304034  0.044582849 0.07406747  0.020937753
## Item13 0.01748661  0.008138668 0.35542825  0.030842162
## Item14 0.41547980  0.033385357 0.06322286  0.014426010
## Item15 0.05460958  0.016031512 0.26546315  0.025597122
## Item16 0.41039489  0.037750229 0.09799984  0.019094943
## Item17 0.05058086  0.014311418 0.14062367  0.022298586
## Item18 0.13095644  0.020758565 0.15592299  0.023078870
## Item19 0.02451213  0.009345748 0.33160007  0.030629486
## Item20 0.01533473  0.008277803 0.18736865  0.024921173

3.2.4 Check Identifiability

Any Q matrix can be checked to ensure that the identifiability conditions are met. In particular, we have:

For a \(J\times J\) permutation matrix \(\mathbf P\), \(\mathbf Q\) can be expressed as, \[ \mathbf P\mathbf Q = \left[\begin{array}{c} \mathbf I_K \\ \mathbf I_K\\ \widetilde{\mathbf Q} \end{array}\right],\] where \(\mathbf I_K\) is a \(K\times K\) identity matrix and \(\widetilde{\mathbf Q}\) is a \(\left(J-2K\right)\times K\) sub-matrix of \(\mathbf Q\) with column \(k\) denoted by \(\widetilde{\mathbf Q}_k\).
Each skill loads onto at least three items, which implies \(\mathbf Q_k^\prime \mathbf 1_J \geq 3\) where \(\mathbf 1_J\) is a \(J\) dimensional vector of ones. Similarly, if \(c_k\) is the \(k\)th column margin for \(\widetilde{\mathbf Q}\) (i.e., \(c_k = \widetilde{\mathbf Q}_k^\prime \mathbf 1_{J-2K}\)) this condition is equivalent to \(c_k>0\).
Each item loads onto at least one skill such that \(\mathbf q_j^\prime \mathbf1_K>0\).

To verify a Q matrix is identifiable, we can use check_identifiability().

example_Q = extract_q_matrix(edina_fractions_k2)

# Todo: switch to is_identifiable
# check_identifiability(example_Q)

3.3 Comparing Multiple Models

With the ability to estimate a variety of model under the exploratory framework, there is interest in being able to select which model and, subsequently, Q matrix is preferred. To aide in this endeavor, there exists:

auto_edina(data, k = 2:4, burnin = 10000, chain_length = 20000,
  save_results = FALSE, save_filename = "edina_model_data")

This function is slightly different than edina in the sense that it takes a range of dimensions in the k parameter. Furthermore, it offers the ability to save model objects independently of one another. This is useful for estimating higher dimensions.

Let’s estimate the models for items_fractions that have been 2 and 4 traits.

many_edina_fractions = auto_edina( items_fractions , k = 2:4)

## Starting the estimation procedure ...

## Working on k = 2 ...

## Time Elapsed: 63.648

## Working on k = 3 ...

## Time Elapsed: 109.915

## Working on k = 4 ...

## Time Elapsed: 198.309

many_edina_fractions

## The results of searching Q-matrices between 2 and 4 ...
##  k   bic  dic heuristic
##  2 10119 9672    0.4895
##  3  9726 9135    0.4526
##  4  9705 8939    0.4526

3.3.1 Best Model

As the auto_edina() model contains a set of models, we need an efficient way of deciding which model to use. For performing model selection, the package implements three different information criterion:

deviance information criterion (DIC),
bayesian information criterion (BIC), and
computing posterior predictive probabilities (PPPs) of the item means and odds ratios for each pair of items. PPPs smaller than 0.05 or greater than 0.95 to be extreme and evidence of misfit.

\[DIC = -2\left({\log p\left( {\mathbf{y}| \mathbf{\hat{\theta}} } \right) - 2\left( {\log p\left( {\mathbf{y}| \mathbf{\hat{\theta}} } \right) - \frac{1}{N}\sum\limits_{n = 1}^N {\log p\left( {\mathbf{y}|{\mathbf{\theta} _s}} \right)} } \right)} \right)\]

\[BIC = -2 \log p\left( {\mathbf{y}| \mathbf{\hat{\theta}} } \right) + (k+j)\log(n) \]

\(PPP\) Procedure:

simulating observed responses \(\mathbf Y^{(r)}\) using model parameters from iteration \(r\) of the MCMC sampler
computing the odds ratio for each pair of items at iteration \(r\) as \(OR^{(r)} = n_{11}^{(r)}n_{00}^{(r)}/\left(n_{10}^{(r)}n_{01}^{(r)}\right)\) here \(n_{11}^{(r)}\) is the frequency of ones on both variables at iteration \(r\), \(n_{10}^{(r)}\) is the frequency of ones on the first item and zeros on the second at iteration \(r\), etc.; and
computing PPPs for each item pair as the proportion of generated \(OR^{(r)}\)’s that exceeded elements of the observed odds ratios.

We can individually apply these methods to an edina object. By default, auto_edina() computes and saves the result.

DIC(edina_fractions_k2)

## [1] 9426.435

BIC(edina_fractions_k2)

## [1] 9869.614

model_heuristic(edina_fractions_k2, alpha = 0.05) ## PPP

## [1] 0.4052632

When viewing the variable containing the results of auto_edina(), the model selection information will be displayed.

many_edina_fractions

## The results of searching Q-matrices between 2 and 4 ...
##  k   bic  dic heuristic
##  2 10119 9672    0.4895
##  3  9726 9135    0.4526
##  4  9705 8939    0.4526

To extract the “best model”, we can use the best_model() function with an appropriate ic selection of either: “heuristic” (PPP), “bic”, or “dic”.

best_edina_fractions = best_model(many_edina_fractions)
best_edina_fractions

## The EDINA model for data with K = 3 
## 
## The model fit is as follows:
##  k      bic      dic heuristic
##  3 9726.208 9134.803 0.4526316
## 
## The estimated coefficients for the EDINA model are:
##        Guessing SD(Guessing) Slipping SD(Slipping)
## Item1  0.042310     0.016233  0.13641      0.01968
## Item2  0.052490     0.017322  0.06862      0.01490
## Item3  0.009615     0.009018  0.14216      0.02022
## Item4  0.205512     0.027935  0.12391      0.02110
## Item5  0.309724     0.031307  0.21319      0.02440
## Item6  0.311496     0.047629  0.04187      0.01076
## Item7  0.034765     0.015716  0.33795      0.02857
## Item8  0.580603     0.030456  0.05366      0.01584
## Item9  0.346158     0.044797  0.24968      0.02234
## Item10 0.027836     0.010947  0.22889      0.02704
## Item11 0.069236     0.015571  0.07181      0.01729
## Item12 0.204721     0.040968  0.09183      0.01571
## Item13 0.018518     0.008336  0.35122      0.03137
## Item14 0.099343     0.038027  0.06550      0.01412
## Item15 0.059198     0.016587  0.25131      0.02553
## Item16 0.124585     0.035715  0.10890      0.01719
## Item17 0.049741     0.013981  0.13736      0.02206
## Item18 0.131878     0.020573  0.15445      0.02293
## Item19 0.025475     0.009584  0.32668      0.03077
## Item20 0.015159     0.008012  0.18518      0.02478
## 
## The estimated Q matrix is:
##        Trait1  Trait2 Trait3
## Item1  0.0000 1.00000 0.0000
## Item2  0.0000 1.00000 0.0000
## Item3  0.0000 1.00000 0.0000
## Item4  0.1755 0.01135 1.0000
## Item5  0.7986 1.00000 0.0000
## Item6  1.0000 0.00000 0.0000
## Item7  0.4016 1.00000 0.0000
## Item8  0.1925 0.92820 0.9866
## Item9  0.9997 0.00350 0.0034
## Item10 0.2403 0.00145 1.0000
## Item11 0.9352 0.00000 1.0000
## Item12 1.0000 0.00000 0.0000
## Item13 0.8915 1.00000 1.0000
## Item14 1.0000 0.00000 0.0000
## Item15 1.0000 1.00000 0.0000
## Item16 1.0000 0.00000 0.0000
## Item17 0.8017 0.00000 1.0000
## Item18 0.8312 0.00000 1.0000
## Item19 0.8972 0.99995 1.0000
## Item20 0.8076 0.00000 1.0000

The output reverts to what is shown during a traditional edina estimation. This will similarly be the case if you use extract_q_matrix() on an auto_edina() object.

3.3.2 Visually Comparing Models

To better understand the set of models that was estimated, we can take a look at the collection of their values. There two graphs of particular interest here:

model selection criterion changes over time.
parameter evolution over time of slipping and guessing

parameter_evolution_graph(many_edina_fractions)

model_selection_graph(many_edina_fractions)