Title: | Derivation of Regression-Based Normative Data |
---|---|
Description: | Normative data are often used to estimate the relative position of a raw test score in the population. This package allows for deriving regression-based normative data. It includes functions that enable the fitting of regression models for the mean and residual (or variance) structures, test the model assumptions, derive the normative data in the form of normative tables or automatic scoring sheets, and estimate confidence intervals for the norms. This package accompanies the book Van der Elst, W. (2024). Regression-based normative data for psychological assessment. A hands-on approach using R. Springer Nature. |
Authors: | Wim Van der Elst [aut, cre]
|
Maintainer: | Wim Van der Elst <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1 |
Built: | 2025-03-12 04:01:30 UTC |
Source: | https://github.com/cran/NormData |
The function Stage.2.NormScore()
can be used to convert a raw test score of a tested person into a percentile rank
(taking into account specified values of the independent variables). The function
Bootstrap.Stage.2.NormScore()
can be used to obtain a confidence interval (CI) around the point estimate of the percentile rank . A non-parametric bootstrap is used to compute a confidence interval (CI) around the estimated percentile rank (for details, see Chapter 8 in Van der Elst, 2023).
Bootstrap.Stage.2.NormScore(Stage.2.NormScore, CI=.99, Number.Bootstraps=2000, Seed=123, Rounded=FALSE, Show.Fitted.Boot=FALSE, verbose=TRUE)
Bootstrap.Stage.2.NormScore(Stage.2.NormScore, CI=.99, Number.Bootstraps=2000, Seed=123, Rounded=FALSE, Show.Fitted.Boot=FALSE, verbose=TRUE)
Stage.2.NormScore |
A fitted object of class |
CI |
The desired CI around the percentile rank for the raw test score at hand. Default |
Number.Bootstraps |
The number of bootstrap samples that are taken. Default |
Seed |
The seed to be used in the bootstrap (for repoducibility). Default |
Rounded |
Logical. Should the percentile rank be rounded to a whole number? Default |
Show.Fitted.Boot |
Logical. Should the fitted Stage 1 models for the bootstrap samples be printed? Default |
verbose |
A logical value indicating whether verbose output should be generated. |
For details, see Chapter 8 in Van der Elst (2023).
An object of class Stage.2.NormScore
with components,
CI.Percentile |
The bootstrapped CI around the estimated percentile rank. |
CI |
The CI used. |
All.Percentiles |
All bootstrapped percentile ranks for the raw test score at hand. |
Assume.Homoscedasticity |
Logical. Was homoscedasticity assumed in the normative conversion? For details, see |
Assume.Normality |
Logical. Was normality assumed in the normative conversion? For details, see |
Stage.2.NormScore |
The fitted |
Percentile.Point.Estimate |
The point estimate for the percentile rank (based on the original dataset). |
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
# Time-intensive part # Replicate the bootstrap results that were obtained in # Case study 1 of Chapter 8 in Van der Elst (2023) # ----------------------------------------------------- library(NormData) # load the NormData package data(GCSE) # load the GCSE dataset # Fit the Stage 1 model Model.1.GCSE <- Stage.1(Dataset=GCSE, Model=Science.Exam~Gender) # Stage 2: Convert a science exam score = 30 obtained by a # female into a percentile rank (point estimate) Normed_Score <- Stage.2.NormScore(Stage.1.Model=Model.1.GCSE, Score=list(Science.Exam=30, Gender="F"), Rounded = FALSE) summary(Normed_Score) # Derive the 99pc CI around the point estimate # using a bootstrap procedure Bootstrap_Normed_Score <- Bootstrap.Stage.2.NormScore( Stage.2.NormScore=Normed_Score) summary(Bootstrap_Normed_Score) plot(Bootstrap_Normed_Score) # Replicate the bootstrap results that were obtained in # Case study 2 of Chapter 8 in Van der Elst (2023) # ------------------------------------------------ library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset # Make the new variable Age.C (= Age centered) that is # needed to fit the final Stage 1 model, # and add it to the Substitution dataset Substitution$Age.C <- Substitution$Age - 50 # Fit the final Stage 1 model Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) summary(Substitution.Model.9) # Convert an LDST score = 40 obtained by a # 20-year-old test participant with LE=Low # into a percentile rank (point estimate) Normed_Score <- Stage.2.NormScore( Stage.1.Model=Substitution.Model.9, Score=list(LDST=40, Age.C=20-50, LE = "Low"), Rounded = FALSE) # Derive the 99pc CI around the point estimate # using a bootstrap Bootstrap_Normed_Score <- Bootstrap.Stage.2.NormScore( Stage.2.NormScore = Normed_Score) summary(Bootstrap_Normed_Score) plot(Bootstrap_Normed_Score)
# Time-intensive part # Replicate the bootstrap results that were obtained in # Case study 1 of Chapter 8 in Van der Elst (2023) # ----------------------------------------------------- library(NormData) # load the NormData package data(GCSE) # load the GCSE dataset # Fit the Stage 1 model Model.1.GCSE <- Stage.1(Dataset=GCSE, Model=Science.Exam~Gender) # Stage 2: Convert a science exam score = 30 obtained by a # female into a percentile rank (point estimate) Normed_Score <- Stage.2.NormScore(Stage.1.Model=Model.1.GCSE, Score=list(Science.Exam=30, Gender="F"), Rounded = FALSE) summary(Normed_Score) # Derive the 99pc CI around the point estimate # using a bootstrap procedure Bootstrap_Normed_Score <- Bootstrap.Stage.2.NormScore( Stage.2.NormScore=Normed_Score) summary(Bootstrap_Normed_Score) plot(Bootstrap_Normed_Score) # Replicate the bootstrap results that were obtained in # Case study 2 of Chapter 8 in Van der Elst (2023) # ------------------------------------------------ library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset # Make the new variable Age.C (= Age centered) that is # needed to fit the final Stage 1 model, # and add it to the Substitution dataset Substitution$Age.C <- Substitution$Age - 50 # Fit the final Stage 1 model Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) summary(Substitution.Model.9) # Convert an LDST score = 40 obtained by a # 20-year-old test participant with LE=Low # into a percentile rank (point estimate) Normed_Score <- Stage.2.NormScore( Stage.1.Model=Substitution.Model.9, Score=list(LDST=40, Age.C=20-50, LE = "Low"), Rounded = FALSE) # Derive the 99pc CI around the point estimate # using a bootstrap Bootstrap_Normed_Score <- Bootstrap.Stage.2.NormScore( Stage.2.NormScore = Normed_Score) summary(Bootstrap_Normed_Score) plot(Bootstrap_Normed_Score)
The function Stage.2.NormTable()
is used to derive a normative table that shows the percentile ranks that correspond to a wide range of raw test scores
(stratified by the relevant independent variables). The function
Bootstrap.Stage.2.NormTable()
can be used to obtain confidence intervals (CIs) around the point estimates of the percentile ranks in the normative table. A non-parametric bootstrap is used to compute these CIs (for details, see Chapter 8 in Van der Elst, 2023).
Bootstrap.Stage.2.NormTable(Stage.2.NormTable, CI=.99, Number.Bootstraps=2000, Seed=123, Rounded=FALSE, Show.Fitted.Boot=FALSE, verbose=TRUE)
Bootstrap.Stage.2.NormTable(Stage.2.NormTable, CI=.99, Number.Bootstraps=2000, Seed=123, Rounded=FALSE, Show.Fitted.Boot=FALSE, verbose=TRUE)
Stage.2.NormTable |
A fitted object of class |
CI |
The desired CI around the percentile ranks. Default |
Number.Bootstraps |
The number of bootstrap samples that are taken. Default |
Seed |
The seed to be used in the bootstrap (for repoducibility). Default |
Rounded |
Logical. Should the percentile ranks that are shown in the normative table be rounded to a whole number? Default |
Show.Fitted.Boot |
Logical. Should the fitted Stage 1 models for the bootstrap samples be printed? Default |
verbose |
A logical value indicating whether verbose output should be generated. |
For details, see Chapter 8 in Van der Elst (2023).
An object of class Stage.2.NormTable
with components,
NormTable.With.CI |
The normative table with the bootstrapped CI. |
CI |
The CI used. |
Assume.Homoscedasticity |
Logical. Was homoscedasticity assumed in the normative conversion? For details, see |
Assume.Normality |
Logical. Was normality assumed in the in the normative conversion? For details, see |
NormTable.With.CI.Min |
A table with the lower bounds of the CIs. |
NormTable.With.CI.Max |
A table with the upper bounds of the CIs. |
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
# Time-intensive part # Replicate the bootstrap results that were obtained in # Case study 1 of Chapter 8 in Van der Elst (2023) # ----------------------------------------------------- library(NormData) # load the NormData package data(GCSE) # load the GCSE dataset # Fit the Stage 1 model Model.1.GCSE <- Stage.1(Dataset=GCSE, Model=Science.Exam~Gender) # Normative table with CIs NormTable.GCSE <- Stage.2.NormTable( Stage.1.Model=Model.1.GCSE, Test.Scores=seq(from=10, to=85, by=5), Grid.Norm.Table=data.frame(Gender=c("F", "M")), Rounded = FALSE) summary(NormTable.GCSE) # Bootstrap the CIs Bootstrap_NormTable.GCSE <- Bootstrap.Stage.2.NormTable( Stage.2.NormTable = NormTable.GCSE) summary(Bootstrap_NormTable.GCSE) # Replicate the bootstrap results that were obtained in # Case study 2 of Chapter 8 in Van der Elst (2023) # ------------------------------------------------ library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset # Make the new variable Age.C (= Age centered) that is # needed to fit the final Stage 1 model, # and add it to the Substitution dataset Substitution$Age.C <- Substitution$Age - 50 # Fit the final Stage 1 model Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) summary(Substitution.Model.9) # Make the normative table NormTable.LDST <- Stage.2.NormTable( Stage.1.Model=Substitution.Model.9, Test.Scores=seq(from=25, to=40, by=5), Grid.Norm.Table=expand.grid( Age.C=seq(from=-30, to=30, by = 1), LE=c("Low", "Average", "High")), Rounded = FALSE) # Bootstrap the CIs Bootstrap_NormTable.LDST <- Bootstrap.Stage.2.NormTable( Stage.2.NormTable = NormTable.LDST) summary(Bootstrap_NormTable.LDST)
# Time-intensive part # Replicate the bootstrap results that were obtained in # Case study 1 of Chapter 8 in Van der Elst (2023) # ----------------------------------------------------- library(NormData) # load the NormData package data(GCSE) # load the GCSE dataset # Fit the Stage 1 model Model.1.GCSE <- Stage.1(Dataset=GCSE, Model=Science.Exam~Gender) # Normative table with CIs NormTable.GCSE <- Stage.2.NormTable( Stage.1.Model=Model.1.GCSE, Test.Scores=seq(from=10, to=85, by=5), Grid.Norm.Table=data.frame(Gender=c("F", "M")), Rounded = FALSE) summary(NormTable.GCSE) # Bootstrap the CIs Bootstrap_NormTable.GCSE <- Bootstrap.Stage.2.NormTable( Stage.2.NormTable = NormTable.GCSE) summary(Bootstrap_NormTable.GCSE) # Replicate the bootstrap results that were obtained in # Case study 2 of Chapter 8 in Van der Elst (2023) # ------------------------------------------------ library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset # Make the new variable Age.C (= Age centered) that is # needed to fit the final Stage 1 model, # and add it to the Substitution dataset Substitution$Age.C <- Substitution$Age - 50 # Fit the final Stage 1 model Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) summary(Substitution.Model.9) # Make the normative table NormTable.LDST <- Stage.2.NormTable( Stage.1.Model=Substitution.Model.9, Test.Scores=seq(from=25, to=40, by=5), Grid.Norm.Table=expand.grid( Age.C=seq(from=-30, to=30, by = 1), LE=c("Low", "Average", "High")), Rounded = FALSE) # Bootstrap the CIs Bootstrap_NormTable.LDST <- Bootstrap.Stage.2.NormTable( Stage.2.NormTable = NormTable.LDST) summary(Bootstrap_NormTable.LDST)
Helper function to check the validity of the homoscedasticity and normality assumptions for a fitted Stage 1 model
Check.Assum(Stage.1.Model)
Check.Assum(Stage.1.Model)
Stage.1.Model |
The fitted |
For details, see Van der Elst (2023).
An object of class Check.Assum
with component,
Assume.Homo.S2 |
Is the homoscedasticity assumption valid? |
Assume.Normality.S2 |
Is the normality assumption valid? |
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
data("Substitution") # Fit a model with a linear mean prediction function Fit <- Stage.1(Dataset = Substitution, Model = LDST~Age) Check.Assum(Fit) # Output shows that the homoscedasticity and normality # assumptions are both violated
data("Substitution") # Fit a model with a linear mean prediction function Fit <- Stage.1(Dataset = Substitution, Model = LDST~Age) Check.Assum(Fit) # Output shows that the homoscedasticity and normality # assumptions are both violated
The function CheckFit()
allows for evaluating the fit of the mean structure of a regression model by comparing sample means and model-predicted means. If the model fits the data well, there should be a good agreement between the sample means and the predicted mean test scores in the relevant subgroups. When the model only contains (binary and/or non-binary) qualitative independent variables, the subgroups correspond to all possible combinations of the different levels of the qualitative variables. When there are quantitative independent variables in the model, these have to be discretized first.
CheckFit(Stage.1.Model, Means, CI=.99, Digits=6)
CheckFit(Stage.1.Model, Means, CI=.99, Digits=6)
Stage.1.Model |
The fitted |
Means |
A formula in the form of |
CI |
The required confidence limits. Default |
Digits |
The number of digits used when showing the results. Default |
For details, see Van der Elst (2023).
An object of class CheckFit
with component,
Results.Observed |
A table with the means, SDs, and N for the observed test score, for each combination of independent variable levels. |
Results.Predicted |
A table with the mean predicted test scores, for each combination of independent variable levels. |
Miss |
The number of missing observations in the dataset. |
Dataset |
The dataset used in the analysis. |
Model |
The specified model for the mean. |
CI |
The requested CI around the mean. |
N |
The sample size of the specified dataset. |
Stage.1.Model |
The fitted |
Saturated |
Is the fitted |
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
# Replicate the fit plot that was obtained in # Case study 1 of Chapter 7 in Van der Elst (2023) # ------------------------------------------------ library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset head(Substitution) # have a look at the first datalines in # the Substitution dataset # Final Stage 1 model Substitution$Age.C <- Substitution$Age - 50 # Add Age_Group (that discretizes the quantitative variable Age # into 6 groups with a span of 10 years in the dataset for use # by the CheckFit() function later on) Substitution$Age_Group <- cut(Substitution$Age, breaks=seq(from=20, to=80, by=10)) Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) # Examine fit Fit.LDST <- CheckFit(Stage.1.Model=Substitution.Model.9, Means=LDST~Age_Group+LE) summary(Fit.LDST) plot(Fit.LDST) # Replicate the fit plot that was obtained in # Case study 2 of Chapter 7 in Van der Elst (2023) # ------------------------------------------------ library(NormData) # load the NormData package data(VLT) # load the VLT dataset head(VLT) # have a look at the first datalines in # the VLT dataset # Fit the final Stage 1 model VLT$Age.C <- VLT$Age - 50 VLT$Age.C2 <- (VLT$Age - 50)**2 # Add Age_Group (that discretizes the quantitative variable Age # into 6 groups with a span of 10 years in the dataset for use # by the CheckFit() function later on) VLT$Age_Group <- cut(VLT$Age, breaks=seq(from=20, to=80, by=10)) VLT.Model.4 <- Stage.1(Dataset = VLT, Alpha = .005, Model = Total.Recall ~ Age.C+Age.C2+Gender+LE+Age.C:Gender) # Examine fit using fit plots for the Age Group by # LE by Gender subgroups Fit.Means.Total.Recall <- CheckFit(Stage.1.Model=VLT.Model.4, Means=Total.Recall~Age_Group+LE+Gender) summary(Fit.Means.Total.Recall) plot(Fit.Means.Total.Recall)
# Replicate the fit plot that was obtained in # Case study 1 of Chapter 7 in Van der Elst (2023) # ------------------------------------------------ library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset head(Substitution) # have a look at the first datalines in # the Substitution dataset # Final Stage 1 model Substitution$Age.C <- Substitution$Age - 50 # Add Age_Group (that discretizes the quantitative variable Age # into 6 groups with a span of 10 years in the dataset for use # by the CheckFit() function later on) Substitution$Age_Group <- cut(Substitution$Age, breaks=seq(from=20, to=80, by=10)) Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) # Examine fit Fit.LDST <- CheckFit(Stage.1.Model=Substitution.Model.9, Means=LDST~Age_Group+LE) summary(Fit.LDST) plot(Fit.LDST) # Replicate the fit plot that was obtained in # Case study 2 of Chapter 7 in Van der Elst (2023) # ------------------------------------------------ library(NormData) # load the NormData package data(VLT) # load the VLT dataset head(VLT) # have a look at the first datalines in # the VLT dataset # Fit the final Stage 1 model VLT$Age.C <- VLT$Age - 50 VLT$Age.C2 <- (VLT$Age - 50)**2 # Add Age_Group (that discretizes the quantitative variable Age # into 6 groups with a span of 10 years in the dataset for use # by the CheckFit() function later on) VLT$Age_Group <- cut(VLT$Age, breaks=seq(from=20, to=80, by=10)) VLT.Model.4 <- Stage.1(Dataset = VLT, Alpha = .005, Model = Total.Recall ~ Age.C+Age.C2+Gender+LE+Age.C:Gender) # Examine fit using fit plots for the Age Group by # LE by Gender subgroups Fit.Means.Total.Recall <- CheckFit(Stage.1.Model=VLT.Model.4, Means=Total.Recall~Age_Group+LE+Gender) summary(Fit.Means.Total.Recall) plot(Fit.Means.Total.Recall)
This function checks the coding of a variable, e.g., the dummy-coding scheme that will be used for binary or qualitative variables.
Coding(x, verbose=TRUE)
Coding(x, verbose=TRUE)
x |
The variable to be evaluated. |
verbose |
A logical value indicating whether verbose output should be generated. |
No return value, called for side effects.
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
data(Substitution) Coding(Substitution$LE)
data(Substitution) Coding(Substitution$LE)
Plot densities for an outcome for different subgroups.
Densities(Dataset, Test.Score, IV, Color=TRUE, Size.Legend=1, xlab="Test score", main, ...)
Densities(Dataset, Test.Score, IV, Color=TRUE, Size.Legend=1, xlab="Test score", main, ...)
Dataset |
The name of the dataset. |
Test.Score |
The name of the outcome variable (e.g., a raw test score). |
IV |
The name of the stratification variable, that defines for which subgroups density plots should be provided. If |
Color |
Logical. Should densities for different subgroups be depicted in color? Default |
Size.Legend |
The size of the legend in the plot. Default |
xlab |
The label on the X-axis. Default |
main |
The title of the plot. |
... |
Other arguments to be passed to the |
No return value, called for side effects.
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
# Plot Gender-specific densities of the raw science exam # scores in the GCSE dataset data(GCSE) Densities(Dataset = GCSE, Test.Score = Science.Exam, IV=Gender) # Plot LE-specific densities of the residuals of a model # where the Openness scale score is regressed on LE data(Personality) Fit <- Stage.1(Dataset = Personality, Model = Openness~LE) summary(Fit) Data.With.Residuals <- data.frame(Personality, Fit$HomoNorm$Residuals) Densities(Dataset = Data.With.Residuals, Test.Score = Fit.HomoNorm.Residuals, IV = LE)
# Plot Gender-specific densities of the raw science exam # scores in the GCSE dataset data(GCSE) Densities(Dataset = GCSE, Test.Score = Science.Exam, IV=Gender) # Plot LE-specific densities of the residuals of a model # where the Openness scale score is regressed on LE data(Personality) Fit <- Stage.1(Dataset = Personality, Model = Openness~LE) summary(Fit) Data.With.Residuals <- data.frame(Personality, Fit$HomoNorm$Residuals) Densities(Dataset = Data.With.Residuals, Test.Score = Fit.HomoNorm.Residuals, IV = LE)
This function provides summary statistics of a test score (i.e., the mean, SD, N, standard error of the mean, and CI of the mean), stratified by the independent variable(s) of interest. The independent variables should be factors (i.e., binary or non-binary qualitiative variables).
ExploreData(Dataset, Model, CI=.99, Digits=6)
ExploreData(Dataset, Model, CI=.99, Digits=6)
Dataset |
A dataset. |
Model |
A formula in the form of |
CI |
The CI for the mean. Default |
Digits |
The number of digits used when showing the results. Default |
For details, see Van der Elst (2023).
An object of class ExploreData
with component,
Results |
A table with the summary statistics. |
Miss |
The number of missing observations in the dataset. |
Dataset |
The dataset used in the analysis. |
Model |
The specified model. |
CI |
The requested CI around the mean. |
N |
The sample size of the specified dataset. |
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
# Replicate the exploratory analyses that were conducted # in Case study 1 of Chapter 5 in Van der Elst (2023) # ------------------------------------------------------ library(NormData) # load the NormData package data(Personality) # load the Personality dataset Explore_Openness <- ExploreData(Dataset=Personality, Model=Openness~LE) summary(Explore_Openness) plot(Explore_Openness, main="Mean Openness scale scores and 99pc CIs") # Replicate the exploratory analyses that were conducted # in Case study 1 of Chapter 7 in Van der Elst (2023) # ------------------------------------------------------ library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset head(Substitution) # have a look at the first datalines in # the Substitution dataset # First make a new variable Age_Group, that discretizes the # quantitative variable Age into 6 groups with a span of 10 years Substitution$Age_Group <- cut(Substitution$Age, breaks=seq(from=20, to=80, by=10)) # Compute descriptives of the LDST score for different Age Group # by LE combinations Explore.LDST.Age.LE <- ExploreData(Dataset=Substitution, Model=LDST~Age_Group+LE) summary(Explore.LDST.Age.LE) # Make a plot of the results. plot(Explore.LDST.Age.LE, main="Mean (99pc CI) LDST scores by Age group and LE") # Compute descriptives of the LDST score for different # Age Group by Gender combinations Explore.LDST.Age.Gender <- ExploreData(Dataset=Substitution, Model=LDST~Age_Group+Gender) # Plot the results plot(Explore.LDST.Age.Gender, main="Mean (99pc CI) LDST scores by Age group and Gender") # Compute descriptives of the LDST score for different # LE by Gender combinations Explore.LDST.LE.Gender <- ExploreData(Dataset=Substitution, Model=LDST~LE+Gender) # Plot the results plot(Explore.LDST.LE.Gender, main="Mean (99pc CI) LDST scores by LE and Gender") # Compute summary statistics of the LDST score in the # Age Group by LE by Gender combinations Explore.LDST <- ExploreData(Dataset=Substitution, Model=LDST~Age_Group+LE+Gender) # Plot the results plot(Explore.LDST)
# Replicate the exploratory analyses that were conducted # in Case study 1 of Chapter 5 in Van der Elst (2023) # ------------------------------------------------------ library(NormData) # load the NormData package data(Personality) # load the Personality dataset Explore_Openness <- ExploreData(Dataset=Personality, Model=Openness~LE) summary(Explore_Openness) plot(Explore_Openness, main="Mean Openness scale scores and 99pc CIs") # Replicate the exploratory analyses that were conducted # in Case study 1 of Chapter 7 in Van der Elst (2023) # ------------------------------------------------------ library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset head(Substitution) # have a look at the first datalines in # the Substitution dataset # First make a new variable Age_Group, that discretizes the # quantitative variable Age into 6 groups with a span of 10 years Substitution$Age_Group <- cut(Substitution$Age, breaks=seq(from=20, to=80, by=10)) # Compute descriptives of the LDST score for different Age Group # by LE combinations Explore.LDST.Age.LE <- ExploreData(Dataset=Substitution, Model=LDST~Age_Group+LE) summary(Explore.LDST.Age.LE) # Make a plot of the results. plot(Explore.LDST.Age.LE, main="Mean (99pc CI) LDST scores by Age group and LE") # Compute descriptives of the LDST score for different # Age Group by Gender combinations Explore.LDST.Age.Gender <- ExploreData(Dataset=Substitution, Model=LDST~Age_Group+Gender) # Plot the results plot(Explore.LDST.Age.Gender, main="Mean (99pc CI) LDST scores by Age group and Gender") # Compute descriptives of the LDST score for different # LE by Gender combinations Explore.LDST.LE.Gender <- ExploreData(Dataset=Substitution, Model=LDST~LE+Gender) # Plot the results plot(Explore.LDST.LE.Gender, main="Mean (99pc CI) LDST scores by LE and Gender") # Compute summary statistics of the LDST score in the # Age Group by LE by Gender combinations Explore.LDST <- ExploreData(Dataset=Substitution, Model=LDST~Age_Group+LE+Gender) # Plot the results plot(Explore.LDST)
This dataset contains the scores of the Fruits Verbal Fluency Test. The test participants were instructed to generate as many words as possible that belong to the category ‘fruits’ (e.g., apple, orange, banana, etc.) within
seconds. These are simulated data based on the results described in Rivera et al. (2019).
data(Fluency)
data(Fluency)
A data.frame
with observations on
variables.
Id
The Id number of the test participant.
Country
The country where the test participant lives, coded as a factor.
Fruits
The number of correctly generated fruit names. Higher score is better.
Rivera et al. (2019). Normative Data For Verbal Fluency in Healthy Latin American Adults: Letter M, and Fruits and Occupations Categories. Neuropsychology, 33, 287-300.
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
Fit a fractional polynomial model with terms of the form
, where the exponents
are selected from a small predefined set
of both integer and non-integer values. This function can be useful to model the mean or variance prediction function in a more flexible way than by using linear, quadratic or cubic polynomials.
Fract.Poly(IV, Outcome, S=c(-3, -2.5, -2.0, -1.5, -1, -0.5, 0.5, 1, 1.5, 2, 2.5, 3), Max.M=3)
Fract.Poly(IV, Outcome, S=c(-3, -2.5, -2.0, -1.5, -1, -0.5, 0.5, 1, 1.5, 2, 2.5, 3), Max.M=3)
IV |
The Independent Variable to be considered in the model. |
Outcome |
The outcome to be considered in the model. |
S |
The set |
Max.M |
The maximum order |
All.Results |
The results (powers and AIC values) of the fractional polynomials. |
Lowest.AIC |
Table with the fractional polynomial model that has the lowest AIC. |
Best.Model |
The best fitted model ( |
IV |
The IV tha was considered in the model. |
Outcome |
The outcome that was considered in the model. |
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
data(VLT) # Fit fractional polynomials of orders 1 to 2 FP <- Fract.Poly(IV = VLT$Age, Outcome = VLT$Total.Recall, Max.M=2) FP$Lowest.AIC FP$Best.Model # Model with lowest AIC: 127.689 + (-190.731 * (Age**(-0.5))) + # (-7.586 * (Age**(0.5))) # Make plot plot(x=VLT$Age, y=VLT$Total.Recall, col="grey") # add best fitted fractional polynomial Age.Vals.Plot <- 20:80 Pred.Vals <- 127.689 + (-190.731 * (Age.Vals.Plot**(-0.5))) + (-7.586 * (Age.Vals.Plot**(0.5))) lines(x=Age.Vals.Plot, y=Pred.Vals, lwd=2, col="red", lty=2) legend("topright", lwd=2, col="red", lty=2, legend="Mean Prediction Function, Fractional Polynomial")
data(VLT) # Fit fractional polynomials of orders 1 to 2 FP <- Fract.Poly(IV = VLT$Age, Outcome = VLT$Total.Recall, Max.M=2) FP$Lowest.AIC FP$Best.Model # Model with lowest AIC: 127.689 + (-190.731 * (Age**(-0.5))) + # (-7.586 * (Age**(0.5))) # Make plot plot(x=VLT$Age, y=VLT$Total.Recall, col="grey") # add best fitted fractional polynomial Age.Vals.Plot <- 20:80 Pred.Vals <- 127.689 + (-190.731 * (Age.Vals.Plot**(-0.5))) + (-7.586 * (Age.Vals.Plot**(0.5))) lines(x=Age.Vals.Plot, y=Pred.Vals, lwd=2, col="red", lty=2) legend("topright", lwd=2, col="red", lty=2, legend="Mean Prediction Function, Fractional Polynomial")
Thiis dataset contains the scores on a written science exam (General Certificate of Secondary Education; GCSE) that is taken by students in
schools in England. The exam is taken at the end of compulsory schooling, when students are typically
years old. The actual score maximum is
, but here a rescaled score (with max value
) is provided. The data originally come from the package
mlmRev
, dataset Gcsemv
.
data(GCSE)
data(GCSE)
A data.frame
with observations on
variables.
Id
The Id number of the student.
Gender
The gender of the student, coded as M = male and F = female.
Science.Exam
The science exam score.
The function GLT
fits two nested linear regression models (that are referred to as the unrestricted and the restricted models), and evaluates whether or not the fit of both models differs significantly.
GLT(Dataset, Unrestricted.Model, Restricted.Model, Alpha=0.05, Alpha.Homosc=0.05, Assume.Homoscedasticity=NULL)
GLT(Dataset, Unrestricted.Model, Restricted.Model, Alpha=0.05, Alpha.Homosc=0.05, Assume.Homoscedasticity=NULL)
Dataset |
A |
Unrestricted.Model |
The unrestricted regression model to be fitted. A formula should be provided using the syntaxis of the |
Restricted.Model |
The restricted regression model to be fitted. |
Alpha |
The significance level that should be used in the GLT procedure. Default |
Alpha.Homosc |
The significance level to conduct the homoscedasticity test. If the unrestricted model only contains qualitative independent variables, the Levene test is used. If the model contains at least one quantitative independent variables, the Breusch-Pagan test is used. If the homoscedasticity assumption is violated, a heteroscedasticity-robust |
Assume.Homoscedasticity |
Logical. The |
For details, see Van der Elst (2023).
An object of class GLT
with components,
F.Test.Stat.Results |
The result of the GLT procedure, i.e., the SSEs and DFs the fitted unrestricted and restricted models, and the |
Fit.Unrestricted.Model |
The fitted unrestricted model. |
Fit.Restricted.Model |
The fitted restricted model. |
Alpha |
The significance level that was used. |
p.val.homoscedasticity |
The p-value that was used in the homoscedasticity test for the unrestricted model. |
F.Test.Hetero.Robust |
The result of the heteroscedasticity-robust |
Alpha.Homoscedasticity |
The significance level that was used to conduct the homoscedasticity test. Default |
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
# Replicate the GLT results that were obtained in # Case study 1 of Chapter 5 in Van der Elst (2023) # ------------------------------------------------ data(Personality) GLT.Openness <- GLT(Dataset=Personality, Unrestricted.Model=Openness~LE, Restricted.Model=Openness~1) summary(GLT.Openness) # Replicate the GLT results that were obtained in # Case study 2 of Chapter 5 in Van der Elst (2023) # ------------------------------------------------ data(Fluency) GLT.Fruits <- GLT(Dataset=Fluency, Unrestricted.Model=Fruits~LE, Restricted.Model=Fruits~1) summary(GLT.Fruits)
# Replicate the GLT results that were obtained in # Case study 1 of Chapter 5 in Van der Elst (2023) # ------------------------------------------------ data(Personality) GLT.Openness <- GLT(Dataset=Personality, Unrestricted.Model=Openness~LE, Restricted.Model=Openness~1) summary(GLT.Openness) # Replicate the GLT results that were obtained in # Case study 2 of Chapter 5 in Van der Elst (2023) # ------------------------------------------------ data(Fluency) GLT.Fruits <- GLT(Dataset=Fluency, Unrestricted.Model=Fruits~LE, Restricted.Model=Fruits~1) summary(GLT.Fruits)
The function ICC computes the intra class correlation. The ICC corresponds to the proportion of the total variance in the residuals that is accounted for by the clustering variable at hand (Kutner et al., 2005).
ICC(Cluster, Test.Score, Dataset, CI = 0.95)
ICC(Cluster, Test.Score, Dataset, CI = 0.95)
Cluster |
The name of the clustering variable in the dataset. |
Test.Score |
The name of the outcome variable in the dataset (e.g., a test score). |
Dataset |
A dataset. |
CI |
The required confidence limits around the ICC. Default |
This function is a modification of the ICCest
function from the ICC
package (v2.3.0), with minimal changes. For details of the original function, see https://cran.r-project.org/web/packages/ICC/ICC.pdf. The author of the original function is Matthew Wolak.
An object of class ICC
with component,
ICC |
The intra class correlation coefficient. |
LowerCI |
The lower bound of the CI around the ICC. |
UpperCI |
The upper bound of the CI around the ICC. |
Num.Clusters |
The number of clusters in the dataset. |
Mean.Cluster.Size |
The mean number of observations per cluster. |
Data |
The dataset used in the analysis (observations with missing values are excluded). |
N.Dataset |
The sample size of the full dataset. |
N.Removed |
The number of observations that are removed due to missingness. |
alpha |
The specified |
Labels.Cluster |
The labels of the clustering variable. |
Original function: Matthew Wolak (with some small modifications by Wim Van der Elst)
https://cran.r-project.org/web/packages/ICC/ICC.pdf
Kutner, M. H., Nachtsheim, C. J., Neter, J., and Li, W. (2005). Applied linear statistical models (5th edition). New York: McGraw Hill.
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
# Compute ICC in Substitution dataset, using Test.Administrator as # clustering unit data(Substitution) # Add administrator to the dataset (just randomly allocate labels # as Test.Administrator, so ICC should be approx. 0) Substitution$Test.Adminstrator <- NA Substitution$Test.Adminstrator <- sample(LETTERS[1:10], replace = TRUE, size = length(Substitution$Test.Adminstrator)) Substitution$Test.Adminstrator <- as.factor(Substitution$Test.Adminstrator) ICC_LDST <- ICC(Cluster = Test.Adminstrator, Test.Score = LDST, Data = Substitution) # Explore results summary(ICC_LDST) plot(ICC_LDST)
# Compute ICC in Substitution dataset, using Test.Administrator as # clustering unit data(Substitution) # Add administrator to the dataset (just randomly allocate labels # as Test.Administrator, so ICC should be approx. 0) Substitution$Test.Adminstrator <- NA Substitution$Test.Adminstrator <- sample(LETTERS[1:10], replace = TRUE, size = length(Substitution$Test.Adminstrator)) Substitution$Test.Adminstrator <- as.factor(Substitution$Test.Adminstrator) ICC_LDST <- ICC(Cluster = Test.Adminstrator, Test.Score = LDST, Data = Substitution) # Explore results summary(ICC_LDST) plot(ICC_LDST)
Gives the levels of a variable.
Levels(x)
Levels(x)
x |
A variable for which the different levels should be printed. |
For details, see Van der Elst (2023).
No return value, called for side effects.
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
data(Substitution) Levels(Substitution$Gender)
data(Substitution) Levels(Substitution$Gender)
These are the data of the Openness subscale of International Personality Item Pool (ipip.ori.org). This subscale consists of 5 items: 1 = I am full of ideas, 2 = I avoid difficult reading material, 3 = I carry the conversation to a higher level, 4 = I spend time reflecting on things, and 5 = I will not probe deeply into a subject. Each item is scored on a 6-point response scale with answer categories 1 = very inaccurate, 2 = moderately inaccurate, 3 = slightly inaccurate, 4 = slightly accurate, 5 = moderately accurate, and 6 = very accurate. The Openness scale score corresponds to the sum of the individual item scores, with items 2 and 5 being reverse scored. The raw Openness scale score ranges between 5 and 30. A higher score is indicative of higher levels of curiosity, intellectualism, imagination, and aesthetic interests (McCrae, 1994).
The data were collected as part of the Synthetic Apeture Personality Assessment (SAPA http://sapa-project.org) web-based personality assessment project.
data(Personality)
data(Personality)
A data.frame
with 2137 observations on 3 variables.
Id
The Id number of the participant.
LE
The Level of Education (LE) of the participant, coded as 1 = less than high school, 2 = finished high school, 3 = some college but did not graduate, 4 = college graduate, and 5 = graduate degree.
Openness
Level of Openness.
McCrae, R. R. (1994). Openness to Experience: expanding the boundaries of factor V. European Journal of Personality, 8, 251-272.
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
This function plots the bootstrap distribution and the percentile bootstrap CI for a test score based on a Bootstrap.Stage.2.NormScore
object. A non-parametric bootstrap is used to compute a confidence interval (CI) around the estimated percentile rank (for details, see Chapter 8 in Van der Elst, 2023).
## S3 method for class 'Bootstrap.Stage.2.NormScore' plot(x, cex.axis=1, cex.main=1, cex.lab=1, ...)
## S3 method for class 'Bootstrap.Stage.2.NormScore' plot(x, cex.axis=1, cex.main=1, cex.lab=1, ...)
x |
A fitted object of class |
cex.axis |
The magnification to be used for axis annotation. |
cex.main |
The magnification to be used for the main label. |
cex.lab |
The magnification to be used for X and Y labels. |
... |
Other arguments to be passed to the |
No return value, called for side effects.
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
# Time-intensive part # Replicate the bootstrap results that were obtained in # Case study 1 of Chapter 8 in Van der Elst (2023) # ----------------------------------------------------- library(NormData) # load the NormData package data(GCSE) # load the GCSE dataset # Fit the Stage 1 model Model.1.GCSE <- Stage.1(Dataset=GCSE, Model=Science.Exam~Gender) # Stage 2: Convert a science exam score = 30 obtained by a # female into a percentile rank (point estimate) Normed_Score <- Stage.2.NormScore(Stage.1.Model=Model.1.GCSE, Score=list(Science.Exam=30, Gender="F"), Rounded = FALSE) summary(Normed_Score) # Derive the 99pc CI around the point estimate # using a bootstrap procedure Bootstrap_Normed_Score <- Bootstrap.Stage.2.NormScore( Stage.2.NormScore=Normed_Score) summary(Bootstrap_Normed_Score) plot(Bootstrap_Normed_Score) # Replicate the bootstrap results that were obtained in # Case study 2 of Chapter 8 in Van der Elst (2023) # ------------------------------------------------ library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset # Make the new variable Age.C (= Age centered) that is # needed to fit the final Stage 1 model, # and add it to the Substitution dataset Substitution$Age.C <- Substitution$Age - 50 # Fit the final Stage 1 model Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) summary(Substitution.Model.9) # Convert an LDST score = 40 obtained by a # 20-year-old test participant with LE=Low # into a percentile rank (point estimate) Normed_Score <- Stage.2.NormScore( Stage.1.Model=Substitution.Model.9, Score=list(LDST=40, Age.C=20-50, LE = "Low"), Rounded = FALSE) # Derive the 99pc CI around the point estimate # using a bootstrap Bootstrap_Normed_Score <- Bootstrap.Stage.2.NormScore( Stage.2.NormScore = Normed_Score) summary(Bootstrap_Normed_Score) plot(Bootstrap_Normed_Score)
# Time-intensive part # Replicate the bootstrap results that were obtained in # Case study 1 of Chapter 8 in Van der Elst (2023) # ----------------------------------------------------- library(NormData) # load the NormData package data(GCSE) # load the GCSE dataset # Fit the Stage 1 model Model.1.GCSE <- Stage.1(Dataset=GCSE, Model=Science.Exam~Gender) # Stage 2: Convert a science exam score = 30 obtained by a # female into a percentile rank (point estimate) Normed_Score <- Stage.2.NormScore(Stage.1.Model=Model.1.GCSE, Score=list(Science.Exam=30, Gender="F"), Rounded = FALSE) summary(Normed_Score) # Derive the 99pc CI around the point estimate # using a bootstrap procedure Bootstrap_Normed_Score <- Bootstrap.Stage.2.NormScore( Stage.2.NormScore=Normed_Score) summary(Bootstrap_Normed_Score) plot(Bootstrap_Normed_Score) # Replicate the bootstrap results that were obtained in # Case study 2 of Chapter 8 in Van der Elst (2023) # ------------------------------------------------ library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset # Make the new variable Age.C (= Age centered) that is # needed to fit the final Stage 1 model, # and add it to the Substitution dataset Substitution$Age.C <- Substitution$Age - 50 # Fit the final Stage 1 model Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) summary(Substitution.Model.9) # Convert an LDST score = 40 obtained by a # 20-year-old test participant with LE=Low # into a percentile rank (point estimate) Normed_Score <- Stage.2.NormScore( Stage.1.Model=Substitution.Model.9, Score=list(LDST=40, Age.C=20-50, LE = "Low"), Rounded = FALSE) # Derive the 99pc CI around the point estimate # using a bootstrap Bootstrap_Normed_Score <- Bootstrap.Stage.2.NormScore( Stage.2.NormScore = Normed_Score) summary(Bootstrap_Normed_Score) plot(Bootstrap_Normed_Score)
The function CheckFit()
allows for evaluating the fit of the mean structure of a regression model by comparing sample means and model-predicted means. This function plots the sample means (with CIs) and the means of the model-predicted values. If the model fits the data well, there should be a good agreement between the sample means and the predicted mean test scores in the relevant subgroups. When the model only contains (binary and/or non-binary) qualitative independent variables, the subgroups correspond to all possible combinations of the different levels of the qualitative variables. When there are quantitative independent variables in the model, these have to be discretized first.
## S3 method for class 'CheckFit' plot(x, Color, pch, lty, Width.CI.Lines=.125, Size.symbol = 1, No.Overlap.X.Axis=TRUE, xlab, ylab="Test score", main = " ", Legend.text.size=1, Connect.Means, cex.axis=1, cex.main=1.5, cex.lab=1.5, ...)
## S3 method for class 'CheckFit' plot(x, Color, pch, lty, Width.CI.Lines=.125, Size.symbol = 1, No.Overlap.X.Axis=TRUE, xlab, ylab="Test score", main = " ", Legend.text.size=1, Connect.Means, cex.axis=1, cex.main=1.5, cex.lab=1.5, ...)
x |
A fitted object of class |
Color |
The colors to be used for the means. If not specified, the default colors are used. |
pch |
The symbols to be used for the means. If not specified, dots are used. |
lty |
The line types to be used for the means. If not specified, solid lines are used. |
Width.CI.Lines |
The width of the horizontal lines that are used to depict the CI around the mean. Default |
Size.symbol |
The size of the symbol used to depict the mean test score. Default |
No.Overlap.X.Axis |
Logical. When a plot is constructed using two IVs (i.e., 2 or more lines of the mean and CIs in the plot), it is possible that the plot is unclear because the different means and CIs can no longer be distinguished. To avoid this, the levels of IV1 (plotted on the X-axis) can be assigned slightly different values for each level of IV2. For example, the mean for the subcategory males in age range [20; 40] will be shown at value X=0.9 (rather than 1) and the mean for the subcategory females in age range [20; 40] will be shown at value X=1.1 (rather than 1). In this way, the different means and CIs can be more clearly distinguished. Default |
xlab |
The label that should be added to the X-axis. |
ylab |
The label that should be added to the Y-axis. Default |
main |
The title of the plot. Default |
Legend.text.size |
The size of the text of the label for IV2. Default |
Connect.Means |
Logical. Should the symbols depicting the mean test scores be connected? If not specified, |
cex.axis |
The size of the labels on the X- and Y-axis. Default |
cex.main |
The magnification to be used for the main label. |
cex.lab |
The magnification to be used for X and Y labels. |
... |
Extra graphical parameters to be passed to |
No return value, called for side effects.
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
# Replicate the fit plot that was obtained in # Case study 1 of Chapter 7 in Van der Elst (2023) # ------------------------------------------------ library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset head(Substitution) # have a look at the first datalines in # the Substitution dataset # Final Stage 1 model Substitution$Age.C <- Substitution$Age - 50 # Add Age_Group (that discretizes the quantitative variable Age # into 6 groups with a span of 10 years in the dataset for use # by the CheckFit() function later on) Substitution$Age_Group <- cut(Substitution$Age, breaks=seq(from=20, to=80, by=10)) Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) # Examine fit Fit.LDST <- CheckFit(Stage.1.Model=Substitution.Model.9, Means=LDST~Age_Group+LE) summary(Fit.LDST) plot(Fit.LDST) # Replicate the fit plot that was obtained in # Case study 2 of Chapter 7 in Van der Elst (2023) # ------------------------------------------------ library(NormData) # load the NormData package data(VLT) # load the VLT dataset head(VLT) # have a look at the first datalines in # the VLT dataset # Fit the final Stage 1 model VLT$Age.C <- VLT$Age - 50 VLT$Age.C2 <- (VLT$Age - 50)**2 # Add Age_Group (that discretizes the quantitative variable Age # into 6 groups with a span of 10 years in the dataset for use # by the CheckFit() function later on) VLT$Age_Group <- cut(VLT$Age, breaks=seq(from=20, to=80, by=10)) VLT.Model.4 <- Stage.1(Dataset = VLT, Alpha = .005, Model = Total.Recall ~ Age.C+Age.C2+Gender+LE+Age.C:Gender) # Examine fit using fit plots for the Age Group by # LE by Gender subgroups Fit.Means.Total.Recall <- CheckFit(Stage.1.Model=VLT.Model.4, Means=Total.Recall~Age_Group+LE+Gender) summary(Fit.Means.Total.Recall) plot(Fit.Means.Total.Recall)
# Replicate the fit plot that was obtained in # Case study 1 of Chapter 7 in Van der Elst (2023) # ------------------------------------------------ library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset head(Substitution) # have a look at the first datalines in # the Substitution dataset # Final Stage 1 model Substitution$Age.C <- Substitution$Age - 50 # Add Age_Group (that discretizes the quantitative variable Age # into 6 groups with a span of 10 years in the dataset for use # by the CheckFit() function later on) Substitution$Age_Group <- cut(Substitution$Age, breaks=seq(from=20, to=80, by=10)) Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) # Examine fit Fit.LDST <- CheckFit(Stage.1.Model=Substitution.Model.9, Means=LDST~Age_Group+LE) summary(Fit.LDST) plot(Fit.LDST) # Replicate the fit plot that was obtained in # Case study 2 of Chapter 7 in Van der Elst (2023) # ------------------------------------------------ library(NormData) # load the NormData package data(VLT) # load the VLT dataset head(VLT) # have a look at the first datalines in # the VLT dataset # Fit the final Stage 1 model VLT$Age.C <- VLT$Age - 50 VLT$Age.C2 <- (VLT$Age - 50)**2 # Add Age_Group (that discretizes the quantitative variable Age # into 6 groups with a span of 10 years in the dataset for use # by the CheckFit() function later on) VLT$Age_Group <- cut(VLT$Age, breaks=seq(from=20, to=80, by=10)) VLT.Model.4 <- Stage.1(Dataset = VLT, Alpha = .005, Model = Total.Recall ~ Age.C+Age.C2+Gender+LE+Age.C:Gender) # Examine fit using fit plots for the Age Group by # LE by Gender subgroups Fit.Means.Total.Recall <- CheckFit(Stage.1.Model=VLT.Model.4, Means=Total.Recall~Age_Group+LE+Gender) summary(Fit.Means.Total.Recall) plot(Fit.Means.Total.Recall)
Plot the means (and CIs) for the test scores, stratified by the independent variable(s) of interest. The independent variables should be factors (i.e., binary or non-binary qualitiative variables).
## S3 method for class 'ExploreData' plot(x, Width.CI.Lines=.125, Size.symbol = 1, No.Overlap.X.Axis=TRUE, xlab, ylab="Test score", main, Color, pch, lty, Black.white=FALSE, Legend.text.size=1, Connect.Means = TRUE, Error.Bars = "CI", cex.axis=1, cex.main=1, cex.lab=1, ...)
## S3 method for class 'ExploreData' plot(x, Width.CI.Lines=.125, Size.symbol = 1, No.Overlap.X.Axis=TRUE, xlab, ylab="Test score", main, Color, pch, lty, Black.white=FALSE, Legend.text.size=1, Connect.Means = TRUE, Error.Bars = "CI", cex.axis=1, cex.main=1, cex.lab=1, ...)
x |
A fitted object of class |
Width.CI.Lines |
The width of the horizontal lines that are used to depict the CI around the mean. Default |
Size.symbol |
The size of the symbol used to depict the mean test score. Default |
No.Overlap.X.Axis |
Logical. When a plot is constructed using multiple IVs (specified in the |
xlab |
The label that should be added to the X-axis. |
ylab |
The label that should be added to the Y-axis. Default |
main |
The title of the plot. |
Color |
The colors that should be used for the means. If not specified, the default colors are used. |
pch |
The symbols to be used for the means. If not specified, dots are used. |
lty |
The line types to be used for the means. If not specified, solid lines are used (i.e., |
Black.white |
Logical. Should the plot be in black and white (rather than in color)? Default |
Legend.text.size |
The size of the text of the label for IV2. Default |
Connect.Means |
Logical. Should the symbols depicting the mean test scores be connected? Default |
Error.Bars |
The type of error bars around the means that should be added in the plot: confidence intervals ( |
cex.axis |
The magnification to be used for axis annotation. |
cex.main |
The magnification to be used for the main label. |
cex.lab |
The magnification to be used for X and Y labels. |
... |
Extra graphical parameters to be passed to |
No return value, called for side effects.
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
# Replicate the exploratory analyses that were conducted # in Case study 1 of Chapter 5 in Van der Elst (2023) # ------------------------------------------------------ library(NormData) # load the NormData package data(Personality) # load the Personality dataset Explore_Openness <- ExploreData(Dataset=Personality, Model=Openness~LE) summary(Explore_Openness) plot(Explore_Openness, main="Mean Openness scale scores and 99pc CIs") # Replicate the exploratory analyses that were conducted # in Case study 1 of Chapter 7 in Van der Elst (2023) # ------------------------------------------------------ library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset head(Substitution) # have a look at the first datalines in # the Substitution dataset # First make a new variable Age_Group, that discretizes the # quantitative variable Age into 6 groups with a span of 10 years Substitution$Age_Group <- cut(Substitution$Age, breaks=seq(from=20, to=80, by=10)) # Compute descriptives of the LDST score for different Age Group # by LE combinations Explore.LDST.Age.LE <- ExploreData(Dataset=Substitution, Model=LDST~Age_Group+LE) summary(Explore.LDST.Age.LE) # Make a plot of the results. plot(Explore.LDST.Age.LE, main="Mean (99pc CI) LDST scores by Age group and LE") # Compute descriptives of the LDST score for different # Age Group by Gender combinations Explore.LDST.Age.Gender <- ExploreData(Dataset=Substitution, Model=LDST~Age_Group+Gender) # Plot the results plot(Explore.LDST.Age.Gender, main="Mean (99pc CI) LDST scores by Age group and Gender") # Compute descriptives of the LDST score for different # LE by Gender combinations Explore.LDST.LE.Gender <- ExploreData(Dataset=Substitution, Model=LDST~LE+Gender) # Plot the results plot(Explore.LDST.LE.Gender, main="Mean (99pc CI) LDST scores by LE and Gender") # Compute summary statistics of the LDST score in the # Age Group by LE by Gender combinations Explore.LDST <- ExploreData(Dataset=Substitution, Model=LDST~Age_Group+LE+Gender) # Plot the results plot(Explore.LDST)
# Replicate the exploratory analyses that were conducted # in Case study 1 of Chapter 5 in Van der Elst (2023) # ------------------------------------------------------ library(NormData) # load the NormData package data(Personality) # load the Personality dataset Explore_Openness <- ExploreData(Dataset=Personality, Model=Openness~LE) summary(Explore_Openness) plot(Explore_Openness, main="Mean Openness scale scores and 99pc CIs") # Replicate the exploratory analyses that were conducted # in Case study 1 of Chapter 7 in Van der Elst (2023) # ------------------------------------------------------ library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset head(Substitution) # have a look at the first datalines in # the Substitution dataset # First make a new variable Age_Group, that discretizes the # quantitative variable Age into 6 groups with a span of 10 years Substitution$Age_Group <- cut(Substitution$Age, breaks=seq(from=20, to=80, by=10)) # Compute descriptives of the LDST score for different Age Group # by LE combinations Explore.LDST.Age.LE <- ExploreData(Dataset=Substitution, Model=LDST~Age_Group+LE) summary(Explore.LDST.Age.LE) # Make a plot of the results. plot(Explore.LDST.Age.LE, main="Mean (99pc CI) LDST scores by Age group and LE") # Compute descriptives of the LDST score for different # Age Group by Gender combinations Explore.LDST.Age.Gender <- ExploreData(Dataset=Substitution, Model=LDST~Age_Group+Gender) # Plot the results plot(Explore.LDST.Age.Gender, main="Mean (99pc CI) LDST scores by Age group and Gender") # Compute descriptives of the LDST score for different # LE by Gender combinations Explore.LDST.LE.Gender <- ExploreData(Dataset=Substitution, Model=LDST~LE+Gender) # Plot the results plot(Explore.LDST.LE.Gender, main="Mean (99pc CI) LDST scores by LE and Gender") # Compute summary statistics of the LDST score in the # Age Group by LE by Gender combinations Explore.LDST <- ExploreData(Dataset=Substitution, Model=LDST~Age_Group+LE+Gender) # Plot the results plot(Explore.LDST)
The ICC corresponds to the proportion of the total variance in the residuals that is accounted for by the clustering variable at hand (Kutner et al., 2005). This function visualizes the extent ot which there is clustering in the dataset.
## S3 method for class 'ICC' plot(x, X.Lab="Cluster", Y.Lab="Test score", Main="", Add.Jitter=0.2, Size.Points=1, Size.Labels=1, Add.Mean.Per.Cluster=TRUE, Col.Mean.Symbol="red", Seed=123, ...)
## S3 method for class 'ICC' plot(x, X.Lab="Cluster", Y.Lab="Test score", Main="", Add.Jitter=0.2, Size.Points=1, Size.Labels=1, Add.Mean.Per.Cluster=TRUE, Col.Mean.Symbol="red", Seed=123, ...)
x |
A fitted object of class |
X.Lab |
The label that should be added to the X-axis. |
Y.Lab |
The label that should be added to the Y-axis. |
Main |
The title of the plot. Default |
Add.Jitter |
The amount of jitter (random noise) that should be added in the horizontal direction (predicted scores, X-axis) of the plot. Adding a bit of jitter is useful to show the inidividual data points more clearly. The specified value |
Size.Points |
The size of the points in the plot. Default |
Size.Labels |
The size of the Labels of the X-axis in the plot. Default |
Add.Mean.Per.Cluster |
Logical. Should the means per cluster be shown? |
Col.Mean.Symbol |
The color of the symbol that is used to indicate the mean (for each of the clusters). Default |
Seed |
The random seed that is used to add jitter. Default |
... |
Other arguments to be passed to the plot function. |
No return value, called for side effects.
Wim Van der Elst
Kutner, M. H., Nachtsheim, C. J., Neter, J., and Li, W. (2005). Applied linear statistical models (5th edition). New York: McGraw Hill.
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
# Compute ICC in Substitution dataset, using Test.Administrator as # clustering unit data(Substitution) # Add administrator to the dataset (just randomly allocate labels # as Test.Administrator, so ICC should be approx. 0) Substitution$Test.Adminstrator <- NA Substitution$Test.Adminstrator <- sample(LETTERS[1:10], replace = TRUE, size = length(Substitution$Test.Adminstrator)) Substitution$Test.Adminstrator <- as.factor(Substitution$Test.Adminstrator) ICC_LDST <- ICC(Cluster = Test.Adminstrator, Test.Score = LDST, Data = Substitution) # Explore results summary(ICC_LDST) plot(ICC_LDST) # Make points in the plot a bit larger and reduce # the size of labels on the X-axis (initials test administrators) plot(ICC_LDST, Size.Labels = .5, Size.Points=.5)
# Compute ICC in Substitution dataset, using Test.Administrator as # clustering unit data(Substitution) # Add administrator to the dataset (just randomly allocate labels # as Test.Administrator, so ICC should be approx. 0) Substitution$Test.Adminstrator <- NA Substitution$Test.Adminstrator <- sample(LETTERS[1:10], replace = TRUE, size = length(Substitution$Test.Adminstrator)) Substitution$Test.Adminstrator <- as.factor(Substitution$Test.Adminstrator) ICC_LDST <- ICC(Cluster = Test.Adminstrator, Test.Score = LDST, Data = Substitution) # Explore results summary(ICC_LDST) plot(ICC_LDST) # Make points in the plot a bit larger and reduce # the size of labels on the X-axis (initials test administrators) plot(ICC_LDST, Size.Labels = .5, Size.Points=.5)
This function provides several plots that are useful to evaluate model assumptions. When the plot()
function is applied to a fitted Stage.1
object, three panels are generated. These panels show plots that can be used (i) to evaluate the homoscedasticity assumption, (ii) to evaluate the normality assumption, and (iii) to evaluate the presence of outliers.
## S3 method for class 'Stage.1' plot(x, Homoscedasticity=TRUE, Normality=TRUE, Outliers=TRUE, Assume.Homoscedasticity, Add.Jitter=0, Seed=123, Confidence.QQ.Normality=.99, Plots.Together=TRUE, Y.Lim.ResVarFunction, Group.Spec.Densities.Delta=FALSE, Main.Homosced.1, Main.Homosced.2, Main.Norm.1, Main.Norm.2, Main.Norm.3, Main.Outliers, cex.axis.homo=1, cex.main.homo=1, cex.lab.homo=1, cex.axis.norm=1.6, cex.main.norm=1.5, cex.lab.norm=1.5, cex.axis.outl=1, cex.main.outl=1, cex.lab.outl=1, Color="red", Loess.Span=0.75, verbose=TRUE, ...)
## S3 method for class 'Stage.1' plot(x, Homoscedasticity=TRUE, Normality=TRUE, Outliers=TRUE, Assume.Homoscedasticity, Add.Jitter=0, Seed=123, Confidence.QQ.Normality=.99, Plots.Together=TRUE, Y.Lim.ResVarFunction, Group.Spec.Densities.Delta=FALSE, Main.Homosced.1, Main.Homosced.2, Main.Norm.1, Main.Norm.2, Main.Norm.3, Main.Outliers, cex.axis.homo=1, cex.main.homo=1, cex.lab.homo=1, cex.axis.norm=1.6, cex.main.norm=1.5, cex.lab.norm=1.5, cex.axis.outl=1, cex.main.outl=1, cex.lab.outl=1, Color="red", Loess.Span=0.75, verbose=TRUE, ...)
x |
A fitted object of class |
Homoscedasticity |
Logical. Should plots to evaluate homoscedasticity be shown? |
Normality |
Logical. Should plots to evaluate the normality assumption be shown? The normality plots are based on the standardized residuals in the normative dataset, which are computed as explained in the |
Outliers |
Logical. Should plots to evaluate outliers be shown? The outlier plot is based on the standardized residuals in the normative dataset, which are computed as explained in the |
Assume.Homoscedasticity |
By default, the standardized residuals |
Add.Jitter |
The amount of jitter (random noise) that should be added to the X-axis of the homoscedasticity plots (which show the model-predicted mean values). Adding a bit of jitter is useful to show the data more clearly (especially when there are only a few unique predicted values, e.g., when a binary or non-binary qualitative independent variable is considered in the mean structure of the model), i.e., to avoid overlapping data points. The specified value |
Seed |
The seed that is used when adding jitter. Default |
Confidence.QQ.Normality |
Specifies the desired confidence-level for the confidence band arond the line of perfect agreement/normality in the QQ-plot that is used to evaluate normality. Default |
Plots.Together |
The different homoscedasticity and normality plots are grouped together in a panel by default. For example, the three normality plots are shown together in one panel. If it is preferred to have the different plots in separate panels (rather than grouped to- gether), the argument |
Y.Lim.ResVarFunction |
The min, max limits of the Y-axis that should be used for the variance function plot. By default, the limit of the Y-axis is set between |
Group.Spec.Densities.Delta |
Logical. Should a plot with the group-specific densities of the standardized residuals be shown? Default |
Main.Homosced.1 |
The title of the first panel of the homoscedasticity plot (i.e., the scatterplot of the residuals against the predicted scores). |
Main.Homosced.2 |
The title of second panel of the homoscedasticity plot (i.e., the variance function plot). |
Main.Norm.1 |
The title of the first panel of the normality plot (i.e., the histogram of the standardized residuals). |
Main.Norm.2 |
The title of the second panel of the normality plot (i.e., the density of the standardized residuals and standard normal distribution). |
Main.Norm.3 |
The title of the third panel of the normality plot (i.e., the QQ-plot). |
Main.Outliers |
The title of the outlier plot. |
cex.axis.homo |
The magnification to be used for axis annotation of the homoscedasticity plots. |
cex.main.homo |
The magnification to be used for the main label of the homoscedasticity plots. |
cex.lab.homo |
The magnification to be used for the X- and Y-axis labels of the homoscedasticity plots. |
cex.axis.norm |
The magnification to be used for axis annotation of the normality plots. |
cex.main.norm |
The magnification to be used for the main label of the normality plots. |
cex.lab.norm |
The magnification to be used for X and Y labels of the normality plots. |
cex.axis.outl |
The magnification to be used for axis annotation of the outlier plot. |
cex.main.outl |
The magnification to be used for the main label of the outlier plot. |
cex.lab.outl |
The magnification to be used for X- and Y-axis labels of the outlier plot. |
Color |
The color to be used for the Empirical Variance Function (EVF) and the standard normal distribution in the variance function plot and the normality plot that show the densities of the standardized residuals and the normal distribution, respectively. Default |
Loess.Span |
The parameter |
verbose |
A logical value indicating whether verbose output should be generated. |
... |
Other arguments to be passed. |
No return value, called for side effects.
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
# Replicate the Stage 1 results that were obtained in # Case study 1 of Chapter 4 in Van der Elst (2023) # --------------------------------------------------- library(NormData) # load the NormData package data(GCSE) # load the GCSE dataset # Conduct the Stage 1 analysis Model.1.GCSE <- Stage.1(Dataset=GCSE, Model=Science.Exam~Gender) summary(Model.1.GCSE) plot(Model.1.GCSE, Add.Jitter = .2) # Use blue color for EVF and density normal distribution plot(Model.1.GCSE, Add.Jitter = .2, Color="blue") # Change the title of the variance function plot into # "Variance function plot, residuals Science exam" plot(Model.1.GCSE, Add.Jitter = .2, Main.Homosced.2 = "Variance function plot, residuals Science exam") # Use a 95 percent CI around the line of perfect agreement in the # QQ plot of normality plot(Model.1.GCSE, Add.Jitter = .2, Confidence.QQ.Normality = .9) # Replicate the Stage 1 results that were obtained in # Case study 1 of Chapter 7 in Van der Elst (2023) # --------------------------------------------------- library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset # Add the variable Age.C (= Age centered) to the Substitution dataset Substitution$Age.C <- Substitution$Age - 50 # Fit the final Stage 1 model Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) # Order.Poly.Var=1 specifies a linear polynomial # for the variance prediction function # Final Stage 1 model summary(Substitution.Model.9) plot(Substitution.Model.9) # Request a variance function plot that assumes that # the homoscedasticity assumption is valid plot(Substitution.Model.9, Assume.Homoscedasticity = TRUE)
# Replicate the Stage 1 results that were obtained in # Case study 1 of Chapter 4 in Van der Elst (2023) # --------------------------------------------------- library(NormData) # load the NormData package data(GCSE) # load the GCSE dataset # Conduct the Stage 1 analysis Model.1.GCSE <- Stage.1(Dataset=GCSE, Model=Science.Exam~Gender) summary(Model.1.GCSE) plot(Model.1.GCSE, Add.Jitter = .2) # Use blue color for EVF and density normal distribution plot(Model.1.GCSE, Add.Jitter = .2, Color="blue") # Change the title of the variance function plot into # "Variance function plot, residuals Science exam" plot(Model.1.GCSE, Add.Jitter = .2, Main.Homosced.2 = "Variance function plot, residuals Science exam") # Use a 95 percent CI around the line of perfect agreement in the # QQ plot of normality plot(Model.1.GCSE, Add.Jitter = .2, Confidence.QQ.Normality = .9) # Replicate the Stage 1 results that were obtained in # Case study 1 of Chapter 7 in Van der Elst (2023) # --------------------------------------------------- library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset # Add the variable Age.C (= Age centered) to the Substitution dataset Substitution$Age.C <- Substitution$Age - 50 # Fit the final Stage 1 model Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) # Order.Poly.Var=1 specifies a linear polynomial # for the variance prediction function # Final Stage 1 model summary(Substitution.Model.9) plot(Substitution.Model.9) # Request a variance function plot that assumes that # the homoscedasticity assumption is valid plot(Substitution.Model.9, Assume.Homoscedasticity = TRUE)
Stage.2.NormScore
object.The function Stage.2.NormScore()
is used to convert the raw test score of a tested person into a percentile rank
(taking into account specified values of the independent variables). This function plots the results graphically. In particular, the density of the standard normal distribution is shown (when the normality assumption is valid for the fitted Stage 1 model), or the density of the standardized residuals in the normative sample (when the noormality assumption is not shown). The AUC between
and the tested person's standarized test score
is shaded in grey, which visualizes the percentile rank that corresponds to the raw test score.
## S3 method for class 'Stage.2.NormScore' plot(x, Main=" ", Both.CDFs=FALSE, xlim, cex.axis=1, cex.main=1, cex.lab=1, ...)
## S3 method for class 'Stage.2.NormScore' plot(x, Main=" ", Both.CDFs=FALSE, xlim, cex.axis=1, cex.main=1, cex.lab=1, ...)
x |
A fitted object of class |
Main |
The title of the plot. Default |
Both.CDFs |
Should both the densities of the standard normal distribution and of the standardized residuals |
xlim |
The limits for the X-axis. Default |
cex.axis |
The magnification to be used for axis annotation. |
cex.main |
The magnification to be used for the main label. |
cex.lab |
The magnification to be used for X and Y labels. |
... |
Extra graphical parameters to be passed to |
No return value, called for side effects.
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
# Replicate the normative conversion that was obtained in # Case study 1 of Chapter 3 in Van der Elst (2023) # (science exam score = 30 obtained by a female) # ------------------------------------------------------- library(NormData) # load the NormData package data(GCSE) # load the GCSE dataset # Fit the Stage 1 model Model.1.GCSE <- Stage.1(Dataset=GCSE, Model=Science.Exam~Gender) # Stage 2: Convert a science exam score = 30 obtained by a # female into a percentile rank (point estimate) Normed_Score <- Stage.2.NormScore(Stage.1.Model=Model.1.GCSE, Score=list(Science.Exam=30, Gender="F")) summary(Normed_Score) plot(Normed_Score) # Replicate the normative conversion that was obtained in # Case study 1 of Chapter 7 in Van der Elst (2023) # (LDST score = 40 obtained by a 20-year-old # test participant with LE=Low) # ------------------------------------------------------- library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset # Make the new variable Age.C (= Age centered) that is # needed to fit the final Stage 1 model, # and add it to the Substitution dataset Substitution$Age.C <- Substitution$Age - 50 # Fit the final Stage 1 model Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) summary(Substitution.Model.9) # Convert an LDST score = 40 obtained by a # 20-year-old test participant with LE=Low # into a percentile rank (point estimate) Normed_Score <- Stage.2.NormScore( Stage.1.Model=Substitution.Model.9, Score=list(LDST=40, Age.C=20-50, LE = "Low")) summary(Normed_Score) plot(Normed_Score)
# Replicate the normative conversion that was obtained in # Case study 1 of Chapter 3 in Van der Elst (2023) # (science exam score = 30 obtained by a female) # ------------------------------------------------------- library(NormData) # load the NormData package data(GCSE) # load the GCSE dataset # Fit the Stage 1 model Model.1.GCSE <- Stage.1(Dataset=GCSE, Model=Science.Exam~Gender) # Stage 2: Convert a science exam score = 30 obtained by a # female into a percentile rank (point estimate) Normed_Score <- Stage.2.NormScore(Stage.1.Model=Model.1.GCSE, Score=list(Science.Exam=30, Gender="F")) summary(Normed_Score) plot(Normed_Score) # Replicate the normative conversion that was obtained in # Case study 1 of Chapter 7 in Van der Elst (2023) # (LDST score = 40 obtained by a 20-year-old # test participant with LE=Low) # ------------------------------------------------------- library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset # Make the new variable Age.C (= Age centered) that is # needed to fit the final Stage 1 model, # and add it to the Substitution dataset Substitution$Age.C <- Substitution$Age - 50 # Fit the final Stage 1 model Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) summary(Substitution.Model.9) # Convert an LDST score = 40 obtained by a # 20-year-old test participant with LE=Low # into a percentile rank (point estimate) Normed_Score <- Stage.2.NormScore( Stage.1.Model=Substitution.Model.9, Score=list(LDST=40, Age.C=20-50, LE = "Low")) summary(Normed_Score) plot(Normed_Score)
This function plots the results of Tukey's Honest Significance Difference (HSD; Tukey, 1949) test that allows for making post hoc comparisons of the group means. Tukey's HSD can only be conducted when the mean structure of the Stage 1 model only contains qualitative independent variables (i.e., when the fitted regression model is essentially an ANOVA).
## S3 method for class 'Tukey.HSD' plot(x, ...)
## S3 method for class 'Tukey.HSD' plot(x, ...)
x |
A fitted object of class |
... |
Extra graphical parameters to be passed to |
No return value, called for side effects.
Wim Van der Elst
Tukey, J. (1949). Comparing individual means in the Analysis of Variance. Biometrics, 5, 99-114.
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
data(Personality) Model.Openness <- Stage.1(Dataset = Personality, Model = Openness ~ LE) # conduct post hoc comparisons for the levels of education Tukey.Openness <- Tukey.HSD(Model.Openness) summary(Tukey.Openness) plot(Tukey.Openness) # conduct post hoc comparisons for the levels of education by education combinations data(Substitution) Model.Substitution <- Stage.1(Dataset = Substitution, Model = LDST ~ LE*Gender) Tukey.Substitution <- Tukey.HSD(Model.Substitution) summary(Tukey.Substitution) plot(Tukey.Substitution)
data(Personality) Model.Openness <- Stage.1(Dataset = Personality, Model = Openness ~ LE) # conduct post hoc comparisons for the levels of education Tukey.Openness <- Tukey.HSD(Model.Openness) summary(Tukey.Openness) plot(Tukey.Openness) # conduct post hoc comparisons for the levels of education by education combinations data(Substitution) Model.Substitution <- Stage.1(Dataset = Substitution, Model = LDST ~ LE*Gender) Tukey.Substitution <- Tukey.HSD(Model.Substitution) summary(Tukey.Substitution) plot(Tukey.Substitution)
The function Plot.Scatterplot.Matrix()
makes a scatterplot matrix of the specified variables.
Plot.Scatterplot.Matrix(Dataset, Variables, Add.Jitter=0.1, Seed=123, ...)
Plot.Scatterplot.Matrix(Dataset, Variables, Add.Jitter=0.1, Seed=123, ...)
Dataset |
The name of the dataset. |
Variables |
The names of the variables that should be shown in the scatterplot matrix. |
Add.Jitter |
The amount of jitter (random noise) that should be added to the variables in the scatterplot matrix. Adding a bit of jitter is useful to show the inidividual data points more clearly, especially if several qualitative variables are added in the plot. The specified value |
Seed |
The seed that is used when adding jitter. Default |
... |
Extra graphical parameters to be passed to |
For details, see Van der Elst (2023).
No return value, called for side effects.
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
data(Substitution) # Make a scatterplot matrix with the variables LDST, # Age, Gender and LE in the Substitution dataset Plot.Scatterplot.Matrix(Dataset = Substitution, Variables = c("LDST", "Age", "Gender", "LE"))
data(Substitution) # Make a scatterplot matrix with the variables LDST, # Age, Gender and LE in the Substitution dataset Plot.Scatterplot.Matrix(Dataset = Substitution, Variables = c("LDST", "Age", "Gender", "LE"))
The function PlotFittedPoly fits polynomials of a specified order to the data.
PlotFittedPoly(Dataset, Test.Score, IV, Center.Value.IV=0, Order.Polynomial=3, Confidence.Band.Poly=FALSE, Alpha=.01, EMF = TRUE, Confidence.Band.EMF=TRUE, xlab, ylab, Color = "red", Black.white=FALSE, Legend.Location="topright", Legend.text.size=1, Add.Jitter=0, Seed=123, cex.axis=1, cex.main=1, cex.lab=1, Loess.Span=0.75, ...)
PlotFittedPoly(Dataset, Test.Score, IV, Center.Value.IV=0, Order.Polynomial=3, Confidence.Band.Poly=FALSE, Alpha=.01, EMF = TRUE, Confidence.Band.EMF=TRUE, xlab, ylab, Color = "red", Black.white=FALSE, Legend.Location="topright", Legend.text.size=1, Add.Jitter=0, Seed=123, cex.axis=1, cex.main=1, cex.lab=1, Loess.Span=0.75, ...)
Dataset |
The name of the dataset. |
Test.Score |
The name of the test score. |
IV |
The name of the independent variable. |
Center.Value.IV |
The constant that is subtracted from the independent variable. |
Order.Polynomial |
The order of the polynomials to be fitted. By default, |
Confidence.Band.Poly |
Logical. Should a confidence band around the prediction function of the polynomial model be added to the plot? Default |
Alpha |
The Alpha-level of the confidence band(s) for the polynomial and/or loess models. Default |
EMF |
Logical. Should the EMF be added to the plot? Default |
Confidence.Band.EMF |
Logical. Should a confidence band around the prediction function of the loess model be added to the plot? Default |
xlab |
The label that should be added to the X-axis. Default |
ylab |
The label that should be added to the Y-axis. Default |
Color |
The color to be used for the fitted EMF. Default |
Black.white |
Logical. Should the plot be in black and white (rather than in color)? Default |
Legend.Location |
The location of the legend. Default |
Legend.text.size |
The size of the text of the label for IV2. Default |
Add.Jitter |
The amount of jitter (random noise) that should be added to the test score. Adding a bit of jitter is useful to show the data more clearly, i.e., to avoid overlapping data points. The specified value |
Seed |
The seed that is used when adding jitter. Default |
cex.axis |
The magnification to be used for axis annotation. |
cex.main |
The magnification to be used for the main label. |
cex.lab |
The magnification to be used for X and Y labels. |
Loess.Span |
The parameter |
... |
Extra graphical parameters to be passed to |
For details, see Van der Elst (2023).
No return value, called for side effects.
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
data(Substitution) # plot of linear, quadratic and cubic polynomials relating age # to the LDST test score PlotFittedPoly(Dataset = Substitution, Test.Score = LDST, IV = Age, Order.Polynomial = 1, Center.Value.IV = 50) PlotFittedPoly(Dataset = Substitution, Test.Score = LDST, IV = Age, Order.Polynomial = 2, Center.Value.IV = 50) PlotFittedPoly(Dataset = Substitution, Test.Score = LDST, IV = Age, Order.Polynomial = 3, Center.Value.IV = 50)
data(Substitution) # plot of linear, quadratic and cubic polynomials relating age # to the LDST test score PlotFittedPoly(Dataset = Substitution, Test.Score = LDST, IV = Age, Order.Polynomial = 1, Center.Value.IV = 50) PlotFittedPoly(Dataset = Substitution, Test.Score = LDST, IV = Age, Order.Polynomial = 2, Center.Value.IV = 50) PlotFittedPoly(Dataset = Substitution, Test.Score = LDST, IV = Age, Order.Polynomial = 3, Center.Value.IV = 50)
The Sandwich()
function can be used to obtain heteroscedasticity-consistent standard errors of the regression parameters of a fitted Stage 1 model. These are used to account for heteroscedasticity.
Sandwich(Stage.1.Model, Type="HC0")
Sandwich(Stage.1.Model, Type="HC0")
Stage.1.Model |
The fitted stage 1 model for which heteroscedasticity-consistent standard errors (sandwich estimators) for the standard errors of the regression parameters has to be provided. |
Type |
The type of the heteroscedasticity-consistent estimator that is used. By default, White's (White, 1980) estimator is used (i.e., |
Sandwich |
The fitted Stage 1 model with sandwich estimators. |
Alpha |
The significance level that is used for inference. Default |
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
White, H. (1980). A heteroscedasticity-consistent covariance matrix and a direct test for heteroscedasticity. Econometrica, 48, 817-838.
data(GCSE) Model.1.GCSE <- Stage.1(Dataset = GCSE, Model = Science.Exam~Gender) Sandwich(Stage.1.Model = Model.1.GCSE)
data(GCSE) Model.1.GCSE <- Stage.1(Dataset = GCSE, Model = Science.Exam~Gender) Sandwich(Stage.1.Model = Model.1.GCSE)
The function Stage.1
fits a regression model with the specified mean and residual variance components, and conducts several model checks (homoscedasticity, normality, absence of outliers, and multicollinearity) that are useful in a setting where regression-based normative data have to be established.
Stage.1(Dataset, Model, Order.Poly.Var=3, Alpha=0.05, Alpha.Homosc=0.05, Alpha.Norm = .05, Assume.Homoscedasticity=NULL, Test.Assumptions=TRUE, Outlier.Cut.Off=4, Show.VIF=TRUE, GVIF.Threshold=10, Sandwich.Type="HC0", Alpha.CI.Group.Spec.SD.Resid=0.01)
Stage.1(Dataset, Model, Order.Poly.Var=3, Alpha=0.05, Alpha.Homosc=0.05, Alpha.Norm = .05, Assume.Homoscedasticity=NULL, Test.Assumptions=TRUE, Outlier.Cut.Off=4, Show.VIF=TRUE, GVIF.Threshold=10, Sandwich.Type="HC0", Alpha.CI.Group.Spec.SD.Resid=0.01)
Dataset |
A |
Model |
The regression model to be fitted (mean structure). A formula should be provided using the syntaxis of the |
Order.Poly.Var |
If the homoscedasticity assumption is violated and the mean structure of the fitted model contains at least one quantitative variable, a polynomial variance prediction function is fitted. The argument |
Alpha |
The significance level to be used when conducting inference for the mean structure of the model. Default |
Alpha.Homosc |
The significance level to be used to evaluate the homoscedasticity assumption based on the Levene test (when all independent variables in the model are qualitative) or the Breusch-Pagan test (when at least one of the independent variables is quantitative). Default |
Alpha.Norm |
The significance level to be used to test the normality assumption for the standardized errors using the Shapiro-Wilk test. The normality assumption is evaluated based on the standardized residuals in the normative dataset, which are computed as explained in the |
Assume.Homoscedasticity |
Logical. The By default, the standardized residuals |
Test.Assumptions |
Logical. Should the model assumptions be evaluated for the specified model? Default |
Outlier.Cut.Off |
Outliers are evaluated based on the standardized residuals, which are computed as explained in the |
Show.VIF |
Logical. Should the generalized VIF (Fox and Monette, 1992) be shown when the function |
GVIF.Threshold |
The threshold value to be used to detect multicollinearity based on the generalized VIF. Default |
Sandwich.Type |
When the homoscedasticity assumption is violated, so-called sandwich estimators (or heteroscedasticity-consistent estimators) for the standard errors of the regression parameters are used. For example, the sandwich estimator for the standard error of |
Alpha.CI.Group.Spec.SD.Resid |
The |
For details, see Van der Elst (2023).
An object of class Stage.1
with components,
HomoNorm |
The fitted regression model assuming homoscedasticity and normality. |
NoHomoNorm |
The fitted regression model assuming no homoscedasticity and normality. |
HomoNoNorm |
The fitted regression model assuming homoscedasticity and no normality. |
NoHomoNoNorm |
The fitted regression model assuming no homoscedasticity and no normality. |
Predicted |
The predicted test scores based on the fitted model. |
Sandwich.Type |
The requested sandwich estimator. |
Order.Poly.Var |
The order of the polynomial variance prediction function. |
Wim Van der Elst
Fox, J. and Monette, G. (1992). Generalized collinearity diagnostics. JASA, 87, 178-183.
Long, J. S. and Ervin, L. H. (2000). Using Heteroscedasticity Consistent Standard Errors in the Linear Regression Model. The American Statistician, 54, 217-224.
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
plot Stage.1
, Stage.2.AutoScore
, Stage.2.NormScore
, Stage.2.NormTable
# Replicate the Stage 1 results that were obtained in # Case study 1 of Chapter 4 in Van der Elst (2023) # --------------------------------------------------- library(NormData) # load the NormData package data(GCSE) # load the GCSE dataset # Conduct the Stage 1 analysis Model.1.GCSE <- Stage.1(Dataset=GCSE, Model=Science.Exam~Gender) summary(Model.1.GCSE) plot(Model.1.GCSE) # Replicate the Stage 1 results that were obtained in # Case study 1 of Chapter 7 in Van der Elst (2023) # --------------------------------------------------- library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset # Add the variable Age.C (= Age centered) and its # quadratic and cubic terms to the Substitution dataset Substitution$Age.C <- Substitution$Age - 50 Substitution$Age.C2 <- (Substitution$Age - 50)**2 Substitution$Age.C3 <- (Substitution$Age - 50)**3 # Fit the full Stage 1 model Substitution.Model.1 <- Stage.1(Dataset=Substitution, Model=LDST~Age.C+Age.C2+Age.C3+Gender+LE+Age.C:LE+ Gender:LE+Age.C:Gender, Alpha=0.005) summary(Substitution.Model.1) # Fit the model in which the non-significant Age.C:Gender # interaction term is removed Substitution.Model.2 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+Age.C2+Age.C3+Gender+LE+ Age.C:LE+Gender:LE) summary(Substitution.Model.2) # Evaluate the significance of the Gender:LE interaction term # GLT is used because the interaction involves multiple regression # parameters GLT.1 <- GLT(Dataset=Substitution, Alpha=0.005, Unrestricted.Model=LDST~Age.C+Age.C2+Age.C3+ Gender+LE+Age.C:LE+Gender:LE, Restricted.Model=LDST~Age.C+Age.C2+Age.C3+ Gender+LE+Age.C:LE) summary(GLT.1) # Fit the model in which the non-significant Gender:LE # interaction term is removed Substitution.Model.3 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+Age.C2+Age.C3+Gender+LE+Age.C:LE) summary(Substitution.Model.3) # Evaluate the significance of the Age:LE interaction # using the General Linear Test framework GLT.2 <- GLT(Dataset=Substitution, Unrestricted.Model=LDST~Age.C+Age.C2+Age.C3+Gender+LE+Age.C:LE, Restricted.Model=LDST~Age.C+Age.C2+Age.C3+Gender+LE, Alpha=0.005) summary(GLT.2) # Fit the model in which the non-significant Age_c:LE # interaction term is removed Substitution.Model.4 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+Age.C2+Age.C3+Gender+LE) summary(Substitution.Model.4) # Fit the model in which the non-significant Age.C3 term is removed Substitution.Model.5 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+Age.C2+Gender+LE) summary(Substitution.Model.5) # Fit the model in which the non-significant Age.C2 term is removed Substitution.Model.6 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+Gender+LE) summary(Substitution.Model.6) # Fit the model in which the non-significant main effect of Gender # is removed Substitution.Model.7 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE) summary(Substitution.Model.7) plot(Substitution.Model.7, Normality = FALSE, Outliers = FALSE) # Check the significance of LE using the GLT framework GLT.3 <- GLT(Dataset=Substitution, Alpha=0.005, Unrestricted.Model=LDST~Age.C+LE, Restricted.Model=LDST~Age.C) summary(GLT.3) # Residual variance function. Substitution.Model.7 uses # a cubic polynomial variance prediction function. # Remove cubic Pred.Y term from Substitution.Model.7, so # fit quadratic variance prediction function Substitution.Model.8 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=2) # Order.Poly.Var=2 specifies a quadratic polynomial # for the variiance prediction function summary(Substitution.Model.8) plot(Substitution.Model.8, Normality = FALSE, Outliers = FALSE) # Remove quadratic Pred.Y term, so fit linear variance # prediction function Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) # Order.Poly.Var=1 specifies a linear polynomial # for the variiance prediction function # Final Stage 1 model summary(Substitution.Model.9) plot(Substitution.Model.9)
# Replicate the Stage 1 results that were obtained in # Case study 1 of Chapter 4 in Van der Elst (2023) # --------------------------------------------------- library(NormData) # load the NormData package data(GCSE) # load the GCSE dataset # Conduct the Stage 1 analysis Model.1.GCSE <- Stage.1(Dataset=GCSE, Model=Science.Exam~Gender) summary(Model.1.GCSE) plot(Model.1.GCSE) # Replicate the Stage 1 results that were obtained in # Case study 1 of Chapter 7 in Van der Elst (2023) # --------------------------------------------------- library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset # Add the variable Age.C (= Age centered) and its # quadratic and cubic terms to the Substitution dataset Substitution$Age.C <- Substitution$Age - 50 Substitution$Age.C2 <- (Substitution$Age - 50)**2 Substitution$Age.C3 <- (Substitution$Age - 50)**3 # Fit the full Stage 1 model Substitution.Model.1 <- Stage.1(Dataset=Substitution, Model=LDST~Age.C+Age.C2+Age.C3+Gender+LE+Age.C:LE+ Gender:LE+Age.C:Gender, Alpha=0.005) summary(Substitution.Model.1) # Fit the model in which the non-significant Age.C:Gender # interaction term is removed Substitution.Model.2 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+Age.C2+Age.C3+Gender+LE+ Age.C:LE+Gender:LE) summary(Substitution.Model.2) # Evaluate the significance of the Gender:LE interaction term # GLT is used because the interaction involves multiple regression # parameters GLT.1 <- GLT(Dataset=Substitution, Alpha=0.005, Unrestricted.Model=LDST~Age.C+Age.C2+Age.C3+ Gender+LE+Age.C:LE+Gender:LE, Restricted.Model=LDST~Age.C+Age.C2+Age.C3+ Gender+LE+Age.C:LE) summary(GLT.1) # Fit the model in which the non-significant Gender:LE # interaction term is removed Substitution.Model.3 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+Age.C2+Age.C3+Gender+LE+Age.C:LE) summary(Substitution.Model.3) # Evaluate the significance of the Age:LE interaction # using the General Linear Test framework GLT.2 <- GLT(Dataset=Substitution, Unrestricted.Model=LDST~Age.C+Age.C2+Age.C3+Gender+LE+Age.C:LE, Restricted.Model=LDST~Age.C+Age.C2+Age.C3+Gender+LE, Alpha=0.005) summary(GLT.2) # Fit the model in which the non-significant Age_c:LE # interaction term is removed Substitution.Model.4 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+Age.C2+Age.C3+Gender+LE) summary(Substitution.Model.4) # Fit the model in which the non-significant Age.C3 term is removed Substitution.Model.5 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+Age.C2+Gender+LE) summary(Substitution.Model.5) # Fit the model in which the non-significant Age.C2 term is removed Substitution.Model.6 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+Gender+LE) summary(Substitution.Model.6) # Fit the model in which the non-significant main effect of Gender # is removed Substitution.Model.7 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE) summary(Substitution.Model.7) plot(Substitution.Model.7, Normality = FALSE, Outliers = FALSE) # Check the significance of LE using the GLT framework GLT.3 <- GLT(Dataset=Substitution, Alpha=0.005, Unrestricted.Model=LDST~Age.C+LE, Restricted.Model=LDST~Age.C) summary(GLT.3) # Residual variance function. Substitution.Model.7 uses # a cubic polynomial variance prediction function. # Remove cubic Pred.Y term from Substitution.Model.7, so # fit quadratic variance prediction function Substitution.Model.8 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=2) # Order.Poly.Var=2 specifies a quadratic polynomial # for the variiance prediction function summary(Substitution.Model.8) plot(Substitution.Model.8, Normality = FALSE, Outliers = FALSE) # Remove quadratic Pred.Y term, so fit linear variance # prediction function Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) # Order.Poly.Var=1 specifies a linear polynomial # for the variiance prediction function # Final Stage 1 model summary(Substitution.Model.9) plot(Substitution.Model.9)
This function is useful to construct an automatic scoring sheet that implements the Stage 2 normative conversion approach in a spreadsheet. In particular, a spreadsheet will be created with three tabs that should be copy-pasted to the different sections of the Model details
tab of the template file. For details, see Van der Elst (2023).
Stage.2.AutoScore(Stage.1.Model, Assume.Homoscedasticity, Assume.Normality, Folder, NameFile="NormSheet.xlsx", verbose=TRUE)
Stage.2.AutoScore(Stage.1.Model, Assume.Homoscedasticity, Assume.Normality, Folder, NameFile="NormSheet.xlsx", verbose=TRUE)
Stage.1.Model |
A fitted object of class |
Assume.Homoscedasticity |
Logical. Should homoscedasticity be assumed? By default, homoscedasticity is assumed when the |
Assume.Normality |
Logical. Should normality of the standardized errors be assumed? By default, normality is assumed when the |
Folder |
The folder where the spreadsheet file should be saved. |
NameFile |
The name of the file in which the normative tables should be saved. Default |
verbose |
A logical value indicating whether verbose output should be generated. |
For details, see Van der Elst (2023).
An object of class Stage.2.AutoScore
with components,
Mean.Structure |
The mean prediction function. |
Residual.Structure |
The variance prediction function. |
Percentiles.Delta |
A table of the standardized residuals and their corresponding estimated percentile ranks (based on the CDF of the standard normal distribution or the CDF of the standardized residuals in the normative sample, see above). |
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
Stage.1
, Stage.2.NormTable
, Stage.2.AutoScore
# Replicate the Stage 1 results that were obtained in # Case study 1 of Chapter 4 in Van der Elst (2023) # --------------------------------------------------- library(NormData) # load the NormData package data(GCSE) # load the GCSE dataset # Conduct the Stage 1 analysis Model.1.GCSE <- Stage.1(Dataset=GCSE, Model=Science.Exam~Gender) summary(Model.1.GCSE) plot(Model.1.GCSE, Add.Jitter = .2) # Write the results to a spreadsheet file Stage.2.AutoScore(Stage.1.Model=Model.1.GCSE, Folder=tempdir(), # Replace tempdir() by the desired folder NameFile="GCSE.Output.xlsx") # Copy-paste the information in GCSE.Output.xlsx to the # template file, as detailed in Van der Elst (2023) # Replicate the Stage 1 results that were obtained in # Case study 1 of Chapter 7 in Van der Elst (2023) # --------------------------------------------------- library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset # Add the variable Age.C (= Age centered) to the Substitution dataset Substitution$Age.C <- Substitution$Age - 50 # Fit the final Stage 1 model Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) # Final Stage 1 model summary(Substitution.Model.9) plot(Substitution.Model.9) # Write the results to a spreadsheet file Stage.2.AutoScore(Stage.1.Model=Substitution.Model.9, Folder=tempdir(), # Replace tempdir() by the desired folder NameFile="LDST.Output.xlsx") # Copy-paste the information in LDST.Output.xlsx to the # template file, as detailed in Van der Elst (2023)
# Replicate the Stage 1 results that were obtained in # Case study 1 of Chapter 4 in Van der Elst (2023) # --------------------------------------------------- library(NormData) # load the NormData package data(GCSE) # load the GCSE dataset # Conduct the Stage 1 analysis Model.1.GCSE <- Stage.1(Dataset=GCSE, Model=Science.Exam~Gender) summary(Model.1.GCSE) plot(Model.1.GCSE, Add.Jitter = .2) # Write the results to a spreadsheet file Stage.2.AutoScore(Stage.1.Model=Model.1.GCSE, Folder=tempdir(), # Replace tempdir() by the desired folder NameFile="GCSE.Output.xlsx") # Copy-paste the information in GCSE.Output.xlsx to the # template file, as detailed in Van der Elst (2023) # Replicate the Stage 1 results that were obtained in # Case study 1 of Chapter 7 in Van der Elst (2023) # --------------------------------------------------- library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset # Add the variable Age.C (= Age centered) to the Substitution dataset Substitution$Age.C <- Substitution$Age - 50 # Fit the final Stage 1 model Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) # Final Stage 1 model summary(Substitution.Model.9) plot(Substitution.Model.9) # Write the results to a spreadsheet file Stage.2.AutoScore(Stage.1.Model=Substitution.Model.9, Folder=tempdir(), # Replace tempdir() by the desired folder NameFile="LDST.Output.xlsx") # Copy-paste the information in LDST.Output.xlsx to the # template file, as detailed in Van der Elst (2023)
The function Stage.2.NormScore()
can be used to convert the raw test score of a tested person into a percentile rank
(taking into account specified values of the independent variables).
Stage.2.NormScore(Stage.1.Model, Assume.Homoscedasticity, Assume.Normality, Score, Rounded=TRUE)
Stage.2.NormScore(Stage.1.Model, Assume.Homoscedasticity, Assume.Normality, Score, Rounded=TRUE)
Stage.1.Model |
A fitted object of class |
Assume.Homoscedasticity |
Logical. Should homoscedasticity be assumed in conducting the normative conversion? By default, homoscedasticity is assumed when the |
Assume.Normality |
Logical. Should normality of the standardized errors be assumed in conducting the normative conversion? By default, normality is assumed when the |
Score |
A |
Rounded |
Logical. Should the percentile rank be rounded to a whole number? Default |
For details, see Van der Elst (2023).
An object of class Stage.2.NormScore
with components,
Fitted.Model |
A fitted object of class |
Results |
A data frame that contains the observed test score, residuals, percentile rank, ... |
Assume.Homoscedasticity |
The homoscedasticity assumption that was made in the normative conversion. |
Assume.Normality |
The normality assumption that was made in the normative conversion. |
Score |
The test score and the value(s) of the independent variable(s) that were used in the computations. |
Stage.1.Model |
The |
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
Stage.2.NormTable
, Stage.2.AutoScore
, Bootstrap.Stage.2.NormScore
# Replicate the normative conversion that was obtained in # Case study 1 of Chapter 3 in Van der Elst (2023) # (science exam score = 30 obtained by a female) # ------------------------------------------------------- library(NormData) # load the NormData package data(GCSE) # load the GCSE dataset # Fit the Stage 1 model Model.1.GCSE <- Stage.1(Dataset=GCSE, Model=Science.Exam~Gender) # Stage 2: Convert a science exam score = 30 obtained by a # female into a percentile rank (point estimate) Normed_Score <- Stage.2.NormScore(Stage.1.Model=Model.1.GCSE, Score=list(Science.Exam=30, Gender="F")) summary(Normed_Score) plot(Normed_Score) # Replicate the normative conversion that was obtained in # Case study 1 of Chapter 7 in Van der Elst (2023) # (LDST score = 40 obtained by a 20-year-old # test participant with LE=Low) # ------------------------------------------------------- library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset # Make the new variable Age.C (= Age centered) that is # needed to fit the final Stage 1 model, # and add it to the Substitution dataset Substitution$Age.C <- Substitution$Age - 50 # Fit the final Stage 1 model Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) summary(Substitution.Model.9) # Convert an LDST score = 40 obtained by a # 20-year-old test participant with LE=Low # into a percentile rank (point estimate) Normed_Score <- Stage.2.NormScore( Stage.1.Model=Substitution.Model.9, Score=list(LDST=40, Age.C=20-50, LE = "Low")) summary(Normed_Score) plot(Normed_Score)
# Replicate the normative conversion that was obtained in # Case study 1 of Chapter 3 in Van der Elst (2023) # (science exam score = 30 obtained by a female) # ------------------------------------------------------- library(NormData) # load the NormData package data(GCSE) # load the GCSE dataset # Fit the Stage 1 model Model.1.GCSE <- Stage.1(Dataset=GCSE, Model=Science.Exam~Gender) # Stage 2: Convert a science exam score = 30 obtained by a # female into a percentile rank (point estimate) Normed_Score <- Stage.2.NormScore(Stage.1.Model=Model.1.GCSE, Score=list(Science.Exam=30, Gender="F")) summary(Normed_Score) plot(Normed_Score) # Replicate the normative conversion that was obtained in # Case study 1 of Chapter 7 in Van der Elst (2023) # (LDST score = 40 obtained by a 20-year-old # test participant with LE=Low) # ------------------------------------------------------- library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset # Make the new variable Age.C (= Age centered) that is # needed to fit the final Stage 1 model, # and add it to the Substitution dataset Substitution$Age.C <- Substitution$Age - 50 # Fit the final Stage 1 model Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) summary(Substitution.Model.9) # Convert an LDST score = 40 obtained by a # 20-year-old test participant with LE=Low # into a percentile rank (point estimate) Normed_Score <- Stage.2.NormScore( Stage.1.Model=Substitution.Model.9, Score=list(LDST=40, Age.C=20-50, LE = "Low")) summary(Normed_Score) plot(Normed_Score)
This function allows for deriving a normative table that shows percentile ranks that correspond to a wide range of raw test scores
(stratified by the relevant independent variables).
Stage.2.NormTable(Stage.1.Model, Assume.Homoscedasticity, Assume.Normality, Grid.Norm.Table, Test.Scores, Digits=6, Rounded=TRUE)
Stage.2.NormTable(Stage.1.Model, Assume.Homoscedasticity, Assume.Normality, Grid.Norm.Table, Test.Scores, Digits=6, Rounded=TRUE)
Stage.1.Model |
A fitted object of class |
Assume.Homoscedasticity |
Logical. Should homoscedasticity be assumed when deriving the normative table? By default, homoscedasticity is assumed when the |
Assume.Normality |
Logical. Should normality of the standardized errors be assumed when deriving the normative table? By default, normality is assumed when the |
Grid.Norm.Table |
A When multiple independent variables are considered, the |
Test.Scores |
A vector that specifies the raw test scores that should be shown in the normative table. |
Rounded |
Logical. Should the percentile ranks that are shown in the normative table be rounded to a whole number? Default |
Digits |
The number of digits that need to be shown in the normative table for the predicted means and residual standard errors. Default |
For details, see Van der Elst (2023).
An object of class Stage.2.NormTable
with components,
Norm.Table |
The normative table. |
Group.Specific.SD.Resid |
Logical. Where prediction-specific SDs of the residuals used? |
Empirical.Dist.Delta |
Logical. Was the CDF of the standardized residuals used to convert the raw test scores into percentile ranks? |
N.Analysis |
The sample size of the analyzed dataset. |
Test.Scores |
A vector of raw test scores for which percentile ranks were requested. |
Assume.Homoscedasticity |
Is homoscedasticity assumed in the computation of the normative data? |
Assume.Normality |
Is normality assumed in the computation of the normative data? |
Stage.1.Model |
The |
Grid.Norm.Table |
The specified |
Digits.Percentile |
The number of digits after the decimal point that were requested for the percentile ranks. |
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
Stage.2.NormScore
, Stage.2.AutoScore
, Bootstrap.Stage.2.NormScore
# Replicate the normative table that was obtained in # Case study 1 of Chapter 3 in Van der Elst (2023) # ----------------------------------------------------- library(NormData) # load the NormData package data(GCSE) # load the GCSE dataset # Fit the Stage 1 model Model.1.GCSE <- Stage.1(Dataset=GCSE, Model=Science.Exam~Gender) # Make a normative table for raw Science Exam scores = 10, # 11, ... 85, stratified by Gender NormTable.GCSE <- Stage.2.NormTable(Stage.1.Model=Model.1.GCSE, Test.Scores=c(10:85), Grid.Norm.Table=data.frame(Gender=c("F", "M"))) summary(NormTable.GCSE) # Replicate the normative table that was obtained in # Case study 1 of Chapter 7 in Van der Elst (2023) # ------------------------------------------------ library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset # Make the new variable Age.C (= Age centered) that is # needed to fit the final Stage 1 model, # and add it to the Substitution dataset Substitution$Age.C <- Substitution$Age - 50 # Fit the final Stage 1 model Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) # Make a normative table for LDST scores = 10, 12, ... 56, # stratified by Age and LE NormTable.LDST <- Stage.2.NormTable( Stage.1.Model=Substitution.Model.9, Test.Scores=seq(from=10, to=56, by=2), Grid.Norm.Table=expand.grid(Age.C=seq(from=-30, to=30, by=1), LE=c("Low", "Average", "High")))
# Replicate the normative table that was obtained in # Case study 1 of Chapter 3 in Van der Elst (2023) # ----------------------------------------------------- library(NormData) # load the NormData package data(GCSE) # load the GCSE dataset # Fit the Stage 1 model Model.1.GCSE <- Stage.1(Dataset=GCSE, Model=Science.Exam~Gender) # Make a normative table for raw Science Exam scores = 10, # 11, ... 85, stratified by Gender NormTable.GCSE <- Stage.2.NormTable(Stage.1.Model=Model.1.GCSE, Test.Scores=c(10:85), Grid.Norm.Table=data.frame(Gender=c("F", "M"))) summary(NormTable.GCSE) # Replicate the normative table that was obtained in # Case study 1 of Chapter 7 in Van der Elst (2023) # ------------------------------------------------ library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset # Make the new variable Age.C (= Age centered) that is # needed to fit the final Stage 1 model, # and add it to the Substitution dataset Substitution$Age.C <- Substitution$Age - 50 # Fit the final Stage 1 model Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) # Make a normative table for LDST scores = 10, 12, ... 56, # stratified by Age and LE NormTable.LDST <- Stage.2.NormTable( Stage.1.Model=Substitution.Model.9, Test.Scores=seq(from=10, to=56, by=2), Grid.Norm.Table=expand.grid(Age.C=seq(from=-30, to=30, by=1), LE=c("Low", "Average", "High")))
This dataset contains the scores of the Trait Anger scale of the STAS. The test participants were first-year psychology students from a university in the Dutch speaking part of Belgium. Participation was a partial fulfillment of the requirement to participate in research. The sample consists of
males and
females, reflecting the gender proportion among psychology students. The average age was
years. The data originally come from the package
psychotools
, dataset VerbalAgression
.
For more info, see https://cran.r-project.org/web/packages/psychotools/psychotools.pdf
.
data(STAS)
data(STAS)
A data.frame
with observations on
variables.
Id
The Id number of the student.
Gender
The gender of the student, coded as a factor.
Anger
The Trait Anger scale score of the STAS.
Substitution tests are speed-dependent tasks that require the participant to match particular signs (symbols, digits, or letters) to other signs within a specified time period. The LDST is an adaptation of earlier substitution tests, such as the Digit Symbol Substitution Test (DSST; Wechsler, 1981) and the Symbol Digit Modalities Test (SDMT; Smith, 1982). The LDST differs from other substitution tests in that the key consists of 'over-learned' signs, i.e., letters and digits. These are simulated data that are based on the results described in Van der Elst et al. (2006) (see Table 2).
data(Substitution)
data(Substitution)
A data.frame
with 1765 observations on 5 variables.
Id
The Id number of the participant.
Age
The age of the participant, in years.
Gender
The gender of the participant, coded as a factor with levels Male
and Female
.
LE
The Level of Education of the test participant, coded as a factor with levels Low
, Average
and High
.
LDST
The test score on the LDST (written version), i.e., the number of correct substitutions made in 60 seconds. A higher score reflects better performance.
This dataset contains the scores of the Taylor Manifest Anxiety Scale (TMAS; Taylor, 1953), administered online. A total of test participants completed the questionnaire. The TMAS scale score ranges between
and
, with lower scores corresponding to higher levels of anxiety.
data(TMAS)
data(TMAS)
A data.frame
with observations on
variables.
Id
The Id number of the test participant.
Gender
The gender of the test participant, coded as a factor.
Score
The TMAS score. A higher value is iindicative for less anxiety.
Taylor, J. (1953). A personality scale of manifest anxiety. The Journal of Abnormal and Social Psychology, 48(2), 285-290.
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
This function conducts Tukey's Honest Significance Difference (HSD; Tukey, 1949) test that allows for making post hoc comparisons of the group means. Tukey's HSD can only be conducted when the mean structure of the Stage 1 model only contains qualitative independent variables (i.e., when the fitted regression model is essentially an ANOVA).
Tukey.HSD(Stage.1.Model, ...)
Tukey.HSD(Stage.1.Model, ...)
Stage.1.Model |
A fitted stage one model that only contains qualitative variables. |
... |
Arguments to be passed to the plot function of the Tukey HSD procedure. |
No return value, called for side effects.
Wim Van der Elst
Tukey, J. (1949). Comparing individual means in the Analysis of Variance. Biometrics, 5, 99-114.
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
data(Personality) Model.Openness <- Stage.1(Dataset = Personality, Model = Openness ~ LE) # conduct post hoc comparisons for the levels of education Tukey.Openness <- Tukey.HSD(Model.Openness) summary(Tukey.Openness) plot(Tukey.Openness) # conduct post hoc comparisons for the levels of education by education combinations data(Substitution) Model.Substitution <- Stage.1(Dataset = Substitution, Model = LDST ~ LE*Gender) Tukey.Substitution <- Tukey.HSD(Model.Substitution) summary(Tukey.Substitution) plot(Tukey.Substitution)
data(Personality) Model.Openness <- Stage.1(Dataset = Personality, Model = Openness ~ LE) # conduct post hoc comparisons for the levels of education Tukey.Openness <- Tukey.HSD(Model.Openness) summary(Tukey.Openness) plot(Tukey.Openness) # conduct post hoc comparisons for the levels of education by education combinations data(Substitution) Model.Substitution <- Stage.1(Dataset = Substitution, Model = LDST ~ LE*Gender) Tukey.Substitution <- Tukey.HSD(Model.Substitution) summary(Tukey.Substitution) plot(Tukey.Substitution)
This dataset contains the Total Recall scores of the Verbal Learning Test (VLT). A total of test-participants participated in the study. These are simulated data based on the results described in Van der Elst et al. (2005).
data(VLT)
data(VLT)
A data.frame
with observations on
variables.
Id
The Id number of the test participant.
Age
The age of the test participant (in years).
Gender
The gender of the test participant, coded as a factor.
LE
The level of education of the test participant.
Total.Recall
The Total Recall score. A higher score is indicative for better verbal memory ability.
Van der Elst et al. (2005). Rey's Verbal Learning Test: Normative data for 1,855 healthy participants aged 24-81 years and the influence of age, sex, education, and mode of presentation. Journal of the International Neuropsychological Society, 11, 290-302.
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
The function Stage.2.NormTable()
allows for deriving a normative table that shows percentile ranks that correspond to a wide range of raw test scores
(stratified by the relevant independent variables). The raw R output format that is provided by the
Stage.2.NormTable()
function is not always convenient, especially when a large number of test scores are tabulated and the table is spread out over several lines. The function WriteNormTable()
can be used to export the normative table to a .txt
, .csv
or .xlsx
file. Such a file can then be opened in a spreadsheet (such as Google Sheets or LibreOffice), where the normative table can be put in a more user-friendly format.
WriteNormTable(NormTable, Folder, NameFile="NormTable.xlsx", verbose=TRUE)
WriteNormTable(NormTable, Folder, NameFile="NormTable.xlsx", verbose=TRUE)
NormTable |
An object of class |
Folder |
The folder where the file with the normative table should be saved. |
NameFile |
The name of the file to which the normative table should be written. Only the extensions |
verbose |
A logical value indicating whether verbose output should be generated. |
No return value, called for side effects.
Wim Van der Elst
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
# Replicate the normative table that was obtained in # Case study 1 of Chapter 3 in Van der Elst (2023) # ----------------------------------------------------- library(NormData) # load the NormData package data(GCSE) # load the GCSE dataset # Fit the Stage 1 model Model.1.GCSE <- Stage.1(Dataset=GCSE, Model=Science.Exam~Gender) # Make a normative table for raw Science Exam scores = 10, # 11, ... 85, stratified by Gender NormTable.GCSE <- Stage.2.NormTable(Stage.1.Model=Model.1.GCSE, Test.Scores=c(10:85), Grid.Norm.Table=data.frame(Gender=c("F", "M"))) summary(NormTable.GCSE) # Write the normative table to the user's computer WriteNormTable(NormTable=NormTable.GCSE, NameFile="NormTable.GCSE.xlsx", Folder=tempdir()) # Replace tempdir() by the desired folder # Replicate the normative table that was obtained in # Case study 1 of Chapter 7 in Van der Elst (2023) # ------------------------------------------------ library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset # Make the new variable Age.C (= Age centered) that is # needed to fit the final Stage 1 model, # and add it to the Substitution dataset Substitution$Age.C <- Substitution$Age - 50 # Fit the final Stage 1 model Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) # Make a normative table for LDST scores = 10, 12, ... 56, # stratified by Age and LE NormTable.LDST <- Stage.2.NormTable( Stage.1.Model=Substitution.Model.9, Test.Scores=seq(from=10, to=56, by=2), Grid.Norm.Table=expand.grid(Age.C=seq(from=-30, to=30, by=1), LE=c("Low", "Average", "High"))) # Write the normative table to the user's computer WriteNormTable(NormTable=NormTable.LDST, NameFile="NormTable.LDST.xlsx", Folder=tempdir()) # Replace tempdir() by the desired folder
# Replicate the normative table that was obtained in # Case study 1 of Chapter 3 in Van der Elst (2023) # ----------------------------------------------------- library(NormData) # load the NormData package data(GCSE) # load the GCSE dataset # Fit the Stage 1 model Model.1.GCSE <- Stage.1(Dataset=GCSE, Model=Science.Exam~Gender) # Make a normative table for raw Science Exam scores = 10, # 11, ... 85, stratified by Gender NormTable.GCSE <- Stage.2.NormTable(Stage.1.Model=Model.1.GCSE, Test.Scores=c(10:85), Grid.Norm.Table=data.frame(Gender=c("F", "M"))) summary(NormTable.GCSE) # Write the normative table to the user's computer WriteNormTable(NormTable=NormTable.GCSE, NameFile="NormTable.GCSE.xlsx", Folder=tempdir()) # Replace tempdir() by the desired folder # Replicate the normative table that was obtained in # Case study 1 of Chapter 7 in Van der Elst (2023) # ------------------------------------------------ library(NormData) # load the NormData package data(Substitution) # load the Substitution dataset # Make the new variable Age.C (= Age centered) that is # needed to fit the final Stage 1 model, # and add it to the Substitution dataset Substitution$Age.C <- Substitution$Age - 50 # Fit the final Stage 1 model Substitution.Model.9 <- Stage.1(Dataset=Substitution, Alpha=0.005, Model=LDST~Age.C+LE, Order.Poly.Var=1) # Make a normative table for LDST scores = 10, 12, ... 56, # stratified by Age and LE NormTable.LDST <- Stage.2.NormTable( Stage.1.Model=Substitution.Model.9, Test.Scores=seq(from=10, to=56, by=2), Grid.Norm.Table=expand.grid(Age.C=seq(from=-30, to=30, by=1), LE=c("Low", "Average", "High"))) # Write the normative table to the user's computer WriteNormTable(NormTable=NormTable.LDST, NameFile="NormTable.LDST.xlsx", Folder=tempdir()) # Replace tempdir() by the desired folder