dataquieR

dataquieR is an R package designed to conduct automated and standardized data quality assessments. It can be applied to all sorts of tabular data. Spreadsheet-type metadata can be used to specify descriptions, expectations, and requirements about the data.

6
contributors
Get started
309 commitsLast commit ≈ 1 month ago7 stars5 forks

Cite this software

Description

The goal of dataquieR is to provide functions for assessing data quality issues across four dimensions: data integrity (e.g., data type errors or duplicates), completeness (e.g., missing values), consistency (e.g., range violations or contradictions), and accuracy (e.g., time trends or examiner effects). It can be applied to any tabular data, including population-based cohort studies, registries, and electronic health record (EHR) data. It can be used alone or in a data quality pipeline. dataquieR also implements one generic pipeline producing htmltools-based HTML5 reports.

See also https://dataquality.qihs.uni-greifswald.de


Installation

You can install the released version of dataquieR from CRAN with:

install.packages("dataquieR")

The suggested packages can be directly installed by:

install.packages("dataquieR", dependencies = TRUE)

The developer version from GitLab.com can be installed using:

if (!requireNamespace("devtools")) {
  install.packages("devtools")
}
devtools::install_gitlab("libreumg/dataquier")

For examples and additional documentation, please refer to our website.

dataquieR usage questionnaire

To help us improve dataquieR, we invite you to provide your feedback by completing this short survey (English or German version).

Suggested packages

dataquieR reports can now use plotly if installed. That means that, in the final report, you can zoom in the figures and get
information by hovering on the points, etc. To install plotly type:

install.packages("plotly")

To install all suggested packages, run:

prep_check_for_dataquieR_updates()

This command can also check for new beta releases of dataquieR from our own server, so not from CRAN:

prep_check_for_dataquieR_updates(beta = TRUE)

Hint If you are running dataquieR in an un-trusted setting, namely, inside a server application, please consider disabling the import of R-serialization files to prevent users from importing RData (or RDS or even R) files, that trigger code execution on your machine, see, e.g., Ivan Krylov’s blog for the reason:

# prevent rio from reading potentially code-containing files 
options(rio.import.trust = FALSE)

If you do so, the example data won’t be loaded any more.

If you are using a version >= 2.0.0 of rio, this will be the default, so for running our examples, then, you’ll have to trust our files by using e.g. withr::with_options(list(rio.import.trust = FALSE), prep_get_data_frame("study_data")) for loading our example study data into the data-frame cache, initially and trusting our files loaded from

References

Funding – see also here

Participating organisations

Uni

Reference papers

Contributors

SS
Stephan Struckmann
Senior developer
Universitätsmedizin Greifswald
ES
Elena Salogni
Software developer
University Medicine Greifswald
EK
Elisa Kasbohm
Software developer
Universitätsmedizin Greifswald
AR
Adrian Richter
Software developer
German Rheumatism Research Center
CS
Carsten Oliver Schmidt
Principal investigator
Universität Greifswald Medizinische Fakultat