Title: | Comprehensive Automatized Evaluation of Distribution Models for Count Data |
---|---|
Description: | A large number of measurements generate count data. This is a statistical data type that only assumes non-negative integer values and is generated by counting. Typically, counting data can be found in biomedical applications, such as the analysis of DNA double-strand breaks. The number of DNA double-strand breaks can be counted in individual cells using various bioanalytical methods. For diagnostic applications, it is relevant to record the distribution of the number data in order to determine their biomedical significance (Roediger, S. et al., 2018. Journal of Laboratory and Precision Medicine. <doi:10.21037/jlpm.2018.04.10>). The software offers functions for a comprehensive automated evaluation of distribution models of count data. In addition to programmatic interaction, a graphical user interface (web server) is included, which enables fast and interactive data-scientific analyses. The user is supported in selecting the most suitable counting distribution for his own data set. |
Authors: | Jaroslaw Chilimoniuk [cre, ctb] , Alicja Gosiewska [ctb] , Jadwiga Słowik [ctb] , Michal Burdukiewicz [aut] , Stefan Roediger [ctb] |
Maintainer: | Jaroslaw Chilimoniuk <[email protected]> |
License: | GPL-3 |
Version: | 1.4 |
Built: | 2024-11-13 05:45:28 UTC |
Source: | https://github.com/biogenies/countfitter |
The countfitteR
package is a toolbox for the analysis of
count data.
countfitteR is a wrapper around existing count models in R. To standardize error messages
and ease up the integration, we slightly modified the zeroinfl
function by Achim Zeileis.
Jaroslaw Chilimoniuk, Stefan Roediger, Michal Burdukiewcz
set.seed(15390) library(countfitteR) df <- data.frame(pois = rpois(25, 0.3), binom = rbinom(25, 1, 0.8)) cmp <- compare_fit(df, fitlist = fit_counts(df, model = "all"))
set.seed(15390) library(countfitteR) df <- data.frame(pois = rpois(25, 0.3), binom = rbinom(25, 1, 0.8)) cmp <- compare_fit(df, fitlist = fit_counts(df, model = "all"))
case_study_FITC
shorter version of the case_study_FITC
. Used as an example in shiny app,
when the user will not load his own count data.
case_study
case_study
example data extracted from Aklides system and merged into one file. Counts in this file will not fit properly, due to the fact that we integrated into the file counts with two different fluorescent dyes used.
case_study_all
case_study_all
example data extracted from Aklides system. Counts with only APC fluorescent dye were merged.
case_study_APC
case_study_APC
example data extracted from Aklides system. Counts with only FITC fluorescent dye were merged.
case_study_FITC
case_study_FITC
Compare empirical distribution of counts with the distribution defined by the model fitted to counts.
compare_fit(count_list, fitlist = fit_counts(count_list, model = "all"))
compare_fit(count_list, fitlist = fit_counts(count_list, model = "all"))
count_list |
A |
fitlist |
a list of fits, as created by |
A data.frame
with distribution values for each unique count.
Count is the name of the original count, model is the name of distribution
model, x is unique count value, n is the frequency of unique counts, value
is result of calculations made by chosen
distribution model.
df <- data.frame(poisson = rpois(25, 0.3), binomial = rbinom(25, 1, 0.8)) compare_fit(df, fitlist = fit_counts(df, model = "all"))
df <- data.frame(poisson = rpois(25, 0.3), binomial = rbinom(25, 1, 0.8)) compare_fit(df, fitlist = fit_counts(df, model = "all"))
Launches graphical user interface that analyses given count data and chooses the best performing distribution model.
countfitteR_gui()
countfitteR_gui()
Any ad-blocking software may cause malfunctions.
Jaroslaw Chilimoniuk, Stefan Roediger, Michal Burdukiewcz
if(interactive()) { countfitteR_gui() }
if(interactive()) { countfitteR_gui() }
Select the most appropriate distribution for the count data in the html-friendly format.
decide(summary_fit, separate)
decide(summary_fit, separate)
summary_fit |
a result of the |
separate |
|
df <- data.frame(poisson = rpois(25, 0.3), binomial = rbinom(25, 1, 0.8)) fc <- fit_counts(df, model = "all") summ <- summary_fitlist(fc) decide(summ, separate = FALSE)
df <- data.frame(poisson = rpois(25, 0.3), binomial = rbinom(25, 1, 0.8)) fc <- fit_counts(df, model = "all") summ <- summary_fitlist(fc) decide(summ, separate = FALSE)
Fit counts to distributions
fit_counts(counts_list, separate = TRUE, model, level = 0.95, ...)
fit_counts(counts_list, separate = TRUE, model, level = 0.95, ...)
counts_list |
A |
separate |
|
model |
single |
level |
Confidence level, default is 0.95. |
... |
Dots parameters are ignored. |
The list of fitted models. Names are names of original counts, an underline
and a name of model used.
confint is a matrix
with the number of rows equal to the number of
parameters. Rownames are names of parameters. The columns contain respectively
lower and upper confidence intervals.
df <- data.frame(poisson = rpois(25, 0.3), binomial = rbinom(25, 1, 0.8)) fit_counts(df, model = "pois")
df <- data.frame(poisson = rpois(25, 0.3), binomial = rbinom(25, 1, 0.8)) fit_counts(df, model = "pois")
Compare empirical distribution of counts with the distribution defined by the model fitted to counts. The bar charts represent theoretical counts depending on the chosen distribution. Red dots describe the real number of counts.
plot_fitcmp(fitcmp)
plot_fitcmp(fitcmp)
fitcmp |
You need to input data frame that is created by compare_fit function. |
df <- data.frame(poisson = rpois(25, 0.3), binomial = rbinom(25, 1, 0.8)) fitcmp <- compare_fit(df, fitlist = fit_counts(df, model = "all")) plot_fitcmp(fitcmp)
df <- data.frame(poisson = rpois(25, 0.3), binomial = rbinom(25, 1, 0.8)) fitcmp <- compare_fit(df, fitlist = fit_counts(df, model = "all")) plot_fitcmp(fitcmp)
Converts data in a table-like formats into lists of counts.
process_counts(x)
process_counts(x)
x |
|
case_study
does not consider NA
s and NaN
s effectively
omitting them (as per the is.na
function).
A list
of counts.
data(case_study) process_counts(case_study)
data(case_study) process_counts(case_study)
Select the most appropriate model
select_model(fitlist)
select_model(fitlist)
fitlist |
a list of fits, as created by |
a data.frame
with two columns: count
representing the name of the count and chosen model
with the model with the lowest BIC.
set.seed(1) df <- data.frame(poisson1 = rpois(50, 2), poisson2 = rpois(50, 5), zip1 = rZIP(50, 2, 0.7), zip2 = rZIP(50, 5, 0.7)) fitlist_separate <- fit_counts(df, model = c("pois", "zip")) select_model(fitlist_separate)
set.seed(1) df <- data.frame(poisson1 = rpois(50, 2), poisson2 = rpois(50, 5), zip1 = rZIP(50, 2, 0.7), zip2 = rZIP(50, 5, 0.7)) fitlist_separate <- fit_counts(df, model = c("pois", "zip")) select_model(fitlist_separate)
Data created from simulation of NB Poiss
sim_dat
sim_dat
# code used to generate the data # be warned: the simulations will take some time ## Not run: library(dplyr) set.seed(15390) sim_dat <- do.call(rbind, lapply(10^(-3L:2), function(single_theta) do.call(rbind, lapply(1L:10/2, function(single_lambda) do.call(rbind, lapply(1L:100, function(single_rep) { foci <- lapply(1L:10, function(dummy) rnbinom(600, size = single_theta, mu = single_lambda)) names(foci) <- paste0("C", 1L:10) fit_counts(foci, separate = TRUE, model = "all") %>% summary_fitlist %>% mutate(between = single_lambda < upper & single_lambda > lower) %>% group_by(model) %>% summarize(prop = mean(between)) %>% mutate(replicate = single_rep, lambda = single_lambda, theta = single_theta) })) )) )) ## End(Not run)
# code used to generate the data # be warned: the simulations will take some time ## Not run: library(dplyr) set.seed(15390) sim_dat <- do.call(rbind, lapply(10^(-3L:2), function(single_theta) do.call(rbind, lapply(1L:10/2, function(single_lambda) do.call(rbind, lapply(1L:100, function(single_rep) { foci <- lapply(1L:10, function(dummy) rnbinom(600, size = single_theta, mu = single_lambda)) names(foci) <- paste0("C", 1L:10) fit_counts(foci, separate = TRUE, model = "all") %>% summary_fitlist %>% mutate(between = single_lambda < upper & single_lambda > lower) %>% group_by(model) %>% summarize(prop = mean(between)) %>% mutate(replicate = single_rep, lambda = single_lambda, theta = single_theta) })) )) )) ## End(Not run)
Counts are fitted to model(s) using the count name as the explanatory variable.
Estimates are presented in the table below along with the BIC values of their models.
Estimated coefficients of models (lambda
for all distributions, theta
for NB and ZINB,
r
for ZIP and ZINB).
summary_fitlist(fitlist)
summary_fitlist(fitlist)
fitlist |
a list of fits, as created by |
Data frame with summarised results of all distribution models.
Count: the name of the original count.
lambda: - Poisson mean, lower and upper confidence intervals.
BIC: Bayesian information criterion
theta: - dispersion parameter
r: probability of excess zeros.
df <- data.frame(poisson = rpois(25, 0.3), binomial = rbinom(25, 1, 0.8)) fc <- fit_counts(df, model = "all") summary_fitlist(fc)
df <- data.frame(poisson = rpois(25, 0.3), binomial = rbinom(25, 1, 0.8)) fc <- fit_counts(df, model = "all") summary_fitlist(fc)
Validates count data.
validate_counts(x)
validate_counts(x)
x |
|
Errors if x
has negative values or non-numeric
values, otherwise TRUE
.
An input object.
data(case_study) process_counts(case_study)
data(case_study) process_counts(case_study)
Density and random generation for the zero-inflated negative binomial distribution.
rZINB(n, size, mu, r) dZINB(x, size, mu, r)
rZINB(n, size, mu, r) dZINB(x, size, mu, r)
n |
number of random values to return. |
size |
target for number of successful trials, or dispersion parameter (the shape parameter of the gamma mixing distribution). Must be strictly positive, need not be integer.. |
mu |
mean. |
r |
probability of excess zeros. |
x |
vector of (non-negative integer) quantiles. |
Negative binomial distribution: NegBinomial
.
rZINB(15, 1.9, 0.9, 0.8)
rZINB(15, 1.9, 0.9, 0.8)
Density and random generation for the zero inflated Poisson distribution.
dZIP(x, lambda, r) rZIP(n, lambda, r)
dZIP(x, lambda, r) rZIP(n, lambda, r)
x |
vector of (non-negative integer) quantiles. |
lambda |
vector of (non-negative) means. |
r |
probability of excess zeros. |
n |
number of random values to return. |
Poisson distribution: Poisson
.
rZIP(15, 1.9, 0.9)
rZIP(15, 1.9, 0.9)