References

  1. Chinnery, Patrick F., and David C. Samuels. "Relaxed replication of mtDNA: a model with implications for the expression of disease." The American Journal of Human Genetics 64.4 (1999): 1158-1165.
  2. Dicu, Daria. “Prior Parameter Distributions.” Absolute Quantification of MtDNA, 2017, dariadicu.github.io/inference.html.
  3. van Dijk, David, et al. "MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data." BioRxiv (2017): 111591.
  4. Haario, Heikki, Eero Saksman, and Johanna Tamminen. "An adaptive Metropolis algorithm." Bernoulli 7.2 (2001): 223-242.
  5. Kharchenko, Peter V., Lev Silberstein, and David T. Scadden. "Bayesian approach to single-cell differential expression analysis." Nature methods 11.7 (2014): 740.
  6. Kiselev, Vladimir, et al. "Analysis of single cell RNA-seq data." Hemberg Group | Wellcome Sanger Institute (2018).
  7. Lalam, Nadia. "Statistical inference for quantitative polymerase chain reaction using a hidden Markov model: a Bayesian approach." Statistical applications in genetics and molecular biology 6.1 (2007).
  8. Li, Wei Vivian, and Jingyi Jessica Li. "An accurate and robust imputation method scImpute for single-cell RNA-seq data." Nature communications 9.1 (2018): 997.
  9. Metropolis, Nicholas, et al. "Equation of state calculations by fast computing machines." The journal of chemical physics 21.6 (1953): 1087-1092.
  10. Murphy, K. (2012). Machine learning. Cambridge, Mass. [u.a.]: MIT Press.
  11. Roberts, Gareth O., and Jeffrey S. Rosenthal. "Examples of adaptive MCMC." Journal of Computational and Graphical Statistics 18.2 (2009): 349-367.
  12. Schmittgen, Thomas D., and Kenneth J. Livak. "Analyzing real-time PCR data by the comparative C T method." Nature protocols 3.6 (2008): 1101.
  13. Stewart, James B., and Patrick F. Chinnery. "The dynamics of mitochondrial DNA heteroplasmy: implications for human health and disease." Nature Reviews Genetics 16.9 (2015): 530.
  14. Stumpf, Michael P.H. 2017, Lecture 18: Parameter Estimation & Model Selection, MSc in Bioinformatics and Theoretical Systems Biology (2017-18), Imperial College London, delivered 30 November 2017.
  15. Tang, Fuchou, et al. "mRNA-Seq whole-transcriptome analysis of a single cell." Nature methods 6.5 (2009): 377.
  16. Wilson, Philip J., and Stephen LR Ellison. "Extending digital PCR analysis by modelling quantification cycle data." BMC bioinformatics 17.1 (2016): 421.
  17. Yuan, Yuan, et al. "Comprehensive Molecular Characterization of Mitochondrial Genomes in Human Cancers." bioRxiv (2017): 161356.

Glossary

Analyte:

Object to be identified and measured.

Binomial branching process:

Stochastic process used to model population as a set of discrete random variables $X_i$ over time units $i$, where the transition between consecutive states is dictated by a binomial distribution.

Burn-in:

A initial period of iterations of a MCMC method before the covariance matrix is updated and the transition kernel adopts a new shape.

Expression matrix:

Each row of the expression matrix represents a gene and each column represents a cell (NB: sometimes the transpose is used). Each entry represents the expression level of a particular gene in a given cell. The units by which the expression is meassured depends on the protocol and the normalization strategy used for the data. Many analyses of scRNA-seq data take as their starting point an expression matrix.

Fat hairy caterpillar:

The shape of the trace plot generated during inference, indicating that the Markov chain is well-mixed.

Gene dropout:

A gene being observed at a moderate expression level in one cell but not being detected in another cell.

Heteroplasmy:

The presence of more than one type of organellar genome (mitochondrial DNA or plastid DNA) within a cell or individual. It can be expressed as a ratio $m/(w+m)$, with $m$ the number of mutant DNA molecules present, and $w$ the number of wildtype.

Jeffrey's prior:

A non-informative prior distribution proportional to the square root of the determinant of the Fisher information matrix.

Manhattan skyline:

The shape of the trace plot generated during inference, indicating that the Markov chain is 'hot'.

Prior:

Probability distribution of a variable, incorporating prior knowledge into the model before inference is performed.

Posterior:

Probability distribution of a variable given observed data, the target object of interest in inference on experimental data.

Sequencing depth:

A measure related to the number of unique reads that include a given nucleotide in the reconstructed sequence. Deep sequencing refers to the general concept of aiming for high number of unique reads of each region of a sequence.

Sequencing read:

The sequence of a section of a unique fragment of RNA transcript.

Wildtype:

Non-mutant DNA molecule.

Abbreviations and Notation

$\alpha$:

The parameter describing the proportionality of the fluorescence intensity emitted at a given cycle to the number of molecules at that point.

AIC:

Akaike information criterion.

AM:

Adaptive Metropolis (algorithm).

cDNA:

complementary DNA.

ddPCR:

Droplet digital PCR.

HMM:

Hidden Markov Model.

MCMC:

Markov chain Monte Carlo.

MH:

Metropolis Hastings (algorithm).

mtDNA:

mitochondrial DNA.

PCR:

polymerase chain reaction.

qPCR:

quantitative PCR.

$r$:

the efficiency of the qPCR experiment.

RNAseq:

RNA sequencing.

$\sigma$:

the variance contributing to the noise term in the fluorescence expression.

scRNAseq:

single cell RNA sequencing.

$X_0$:

the initial number of molecules present in the single cell, the primary target analyte.