Terms and Data Information

Terms of use

The data presented here are released for the broader research community's benefit. You are welcome to freely download and explore the data. We encourage utilization and publication of frequency data for specific targeted sets of variants, such as analyzing candidate causal variants found in rare disease patients. However, we request refraining from publishing global (genome-wide) analyses of this data or large gene sets until after our papers are published, which is anticipated in 2025.

Citation in publications

Please cite our bioRxiv preprint for usage of the data from the Macaque Capture and Whole Exome Database (mCED) browser.

Data Generation

The sequencing reads were mapped to the rhesus reference genome assembly (Mmul_8.0.1 or Mmul_10) using BWA mem. Single nucleotide variants (SNVs) and short insertion/deletions (indels) were called following the GATK pipeline. Variants called based on the Mmul_8.0.1 genome assembly were subsequently mapped to the Mmul_10 assembly using the UCSC genome browser program liftOver. Any sites with mismatched reference nucleotides were corrected using bcftools.

The sequenced variants underwent quality control and filtering based on the following criteria: 1. Variants labeled as "PASS" by GATK, meeting specific filtering conditions: QD < 5.0, QUAL < 30.0, FS > 15.0, MQ < 50.0, MQRankSum < -12.5, and ReadPosRankSum < -8.0. 2. Variants that remained dimorphic after assigning missing variants according to the following criteria: a) variants not called; b) heterozygous variant calls with allelic imbalance (AB>0.8 or AB<0.2); c) variant GQ per sample (from GATK) < 20; d) genotypes supported by less than 10 reads (DP < 10). 3. Sites where more than two distinct alleles were detected in a single subject were filtered out.