Raes Lab

1 - MT-associated loci and heritability

Maximum likelihood phylogenetic tree of gut microbial genera used in association analysis. Circles sizes along the branches and nodes of the tree indicate the number of loci observed to be associated to microbial diversity at that genus, family, order, class, or phylum levels. Bootstrap values over 60 are shown on branches. For illustrative purposes, loci counts, or the number of associated loci, greater than five were set to lower values; true loci counts can be found in Table S3. Bar plots on the right describe the prevalence and the standardized mean abundance of the bacterial genera in the FGFP population, and the estimated heritability (bars represent standard errors, point estimates are in white; arrows identify those observations determined to be different from zero at a p-value less than 0.05 (likelihood ratio test)) of AB and P/A traits. True heritability values can be found in Table S3.

2 - Genomic variants associated with microbial traits (i)

Manhattan plots illustrating negative log10 p-values derived from the FGFP cohort Score test association analysis for (a) rank normal transformed (RNT) abundances and (b) presence/absence (Hurdle Binary) states. The genome-wide threshold is indicated by the horizontal dashed line. (c) Manhattan plot for our targeted meta-analysis derived from Expectation-Maximization (em) parameter estimates. Sites that did not exhibit consistent effect estimates in meta-analysis are shaded in grey, while those sites that had a smaller meta-p-value than the FGFP (em) p-value are colored in blue and red for P/A (HB) and RNT (AB) traits, respectively. Loci that achieved study-wide significance are highlighted by a blue tower, while those exceeding the genome-wide threshold are in shaded in grey. Dashed lines indicate the study-level (black), conventional genome-wide (grey), and target-meta analysis (red) thresholds of 1.57x10−10, 2.5x10−8, and 1x10-5, respectively.

2 - Genomic variants associated with microbial traits (ii)

LocusZoom plots of association results using the FGFP derived p-values for three top SNPs that achieved the conventional genome-wide level p-value in the meta-analysis (d-f). The LD estimates are color coded (D′ ≥0.3 to >0.4 in purple) and recombination rate is indicated by the blue lines and the z-axis. The proximal genes to the top SNPs are indicated in the bottom panel. Genotype to microbial trait structure for the tagged variant at each locus is illustrated in the insert, with (f) illustrating a boxplot of taxon abundance by genotypic state (homozygous reference (0), heterozygous (1), or homozygous effect allele (2)), while (d) and (e) inserts illustrate bar plots of the proportion of individuals in each genotype class where the taxon is absent (red) or present (blue) in each of the observed genotypic states.

Name	Description	Download
Supplementary_Table_1	Summary of analytical methods as published in previous human, 16S rRNA microbiome GWAS studies. The table includes publication details of the studies (the title, first author, website url), their sample size, 16S rRNA gene variable region amplified and sequenced, software or pipeline used to cluster 16S reads into taxonomic and/or operational taxonomic units (OUT). In addition, the number of microbial traits (MT) analyzed in the mGWAS, transformation applied to the MT abundance data, software or test applied to perform association between genotype and MT, and methodologies used to account for population structure and for multiple testing burden.	Download
Supplementary_Table_2	Microbial trait data used in this study. The table includes individual level DADA2 integer count data for all phylogenetic levels (highlighted in green); the (zero-truncated)-abundance data rank normal transformed and regressed (RNTRes) against technical batch covariates, PC1-10, sex, and age (residuals extracted) (in blue); the derived presence (1) or absence(0) (P/A) hurdle binary (HB) data for zero-inflated counts (yellow); and the diversity metrics, untransformed or regressed and enterotype calls.	Download
Supplementary_Table_3	GWAS taxa (92) result summaries. The table contains the 92 taxon used in the mGWAS with their corresponding phylogenetic description (phylum, class, order, family, genus), prevalence and abundance in FGFP, the number of sites and loci associated with each taxon derived from the meta-analysis, and from FGFP alone. In addition, the heritability point estimates, standard errors, and p-values for presence/absence (P/A), rank normal transformed abundances (AB), as well as log and box-cox transformations of the integer count data for cross study comparison.	Download
Supplementary_Table_4	Results for association analysis for CNVs against all MTs. The table includes all the CNV included in the GWAS and provided results for: (1) specific CNVs, as called by PennCNV (CNV_GWAS); (2) burned GWAS performed in 1 million base pair non-overlapping windows (CNV_Burden_GWAS), (3) a global burden GWAS, calculated as sum of all CNV counts (CNV_GlobalBurden_GWAS); and (4) a global burden GWAS of rare CNVs, calculated as the sum of all CNV counts for CNVs observed variable in less than 1% of the FGFP population (CNV_RareGlobalBurden).	Download
Supplementary_Table_5	Results and annotation for all 3,321 meta-supported associations. Meta-supported associations refer to those with smaller p-value in the meta-analysis compared to FGFP alone. The table includes the microbial traits, rsids, snpids, a flag (TagLocus) indicating if a variant is a locus tag or top result in a linkage disequilibrium block, parameter estimates for the meta-analysis, FGFP, FoCUS and PopGen including, beta, se, association p-value, sample size, variant info score, minor allele frequency, effect allele frequency, and annotation of variants with DEPICT, and ENSEMBL v75 via biomaRt.	Download
Supplementary_Table_6	See Supplementary Information.	Download
Supplementary_Table_7	Catalog of MT-SNP matched hits previously reported in mGWAS associations. The table provides the estimates for the like or a higher order taxonomic level microbial trait, when available, from the FGFP (-method score) alongside the publicly available data from previous studies.	Download
Supplementary_Table_8	Catalog of top hits previously reported in mGWAS associations. Estimates for the STONGEST associated (smallest p-value, REGARDLESS OF THE TRAIT) microbial trait, when available, from the FGFP (-method score) is provided alongside the publicly available data from previous studies.	Download
Supplementary_Table_9	The effect estimates for all FGFP microbial traits at the MCM6 rs4988235 variant. The table includes MT as defined by FGFP, the rsid in FGFP, the SNPID in FGFP, defined as a compound identifier of chromosome:basepair_allele1_allele2, chromosome as reported by FGFP, base position as reported by FGFP (hg19), reference allele in FGFP, effect allele in FGFP, minor allele frequency in FGFP, imputation information score in FGFP, beta or effect estimate for the MT, standard error in FGFP, p-value in FGFP.	Download
Supplementary_Table_10/td>	Results derived from PhenoScanner V2 analysis. The output for the 14 SNP-microbial traits associations that surpass the genome-wide significance threshold or may be considered replicated. The table includes the microbial trait (mt), SNP ID (snpid: abbreviated as the chromosome, base pair (GRCh37), non-effect allele, and effect allele), the dbSNP SNP reference ID (rsid), the trait reported by PhenoScanner to be previously associated with the SNP, the type of trait previously associated (Trait), the pubmed ID (PMID) to the study if available, the effect estimate reported (Beta), the p-value reported (P), the sample size reported (N), the reported units of the effect estimates (Unit), and repeated details if a second trait was also linked to the same SNP.	Download
Supplementary_Table_11	Results derived from PhenoScanner V2 analysis. Including all previously reported GWAS associations for the top 14 variants at a p-value threshold of 0.01 (PhenoScanner_GWAS); all previously reported expression quantitative loci associations, (PhenoScanner_eQTL) and all previously reported methylation quantitative loci associations (PhenoScanner_methQTL).	Download
Supplementary_Table_12	Meta-analysis GENE2FUNC enrichment. All enrichment results derived from GENE2FUNC (http://fuma.ctglab.nl), containing all results “GENE2FUNC_All_Enrich_Result” and only those from the GWAS Catalog and Canonical Pathways “GENE2FUNC_GWAS_&_CanonicalPath” that we used for the study. In addition, the gene expression (GTEx v7) analysis for general tissues (“GTEx_v7_GeneralTissue_DEG”) and specific tissues (“GTEx_v7_SpecificTissue_DEG”) differential expression groups are provided. Table includes the category identifier, the gene set or specific pathways, the number of genes in the gene set, the number of genes associated to meta-supported variants present in the gene set, the p-value, the Benjamin-Hochberg correct p-value, and the HUGO or ENSEMBL ids for all of the meta-supported genes in said gene set.	Download
Supplementary_Table_13	Bi-directional Mendelian randomization results. Mendelian randomization (MR) results for MT of disease phenotypes and disease of MTs are provided. For MT of disease only a single SNP was used as the instrument, so Wald Ratio results are presented for those MT in Table 1 and for Bifidobacterium AB separately. Tables include the exposure (MT) outcome (disease) the number of SNPs used as instruments, the model, and parameter estimates beta (presented as log odd ratios for binary outcomes and standard deviation units of change for continuous), standard error and p-value. Disease-on-MT MR analysis include results for inverse variance weighted, weighted median, weighted mode, and MR Egger methods, as well as additional results tables for Heterogeneity and Pleiotropy sensitivity analysis.	Download
Supplementary_Table_14	SNP rids for all genetic variants used to proxy disease phenotypes in MR analyses.	Download
Supplementary_Table_15	Core microbial taxa in the FGFP dataset, given a prevalence value of 95%, or being present in 95% of all individuals sampled in FGFP.	Download
Supplementary_Table_16	Prevalent, but low abundant taxa. This table contains taxa with prevalence over 50%, but average abundance levels below 40, a value previously shown to indicate poor replicability in rarefaction data.	Download
Supplementary_Table_17	FGFP cohort summary statistics for analyzed taxa. Table includes taxon name (taxa), mean abundance (mean), standard deviation, Shapiro’s W-statistic for the raw, log2, and rank normal data transformed distributions, and accompanying p-values for the Shapiro normality test. In addition, the proportion of individuals with zero counts (equivalent to 1-prevelance) and a flag (0 = no, 1 = yes) declaring if a taxon should be processed through the 2-step hurdle binary (HB) model, including a P/A and zero-truncated abundance trait. Finally, for HB taxon there is also an estimate of the zero-truncated mean abundance and its standard deviation.	Download
Supplementary_Table_18	Cohort microbiome summary statistics. The taxa in FGFP, FOCUS and PopGen are included, with information on their mean abundances, standard deviation, Shapiro’s W-statistic for the raw, log2, and rank normal data transformed distributions, and accompanying p-values for the Shapiro normality test. In addition, the proportion of individuals with zero counts (equivalent to 1-prevelance) and a flag (0 = no, 1 = yes) declaring if a taxon should be processed through the 2-step hurdle binary (HB) model, including a P/A and zero-truncated abundance trait. Finally, for HB taxon there is also an estimate of the zero-truncated mean abundance and its standard deviation.	Download
Supplementary_Table_19	DEPICT gene enrichment results for individuals MTs. This table includes GO, KEGG, Reactome, and Ensembl gene subnetworks. The gene set ID and functional category source along with description of the functional category gene set are included.	Download
Supplementary_Table_20/td>	Tissue enrichment results for individuals MTs. Tissue specific DEPICT enrichment, MeSH refers to Medical Subject Headings.	Download
Supplementary_Table_21	DEPICT gene priority enrichment results for individuals MTs. All variants with a p-value less than 1x10-5 in the initial FGFP GWAS screen are included.	Download
Supplementary_Table_22	Garfield enrichment results for MTs. Tissue specific epigenetic regulatory function enrichment analysis derived from GARFIELD using FGFP variants at an association smaller than p-value of 1x10-5. The table includes MT identifier, odds ratio, p-value, Benjamini-Hochberg correct p-value, along with the specific regulatory annotation, cell type, tissue and epigenetic category of the association.	Download

Supplementary Information is available in the following link.

Download Supplementary Information

Abstract: https://www.nature.com/nmicrobiol/

Full Text: https://www.nature.com/nmicrobiol/

The full analysis pipeline for (i) microbiome processing; (ii) genotype quality control and imputation; (iii) genome-wide association analysis and (iv) phylogenetic analysis, is available here:

https://github.com/kul-fgfpgwas/rep-cookbook

Data can be downloaded from the following link (110 Gb):

https://doi.org/10.5523/bris.22bqn399f9i432q56gt3wfhzlc

The process of enquiry for data access are as follows: Upon data request by email to jeroen.raes@kuleuven.be, the FGFP data access committee will evaluate access permission, which will be granted upon signature of a data use agreement between the governing legal entities.

Genome-wide associations of human gut microbiome variation and implications for causal inference analyses

1 - MT-associated loci and heritability

2 - Genomic variants associated with microbial traits (i)

2 - Genomic variants associated with microbial traits (ii)