GenomicKB

A Knowledge Graph for Human Genome

What's GenomicKB?

Genomic Knowledgebase (GenomicKB) is a graph database which use a knowledge graph to consolidates genomic datasets and annotations from over 30 consortia and portals. In GenomicKB, genomic entities are represented as graph entities (e.g., genes), connections among them as relationships (e.g., eQTLs), and specific genomic features (e.g., the expression level of a gene) as properties.


GenomicKB's advantages over traditional tabular-structured data:
• Emphasizes the relations between genomic entities at multiple resolutions and from multiple tissues and cell types. Entities from each consortium automatically and explicitly cross-link with one another in the knowledge graph without any operations such as table joining and sorting.
• Turns multi-modal data analysis into coding-free and intuitive queries and enables large-scale cross-modality pattern.
• Automatically maintains the data structure and disambiguates genomic concepts with well-defined schema, identity, and ontology.

Rotation compass of GKB database

318,790,570
Nodes

1,131,257,092
Edges

3,902,460,300
Attributes

Example 1

Let's start with an easy question:

which common genomic variants locate in a gene of interest?

sample 1 picture

Example 2

Use eQTLs to match enhancers with genes!

Gene-enhancer pairs are identified when one eQTL locates in an enhancer and correlate with the expression of a nearby gene. (eQTLs: sequence variants that correlates with gene expression)

sample 2 picture

Example 3

Find "structural" loops in K562!

Loops are strong 3D interactions between genomic regions, which might be correlated with CTCF binding and/or transcriptional regulation. We define "structural" loops as loops whose anchors are bound by CTCF. (In contrast, "regulatory" loops are the loops whose anchors are bound by H3K27ac).

sample 3 picture

Example 4

Let's validate whether GWAS SNPs of type II diabetes locate in genes that are activated in pancreas!

sample 4 picture