Skip to content

gregor_linkml

This repository includes the LinkML model and tooling for the GREGoR data model.

URI: https://w3id.org/madanucd/gregor_linkml

Name: gregor_linkml

Classes

Class Description
Aligned Table containing aligned information
AlignedAtacShortRead Table containing aligned_atac_short_read information
AlignedDnaShortRead Table containing aligned_dna_short_read information
AlignedDnaShortReadSet Table containing aligned_dna_short_read_set information
AlignedNanopore Table containing aligned_nanopore information
AlignedNanoporeSet Table containing aligned_nanopore_set information
AlignedPacBio Table containing aligned_pac_bio information
AlignedPacBioSet Table containing aligned_pac_bio_set information
AlignedRnaShortRead Table containing aligned_rna_short_read information
AlleleSpecificAtacShortRead Table containing allele_specific_atac_short_read information
Analyte Table containing analyte information
CalledPeaksAtacShortRead Table containing called_peaks_atac_short_read information
CalledVariantsDnaShortRead Table containing called_variants_dna_short_read information
CalledVariantsNanopore Table containing called_variants_nanopore information
CalledVariantsPacBio Table containing called_variants_pac_bio information
Experiment Table containing experiment information
ExperimentAtacShortRead Table containing experiment_atac_short_read information
ExperimentDnaShortRead Table containing experiment_dna_short_read information
ExperimentNanopore Table containing experiment_nanopore information
ExperimentPacBio Table containing experiment_pac_bio information
ExperimentRnaShortRead Table containing experiment_rna_short_read information
Family Table containing family information
GeneticFindings Table containing genetic_findings information
Participant Table containing participant information
Phenotype Table containing phenotype information

Slots

Slot Description
5prime3prime_bias
additional_details modifier of a term where the additional details are not supported/available a...
additional_family_members_with_variant List of related participant IDs carrying the same variant
additional_modifiers
affected_status Indicate affected status of individual (overall with respect to primary pheno...
age_at_collection age or participant in years at biosample collection
age_at_enrollment age in years at which consent was originally obtained
age_at_last_observation Age at last observation, aka age in years at the last time the center can vou...
aligned_atac_short_read_file name and path of file with aligned reads
aligned_atac_short_read_id identifier for aligned ATAC-seq data
aligned_atac_short_read_index_file name and path of index file corresponding to aligned reads file
aligned_dna_short_read_file name and path of file with aligned reads
aligned_dna_short_read_id
aligned_dna_short_read_index_file name and path of index file corresponding to aligned reads file
aligned_dna_short_read_set_id identifier for experiment set
aligned_file
aligned_id table_name
aligned_index_file
aligned_nanopore_file name and path of file with aligned reads
aligned_nanopore_id
aligned_nanopore_index_file name and path of index file corresponding to aligned reads file
aligned_nanopore_set_id identifier for experiment set
aligned_pac_bio_file name and path of file with aligned reads
aligned_pac_bio_id
aligned_pac_bio_index_file name and path of index file corresponding to aligned reads file
aligned_pac_bio_set_id identifier for experiment set
aligned_read_length_mean Mean length of aligned reads
aligned_rna_short_read_file name and path of file with aligned reads
aligned_rna_short_read_id identifier for aligned_short_read (primary key)
aligned_rna_short_read_index_file name and path of index file corresponding to aligned reads file
alignment_log_file path of (log) file with all parameters for alignment software
alignment_postprocessing If any post processing was applied
alignment_software Software including version number
allele_balance_or_heteroplasmy_percentage Reported allele balance (mosaic) or heteroplasmy percentage (mitochondrial)
alt Alternate position of the variant
analysis_details brief description of the analysis pipeline used for producing the asc_file; p...
analyte_id
analyte_processing_details details about how the analyte or original biosample was extracted or processe...
analyte_type analyte derived from the primary_biosample
ancestry_detail Additional specific ancestry description free text beyond what is captured by...
application_kit Library prep kits for special applications
asc_atac_short_read_id unique key for table (anvil requirement)
asc_file name and path of the tsv file with allele-specific chromatin accessibility me...
asc_md5sum md5 checksum for called_peaks_file
barcode_kit Barcode kit used
by_strand run reports separate reads per strand
called_peaks_atac_short_read_id identifier for called peaks
called_peaks_file name and path of the bed file with open chromatin peaks after QC filtering
called_variants_dna_file name and path of the file with variant calls
called_variants_dna_short_read_id unique key for table (anvil requirement)
called_variants_nanopore_id unique key for table (anvil requirement)
called_variants_pac_bio_id unique key for table (anvil requirement)
caller_software variant calling software used including version number
chemistry_type chemistry type used for the experiment
chrom Chromosome of the variant
chrom_end End position chromosome of SV
ClinGen_allele_ID ClinGen Allele ID for cross table refrence
condition_id MONDO/OMIM number for condition used for variant interpretation
condition_inheritance Description of the expected inheritance of condition used for variant interpr...
consanguinity Indicate if consanguinity is present or suspected within a family
consanguinity_detail Free text description of any additional consanguinity details
consent_code Consent group pertaining to this participant's data
contamination Contamination level estimate
copy_number CNV copy number
date_data_generation Date of data generation (First sequencing date)
estimated_library_size
experiment_atac_short_read_id identifier for experiment
experiment_dna_short_read_id identifier for experiment
experiment_id table_name
experiment_nanopore_id identifier for experiment
experiment_pac_bio_id identifier for experiment
experiment_rna_short_read_id identifier for experiment
experiment_sample_id identifier used in the data file (e
experiment_type
family_history_detail Details about family history that do not fit into structured fields
family_id Identifier for family (primary key)
fragmentation_method method used for shearing/fragmentation
gene_annotation annotation file used for alignment
gene_annotation_details
gene_disease_validity Validity assessment of the gene-disease relationship
gene_known_for_phenotype Indicate if the gene listed is a candidate or known disease gene
gene_of_interest HGNC approved symbol of the known or candidate gene(s) that are relevant for ...
genetic_findings_id Unique ID of this variant in this participant (primary key)
genome_coverage e
gregor_center GREGoR Center to which the participant is originally associated
GREGoR_ClinVar_SCV ClinVar accession number for the variant curation submitted by your center
GREGoR_variant_classification Clinical significance of variant described to condition listed as determined ...
het_sites_file VCF file containing prefiltered heterozygous sites used for reference alignme...
het_sites_md5sum md5 checksum for het_sites_file
hgvs genomic HGVS description of the variant
hgvsc HGVS c
hgvsp HGVS p
hours_since_last_meal his is relevant when analyzing metabolomics data
id_in_table
includes_CpG_methylation run reports CpG methylation
includes_kinetics run reports base kinetics
instrument_ics_version Version number of PacBio instrument control software
internal_project_id An identifier used by GREGoR research centers to identify a set of participan...
known_condition_name Free text of condition name
library_prep_type type of library prep
library_size library prep - expected size of library from FemtoPulse
linked_variant Second variant in recessive cases
linked_variant_phase
mapped_reads_pct Number between 1 and 100, na
maternal_id participant_id for mother; 0 if not available
md5sum md5 checksum for file
mean_coverage Mean coverage of either the genome or the targeted regions
method_of_discovery The method/assay(s) used to identify the candidate
methylation_called Indicates whether 5mC methylation has been called and annotated in the BAM fi...
missing_variant_case Indication of whether this is known to be a missing variant case, see notes f...
missing_variant_details For missing variant cases, indicate gene(s) or region of interest and reason ...
movie_length_hours sequencing - length of sequencing collection, in hrs
movie_name sequencing - unique name of sequencing collection
notes Free text field to explain edge cases or discovery updates or list parallel e...
num_aligned_bases Number of bases in aligned reads
num_aligned_reads Total aligned reads
num_bases Number of bases (before/ignoring alignment)
num_reads Total reads (before/ignoring alignment)
onset_age_range
ontology
partial_contribution_explained List of specific phenotypes (HPO IDs) explained by the condition associated w...
participant_drugs_intake The list of drugs patient is on, at the time of sample collection
participant_id
participant_special_diet If the patient was fasting, when the sample was collected
passage_number passage_number is relevant for fibroblast cultures and possibly iPSC
paternal_id participant_id for father; 0 if not available
peak_caller_software peak calling software used including version number
peak_set_type peak set type, according to ENCODE descriptors
peaks_md5sum md5 checksum for called_peaks_file
pedigree_file name of file (renamed from pedigree_image because it can contain a PED file o...
pedigree_file_detail Free text description of other family structure/pedigree file caption or lege...
percent_chrX_Y
percent_GC
percent_Globin
percent_mRNA
percent_mtRNA
percent_multimapped how many reads aligned to multiple places
percent_rRNA
percent_UMI
percent_unaligned how many reads didn't align
percent_uniquely_aligned how many reads aligned to just one place
phenotype_contribution Contribution of variant-linked condition to participant's phenotype
phenotype_description human-readable 'Phenotypic one-line summary' for why this individual is of in...
phenotype_id primary key
pmid_id Case specific PubMed ID if applicable
polymerase_kit sequencing - part number of polymerase kit used
pos Start position of the variant
pos_end End position of SV
presence
primary_biosample Tissue type of biosample taken from the participant that the analyte was extr...
primary_biosample_details Free text to capture information not in structured fields
primary_biosample_id Optional ID for the biosample; allows for linking of multiple analytes extrac...
prior_testing Text description of any genetic testing for individual conducted prior to enr...
proband_relationship Text description of individual relationship to proband in family, especially ...
proband_relationship_detail Other proband relationship not captured in enumeration above
public_database_ID_other Public database variant/case ID
public_database_other Public databases that this variant in this participant has been submitted by ...
quality_issues describe if there are any QC issues that would be important to note
read_error_rate Mean empirical per-base error rate of aligned reads
read_length sequenced read length (bp); GREGoR RCs do paired end sequencing, so is the ex...
read_length_mean Mean length of all reads (before/ignoring alignment)
recontactable Is the originating GREGoR Center likely able to recontact this participant
ref Reference allele of the variant
reference_assembly
reference_assembly_details
reference_assembly_uri
reported_ethnicity Self/submitter-reported ethnicity (OMB categories)
reported_race Self/submitter-reported race (OMB categories)
RIN RIN number for quality of sample
sample_transformation_detail details regarding sample transformation
seq_library_prep_kit_method Library prep kit used
sequencing_event_details describe if there are any sequencing-specific issues that would be important ...
sequencing_kit sequencing - part number of sequencing kit reagents
sequencing_platform sequencing platform used for the experiment
sex Biological sex assigned at birth (aligned with All of Us)
sex_concordance Comparison between reported sex vs genotype sex; Other if ploidy NOT XX or XY...
sex_detail Optional free-text field to describe known discrepancies between 'sex' value ...
single_or_paired_ends single or paired end
size_selection_method library prep - method use for library size selection
smrt_cell_id sequencing - unique serial number for SMRT Cell
smrt_cell_kit sequencing - part number of the SMRT Cell
smrtlink_server_version Version number of PacBio SMRTLink software
solve_status Indication of whether the submitting RC considers this case 'solved'
sv_type
syndromic For participants with few HPO terms, this optional field is to provide contex...
table_name
target_insert_size insert size the protocol targets for DNA fragments
targeted_region_bed_file name and path of bed file uploaded to workspace
targeted_regions_method Which capture kit is used
term_id
time_to_freeze time (in hours) from collection to freezing the sample
tissue_affected_status If applicable to disease (suspected mosaic), is the tissue from an affected s...
total_reads total number of reads
transcript Text description of transcript overlapping the variant
twin_id participant_id for twins, triplets, etc; 0 if not available
variant_inheritance Detection of variant in parents
variant_reference_assembly The genome build for identifying the variant position
variant_type
variant_types types of variants called
was_barcoded indicates whether samples were barcoded on this flowcell
within_site_batch_name batch number for the site, important for future batch correction
zygosity Zygosity of variant

Enumerations

Enumeration Description
AdditionalModifiersEnum
AffectedStatusEnum
AnalyteTypeEnum
ApplicationKitEnum
ChemistryTypeEnum
ChromEndEnum
ChromEnum
ConditionInheritanceEnum
ConsanguinityEnum
ConsentCodeEnum
ExperimentTypeEnum
GeneAnnotationDetailsEnum
GeneDiseaseValidityEnum
GeneKnownForPhenotypeEnum
GregorCenterEnum
GregorVariantClassificationEnum
LibraryPrepTypeEnum
LinkedVariantPhaseEnum
MethodOfDiscoveryEnum
MissingVariantCaseEnum
OnsetAgeRangeEnum
OntologyEnum
PeakSetTypeEnum
PhenotypeContributionEnum
PresenceEnum
PrimaryBiosampleEnum
ProbandRelationshipEnum
RecontactableEnum
ReferenceAssemblyEnum
ReportedEthnicityEnum
ReportedRaceEnum
SeqLibraryPrepKitMethodEnum
SequencingPlatformEnum
SexEnum
SingleOrPairedEndsEnum
SolveStatusEnum
SvTypeEnum
SyndromicEnum
TableNameEnum
TissueAffectedStatusEnum
VariantInheritanceEnum
VariantReferenceAssemblyEnum
VariantTypeEnum
VariantTypesEnum
ZygosityEnum

Types

Type Description
Boolean A binary (true or false) value
Curie a compact URI
Date a date (year, month and day) in an idealized calendar
DateOrDatetime Either a date or a datetime
Datetime The combination of a date and time
Decimal A real number with arbitrary precision that conforms to the xsd:decimal speci...
Double A real number that conforms to the xsd:double specification
Float A real number that conforms to the xsd:float specification
Integer An integer
Jsonpath A string encoding a JSON Path
Jsonpointer A string encoding a JSON Pointer
Ncname Prefix part of CURIE
Nodeidentifier A URI, CURIE or BNODE that represents a node in a model
Objectidentifier A URI or CURIE that represents an object in the model
Sparqlpath A string encoding a SPARQL Property Path
String A character string
Time A time object represents a (local) time of day, independent of any particular...
Uri a complete URI
Uriorcurie a URI or a CURIE

Subsets

Subset Description