gregor_linkml

This repository includes the LinkML model and tooling for the GREGoR data model.

URI: https://w3id.org/madanucd/gregor_linkml

Name: gregor_linkml

Classes

Class	Description
Aligned	Table containing aligned information
AlignedAtacShortRead	Table containing aligned_atac_short_read information
AlignedDnaShortRead	Table containing aligned_dna_short_read information
AlignedDnaShortReadSet	Table containing aligned_dna_short_read_set information
AlignedNanopore	Table containing aligned_nanopore information
AlignedNanoporeSet	Table containing aligned_nanopore_set information
AlignedPacBio	Table containing aligned_pac_bio information
AlignedPacBioSet	Table containing aligned_pac_bio_set information
AlignedRnaShortRead	Table containing aligned_rna_short_read information
AlleleSpecificAtacShortRead	Table containing allele_specific_atac_short_read information
Analyte	Table containing analyte information
CalledPeaksAtacShortRead	Table containing called_peaks_atac_short_read information
CalledVariantsDnaShortRead	Table containing called_variants_dna_short_read information
CalledVariantsNanopore	Table containing called_variants_nanopore information
CalledVariantsPacBio	Table containing called_variants_pac_bio information
Experiment	Table containing experiment information
ExperimentAtacShortRead	Table containing experiment_atac_short_read information
ExperimentDnaShortRead	Table containing experiment_dna_short_read information
ExperimentNanopore	Table containing experiment_nanopore information
ExperimentPacBio	Table containing experiment_pac_bio information
ExperimentRnaShortRead	Table containing experiment_rna_short_read information
Family	Table containing family information
GeneticFindings	Table containing genetic_findings information
Participant	Table containing participant information
Phenotype	Table containing phenotype information

Slots

Slot	Description
5prime3prime_bias
additional_details	modifier of a term where the additional details are not supported/available a...
additional_family_members_with_variant	List of related participant IDs carrying the same variant
additional_modifiers
affected_status	Indicate affected status of individual (overall with respect to primary pheno...
age_at_collection	age or participant in years at biosample collection
age_at_enrollment	age in years at which consent was originally obtained
age_at_last_observation	Age at last observation, aka age in years at the last time the center can vou...
aligned_atac_short_read_file	name and path of file with aligned reads
aligned_atac_short_read_id	identifier for aligned ATAC-seq data
aligned_atac_short_read_index_file	name and path of index file corresponding to aligned reads file
aligned_dna_short_read_file	name and path of file with aligned reads
aligned_dna_short_read_id
aligned_dna_short_read_index_file	name and path of index file corresponding to aligned reads file
aligned_dna_short_read_set_id	identifier for experiment set
aligned_file
aligned_id	table_name
aligned_index_file
aligned_nanopore_file	name and path of file with aligned reads
aligned_nanopore_id
aligned_nanopore_index_file	name and path of index file corresponding to aligned reads file
aligned_nanopore_set_id	identifier for experiment set
aligned_pac_bio_file	name and path of file with aligned reads
aligned_pac_bio_id
aligned_pac_bio_index_file	name and path of index file corresponding to aligned reads file
aligned_pac_bio_set_id	identifier for experiment set
aligned_read_length_mean	Mean length of aligned reads
aligned_rna_short_read_file	name and path of file with aligned reads
aligned_rna_short_read_id	identifier for aligned_short_read (primary key)
aligned_rna_short_read_index_file	name and path of index file corresponding to aligned reads file
alignment_log_file	path of (log) file with all parameters for alignment software
alignment_postprocessing	If any post processing was applied
alignment_software	Software including version number
allele_balance_or_heteroplasmy_percentage	Reported allele balance (mosaic) or heteroplasmy percentage (mitochondrial)
alt	Alternate position of the variant
analysis_details	brief description of the analysis pipeline used for producing the asc_file; p...
analyte_id
analyte_processing_details	details about how the analyte or original biosample was extracted or processe...
analyte_type	analyte derived from the primary_biosample
ancestry_detail	Additional specific ancestry description free text beyond what is captured by...
application_kit	Library prep kits for special applications
asc_atac_short_read_id	unique key for table (anvil requirement)
asc_file	name and path of the tsv file with allele-specific chromatin accessibility me...
asc_md5sum	md5 checksum for called_peaks_file
barcode_kit	Barcode kit used
by_strand	run reports separate reads per strand
called_peaks_atac_short_read_id	identifier for called peaks
called_peaks_file	name and path of the bed file with open chromatin peaks after QC filtering
called_variants_dna_file	name and path of the file with variant calls
called_variants_dna_short_read_id	unique key for table (anvil requirement)
called_variants_nanopore_id	unique key for table (anvil requirement)
called_variants_pac_bio_id	unique key for table (anvil requirement)
caller_software	variant calling software used including version number
chemistry_type	chemistry type used for the experiment
chrom	Chromosome of the variant
chrom_end	End position chromosome of SV
ClinGen_allele_ID	ClinGen Allele ID for cross table refrence
condition_id	MONDO/OMIM number for condition used for variant interpretation
condition_inheritance	Description of the expected inheritance of condition used for variant interpr...
consanguinity	Indicate if consanguinity is present or suspected within a family
consanguinity_detail	Free text description of any additional consanguinity details
consent_code	Consent group pertaining to this participant's data
contamination	Contamination level estimate
copy_number	CNV copy number
date_data_generation	Date of data generation (First sequencing date)
estimated_library_size
experiment_atac_short_read_id	identifier for experiment
experiment_dna_short_read_id	identifier for experiment
experiment_id	table_name
experiment_nanopore_id	identifier for experiment
experiment_pac_bio_id	identifier for experiment
experiment_rna_short_read_id	identifier for experiment
experiment_sample_id	identifier used in the data file (e
experiment_type
family_history_detail	Details about family history that do not fit into structured fields
family_id	Identifier for family (primary key)
fragmentation_method	method used for shearing/fragmentation
gene_annotation	annotation file used for alignment
gene_annotation_details
gene_disease_validity	Validity assessment of the gene-disease relationship
gene_known_for_phenotype	Indicate if the gene listed is a candidate or known disease gene
gene_of_interest	HGNC approved symbol of the known or candidate gene(s) that are relevant for ...
genetic_findings_id	Unique ID of this variant in this participant (primary key)
genome_coverage	e
gregor_center	GREGoR Center to which the participant is originally associated
GREGoR_ClinVar_SCV	ClinVar accession number for the variant curation submitted by your center
GREGoR_variant_classification	Clinical significance of variant described to condition listed as determined ...
het_sites_file	VCF file containing prefiltered heterozygous sites used for reference alignme...
het_sites_md5sum	md5 checksum for het_sites_file
hgvs	genomic HGVS description of the variant
hgvsc	HGVS c
hgvsp	HGVS p
hours_since_last_meal	his is relevant when analyzing metabolomics data
id_in_table
includes_CpG_methylation	run reports CpG methylation
includes_kinetics	run reports base kinetics
instrument_ics_version	Version number of PacBio instrument control software
internal_project_id	An identifier used by GREGoR research centers to identify a set of participan...
known_condition_name	Free text of condition name
library_prep_type	type of library prep
library_size	library prep - expected size of library from FemtoPulse
linked_variant	Second variant in recessive cases
linked_variant_phase
mapped_reads_pct	Number between 1 and 100, na
maternal_id	participant_id for mother; 0 if not available
md5sum	md5 checksum for file
mean_coverage	Mean coverage of either the genome or the targeted regions
method_of_discovery	The method/assay(s) used to identify the candidate
methylation_called	Indicates whether 5mC methylation has been called and annotated in the BAM fi...
missing_variant_case	Indication of whether this is known to be a missing variant case, see notes f...
missing_variant_details	For missing variant cases, indicate gene(s) or region of interest and reason ...
movie_length_hours	sequencing - length of sequencing collection, in hrs
movie_name	sequencing - unique name of sequencing collection
notes	Free text field to explain edge cases or discovery updates or list parallel e...
num_aligned_bases	Number of bases in aligned reads
num_aligned_reads	Total aligned reads
num_bases	Number of bases (before/ignoring alignment)
num_reads	Total reads (before/ignoring alignment)
onset_age_range
ontology
partial_contribution_explained	List of specific phenotypes (HPO IDs) explained by the condition associated w...
participant_drugs_intake	The list of drugs patient is on, at the time of sample collection
participant_id
participant_special_diet	If the patient was fasting, when the sample was collected
passage_number	passage_number is relevant for fibroblast cultures and possibly iPSC
paternal_id	participant_id for father; 0 if not available
peak_caller_software	peak calling software used including version number
peak_set_type	peak set type, according to ENCODE descriptors
peaks_md5sum	md5 checksum for called_peaks_file
pedigree_file	name of file (renamed from pedigree_image because it can contain a PED file o...
pedigree_file_detail	Free text description of other family structure/pedigree file caption or lege...
percent_chrX_Y
percent_GC
percent_Globin
percent_mRNA
percent_mtRNA
percent_multimapped	how many reads aligned to multiple places
percent_rRNA
percent_UMI
percent_unaligned	how many reads didn't align
percent_uniquely_aligned	how many reads aligned to just one place
phenotype_contribution	Contribution of variant-linked condition to participant's phenotype
phenotype_description	human-readable 'Phenotypic one-line summary' for why this individual is of in...
phenotype_id	primary key
pmid_id	Case specific PubMed ID if applicable
polymerase_kit	sequencing - part number of polymerase kit used
pos	Start position of the variant
pos_end	End position of SV
presence
primary_biosample	Tissue type of biosample taken from the participant that the analyte was extr...
primary_biosample_details	Free text to capture information not in structured fields
primary_biosample_id	Optional ID for the biosample; allows for linking of multiple analytes extrac...
prior_testing	Text description of any genetic testing for individual conducted prior to enr...
proband_relationship	Text description of individual relationship to proband in family, especially ...
proband_relationship_detail	Other proband relationship not captured in enumeration above
public_database_ID_other	Public database variant/case ID
public_database_other	Public databases that this variant in this participant has been submitted by ...
quality_issues	describe if there are any QC issues that would be important to note
read_error_rate	Mean empirical per-base error rate of aligned reads
read_length	sequenced read length (bp); GREGoR RCs do paired end sequencing, so is the ex...
read_length_mean	Mean length of all reads (before/ignoring alignment)
recontactable	Is the originating GREGoR Center likely able to recontact this participant
ref	Reference allele of the variant
reference_assembly
reference_assembly_details
reference_assembly_uri
reported_ethnicity	Self/submitter-reported ethnicity (OMB categories)
reported_race	Self/submitter-reported race (OMB categories)
RIN	RIN number for quality of sample
sample_transformation_detail	details regarding sample transformation
seq_library_prep_kit_method	Library prep kit used
sequencing_event_details	describe if there are any sequencing-specific issues that would be important ...
sequencing_kit	sequencing - part number of sequencing kit reagents
sequencing_platform	sequencing platform used for the experiment
sex	Biological sex assigned at birth (aligned with All of Us)
sex_concordance	Comparison between reported sex vs genotype sex; Other if ploidy NOT XX or XY...
sex_detail	Optional free-text field to describe known discrepancies between 'sex' value ...
single_or_paired_ends	single or paired end
size_selection_method	library prep - method use for library size selection
smrt_cell_id	sequencing - unique serial number for SMRT Cell
smrt_cell_kit	sequencing - part number of the SMRT Cell
smrtlink_server_version	Version number of PacBio SMRTLink software
solve_status	Indication of whether the submitting RC considers this case 'solved'
sv_type
syndromic	For participants with few HPO terms, this optional field is to provide contex...
table_name
target_insert_size	insert size the protocol targets for DNA fragments
targeted_region_bed_file	name and path of bed file uploaded to workspace
targeted_regions_method	Which capture kit is used
term_id
time_to_freeze	time (in hours) from collection to freezing the sample
tissue_affected_status	If applicable to disease (suspected mosaic), is the tissue from an affected s...
total_reads	total number of reads
transcript	Text description of transcript overlapping the variant
twin_id	participant_id for twins, triplets, etc; 0 if not available
variant_inheritance	Detection of variant in parents
variant_reference_assembly	The genome build for identifying the variant position
variant_type
variant_types	types of variants called
was_barcoded	indicates whether samples were barcoded on this flowcell
within_site_batch_name	batch number for the site, important for future batch correction
zygosity	Zygosity of variant

Enumerations

Enumeration	Description
AdditionalModifiersEnum
AffectedStatusEnum
AnalyteTypeEnum
ApplicationKitEnum
ChemistryTypeEnum
ChromEndEnum
ChromEnum
ConditionInheritanceEnum
ConsanguinityEnum
ConsentCodeEnum
ExperimentTypeEnum
GeneAnnotationDetailsEnum
GeneDiseaseValidityEnum
GeneKnownForPhenotypeEnum
GregorCenterEnum
GregorVariantClassificationEnum
LibraryPrepTypeEnum
LinkedVariantPhaseEnum
MethodOfDiscoveryEnum
MissingVariantCaseEnum
OnsetAgeRangeEnum
OntologyEnum
PeakSetTypeEnum
PhenotypeContributionEnum
PresenceEnum
PrimaryBiosampleEnum
ProbandRelationshipEnum
RecontactableEnum
ReferenceAssemblyEnum
ReportedEthnicityEnum
ReportedRaceEnum
SeqLibraryPrepKitMethodEnum
SequencingPlatformEnum
SexEnum
SingleOrPairedEndsEnum
SolveStatusEnum
SvTypeEnum
SyndromicEnum
TableNameEnum
TissueAffectedStatusEnum
VariantInheritanceEnum
VariantReferenceAssemblyEnum
VariantTypeEnum
VariantTypesEnum
ZygosityEnum

Types

Type	Description
Boolean	A binary (true or false) value
Curie	a compact URI
Date	a date (year, month and day) in an idealized calendar
DateOrDatetime	Either a date or a datetime
Datetime	The combination of a date and time
Decimal	A real number with arbitrary precision that conforms to the xsd:decimal speci...
Double	A real number that conforms to the xsd:double specification
Float	A real number that conforms to the xsd:float specification
Integer	An integer
Jsonpath	A string encoding a JSON Path
Jsonpointer	A string encoding a JSON Pointer
Ncname	Prefix part of CURIE
Nodeidentifier	A URI, CURIE or BNODE that represents a node in a model
Objectidentifier	A URI or CURIE that represents an object in the model
Sparqlpath	A string encoding a SPARQL Property Path
String	A character string
Time	A time object represents a (local) time of day, independent of any particular...
Uri	a complete URI
Uriorcurie	a URI or a CURIE

Subsets

Subset	Description