How to prioritize disease-gene pairs by calculating the similarity between patient clinical features and all known diseases using Human Phenotype Ontology.

Human Phenotype Ontology (HPO) [https://hpo.jax.org/] is a set of hierarchically structured terms widely used to describe both standard human disease symptoms and clinical phenotypes of individual patients.

HPO is a standard vocabulary for annotating disease phenotypes, and is adopted by many different public disease databases. One example of these valuable resources is Orphanet [https://www.orpha.net/].

Aggregate data constantly updated by Orphanet can be accessed from the Orphadata website [http://www.orphadata.org/cgi-bin/index.php], such as rare disease associated genes and clinical symptoms.
These datasets are available in nine languages.

There are several bioinformatics methods that use Human Phenotype Ontology (HPO) in clinical diagnostics; these methods generally use descriptions of a patient's clinical features encoded with HPO terms, and return a diagnostic prediction based on the ontological similarity between the patient's symptoms and the HPO codes assigned to the diseases.

There is an interesting package for R that can be used, for example, to identify the genes whose alterations are most likely to explain a patient's symptomatology described with the HPO ontology.

The package is called "ontologySimilarity" [https://cran.rstudio.com/web/packages/ontologySimilarity/vignettes/ontologySimilarity-introduction.html] and here we see an example where a pairwise similarity is computed between the HPOs describing a patient's symptoms and the HPOs associated with a set of gene/disease pairs present in Orphadata (in XML format).



library(ontologyIndex)
library(ontologySimilarity)

### url <- "https://raw.githubusercontent.com/obophenotype/human-phenotype-ontology/master/hp.obo"
ontology <- get_ontology("hp.obo")

term_sets <- list(
  'AGRN_98914'=c('HP:0100295', 'HP:0000218', 'HP:0000276', 'HP:0001250', 'HP:0001270'),
  'AGRN_98913'=c('HP:0001324', 'HP:0000218', 'HP:0000496', 'HP:0000508', 'HP:0000597'),  
  'AMPD2_401805'=c('HP:0001257', 'HP:0001276', 'HP:0001347', 'HP:0002194', 'HP:0003202'),
  'ATP6V1B2_3473'=c('HP:0000169', 'HP:0000154', 'HP:0000414', 'HP:0000445', 'HP:0001249'),  
  'proband'=c('HP:0002194', 'HP:0002540', 'HP:0010536', 'HP:0002835', 'HP:0001744')
)

grid <- get_sim_grid(ontology=ontology, term_sets=term_sets)
grid <- subset( grid, select = proband )
grid <- grid[!rownames(grid) %in% "proband", ]
df <- as.data.frame(grid)
rownames(subset(df, df$grid == max(df)))

Cerca nel blog

Bioinformatics recipes

How to prioritize disease-gene pairs by calculating the similarity between patient clinical features and all known diseases using Human Phenotype Ontology.

Commenti

Posta un commento