VarPhen: Web Based Tool for Genotype-phenotype Association

Elsayed Hegazy1, Mahmoud Elhefnawi1, 2

1Nile University, Giza, 12588, Egypt.

2National Research Centre, Cairo, Egypt.

Abstract

Personalized medicine and the highly attention of next generation sequencing increase the demand of turning the genotype data into meaningful phenotype data. VarPhen is a web based tool used to do such thing. It’s written in C# code it’s based on using RefSeq SNPs ID as a genotype to retrieve the relevant phenotype. VarPhen use ClinVar database as the source of clinical information and phenotypes relevant to specific variant.

Introduction

Next generation sequencing workflows and pipelines is now available for analyze all row data from quality control and mapping to variant calling but very few tools deals with vcf file to interpret to generate a meaningful reports with the common and rare diseases. One of the biggest servers regarding this issue is ClinVar (http://www.ncbi.nlm.nih.gov/clinvar/) which provides a freely available archive of the relationships among medically important variants and phenotypes. ClinVar is a huge database for reporting human variation, interpretations of the relationship of that variation to human health and the evidence supporting each interpretation. The database is tightly coupled with dbSNP and dbVar, which maintain information about the location of variation on human assemblies. ClinVar is also based on the phenotypic descriptions maintained in MedGen (http://www.ncbi.nlm.nih.gov/medgen). Each ClinVar record represents the submitter, the variation and the phenotype.

The demand of vcf interpretation to valuable knowledge and phenotype increased day by day with the increasing of personal genome demand day by day.

Here we will develop web based application that able to connect to CinVar and retrieve diseases associated with each variant listed in vcf file or sample.

Availability and implementation

VarPhen available for use on http://www.varphen.com as a web based tool written in ASP.Net with C# code behind using NCBI Database ClinVar API as a source of phenotypes.

Review of literature

Knowledge is more valuable when shared. By contributing these tools to the big spectrum which is the research community and healthcare as industry, we want to increase the quality and accuracy of genetic data analysis and interpretation available to all patients, physicians and researchers.

OpenSNP is a Crowdsourced Web Resource for Personal Genomics. It’s based on collecting users or patient’s vcf files from different sources like 23andme and decodeme plus the normal vcf file then detect variants and all relevant phenotypes.

CLINVITAE is a clinically observation database uses the genetic variants aggregated from public sources. It is operated and made freely available by INVITAE which is a service like ClinVar.

To make CLINVITAE as informative as possible, CLINVITAE aggregate the data from multiple public databases. CLINVITAE long term goal is to facilitate the search for clinically interpreted variants by creating a single unified resource for all interpretation results. CLINVITAE want physicians and researchers to save their time when comparing variants across multiple platforms and resources, and fully utilize the available data.

GWAS Central or the Human Genome Variation database of Genotype-to-Phenotype information which is a database of summary level findings from genetic association studies, both large and small. GWAS actively gather datasets from public domain projects, and encourage direct data submission from the community improving the quality and accuracy of interpretation.

Genome-wide association studies (GWAS) have been successful at identifying some of the variation in traits attributable to genetics. The National Human Genome Research Institute (NHGRI) has begun aggregating results of association studies into a master GWAS catalog.

Also, INTERPRETOME is a freely available and secure personal genome interpretation engine analyze vcf file into valuable knowledge for diseases from GWAS.

Another database from the big company QIAGEN is The Human Gene Mutation Database represents a good trial to collect the known published gene lesions responsible for human inherited disease.

Also, The Diagnostic Mutation Database (DMuDB) is a secure repository of clinical quality variant data collected from diagnostic genetics laboratories. Access to DMuDB is available by annual laboratory subscription, and must be for diagnostic purposes only.

Many of databases and tools do such analysis or job but very few tools and databases are freely available or accessible by programming inside your application.

Aim

Developing web based application for transforming variants from vcf into knowledge by identifying which variant pathogenic and what is the associated diseases with that variant.

Methods

Technically this web based tool developed straight forward by using one of the most powerful web technologies which is ASP.Net web forms with C Sharp back end code. User asked to upload vcf file then file processing done by manipulating file to discard vcf header and start reading vcf data after the header. After that VarPhen only read the third column which represent the RefSeq of the variant as ClinVar input. Also VarPhen detect if the SNP is novel or not this improve the tool performance because if VarPhen found a novel variant so there no web request created to ClinVar but if the variant not novel so VarPhen create a web request to ClinVar asking for full listed information associated with this variant. After the web request processed ClinVar web response retrieved by VarPhen as XML file. VarPhen start to parse and analyze the resulted XML file. Manipulation of XML file target is extracting the phenotypes which associated with the variant of interest. VarPhen uses C# data structure Queue which help also to improve the performance of the VarPhen web requests to ClinVar API because it’s based on the concept of First in first served first out. All phenotypes saved in another C# data structure which is the List data structure as a series of strings. After that it’s the turn of the presentation layer by populating ASP.Net grid view by the list of phenotypes as a table in the user interface.

Figure 1- VarPhen flowchart

Figure 1 shows the flowchart of VarPhen tool as it indicated its start with vcf file the check if variant is novel or not. No web request created in case of variant novel. API calling starts only with known variants to retrieve the XML file containing the phenotype data.

Conclusion

VarPhen is one of the easiest ways to know what phenotypes associated with a specific vcf file is. VarPhen target users with no programming experience. No registration required to use it. So simple by its user friendly interface.