The current challenge in human molecular genetics is to bridge the gap between genotype and phenotype, particularly in highly heterogeneous monogenic disorders, such as retinitis pigmentosa (RP), as well as in polygenic and complex diseases. It is worth noting that after long years of research from many groups, more than 70 non-syndromic RP genes have been identified so far, but around 30% of the cases remain genetically unassigned. Network analysis from a wealth of heterogeneous data sources is particular useful to unveil functional clues and pinpoint unsuspected relevant genes. We hereby present the methodology used to gather information from several publicly available datasets, generate a computational analysis framework on a core network data structure, and provide a user-friendly web interface (RPGeNet) to explore that network. This tool will aid researchers in human genetics and related fields to understand the molecular role of the reported RP (and the closely related disease, Leber congenital amaurosis) genes as well as provide a rationale to identify novel candidates. The identification of key molecular nodes on this type of interaction networks will be instrumental in optimizing diagnosis and devising future efficient therapeutic approaches.
In order to analyze the transcriptional repertoire of CERKL in human and mouse, both computational and wet-lab experiments were conducted. The genomic sequences were explored to determine the promoters and the corresponding transcription start sites (TSSs), the alternative splicing variants, as well as the different putative translation initiation sites (TISs), which altogether is compatible with a wide display of functional domains and contributes to the final mRNA and protein complexity for this gene.
From human MMP sequences, four S.mediterranea homologs were found on planarian transcriptomes. They were cloned from those sequences to validate experimentally and describe their phenotypes further. PFAM domains related to the Matrix Metalloproteinase proteins were mapped onto the set of homologous sequences for distinct families of Matrix Metalloproteinase genes. Phylogenetic reconstruction over the protein sequence alignment was computed by RAxML. Both data was merged to produce image on the left using iTOL.
Taking advantage of digital gene expression (DGE) sequencing technology we compare all the available transcriptomes for S. mediterranea and improve their annotation. These results are accessible via web for the community of researchers. Using the quantitative nature of DGE, we describe the transcriptional profile of neoblasts and present 42 new neoblast genes, including several cancer-related genes and transcription factors. Furthermore, we describe in detail the Smed-meis-like gene and the three Nuclear Factor Y subunits Smed-nf-YA, Smed-nf-YB-2 and Smed-nf-YC. In conclusion, we found that DGE is a valuable tool in our case for gene discovery, quantification and annotation. The application of DGE in S. mediterranea confirms the planarian stem cells or neoblasts as a complex population of pluripotent and multipotent cells regulated by a mixture of transcription factors and cancer-related genes.
A detailed description of the planarian transcriptome is essential for future investigation into regenerative processes using planarians as a model system. In order to obtain the most representative set of planarian genes expressed under different physiological conditions, total RNA was isolated from a mixture of non-irradiated and irradiated intact and regenerating planarians of species Schmidtea mediterranea . We have performed sequence analyses on the assemblies of reads obtained by 454-sequencing from that pool of transcripts. Among those analyses, functional annotation was useful in order to identify putative homologues of several gene families that may play a key role during regeneration, such as neurotransmitter and hormone receptors, homeobox-containing genes, and genes related to eye function.
The genome sequencing of S. mediterranea and some EST projects generated interesting data to delineate neoblast cells features. There are some molecular aspects not reflected at both, genomic and transcriptomic levels, because little information at protein dynamics level exists. This work attempted to open a new unexplored area in the planarian research field. We developed a proteomics strategy in order to identify and characterize neoblast specific proteins. In this paper we describe the method and discuss the results in comparison with genomic analysis carried out in planaria, as well as with proteomic studies using other stem cell model systems.
Josep F abril was involved, during his PhD thesis at Roderic Guigó's lab, in the human ENCODE Genome Annotation Assessment Project (EGASP). An analysis workshop was held at the Wellcome Trust Conference Center in Hinxton, UK, on May 6-7, 2005. You can get more information about the event and the results of the evaluations from the following links:
We have participated in the evaluation of the submissions by gene prediction groups to the RNA-seq GASP (RGASP). An analysis workshop was held at the Wellcome Trust Conference Center in Hinxton, UK, on November 10-11, 2009. We provide below the related links, including the web page summarizing the whole set of evaluations:
The original document is available at https://compgen.bio.ub.edu/CompGenOld/Research