Since the initial annotation of miRNAs from cloned short RNAs by the Ambros, Tuschl, and Bartel groups in 2001, more than a hundred studies have sought to identify additional miRNAs in various species. We report here a meta-analysis of short RNA data from Drosophila melanogaster, aggregating published libraries with 76 data sets that we generated for the modENCODE project. In total, we began with more than 1 billion raw reads from 187 libraries comprising diverse developmental stages, specific tissue- and cell-types, mutant conditions, and/or Argonaute immunoprecipitations. We elucidated several features of known miRNA loci, including multiple phased byproducts of cropping and dicing, abundant alternative 5' termini of certain miRNAs, frequent 3' untemplated additions, and potential editing events. We also identified 49 novel genomic locations of miRNA production, and 61 additional candidate loci with limited evidence for miRNA biogenesis. Although these loci broaden the Drosophila miRNA catalog, this work supports the notion that a restricted set of cellular transcripts is competent to be specifically processed by the Drosha/Dicer-1 pathway. Unexpectedly, we detected miRNA production from coding and untranslated regions of mRNAs and found the phenomenon of miRNA production from the antisense strand of known loci to be common. Altogether, this study lays a comprehensive foundation for the study of miRNA diversity and evolution in a complex animal model.
Drosophila melanogaster is one of the most well studied genetic model organisms; nonetheless, its genome still contains unannotated coding and non-coding genes, transcripts, exons and RNA editing sites. Full discovery and annotation are pre-requisites for understanding how the regulation of transcription, splicing and RNA editing directs the development of this complex organism. Here we used RNA-Seq, tiling microarrays and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events, and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. These data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.
BACKGROUND: High-throughput screening using RNAi is a powerful gene discovery method but is often complicated by false positive and false negative results. Whereas false positive results associated with RNAi reagents has been a matter of extensive study, the issue of false negatives has received less attention. RESULTS: We performed a meta-analysis of several genome-wide, cell-based Drosophila RNAi screens, together with a more focused RNAi screen, and conclude that the rate of false negative results is at least 8%. Further, we demonstrate how knowledge of the cell transcriptome can be used to resolve ambiguous results and how the number of false negative results can be reduced by using multiple, independently-tested RNAi reagents per gene. CONCLUSIONS: RNAi reagents that target the same gene do not always yield consistent results due to false positives and weak or ineffective reagents. False positive results can be partially minimized by filtering with transcriptome data. RNAi libraries with multiple reagents per gene also reduce false positive and false negative outcomes when inconsistent results are disambiguated carefully.
The DNA damage checkpoint, the first pathway known to be activated in response to DNA damage, is a mechanism by which the cell cycle is temporarily arrested to allow DNA repair. The checkpoint pathway transmits signals from the sites of DNA damage to the cell cycle machinery through the evolutionarily conserved ATM (ataxia telangiectasia mutated) and ATR (ATM- and Rad3-related) kinase cascades. We conducted a genome-wide RNAi (RNA interference) screen in Drosophila cells to identify previously unknown genes and pathways required for the G₂-M checkpoint induced by DNA double-strand breaks (DSBs). Our large-scale analysis provided a systems-level view of the G₂-M checkpoint and revealed the coordinated actions of particular classes of proteins, which include those involved in DNA repair, DNA replication, cell cycle control, chromatin regulation, and RNA processing. Further, from the screen and in vivo analysis, we identified previously unrecognized roles of two DNA damage response genes, mus101 and mus312. Our results suggest that the DNA replication preinitiation complex, which includes MUS101, and the MUS312-containing nuclease complexes, which are important for DSB repair, also function in the G₂-M checkpoint. Our results provide insight into the diverse mechanisms that link DNA damage and the checkpoint signaling pathway.
Drosophila melanogaster cell lines are important resources for cell biologists. Here, we catalog the expression of exons, genes, and unannotated transcriptional signals for 25 lines. Unannotated transcription is substantial (typically 19% of euchromatic signal). Conservatively, we identify 1405 novel transcribed regions; 684 of these appear to be new exons of neighboring, often distant, genes. Sixty-four percent of genes are expressed detectably in at least one line, but only 21% are detected in all lines. Each cell line expresses, on average, 5885 genes, including a common set of 3109. Expression levels vary over several orders of magnitude. Major signaling pathways are well represented: most differentiation pathways are "off" and survival/growth pathways "on." Roughly 50% of the genes expressed by each line are not part of the common set, and these show considerable individuality. Thirty-one percent are expressed at a higher level in at least one cell line than in any single developmental stage, suggesting that each line is enriched for genes characteristic of small sets of cells. Most remarkable is that imaginal disc-derived lines can generally be assigned, on the basis of expression, to small territories within developing discs. These mappings reveal unexpected stability of even fine-grained spatial determination. No two cell lines show identical transcription factor expression. We conclude that each line has retained features of an individual founder cell superimposed on a common "cell line" gene expression pattern.
MicroRNAs (miRNAs) regulate numerous biological processes by base-pairing with target messenger RNAs (mRNAs), primarily through sites in 3' untranslated regions (UTRs), to direct the repression of these targets. Although miRNAs have sometimes been observed to target genes through sites in open reading frames (ORFs), large-scale studies have shown such targeting to be generally less effective than 3' UTR targeting. Here, we show that several miRNAs each target significant groups of genes through multiple sites within their coding regions. This ORF targeting, which mediates both predictable and effective repression, arises from highly repeated sequences containing miRNA target sites. We show that such sequence repeats largely arise through evolutionary duplications and occur particularly frequently within families of paralogous C(2)H(2) zinc-finger genes, suggesting the potential for their coordinated regulation. Examples of ORFs targeted by miR-181 include both the well-known tumor suppressor RB1 and RBAK, encoding a C(2)H(2) zinc-finger protein and transcriptional binding partner of RB1. Our results indicate a function for repeat-rich coding sequences in mediating post-transcriptional regulation and reveal circumstances in which miRNA-mediated repression through ORF sites can be reliably predicted.
Polyglutamine (polyQ) diseases are a group of late-onset, progressive neurodegenerative disorders caused by CAG trinucleotide repeat expansion in the coding region of disease genes. The cell nucleus is an important site of pathology in polyQ diseases, and transcriptional dysregulation is one of the pathologic hallmarks observed. In this study, we showed that exportin-1 (Xpo1) regulates the nucleocytoplasmic distribution of expanded polyQ protein. We found that expanded polyQ protein, but not its unexpanded form, possesses nuclear export activity and interacts with Xpo1. Genetic manipulation of Xpo1 expression levels in transgenic Drosophila models of polyQ disease confirmed the specific nuclear export role of Xpo1 on expanded polyQ protein. Upon Xpo1 knockdown, the expanded polyQ protein was retained in the nucleus. The nuclear disease protein enhanced polyQ toxicity by binding to heat shock protein (hsp) gene promoter and abolished hsp gene induction. Further, we uncovered a developmental decline of Xpo1 protein levels in vivo that contributes to the accumulation of expanded polyQ protein in the nucleus of symptomatic polyQ transgenic mice. Taken together, we first showed that Xpo1 is a nuclear export receptor for expanded polyQ domain, and our findings establish a direct link between protein nuclear export and the progressive nature of polyQ neurodegeneration.
Existing transgenic RNAi resources in Drosophila melanogaster based on long double-stranded hairpin RNAs are powerful tools for functional studies, but they are ineffective in gene knockdown during oogenesis, an important model system for the study of many biological questions. We show that shRNAs, modeled on an endogenous microRNA, are extremely effective at silencing gene expression during oogenesis. We also describe our progress toward building a genome-wide shRNA resource.
The adult Drosophila midgut is thought to arise from an endodermal rudiment specified during embryogenesis. Previous studies have reported the presence of individual cells termed adult midgut precursors (AMPs) as well as "midgut islands" or "islets" in embryonic and larval midgut tissue. Yet the precise relationship between progenitor cell populations and the cells of the adult midgut has not been characterized. Using a combination of molecular markers and directed cell lineage tracing, we provide evidence that the adult midgut arises from a molecularly distinct population of single cells present by the embryonic/larval transition. AMPs reside in a distinct basal position in the larval midgut where they remain through all subsequent larval and pupal stages and into adulthood. At least five phases of AMP activity are associated with the stepwise process of midgut formation. Our data shows that during larval stages AMPs give rise to the presumptive adult epithelium; during pupal stages AMPs contribute to the final size, cell number and form. Finally, a genetic screen has led to the identification of the Ecdysone receptor as a regulator of AMP expansion.
BACKGROUND: Mapping of orthologous genes among species serves an important role in functional genomics by allowing researchers to develop hypotheses about gene function in one species based on what is known about the functions of orthologs in other species. Several tools for predicting orthologous gene relationships are available. However, these tools can give different results and identification of predicted orthologs is not always straightforward. RESULTS: We report a simple but effective tool, the Drosophila RNAi Screening Center Integrative Ortholog Prediction Tool (DIOPT; http://www.flyrnai.org/diopt), for rapid identification of orthologs. DIOPT integrates existing approaches, facilitating rapid identification of orthologs among human, mouse, zebrafish, C. elegans, Drosophila, and S. cerevisiae. As compared to individual tools, DIOPT shows increased sensitivity with only a modest decrease in specificity. Moreover, the flexibility built into the DIOPT graphical user interface allows researchers with different goals to appropriately 'cast a wide net' or limit results to highest confidence predictions. DIOPT also displays protein and domain alignments, including percent amino acid identity, for predicted ortholog pairs. This helps users identify the most appropriate matches among multiple possible orthologs. To facilitate using model organisms for functional analysis of human disease-associated genes, we used DIOPT to predict high-confidence orthologs of disease genes in Online Mendelian Inheritance in Man (OMIM) and genes in genome-wide association study (GWAS) data sets. The results are accessible through the DIOPT diseases and traits query tool (DIOPT-DIST; http://www.flyrnai.org/diopt-dist). CONCLUSIONS: DIOPT and DIOPT-DIST are useful resources for researchers working with model organisms, especially those who are interested in exploiting model organisms such as Drosophila to study the functions of human disease genes.
Characterizing the extent and logic of signaling networks is essential to understanding specificity in such physiological and pathophysiological contexts as cell fate decisions and mechanisms of oncogenesis and resistance to chemotherapy. Cell-based RNA interference (RNAi) screens enable the inference of large numbers of genes that regulate signaling pathways, but these screens cannot provide network structure directly. We describe an integrated network around the canonical receptor tyrosine kinase (RTK)-Ras-extracellular signal-regulated kinase (ERK) signaling pathway, generated by combining parallel genome-wide RNAi screens with protein-protein interaction (PPI) mapping by tandem affinity purification-mass spectrometry. We found that only a small fraction of the total number of PPI or RNAi screen hits was isolated under all conditions tested and that most of these represented the known canonical pathway components, suggesting that much of the core canonical ERK pathway is known. Because most of the newly identified regulators are likely cell type- and RTK-specific, our analysis provides a resource for understanding how output through this clinically relevant pathway is regulated in different contexts. We report in vivo roles for several of the previously unknown regulators, including CG10289 and PpV, the Drosophila orthologs of two components of the serine/threonine-protein phosphatase 6 complex; the Drosophila ortholog of TepIV, a glycophosphatidylinositol-linked protein mutated in human cancers; CG6453, a noncatalytic subunit of glucosidase II; and Rtf1, a histone methyltransferase.
To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.
MicroRNAs (miRNAs) are a class of short noncoding RNAs that regulate protein-coding genes posttranscriptionally. In animals, most known miRNA targeting occurs within the 3'UTR of mRNAs, but the extent of biologically relevant targeting in the ORF or 5'UTR of mRNAs remains unknown. Here, we develop an algorithm (MinoTar-miRNA ORF Targets) to identify conserved regulatory motifs within protein-coding regions and use it to estimate the number of preferentially conserved miRNA-target sites in ORFs. We show that, in Drosophila, preferentially conserved miRNA targeting in ORFs is as widespread as it is in 3'UTRs and that, while far less abundant, conserved targets in Drosophila 5'UTRs number in the hundreds. Using our algorithm, we predicted a set of high-confidence ORF targets and selected seven miRNA-target pairs from among these for experimental validation. We observed down-regulation by the miRNA in five out of seven cases, indicating our approach can recover functional sites with high confidence. Additionally, we observed additive targeting by multiple sites within a single ORF. Altogether, our results demonstrate that the scale of biologically important miRNA targeting in ORFs is extensive and that computational tools such as ours can aid in the identification of such targets. Further evidence suggests that our results extend to mammals, but that the extent of ORF and 5'UTR targeting relative to 3'UTR targeting may be greater in Drosophila.
Adult structures in holometabolous insects such as Drosophila are generated by groups of imaginal cells dedicated to the formation of different organs. Imaginal cells are specified in the embryo and remain quiescent until the larval stages, when they proliferate and differentiate to form organs. The Drosophila tracheal system is extensively remodeled during metamorphosis by a small number of airway progenitors. Among these, the spiracular branch tracheoblasts are responsible for the generation of the pupal and adult abdominal airways. To understand the coordination of proliferation and differentiation during organogenesis of tubular organs, we analyzed the remodeling of Drosophila airways during metamorphosis. We show that the embryonic spiracular branch tracheoblasts are multipotent cells that express the homeobox transcription factor Cut, which is necessary for their survival and normal development. They give rise to three distinct cell populations at the end of larval development, which generate the adult tracheal tubes, the spiracle and the epidermis surrounding the spiracle. Our study establishes the series of events that lead to the formation of an adult tubular structure in Drosophila.
Predicting gene functions by integrating large-scale biological data remains a challenge for systems biology. Here we present a resource for Drosophila melanogaster gene function predictions. We trained function-specific classifiers to optimize the influence of different biological datasets for each functional category. Our model predicted GO terms and KEGG pathway memberships for Drosophila melanogaster genes with high accuracy, as affirmed by cross-validation, supporting literature evidence, and large-scale RNAi screens. The resulting resource of prioritized associations between Drosophila genes and their potential functions offers a guide for experimental investigations.
Protein aggregates are a common pathological feature of most neurodegenerative diseases (NDs). Understanding their formation and regulation will help clarify their controversial roles in disease pathogenesis. To date, there have been few systematic studies of aggregates formation in Drosophila, a model organism that has been applied extensively in modeling NDs and screening for toxicity modifiers. We generated transgenic fly lines that express enhanced-GFP-tagged mutant Huntingtin (Htt) fragments with different lengths of polyglutamine (polyQ) tract and showed that these Htt mutants develop protein aggregates in a polyQ-length- and age-dependent manner in Drosophila. To identify central regulators of protein aggregation, we further generated stable Drosophila cell lines expressing these Htt mutants and also established a cell-based quantitative assay that allows automated measurement of aggregates within cells. We then performed a genomewide RNA interference screen for regulators of mutant Htt aggregation and isolated 126 genes involved in diverse cellular processes. Interestingly, although our screen focused only on mutant Htt aggregation, several of the identified candidates were known previously as toxicity modifiers of NDs. Moreover, modulating the in vivo activity of hsp110 (CG6603) or tra1, two hits from the screen, affects neurodegeneration in a dose-dependent manner in a Drosophila model of Huntington's disease. Thus, other aggregates regulators isolated in our screen may identify additional genes involved in the protein-folding pathway and neurotoxicity.
Identification of the signaling pathways that control the proliferation of stem cells (SCs), and whether they act in a cell or non-cell autonomous manner, is key to our understanding of tissue homeostasis and cancer. In the adult Drosophila midgut, the Jun N-Terminal Kinase (JNK) pathway is activated in damaged enterocyte cells (ECs) following injury. This leads to the production of Upd cytokines from ECs, which in turn activate the Janus kinase (JAK)/Signal transducer and activator of transcription (STAT) pathway in Intestinal SCs (ISCs), stimulating their proliferation. In addition, the Hippo pathway has been recently implicated in the regulation of Upd production from the ECs. Here, we show that the Hippo pathway target, Yorkie (Yki), also plays a crucial and cell-autonomous role in ISCs. Activation of Yki in ISCs is sufficient to increase ISC proliferation, a process involving Yki target genes that promote division, survival and the Upd cytokines. We further show that prior to injury, Yki activity is constitutively repressed by the upstream Hippo pathway members Fat and Dachsous (Ds). These findings demonstrate a cell-autonomous role for the Hippo pathway in SCs, and have implications for understanding the role of this pathway in tumorigenesis and cancer stem cells.
Biological networks are highly complex systems, consisting largely of enzymes that act as molecular switches to activate/inhibit downstream targets via post-translational modification. Computational techniques have been developed to perform signaling network inference using some high-throughput data sources, such as those generated from transcriptional and proteomic studies, but comparable methods have not been developed to use high-content morphological data, which are emerging principally from large-scale RNAi screens, to these ends. Here, we describe a systematic computational framework based on a classification model for identifying genetic interactions using high-dimensional single-cell morphological data from genetic screens, apply it to RhoGAP/GTPase regulation in Drosophila, and evaluate its efficacy. Augmented by knowledge of the basic structure of RhoGAP/GTPase signaling, namely, that GAPs act directly upstream of GTPases, we apply our framework for identifying genetic interactions to predict signaling relationships between these proteins. We find that our method makes mediocre predictions using only RhoGAP single-knockdown morphological data, yet achieves vastly improved accuracy by including original data from a double-knockdown RhoGAP genetic screen, which likely reflects the redundant network structure of RhoGAP/GTPase signaling. We consider other possible methods for inference and show that our primary model outperforms the alternatives. This work demonstrates the fundamental fact that high-throughput morphological data can be used in a systematic, successful fashion to identify genetic interactions and, using additional elementary knowledge of network structure, to infer signaling relations.
Akt represents a nodal point between the Insulin receptor and TOR signaling, and its activation by phosphorylation controls cell proliferation, cell size, and metabolism. The activity of Akt must be carefully balanced, as increased Akt signaling is frequently associated with cancer and as insufficient Akt signaling is linked to metabolic disease and diabetes mellitus. Using a genome-wide RNAi screen in Drosophila cells in culture, and in vivo analyses in the third instar wing imaginal disc, we studied the regulatory circuitries that define dAkt activation. We provide evidence that negative feedback regulation of dAkt occurs during normal Drosophila development in vivo. Whereas in cell culture dAkt is regulated by S6 Kinase (S6K)-dependent negative feedback, this feedback inhibition only plays a minor role in vivo. In contrast, dAkt activation under wild-type conditions is defined by feedback inhibition that depends on TOR Complex 1 (TORC1), but is S6K-independent. This feedback inhibition is switched from TORC1 to S6K only in the context of enhanced TORC1 activity, as triggered by mutations in tsc2. These results illustrate how the Akt-TOR pathway dynamically adapts the routing of negative feedback in response to the activity load of its signaling circuit in vivo.
The progressive loss of muscle strength during aging is a common degenerative event of unclear pathogenesis. Although muscle functional decline precedes age-related changes in other tissues, its contribution to systemic aging is unknown. Here, we show that muscle aging is characterized in Drosophila by the progressive accumulation of protein aggregates that associate with impaired muscle function. The transcription factor FOXO and its target 4E-BP remove damaged proteins at least in part via the autophagy/lysosome system, whereas foxo mutants have dysfunctional proteostasis. Both FOXO and 4E-BP delay muscle functional decay and extend life span. Moreover, FOXO/4E-BP signaling in muscles decreases feeding behavior and the release of insulin from producing cells, which in turn delays the age-related accumulation of protein aggregates in other tissues. These findings reveal an organism-wide regulation of proteostasis in response to muscle aging and a key role of FOXO/4E-BP signaling in the coordination of organismal and tissue aging.