Why do genes required promoters




















We have developed a custom microarray platform that tiles roughly 35, alternative putative promoters from nearly 7, genes in the human genome. Most intriguingly, we show that the downstream promoter in E2-sensitive multiple promoter genes tends to be very close to the 3'-terminus of the gene, suggesting exotic mechanisms of expression regulation in these genes. The usage of alternative promoters greatly multiplies the transcriptional complexity available within the human genome.

The fact that many of these promoters are incapable of driving the synthesis of a meaningful protein-encoding transcript further complicates the story.

The regulation of human gene expression is known to be an extraordinarily complex process, including transcription, mRNA processing, mRNA transport, mRNA stability, mRNA translation, protein modification and protein stability.

Nevertheless, the picture that has emerged over the past two to three decades is one in which the process of transcription itself is a highly regulated process [ 1 ], and one could easily believe that the combinatorial interaction of multiple transcription factors within the gene promoter is sufficient to explain this complexity. However, genes with more than one promoter have been known for some time [ 2 ], and recent studies using independent lines of evidence have suggested that a large proportion of the human genome is transcribed from both strands [ 3 ] and numerous human genes have more than one promoter allowing gene transcription in different cellular conditions [ 4 — 7 ].

As summarized in Figure 1A , alternative promoters can take many different forms, producing a wide variety of transcripts and proteins from a single gene locus. Moreover, the use of alternative transcription initiation sites also affects the splicing pattern of downstream exons, creating a variety of different transcripts and protein products [ 8 ].

It is needless to say that these various promoters greatly increase the regulatory control that the cell has over the expression of the gene.

Alternative promoters can take on several forms A : Two promoters on a single exon top ; alternative first exons middle ; a downstream promoter is located within the intron region of another isoform bottom. The median number of promoters per gene on our microarray is three B. There are a significant number of single-promoter genes on the array, but these are invariably share a bidirectional promoter with multi-promoter genes.

Alternative promoters are of particular interest because their aberrant expression has been linked to a number of diseases, particularly cancer. CYP19A1 is a well-characterized example that has five known alternative promoters, many of which are separated by more than 10 kb and are therefore regulated by completely non-overlapping promoters [ 19 ]. Alternative first exons Ex Additionally, in gonads, the transcription starts just 39 bp upstream of translation initiation codon in exon The use of alternative non-coding first exons in the CYP19A1 transcripts does not alter the protein sequence, as the different 5'UTRs splice into a common second exon exon-2 that contains the translation initiation codon.

It is known that theses various promoters are used in a tissue-dependent manner [ 19 ], but the promoter upstream of exon Ex Many putative gene promoters have been identified either through mapping of expressed sequence tags ESTs to the genome Acembly [ 20 ], ECGene [ 21 ] , through sequence conservation studies with other organisms [ 22 ] or de novo computational prediction e.

Databases such as MPromDb [ 25 ] and H-DBAS [ 26 ] provide information about well-curated promoters and alternative spliced transcripts identified by aligning completely sequenced and precisely annotated full-length cDNAs [ 4 ]. Recently, intensive efforts have been invested in establishing genome-wide profiling methods to identify the regulatory regions, including alternative transcription start sites and the upstream promoter regions in human and mouse genomes [ 27 ].

Currently, three ways were applied for this purpose. One is based on the decreased nucleosome occupancy and increased sensitivity to DNase of the active promoter regions. The two approaches, called DNase-chip and DNase-array, have been created to detect those transcribed promoters and transcripts [ 28 , 29 ].

The data from these studies provide evidence of large-scale alternative splicing and wide-spread use of alternative promoters throughout the mammalian genomes. Most of these methods cannot predict the mRNA sequence produced from that promoter, and therefore constructing a traditional cDNA microarray to detect their expression is impossible.

Moreover, two promoters may produce mRNA isoforms that are nearly indistinguishable, again making expression microarrays difficult to design. Although there is evidence that the presence of RNA polymerase II in the promoter does not perfectly correlate with active transcription [ 32 ], there does exist a correlation between the two events and therefore RNA polymerase II binding is a good approximation of transcriptional activity [ 33 , 34 ]. Here, we have taken an intermediate approach, where we first annotated all possible putative promoters in the human genome by integrative bioinformatics analyses.

Using these annotations, we designed mer probes complementary to sequences and tiling the core promoter regions both known and putative of a subset of genes that have at least two annotated promoters. It is well known that estrogen receptor can act both as an activator and repressor of specific target genes, and that these events can then affect cell division and breast cancer progression [ 35 , 36 ].

Knowledge about which of the alternative promoters of the ER regulated genes are active and inactive in E2 treated and untreated conditions in MCF7 cells would lead to better understanding of their effects in breast cancer development.

Several novel putative promoters were found to be active before and after E2 treatment. Interestingly, we found that in genes with more than one putative promoter, downstream promoters are much more likely to be affected by E2 treatment than upstream promoters, suggesting interesting mechanisms of gene regulation in multiple promoter genes.

In order to design an alternative promoter array, we first used a bioinformatics approach to annotate all known and putative promoters in the human genome. We took a gene-centric approach to our microarray design, choosing genes that had two or more known or putative promoters. In the end, about 34, known or putative promoters were selected for our array, covering about 7, genes. The median number of promoters per gene is three Figure 1B.

Because of limitations on probe design, not all regions could be effectively covered but on average the spacing is approximately 80 bases from the end of one probe to the beginning of the next. The amplified immunoprecipitated DNA and input control, after labeling with Cy5 and Cy3 fluorescent dyes respectively, was used to probe the alternative promoter microarray Figure 2A.

Each experiment was repeated once to determine the reproducibility of the probe hybridization intensities. After filtering the low quality spots, we performed intensity dependent Lowess normalization. The MA plot for normalized data is shown in Figure 2B for one control before E2 treatment experiment.

We, then, plotted the distribution of the normalized log ratios of red and green intensities. The histogram in Figure 2C presents the log ratios for one control experiment, which shows a clear bi-modal distribution. The distribution with mode close to zero represents the probes that are non-responsive and the distribution with mode close to 2.

The Expectation Maximization EM algorithm of Khalili et al [ 39 ] was modified from the original Gamma-Normal-Gamma fit to a simple Gamma-Normal fit that appeared to be more appropriate for our data. The algorithm clearly defines two distinct distributions in Figure 2C , representing the unbound probes in red and the bound probes in green. See Additional File 1 for the MA plots and log ratio distribution of data from other experiments.

A nice feature of the algorithm is that probes can be assigned to each distribution with a certain probability, allowing us to increase or reduce the stringency of our assignments easily. We defined strong candidates for RNA Polymerase II activity as those probes that fell within the green distribution with a p-value of at most 0. However, we also defined a second, weaker condition: those probes that are not significantly part of the larger unbound red distribution at a p-value of 0. This latter group would encompass the "grey area" that lies between the two distributions.

The "best" probe from each promoter was used to evaluate the activity of the promoter as a whole. Figure 2D shows the proportion of active promoters in MCF7 cells at different quality thresholds. This is roughly in accordance with previous genome-wide studies of promoter activity. ChIP-chip procedure A. The grey areas in between are ambiguous.

Our model allows us to annotate promoters as being active or inactive at different confidence levels D. We validated a total of 18 promoters, 10 promoters that we predicted to be active with high confidence and 8 promoters that were predicted to be inactive in MCF7 cells.

Although the binding of RNA polymerase II to the promoter region needn't correlate to gene expression because of posttranscriptional events, we find that a rough correspondence does exist.

By comparing these results to the gene EIF3S9 , whose most upstream promoter was "highly on" in both treatments Figure 5A , we found that the qRT-PCR experiments showed a correspondingly high level of expression of the corresponding gene isoform Figure 5B.

Similarly, all but one of the promoters called as negative showed no evidence for RNA polymerase II binding red bars. Error bars indicate standard errors from the mean, based on three replicates. Exon 3 is spliced out of the transcript initiated at Exon 1, but Exon 4 is common to both transcripts. The ChIP-chip microarray analysis indicated that the first promoter is inactive in the control experiment, but is activated with E2 treatment at a low level, a result that is verified by qRT-PCR results B.

The second promoter was predicted to be active at a low level with and without E2 treatment, which again was verified C. Error bars indicate standard errors from the mean for three replicates. Shown here are the first three exons of the gene EIF3S9, spanning a region of approximately 3. Wang et al. Similar to the findings of Wang et al. As shown in Table 1 , each promoter on the array is supported by different lines of evidence. The most common promoters are those that are supported by multiple CAGE tags.

Of course, it is important to note that a negative result does not necessarily indicate an inaccurate promoter prediction; these promoters may be active in different cell types, or under different environmental conditions. Therefore, these numbers should be seen as a lower limit. Of the ten genes selected for validation in Figure 4 , eight fall into the novel category i. These surprising results indicate that large numbers of undiscovered, unannotated promoters exist within human genes.

Notably, we have discovered new and active promoters that are situated more than bases upstream of the currently-defined 5' end of the gene, suggesting that a significant fraction of the current gene annotations may not be 5'-complete. These results also strongly support the recent reports of high frequency of alternative promoter in mammalian genomes [ 41 , 42 ].

In addition, the complicated distribution patterns of these alternative promoters might be easily overlooked by previous expression array analyses. Our hypothesis was that treatment with E2 affects the promoter activity of a sub-set of genes in the genome. For this analysis, we defined "active" as promoters with "high", "medium" or "low" confidence.

A better understanding of the differences between transcription initiation from different classes of promoter, and of the basic mechanisms of transcriptional activation, will be required for a complete understanding of the mechanistic basis of core promoter preferences.

The studies cited above provide strong evidence that core promoter diversity is an important contributor to combinatorial regulation. In the future, it will be important to subject this hypothesis to more stringent tests. With respect to the study by Butler and Kadonaga in this issue, it will be important to identify the enhancers that exhibit core promoter preferences, as well as the promoters that may be relevant targets of those enhancers.

Evidence that the endogenous core promoters possess the anticipated structures would provide considerable support for the hypothesis. In addition to confirming the importance of core promoter preferences for combinatorial regulation, it will be important to explore in greater depth the mechanistic basis of these preferences.

In some respects, this goal will be difficult to achieve until current controversies regarding the basic mechanisms of transcriptional activation have been resolved. On the other hand, because the core promoter preferences of transcriptional activators lead to a number of testable predictions, further exploration of the mechanisms underlying these preferences may contribute to the resolution of the controversies.

View all Core promoters: active contributors to combinatorial gene regulation Stephen T. Figure 1. Previous Section Next Section. Previous Section. Breathnach R. Burke T. Cold Spr. CrossRef Medline Google Scholar. Butler J. Chalkley G. EMBO J. Dantonel J. Trends Biochem. Emami K. Ernst P. Garraway I. Green M. Ham J. Medline Google Scholar.

Hansen S. Cell 82 : — Holmes M. Science : — Kaufmann J. Knutson A. Kutach A. Lemon B. Medline Web of Science Google Scholar. Mack D. Nature : — Maldonado E. Cell 99 : — Martinez E. Merika M. Merli C. Metz R. Ohtsuki S. O'Shea-Greenfield A. Siegal M. Smale S. Cell 57 : — Cold Spring Harb. Verrijzer C. Cell 81 : — Wieczorek E. Willy P. Yean D. Zenzie-Gregory B.

Google Scholar. When bound along with the transcription factors, RNA polymerase is phosphorylated. This releases part of the protein from the DNA to activate the transcription initiation complex and places RNA polymerase in the correct orientation to begin transcription; DNA-bending protein brings the enhancer, which can be quite a distance from the gene, in contact with transcription factors and mediator proteins. Transcription factors recognize the promoter. RNA polymerase II then binds and forms the transcription initiation complex.

In addition to the general transcription factors, other transcription factors can bind to the promoter to regulate gene transcription. These transcription factors bind to the promoters of a specific set of genes. They are not general transcription factors that bind to every promoter complex, but are recruited to a specific sequence on the promoter of a specific gene.

There are hundreds of transcription factors in a cell that each bind specifically to a particular DNA sequence motif. When transcription factors bind to the promoter just upstream of the encoded gene, they are referred to as cis-acting elements because they are on the same chromosome, just next to the gene. The region that a particular transcription factor binds to is called the transcription factor binding site.

Transcription factors respond to environmental stimuli that cause the proteins to find their binding sites and initiate transcription of the gene that is needed. Enhancers increase the rate of transcription of genes, while repressors decrease the rate of transcription. In some eukaryotic genes, there are regions that help increase or enhance transcription.

These regions, called enhancers, are not necessarily close to the genes they enhance. They can be located upstream of a gene, within the coding region of the gene, downstream of a gene, or may be thousands of nucleotides away.

Enhancer regions are binding sequences, or sites, for transcription factors. This shape change allows the interaction between the activators bound to the enhancers and the transcription factors bound to the promoter region and the RNA polymerase to occur. Whereas DNA is generally depicted as a straight line in two dimensions, it is actually a three-dimensional object. Therefore, a nucleotide sequence thousands of nucleotides away can fold over and interact with a specific promoter.

Enhancers : An enhancer is a DNA sequence that promotes transcription. Each enhancer is made up of short DNA sequences called distal control elements.

Activators bound to the distal control elements interact with mediator proteins and transcription factors. Like prokaryotic cells, eukaryotic cells also have mechanisms to prevent transcription. Transcriptional repressors can bind to promoter or enhancer regions and block transcription. Like the transcriptional activators, repressors respond to external stimuli to prevent the binding of activating transcription factors.

A corepressor is a protein that decreases gene expression by binding to a transcription factor that contains a DNA-binding domain. The corepressor is unable to bind DNA by itself. The corepressor can repress transcriptional initiation by recruiting histone deacetylase, which catalyzes the removal of acetyl groups from lysine residues. This increases the positive charge on histones, which strengthens the interaction between the histones and DNA, making the DNA less accessible to the process of transcription.

Both the packaging of DNA around histone proteins, as well as chemical modifications to the DNA or proteins, can alter gene expression. Discuss how eukaryotic gene regulation occurs at the epigenetic level and the various epigenetic changes that can be made to DNA.

The human genome encodes over 20, genes; each of the 23 pairs of human chromosomes encodes thousands of genes. The DNA in the nucleus is precisely wound, folded, and compacted into chromosomes so that it will fit into the nucleus. It is also organized so that specific segments can be accessed as needed by a specific cell type.

The first level of organization, or packing, is the winding of DNA strands around histone proteins. Histones package and order DNA into structural units called nucleosome complexes, which can control the access of proteins to the DNA regions. Under the electron microscope, this winding of DNA around histone proteins to form nucleosomes looks like small beads on a string. These beads histone proteins can move along the string DNA and change the structure of the molecule.

These nucleosomes control the access of proteins to the underlying DNA. When viewed through an electron microscope b , the nucleosomes look like beads on a string. Nucleosomes can move to open the chromosome structure to expose a segment of DNA, but do so in a very controlled manner.

Nucleosomes can change position to allow transcription of genes : Nucleosomes can slide along DNA. When nucleosomes are spaced closely together top , transcription factors cannot bind and gene expression is turned off.

When the nucleosomes are spaced far apart bottom , the DNA is exposed. Transcription factors can bind, allowing gene expression to occur. Modifications to the histones and DNA affect nucleosome spacing. How the histone proteins move is dependent on signals found on both the histone proteins and on the DNA. These signals are tags, or modifications, added to histone proteins and DNA that tell the histones if a chromosomal region should be open or closed.

These tags are not permanent, but may be added or removed as needed. They are chemical modifications phosphate, methyl, or acetyl groups that are attached to specific amino acids in the protein or to the nucleotides of the DNA. The tags do not alter the DNA base sequence, but they do alter how tightly wound the DNA is around the histone proteins. DNA is a negatively-charged molecule; therefore, changes in the charge of the histone will change how tightly wound the DNA molecule will be.

When unmodified, the histone proteins have a large positive charge; by adding chemical modifications, such as acetyl groups, the charge becomes less positive. Modifications affect nucleosome spacing and gene expression. The DNA molecule itself can also be modified. This occurs within very specific regions called CpG islands. These are stretches with a high frequency of cytosine and guanine dinucleotide DNA pairs CG found in the promoter regions of genes.

When this configuration exists, the cytosine member of the pair can be methylated a methyl group is added. This modification changes how the DNA interacts with proteins, including the histone proteins that control access to the region. Highly-methylated hypermethylated DNA regions with deacetylated histones are tightly coiled and transcriptionally inactive. These changes to DNA are inherited from parent to offspring, such that while the DNA sequence is not altered, the pattern of gene expression is passed to the next generation.

This type of gene regulation is called epigenetic regulation. Instead, these changes are temporary although they often persist through multiple rounds of cell division and alter the chromosomal structure open or closed as needed. A gene can be turned on or off depending upon the location and modifications to the histone proteins and DNA.



0コメント

  • 1000 / 1000