Question 1
A caveat of phylogenetic footprinting is to extract noncoding sequences Upstream of corresponding genes and focus the comparison to this region only, which helps to prevent false positives.
A. True
B. False
View Answer
Answer: Option A
Explanation:
The predictive value of this method also depends on the quality of the subsequent sequence alignments. Advanced alignment programs can be used. Even more sophisticated expectation maximization (EM) and Gibbs sampling algorithms can be used in detecting weakly conserved motifs.
Question 2
Ab initio type of algorithm predicts prokaryotic and eukaryotic promoters and regulatory elements based on characteristic sequences patterns for promoters and regulatory elements.
A. True
B. False
View Answer
Answer: Option A
Explanation:
Some ab initio programs are signal based, relying on characteristic promoter sequences such as the TATA box. Other programs rely on content information such as hexamer frequencies.
Question 3
CpGProD is a web-based program that predicts promoters containing a high density of CpG islands _______
A. in archea genomic sequences
B. in mammalian genomic sequences
C. in eukaryotic and bacterial genomic sequences
D. only in bacterial genomic sequences
View Answer
Answer: Option B
Explanation:
It calculates moving averages of GC% and CpG ratios (observed/expected) over a window of a certain size (usually 200 bp). When the values are above a certain threshold, the region is identified as a CpG island.
Question 4
In BPROM, once the operons are assigned, the program is able to predict putative promoter sequences.
A. True
B. False
View Answer
Answer: Option A
Explanation:
The most bacterial promoters are located within 200 bp of the protein coding region. Hence, the program is most effectively used when about 200 bp of upstream sequence of the first gene of an operon is supplied as input to increase specificity.
Question 5
In CONPRO, for each program, the highest score prediction is taken as the promoter in the region.
A. True
B. False
View Answer
Answer: Option A
Explanation:
If three predictions fall within a 100-bp region, this is considered a consensus prediction If no three-way consensus is achieved, TSSG and PromFD predictions are taken. Because no coding sequence is used in prediction, specificity is improved relative to each individual program.
Question 6
INCLUSive is a suite of web based tools designed to streamline the process of microarray data collection and sequence motif detection.
A. True
B. False
View Answer
Answer: Option A
Explanation:
The pipeline processes microarray data, automatically clusters genes according expression patterns, retrieves upstream sequences of coregulated genes and detects motifs using a Gibbs sampling approach (Motif Sampler). To further avoid the problem of getting stuck in a local optimum, each sequence dataset is submitted to Motif Sampler ten times. The results may vary in each run. The results from the ten runs are compiled to derive consensus motifs.
Question 7
McPromoter, a web-based program, uses a neural network to make promoter predictions.
A. True
B. False
View Answer
Answer: Option A
Explanation:
It has a unique promoter model containing six scoring segments. The program scans a window of 300 bases for the likelihoods of being in each of the coding, noncoding, and promoter regions.
Question 8
Once an operon structure is known ______ for the presence of a promoter and regulatory elements _____ in the operon do not possess such DNA elements.
A. only the first gene is predicted, whereas other genes
B. only the first hundred genes are predicted, whereas next few genes
C. only first two genes are predicted, whereas next few genes
D. only first ten genes are predicted, whereas next few genes
View Answer
Answer: Option A
Explanation:
Only the first gene is predicted for the presence of a promoter and regulatory elements, whereas other genes in the operon do not possess such DNA elements. There are a number of methods available for prokaryotic operon prediction. The most accurate is a set of simple rules developed.
Question 9
Operon prediction is less important in prokaryotic promoter prediction.
A. True
B. False
View Answer
Answer: Option B
Explanation:
One of the unique aspects in prokaryotic promoter prediction is the determination of operon structures because genes within an operon share a common promoter located upstream of the first gene of the operon. Hence, operon prediction is the key in prokaryotic promoter prediction.
Question 10
rVISTA uses two orthologous sequences as input and first identifies all putative regulatory motifs based on TRANSFAC matches.
A. True
B. False
View Answer
Answer: Option A
Explanation:
rVISTA is a cross-species comparison tool for promoter recognition. It aligns the two sequences using a local alignment strategy. The motifs that have the highest percent identity in the pairwise comparison are presented graphically as regulatory elements.
Question 11
The advantage of the ab initio method is that the sequence can be applied as such without having to obtain experimental information.
A. True
B. False
View Answer
Answer: Option A
Explanation:
The limitation is the need for training, which makes the prediction programs species specific. In addition, this type of method has a difficulty in discovering new, unknown motifs.
Question 12
The eukaryotic transcription initiation is less dependent on transcription factors.
A. True
B. False
View Answer
Answer: Option B
Explanation:
The eukaryotic transcription initiation requires cooperation of a large number of transcription factors. Co-operativity means that the promoter regions tend to contain a high density of protein-binding sites. Thus, finding a cluster of transcription factor binding sites often enhances the probability of individual binding site prediction.
Question 13
The input for the neural network includes parameters for sequence physical properties, such as ______
A. DNA bendability
B. Signals such as the TATA box
C. Signals such as initiator box
D. Signals such as CpAA islands
View Answer
Answer: Option D
Explanation:
As seen, the correct answer is CpG in option d. The hidden layer combines all the features to derive an overall likelihood for a site being a promoter. Another unique feature is that McPromoter does not require that certain patterns must be present, but instead the combination of all features is important. For instance, even if the TATA box score is very low, a promoter prediction can still be made if the other features score highly. The program is currently trained for Drosophila and human sequences.
Question 14
To increase the specificity of prediction, a unique feature of eukaryotic promoter is employed, which is the presence of CpG islands.
A. True
B. False
View Answer
Answer: Option A
Explanation:
It is known that many vertebrate genes are characterized by a high density of CG dinucleotides near the promoter region overlapping the transcription start site. By identifying the CpG islands, promoters can be traced on the immediate upstream region from the islands. By combining CpG islands and other promoter signals, the accuracy of prediction can be improved. Several programs have been developed based on the combined features to predict the transcription start sites in particular.
Question 15
TSSW is a web program that distinguishes promoter sequences from non-promoter sequences based on a combination of unique content information such as hexamer/trimer frequencies and signal information such the TATA box in the promoter region.
A. True
B. False
View Answer
Answer: Option A
Explanation:
As mentioned here, TSSW uses unique content information such as hexamer/trimer frequencies and signal information such the TATA box in the promoter region. The values are fed to a linear discriminant function to separate true motifs from background noise.
Question 16
Which of the following is correct regarding the method for prokaryotic operon prediction?
A. It relies on two kinds of information: gene orientation and intergenic distances of a pair of genes of interest and conserved linkage of the genes based on comparative genomic analysis
B. It relies only on the gene orientation and intergenic distances of a pair of genes of interest
C. It relies only on the conserved linkage of the genes based on comparative genomic analysis
D. The prediction cannot be done manually using the rules
View Answer
Answer: Option A
Explanation:
A scoring scheme is developed to assign operons with different levels of Confidence. This method is claimed to produce accurate identification of an operon structure, which in turn facilitates the promoter prediction. The prediction can be done manually using the rules. The few dedicated programs for prokaryotic promoter prediction do not apply the rule for historical reasons. The most frequently used program is BPROM.
Question 17
Which of the following is incorrect regarding Cluster-Buster?
A. It is an HMM-based web-based program
B. A query sequence is scanned with a window size of 1 kb for putative regulatory motifs using motif HMMs
C. It works by detecting a region of high concentration of unknown transcription factor binding sites and regulatory motifs at the initiation
D. It is designed to find clusters of regulatory binding sites
View Answer
Answer: Option C
Explanation:
It works by detecting a region of high concentration of known transcription factor binding sites and regulatory motifs. If multiple motifs are detected within a window, a positive score is assigned to each motif found. The total score of the window is the sum of each motif score subtracting a gap penalty, which is proportional to the distances between motifs. If the score of a certain region is above a certain threshold, it is predicted to contain a regulatory cluster.
Question 18
Which of the following is incorrect regarding CONPRO?
A. It is a web-based program that uses a consensus method
B. It is used to identify promoter elements for human DNA
C. cDNA does not play a role in prediction
D. The program uses the information to search the human genome database for the position of the gene
View Answer
Answer: Option C
Explanation:
To use the program, a user supplies the transcript sequence of a gene (cDNA). It then uses the GENSCAN program to predict 5’ untranslated exons in the upstream region. Once the 5’-most exon is located, a further upstream region (1.5 kb) is used for promoter prediction, which relies on a combination of five promoter prediction programs, TSSG, TSSW, NNPP, PROSCAN, and PromFD.
Question 19
Which of the following is incorrect regarding BPROM?
A. It is a web-based program for prediction of bacterial promoters
B. It is a web-based program only for prediction of eukarotic promoters
C. It uses a linear discriminant function
D. The linear discriminant function is combined with signal and content information
View Answer
Answer: Option B
Explanation:
The linear discriminant function is combined with signal and content Information such as consensus promoter sequence and oligonucleotide composition of the promoter sites. This program first predicts a given sequence for bacterial operon structures by using an intergenic distance of 100 bp as basis for distinguishing genes to be in an operon.
Question 20
Which of the following is incorrect regarding Eponine?
A. It is a web-based program that predicts transcription start sites
B. It is a web-based program that particularly predicts tranpososons and retropososons
C. The regulatory sites include the TATA box, the CCAAT box, and CpG islands
D. It is based on a series of pre-constructed PSSMs of several regulatory sites
View Answer
Answer: Option B
Explanation:
The query sequence from a mammalian source is scanned through the PSSMs. The sequence stretches with high-score matching to all the PSSMs, as well as matching of the spacing between the elements, are declared transcription start sites. A Bayesian method is also used in decision making.
Question 21
Which of the following is incorrect regarding FindTerm?
A. It is a program for searching bacterial ρ-independent termination signals located at the end of operons
B. It is a program for searching bacterial ρ-dependent termination signals located within the operons
C. The predictions are made based on matching of known profiles of the termination signals combined with energy calculations
D. It is available from the same site as FGENES and BPROM
View Answer
Answer: Option B
Explanation:
The predictions are made based on matching of known profiles of the termination signals combined with energy calculations for the derived RNA secondary structures for the putative hairpin-loop structure. The sequence region that scores best in features and energy terms is chosen as the prediction. The information can sometimes be useful in defining an operon.
Question 22
Which of the following is incorrect regarding First EF?
A. It is a program that predicts promoters for bacterial DNA
B. It is a web-based program that predicts promoters for human DNA
C. It stands for First Exon Finder
D. It integrates gene prediction with promoter prediction
View Answer
Answer: Option A
Explanation:
It uses quadratic discriminant functions (see Chapter 8) to calculate the probabilities of the first exon of a gene and its boundary sites. A segment of DNA (15 kb) upstream of the first exon is subsequently extracted for promoter prediction on the basis of scores for CpG islands.
Question 23
Which of the following is incorrect regarding the Prediction for Eukaryotes?
A. The consensus patterns are only derived from bioinformatics studies
B. The experimentally determined DNA binding sites are compiled into profiles and stored in a database for scanning an unknown sequence to find similar conserved patterns
C. The consensus patterns are derived from experimentally determined DNA binding sites
D. The ab initio method for predicting eukaryotic promoters and regulatory elements relies on searching the input sequences for matching of consensus patterns of known promoters and regulatory elements
View Answer
Answer: Option A
Explanation:
This approach tends to generate very high rate of false positives owing to nonspecific matches with the short sequence patterns. Furthermore, because of the high variability of transcription factor binding sites, the simple sequence matching often misses true promoter sites, creating false negatives.
Question 24
Which of the following is incorrect regarding the ab initio approaches?
A. The conventional approach to detecting a promoter or regulatory site is through matching a consensus sequence pattern represented by regular expressions
B. The conventional approach to detecting a promoter or regulatory site is through matching a position-specific scoring matrix constructed from well-characterized binding sites
C. The consensus sequences or the matrices are relatively short, covering 6 to 10 bases
D. The consensus sequences or the matrices are relatively large, covering 700 to 1000 bases
View Answer
Answer: Option D
Explanation:
To determine whether a query sequence matches a weight matrix, the sequence is scanned through the matrix. Scores of matches and mismatches at all matrix positions are summed up to give a log odds score, which is then evaluated for statistical significance. This simple approach, however, often has difficulty differentiating true promoters from random sequence matches and generates high rates of false positives as a result.
Question 25
Which of the following is untrue about Bayes Aligner?
A. Posterior probability values, which are considered estimates of the true alignment, are calculated for each alignment
B. The method generates a single best alignment
C. It aligns two sequences using a Bayesian algorithm which is a unique sequence alignment method
D. It is a web-based footprinting program
View Answer
Answer: Option B
Explanation:
Instead of returning a single best alignment, the method generates a distribution of a large number of alignments using a full range of scoring matrices and gap penalties. By studying the distribution, the alignment that has the highest likelihood score, which is in the extreme margin of the distribution, is chosen. Based on this unique alignment searching algorithm, weakly conserved motifs can be identified with high probability scores.
Question 26
Which of the following is untrue about ConSite?
A. It is a web server that finds putative promoter elements
B. It includes comparing two orthologous sequences
C. The program does not accept pre-computed alignment
D. The program accepts pre-computed alignment
View Answer
Answer: Option C
Explanation:
The user provides two individual sequences which are aligned by ConSite using a global alignment algorithm. Conserved regions are identified by calculating identity scores, which are then used to compare against a motif database of regulatory sites (TRANSFAC). High-scoring sequence segments upstream of genes are returned as putative regulatory elements.
Question 27
Which of the following is untrue about PhyloCon?
A. It stands for Phylogenetic Consensus
B. It is used to identify regulatory motifs
C. It is a UNIX program that combines phylogenetic footprinting with gene expression profiling analysis
D. No conservation among orthologous genes and conservation among coregulated genes is a disadvantage
View Answer
Answer: Option D
Explanation:
This approach takes advantage of conservation among orthologous genes as well as conservation among coregulated genes. For each individual gene in a set of coregulated genes, multiple sequence homologs are aligned to derive profiles. Based on the gene expression data, profiles between coregulated genes are further compared to identify functionally conserved motifs among evolutionarily conserved motifs.
Question 28
Which of the following is untrue about Expression Profiling–Based Method?
A. Genes with similar expression profiles are considered coexpressed, which can be identified through a clustering approach
B. This approach appears to be less effective for finding transcription factor binding sites
C. An advanced alignment-independent profile construction method such as EM and Gibbs motif sampling is often used in finding the subtle sequence motifs
D. The basis for coexpression is thought to be due to common promoters and regulatory elements
View Answer
Answer: Option B
Explanation:
This approach is essentially experimentally based and appears to be robust for finding transcription factor binding sites. The problem is that the regulatory elements of coexpressed genes are usually short and weak. Their patterns are difficult to discern using simple multiple sequence alignment approaches.
Question 29
Which of the following is untrue about FootPrinter?
A. It is a web-based program for phylogenetic footprinting using multiple input sequences
B. The motifs from organisms spanning over the widest evolutionary distances are identified as promoter or regulatory motifs
C. The program performs multiple alignment of the input sequences to identify conserved motifs
D. The user does not necessarily provides a phylogenetic tree that defines the evolutionary relationship of the input sequences
View Answer
Answer: Option D
Explanation:
The user also needs to provide a phylogenetic tree that defines the evolutionary relationship of the input sequences. One may obtain the tree information from the “Tree of Life” web site, which archives known phylogenetic trees using ribosomal RNAs as gene markers. It identifies unusually well-conserved motifs across a set of orthologous sequences.
Question 30
Which of the following is untrue?
A. MEME is the EM based program only for protein motif discovery
B. AlignACE is a web-based program using the Gibbs sampling algorithm to find common motifs
C. AlignACE is optimized for DNA sequence motif extraction
D. Melina stands for Motif Elucidator In Nucleotide sequence Assembly
View Answer
Answer: Option A
Explanation:
The use of MEME is similar to that for protein sequences and DNA motif finding. AlignACE automatically determines the optimal number and lengths of motifs from the input sequences. Melina is a web-based program that runs four individual motif-finding algorithms – MEME, GIBBS sampling, CONSENSUS, and Core search – simultaneously. The user compares the results to determine the consensus of motifs predicted by all four prediction methods.