BZIP study
Contents |
Identification
Accession IDs for bZIP transcription factor domain:
Pfam-based approach
HMM domain clans
Protein domain information obtained from Pfam database.
- bZIP_1 (PF00170)
Regarded as the main domain for bZIP transcription factor. Clearly distinguishes basic region and leucine zipper.
→ Used for further analysis.
- bZIP_2 (PF07716)
Conserved basic region, weaker leucine zipper.
- bZIP_Maf (PF03131)
No distinct basic region. No clear leucine zipper interface.
Downloading and generating HMM consensus sequence file
WD: 63:/data6/chojam96/bZIP/identification
- Download Stockholm alignment file from Pfam bZIP_1 > Alignment > Download options > Seed
Seed: the curated alignment from which the HMM for the family is built.
- Use 'hmmbuild' command from HMMER software to convert Stockholm alignment into a profile HMM
After gunzipping, use following command.
hmmbuild [options] hmmfile alignfile
hmmbuild reads a multiple sequence alignment file alignfile , builds a new profile HMM, and saves the HMM in hmmfile.
For following commands for HMMER, refer to this page.
hmmbuild PF00170.hmm PF00170.seed
hmmscanning the protein sequence
- Use 'hmmpress' command for preparing HMM database
hmmpress [options] hmmfile
This produces four preparation files.
- Run 'hmmscan' command against Vigna radiata pepetide sequence fasta file
hmmscan [options] hmmdb seqfile hmmscan --tblout PF00170.out PF00170.hmm Vradi_ver6.fa.cds.primary.fasta.pepshorten.fa
No primary options used. Peptide file used for hmmscanning had only main transcripts(*.1, n=22427).
PF00170.out (result file of hmmscan) contained 147 nonredundant genes.
Out of 147 genes, 144 genes had complete peptide sequence, marked by termination codon sign (*).
- Get corresponding protein sequences
python getgenes.py PF00170.out Vradi_ver6.fa.cds.primary.fasta.pepshorten.fa bZIP.primary.fa
Confirming domain presence
Trickest part in identification of bZIP genes - the best way is to pick and choose gene by gene, with measures.
1. Align peptide sequence file (bZIP.primary.fa, n=147) by 'MUSCLE' to manually see the alignment
muscle -in bZIP.primary.fa -out bZIP.primary.aligned.fa
Toggled sequence at 50% level to check conserved domains, but no consensus found.
2. Use 'interproscan' and 'NCBI CD search' for domain detection
- Interproscan
- NCBI CD (conserved domain) search: For one query, use CD-search. For multiple queries, use Batch CD-search.
19.04.21 → search ID: QM3-qcdsearch-1F885D58AB029684