Difference between revisions of "BZIP study"
KangHeum Cho (Talk | contribs) |
KangHeum Cho (Talk | contribs) |
||
Line 1: | Line 1: | ||
= Identification = | = Identification = | ||
+ | |||
+ | Accession IDs for bZIP transcription factor domain: | ||
+ | |||
+ | # [https://pfam.xfam.org/family/PF00170 Pfam: PF00170] | ||
+ | # [http://smart.embl-heidelberg.de/smart/do_annotation.pl?DOMAIN=BRLZ SMART: SM00338] | ||
+ | # [http://www.ebi.ac.uk/interpro/entry/IPR004827 Interpro: IPR004827] | ||
+ | # [https://www.ncbi.nlm.nih.gov/pubmed/7780801 PubMed (NCBI): PUBMED:7780801] | ||
+ | |||
+ | |||
== Pfam-based approach == | == Pfam-based approach == | ||
=== HMM domain clans === | === HMM domain clans === | ||
+ | |||
+ | Protein domain information obtained from [https://pfam.xfam.org/ Pfam database]. | ||
* [https://pfam.xfam.org/family/PF00170 bZIP_1] (PF00170) | * [https://pfam.xfam.org/family/PF00170 bZIP_1] (PF00170) | ||
Line 9: | Line 20: | ||
''Regarded as the main domain'' for bZIP transcription factor. Clearly distinguishes basic region and leucine zipper. | ''Regarded as the main domain'' for bZIP transcription factor. Clearly distinguishes basic region and leucine zipper. | ||
− | '''Used for further analysis.''' | + | → '''Used for further analysis.''' |
* [https://pfam.xfam.org/family/PF07716 bZIP_2] (PF07716) | * [https://pfam.xfam.org/family/PF07716 bZIP_2] (PF07716) | ||
Line 20: | Line 31: | ||
=== Downloading and generating HMM consensus sequence file === | === Downloading and generating HMM consensus sequence file === | ||
+ | |||
+ | WD: 147.46.250.63:/data6/chojam96/bZIP/identification | ||
* Download Stockholm alignment file from Pfam bZIP_1 > Alignment > Download options > Seed | * Download Stockholm alignment file from Pfam bZIP_1 > Alignment > Download options > Seed | ||
Line 25: | Line 38: | ||
Seed: the curated alignment from which the HMM for the family is built. | Seed: the curated alignment from which the HMM for the family is built. | ||
− | * Use ' | + | * Use 'hmmbuild' command from HMMER software to convert Stockholm alignment into a profile HMM |
After ''gunzipping'', use following command. | After ''gunzipping'', use following command. | ||
Line 32: | Line 45: | ||
hmmbuild reads a multiple sequence alignment file alignfile , builds a new profile HMM, and saves the HMM in hmmfile. | hmmbuild reads a multiple sequence alignment file alignfile , builds a new profile HMM, and saves the HMM in hmmfile. | ||
+ | |||
+ | For following commands for HMMER, refer to [http://eddylab.org/software/hmmer3/3.1b2/Userguide.pdf this page]. | ||
hmmbuild PF00170.hmm PF00170.seed | hmmbuild PF00170.hmm PF00170.seed | ||
− | * | + | === hmmscanning the protein sequence === |
+ | |||
+ | * Use 'hmmpress' command for preparing HMM database | ||
+ | |||
+ | '''hmmpress''' ''[options] hmmfile'' | ||
+ | |||
+ | This produces four preparation files. | ||
+ | |||
+ | * Run 'hmmscan' command against ''Vigna radiata'' pepetide sequence fasta file | ||
+ | |||
+ | '''hmmscan''' ''[options] hmmdb seqfile'' | ||
+ | hmmscan --tblout PF00170.out PF00170.hmm Vradi_ver6.fa.cds.primary.fasta.pepshorten.fa | ||
+ | |||
+ | No primary options used. Peptide file used for hmmscanning had only main transcripts(*.1, n=22427). | ||
+ | |||
+ | '''PF00170.out''' (result file of hmmscan) contained 147 nonredundant genes. | ||
+ | |||
+ | Out of 147 genes, 144 genes had complete peptide sequence, marked by termination codon sign (*). | ||
+ | |||
+ | * Get corresponding protein sequences | ||
+ | |||
+ | python getgenes.py PF00170.out Vradi_ver6.fa.cds.primary.fasta.pepshorten.fa '''bZIP.primary.fa''' | ||
+ | |||
+ | === Confirming domain presence === | ||
+ | |||
+ | '''''Trickest part in identification of bZIP genes''''' - the best way is to pick and choose gene by gene, with measures. | ||
+ | |||
+ | 1. Align peptide sequence file (bZIP.primary.fa, n=147) by 'MUSCLE' to manually see the alignment | ||
+ | |||
+ | muscle -in bZIP.primary.fa -out bZIP.primary.aligned.fa | ||
+ | |||
+ | Toggled sequence at 50% level to check conserved domains, but no consensus found. | ||
+ | |||
+ | 2. Use 'interproscan' or 'NCBI CD search' for domain detection |
Revision as of 06:37, 20 April 2019
Contents |
Identification
Accession IDs for bZIP transcription factor domain:
Pfam-based approach
HMM domain clans
Protein domain information obtained from Pfam database.
- bZIP_1 (PF00170)
Regarded as the main domain for bZIP transcription factor. Clearly distinguishes basic region and leucine zipper.
→ Used for further analysis.
- bZIP_2 (PF07716)
Conserved basic region, weaker leucine zipper.
- bZIP_Maf (PF03131)
No distinct basic region. No clear leucine zipper interface.
Downloading and generating HMM consensus sequence file
WD: 147.46.250.63:/data6/chojam96/bZIP/identification
- Download Stockholm alignment file from Pfam bZIP_1 > Alignment > Download options > Seed
Seed: the curated alignment from which the HMM for the family is built.
- Use 'hmmbuild' command from HMMER software to convert Stockholm alignment into a profile HMM
After gunzipping, use following command.
hmmbuild [options] hmmfile alignfile
hmmbuild reads a multiple sequence alignment file alignfile , builds a new profile HMM, and saves the HMM in hmmfile.
For following commands for HMMER, refer to this page.
hmmbuild PF00170.hmm PF00170.seed
hmmscanning the protein sequence
- Use 'hmmpress' command for preparing HMM database
hmmpress [options] hmmfile
This produces four preparation files.
- Run 'hmmscan' command against Vigna radiata pepetide sequence fasta file
hmmscan [options] hmmdb seqfile hmmscan --tblout PF00170.out PF00170.hmm Vradi_ver6.fa.cds.primary.fasta.pepshorten.fa
No primary options used. Peptide file used for hmmscanning had only main transcripts(*.1, n=22427).
PF00170.out (result file of hmmscan) contained 147 nonredundant genes.
Out of 147 genes, 144 genes had complete peptide sequence, marked by termination codon sign (*).
- Get corresponding protein sequences
python getgenes.py PF00170.out Vradi_ver6.fa.cds.primary.fasta.pepshorten.fa bZIP.primary.fa
Confirming domain presence
Trickest part in identification of bZIP genes - the best way is to pick and choose gene by gene, with measures.
1. Align peptide sequence file (bZIP.primary.fa, n=147) by 'MUSCLE' to manually see the alignment
muscle -in bZIP.primary.fa -out bZIP.primary.aligned.fa
Toggled sequence at 50% level to check conserved domains, but no consensus found.
2. Use 'interproscan' or 'NCBI CD search' for domain detection