Difference between revisions of "BZIP study"

From Crop Genomics Lab.
Jump to: navigation, search
Line 1: Line 1:
 
= Identification =
 
= Identification =
 +
 +
Accession IDs for bZIP transcription factor domain:
 +
 +
# [https://pfam.xfam.org/family/PF00170 Pfam: PF00170]
 +
# [http://smart.embl-heidelberg.de/smart/do_annotation.pl?DOMAIN=BRLZ SMART: SM00338]
 +
# [http://www.ebi.ac.uk/interpro/entry/IPR004827 Interpro: IPR004827]
 +
# [https://www.ncbi.nlm.nih.gov/pubmed/7780801 PubMed (NCBI): PUBMED:7780801]
 +
 +
  
 
== Pfam-based approach ==
 
== Pfam-based approach ==
  
 
=== HMM domain clans ===
 
=== HMM domain clans ===
 +
 +
Protein domain information obtained from [https://pfam.xfam.org/ Pfam database].
  
 
* [https://pfam.xfam.org/family/PF00170 bZIP_1] (PF00170)
 
* [https://pfam.xfam.org/family/PF00170 bZIP_1] (PF00170)
Line 9: Line 20:
 
''Regarded as the main domain'' for bZIP transcription factor. Clearly distinguishes basic region and leucine zipper.
 
''Regarded as the main domain'' for bZIP transcription factor. Clearly distinguishes basic region and leucine zipper.
  
'''Used for further analysis.'''
+
→ '''Used for further analysis.'''
  
 
* [https://pfam.xfam.org/family/PF07716 bZIP_2] (PF07716)
 
* [https://pfam.xfam.org/family/PF07716 bZIP_2] (PF07716)
Line 20: Line 31:
  
 
=== Downloading and generating HMM consensus sequence file ===
 
=== Downloading and generating HMM consensus sequence file ===
 +
 +
WD: 147.46.250.63:/data6/chojam96/bZIP/identification
  
 
* Download Stockholm alignment file from Pfam bZIP_1 > Alignment > Download options > Seed
 
* Download Stockholm alignment file from Pfam bZIP_1 > Alignment > Download options > Seed
Line 25: Line 38:
 
Seed: the curated alignment from which the HMM for the family is built.
 
Seed: the curated alignment from which the HMM for the family is built.
  
* Use '[http://www.ibi.vu.nl/programs/prcwww/hmmbuild.html hmmbuild]' command from HMMER software to convert Stockholm alignment into a profile HMM
+
* Use 'hmmbuild' command from HMMER software to convert Stockholm alignment into a profile HMM
  
 
After ''gunzipping'', use following command.
 
After ''gunzipping'', use following command.
Line 32: Line 45:
  
 
hmmbuild reads a multiple sequence alignment file alignfile , builds a new profile HMM, and saves the HMM in hmmfile.
 
hmmbuild reads a multiple sequence alignment file alignfile , builds a new profile HMM, and saves the HMM in hmmfile.
 +
 +
For following commands for HMMER, refer to [http://eddylab.org/software/hmmer3/3.1b2/Userguide.pdf this page].
  
 
  hmmbuild PF00170.hmm PF00170.seed
 
  hmmbuild PF00170.hmm PF00170.seed
  
*
+
=== hmmscanning the protein sequence ===
 +
 
 +
* Use 'hmmpress' command for preparing HMM database
 +
 
 +
'''hmmpress''' ''[options] hmmfile''
 +
 
 +
This produces four preparation files.
 +
 
 +
* Run 'hmmscan' command against ''Vigna radiata'' pepetide sequence fasta file
 +
 
 +
'''hmmscan''' ''[options] hmmdb seqfile''
 +
hmmscan --tblout PF00170.out PF00170.hmm Vradi_ver6.fa.cds.primary.fasta.pepshorten.fa
 +
 
 +
No primary options used. Peptide file used for hmmscanning had only main transcripts(*.1, n=22427).
 +
 
 +
'''PF00170.out''' (result file of hmmscan) contained 147 nonredundant genes.
 +
 
 +
Out of 147 genes, 144 genes had complete peptide sequence, marked by termination codon sign (*).
 +
 
 +
* Get corresponding protein sequences
 +
 
 +
python getgenes.py PF00170.out Vradi_ver6.fa.cds.primary.fasta.pepshorten.fa '''bZIP.primary.fa'''
 +
 
 +
=== Confirming domain presence ===
 +
 
 +
'''''Trickest part in identification of bZIP genes''''' - the best way is to pick and choose gene by gene, with measures.
 +
 
 +
1. Align peptide sequence file (bZIP.primary.fa, n=147) by 'MUSCLE' to manually see the alignment
 +
 
 +
muscle -in bZIP.primary.fa -out bZIP.primary.aligned.fa
 +
 
 +
Toggled sequence at 50% level to check conserved domains, but no consensus found.
 +
 
 +
2. Use 'interproscan' or 'NCBI CD search' for domain detection

Revision as of 06:37, 20 April 2019

Contents

Identification

Accession IDs for bZIP transcription factor domain:

  1. Pfam: PF00170
  2. SMART: SM00338
  3. Interpro: IPR004827
  4. PubMed (NCBI): PUBMED:7780801


Pfam-based approach

HMM domain clans

Protein domain information obtained from Pfam database.

Regarded as the main domain for bZIP transcription factor. Clearly distinguishes basic region and leucine zipper.

Used for further analysis.

Conserved basic region, weaker leucine zipper.

  • bZIP_Maf (PF03131)

No distinct basic region. No clear leucine zipper interface.

Downloading and generating HMM consensus sequence file

WD: 147.46.250.63:/data6/chojam96/bZIP/identification
  • Download Stockholm alignment file from Pfam bZIP_1 > Alignment > Download options > Seed

Seed: the curated alignment from which the HMM for the family is built.

  • Use 'hmmbuild' command from HMMER software to convert Stockholm alignment into a profile HMM

After gunzipping, use following command.

hmmbuild [options] hmmfile alignfile

hmmbuild reads a multiple sequence alignment file alignfile , builds a new profile HMM, and saves the HMM in hmmfile.

For following commands for HMMER, refer to this page.

hmmbuild PF00170.hmm PF00170.seed

hmmscanning the protein sequence

  • Use 'hmmpress' command for preparing HMM database
hmmpress [options] hmmfile

This produces four preparation files.

  • Run 'hmmscan' command against Vigna radiata pepetide sequence fasta file
hmmscan [options] hmmdb seqfile
hmmscan --tblout PF00170.out PF00170.hmm Vradi_ver6.fa.cds.primary.fasta.pepshorten.fa

No primary options used. Peptide file used for hmmscanning had only main transcripts(*.1, n=22427).

PF00170.out (result file of hmmscan) contained 147 nonredundant genes.

Out of 147 genes, 144 genes had complete peptide sequence, marked by termination codon sign (*).

  • Get corresponding protein sequences
python getgenes.py PF00170.out Vradi_ver6.fa.cds.primary.fasta.pepshorten.fa bZIP.primary.fa

Confirming domain presence

Trickest part in identification of bZIP genes - the best way is to pick and choose gene by gene, with measures.

1. Align peptide sequence file (bZIP.primary.fa, n=147) by 'MUSCLE' to manually see the alignment

muscle -in bZIP.primary.fa -out bZIP.primary.aligned.fa

Toggled sequence at 50% level to check conserved domains, but no consensus found.

2. Use 'interproscan' or 'NCBI CD search' for domain detection