Difference between revisions of "Methylation study"
KangHeum Cho (Talk | contribs) |
KangHeum Cho (Talk | contribs) |
||
Line 74: | Line 74: | ||
2. Combine all cytosine context files → 'Vr-Met-S(K)_bismark.cov' and set minimum threshold as (met + unmet) >= 3 → 'Vr-Met-S(K)_bismark.min3.cov' | 2. Combine all cytosine context files → 'Vr-Met-S(K)_bismark.cov' and set minimum threshold as (met + unmet) >= 3 → 'Vr-Met-S(K)_bismark.min3.cov' | ||
+ | |||
+ | concatenate and run 'cut_min3.py' | ||
Chr Pos Met_read Unmet_read | Chr Pos Met_read Unmet_read | ||
Line 125: | Line 127: | ||
This produces <Gene ID>_<nowiki><S/K></nowiki>.graph.pdf files. | This produces <Gene ID>_<nowiki><S/K></nowiki>.graph.pdf files. | ||
+ | |||
+ | == Average methylation profiles among high/low synchrony groups == | ||
+ | |||
+ | High syn group: Vr-Met-S Vr-Met-SK10 Vr_Met_SK49 Vr_Met_SK68 Vr-Met-SK186 | ||
+ | Low syn group: Vr-Met-K Vr_Met_SK94 Vr_Met_SK152 Vr-Met-SK156 Vr_Met_SK176 | ||
+ | D: 193:/data2/chojam96/methylation/metilene/metilene_v0.2-7/ | ||
+ | |||
+ | WD: 63:/data6/chojam96/methylation/highlow_methylation | ||
+ | |||
+ | 1. Create 'genelst' containing five genes for methylation profile figures | ||
+ | |||
+ | Vradi01g09470.1 | ||
+ | Vradi03g00420.1 | ||
+ | Vradi11g00660.1 | ||
+ | Vradi01g11940.1 | ||
+ | Vradi09g08580.1 | ||
+ | |||
+ | 2. Get high syn group bismark coverage information from 193 | ||
+ | |||
+ | 15 files: CpG(H)_Vr-Met-S(SK##)_bismark.cov.gz | ||
+ | |||
+ | |||
+ | 3. Combine all cytosine context and group files, set minimum threshold as (met + unmet) >= 3 | ||
+ | |||
+ | run 'cut_min3.py' and concatenate | ||
+ | |||
+ | creates ''''highgroup.min3.cov'''' | ||
+ | |||
+ | |||
+ | 4. Divide into chromosomes and scaffolds | ||
+ | |||
+ | run 'dividescaffolds.py' | ||
+ | |||
+ | |||
+ | 5. '''Run 'getdata.py'''' | ||
+ | |||
+ | '''Coverage(.cov) files of ''all the scaffolds'' should be present in the same working directory!''' | ||
+ | |||
+ | python getdata.py <GFF.onlymRNA> <Gene ID> <nowiki><S/K></nowiki> | ||
+ | python getdata.py ../Vradi_ver6.onlymra.gff Vradi01g09470.1 S | ||
+ | |||
+ | |||
+ | |||
+ | Repeat same procedures with low syn group | ||
+ | |||
+ | creates ''''lowgroup.min3.cov'''' |
Revision as of 02:15, 7 June 2019
Contents |
Tuxedo
WD: 63:/data6/chojam96/methylation/tuxedo
Basic Bowtie2 pipeline
Refer to 2108 Hakyung TUXEDO
HTSeq
For detailed information, refer to HTSeq page.
Basic command:
htseq-count -t exon (-f bam) (--stranded=no) (-q) <sorted.bam> <annotation.gff/gtf> > <htseq.txt>
Default input file format is SAM, so specifying '-f bam' for BAM input file is necessary.
For files not following normal HTSeq pipeline, '--stranded=no' option is also required. Refer to 2108 Hakyung TUXEDO.
However, for V. radiata Illumina-sequenced GFF3 files, ID attributes are mixed up, so additional '--idattr' option should be used!
htseq-count -t exon -f bam --stranded=no -q --idattr transcript_id <sorted.bam> <annotation.gff/gtf> > <htseq.txt>
ID attributes may be:
- transcript_id for GTF format
- Name for GFF format
- etc
edgeR
Refer to 2108 Hakyung TUXEDO
If edgeR not installed properly due to some version issues, follow instructions from edgeR installation.
DEG and methylation
DEG
From HTSeq, for six samples (three for K, three for S), files *_htseq.txt were produced and combined into combined_htseq.txt
edgeR was performed on 'combined_htseq.txt' on R studio and 'combined_htseq.count.edgeR' (n=23106) was produced.
DEG was extracted using:
python finddeg.py combined_htseq.count.edgeR combined_htseq.deg
DEGs from each cytosine contexts were extracted using:
python extractmetccount.py <DEG file> <*_genes> <*_count.deg>
CpG_count.deg (n=21)
CHG_count.deg (n=23)
CHH_count.deg (n=4)
Gene methylation count
WD: 63:/data6/chojam96/methylation/expression_methylation
1. From 193:/data2/chojam96/methylation/metilene/metilene_v0.2-7/Vr-Met-S(K), get bismark coverage files
Six files: 'CpG(H)_Vr-Met-S(K)_bismark.cov'
Chr Pos Pos MetPercentage Met_read Unmet_read Vr01 44 44 100 1 0 Vr01 45 45 100 1 0 Vr01 359 359 100 1 0 Vr01 360 360 100 2 0 Vr01 371 371 100 1 0
2. Combine all cytosine context files → 'Vr-Met-S(K)_bismark.cov' and set minimum threshold as (met + unmet) >= 3 → 'Vr-Met-S(K)_bismark.min3.cov'
concatenate and run 'cut_min3.py'
Chr Pos Met_read Unmet_read Vr01 417 4 0 Vr01 459 2 1 Vr01 467 1 2 Vr01 606 6 0 Vr01 608 3 1
3. Divide into chromosomes and scaffolds → 'Vr#(scaffold)_S(K).min3.cov'
python dividescaffolds.py Vr-Met-S(K)_bismark.min3.cov
4. Run 'getdata.py': takes GFF - only mRNA - file, complete gene ID and line type (S, K)
Coverage(.cov) files of all the scaffolds should be present in the same working directory!
python getdata.py <GFF.onlymRNA> <Gene ID> <S/K> python getdata.py ../Vradi_ver6.onlymrna.gff Vradi01g05390.1 S
This produces <Gene ID>_<S/K> file with following format:
'Vradi01g05390.1_S' Position Number of counts Met/Unmet #strand:- 8291109 4 met 8291109 -0 unmet 8291141 5 met 8291141 -0 unmet 8291162 4 met 8291162 -2 unmet 8291182 12 met 8291182 -0 unmet
5. Edit the file suitable for R graphs
python dataforbar.py <Gene ID>_<S/K> <Gene ID>_<S/K>.bar.tsv python dataforbar.py Vradi0185s00060.1_K Vradi0185s00060.1_K.bar.tsv
Columns:
pos count type
6. Make R script and run Rscript → requires original GFF file
tsv files of all genes should be present in the same working directory!
python makerscript.py <gene list file> <GFF file> <output R script> python makerscript.py CpG_deg.genes ../Vradi_ver6.gff CpG_genes.rscript.R
Rscript CpG_genes.rscript.R
This produces <Gene ID>_<S/K>.graph.pdf files.
Average methylation profiles among high/low synchrony groups
High syn group: Vr-Met-S Vr-Met-SK10 Vr_Met_SK49 Vr_Met_SK68 Vr-Met-SK186 Low syn group: Vr-Met-K Vr_Met_SK94 Vr_Met_SK152 Vr-Met-SK156 Vr_Met_SK176 D: 193:/data2/chojam96/methylation/metilene/metilene_v0.2-7/
WD: 63:/data6/chojam96/methylation/highlow_methylation
1. Create 'genelst' containing five genes for methylation profile figures
Vradi01g09470.1 Vradi03g00420.1 Vradi11g00660.1 Vradi01g11940.1 Vradi09g08580.1
2. Get high syn group bismark coverage information from 193
15 files: CpG(H)_Vr-Met-S(SK##)_bismark.cov.gz
3. Combine all cytosine context and group files, set minimum threshold as (met + unmet) >= 3
run 'cut_min3.py' and concatenate
creates 'highgroup.min3.cov'
4. Divide into chromosomes and scaffolds
run 'dividescaffolds.py'
5. Run 'getdata.py'
Coverage(.cov) files of all the scaffolds should be present in the same working directory!
python getdata.py <GFF.onlymRNA> <Gene ID> <S/K> python getdata.py ../Vradi_ver6.onlymra.gff Vradi01g09470.1 S
Repeat same procedures with low syn group
creates 'lowgroup.min3.cov'