Difference between revisions of "2017 Taeyoung Lab note"

From Crop Genomics Lab.
Jump to: navigation, search
(2017.1.23)
(Replaced content with "2017 Jan Taeyoung Lab note")
Line 1: Line 1:
== Ongoing ==
+
[[2017 Jan Taeyoung Lab note]]
 
+
1. TBLASTX using Jat Species Transcriptome
+
 
+
== 2017 1.2 ==
+
==== Jatropha transcriptome Trinity assemble ====
+
raw data : 244:/NGS/NGS/JatrophaCurcas/RNA
+
Jatropha species transcriptome assemble : Jct,Jcu, Jin, Jgo, Jci, Jpo, Jmu, Jma, Jac, Rco (listed in 244:/NGS/NGS/JatrophaCurcas/RNA/list). All done
+
Jatropha organ transcriptome assemble : Leaf, Root, Stem, Female flower, Male flower, LG, SG, Y, B. All done
+
 
+
==== Cdhit ====
+
193:/data2/alima90/program/cdhit/cd-hit -i Y.cds.fa -M 10000 -o Y.cds.fa.cdhit -T 5
+
193:/data2/alima90/program/cdhit/cd-hit -i LG.cds.fa -M 10000 -o LG.cds.fa.cdhit -T 5
+
 
+
==== UV GBS mapping (w/ joinmap) ====
+
244:python vcf.parsing.for.mandf.py UV.vcf.SNPonly 3 0.01 except_sample.txt > UV.vcf.SNPonly.except.LowDepthSample.d3.Q30.m0.1.loc
+
loc file is manually edited by excel
+
Genetic map is constructed using Joinmap 4.1
+
 
+
==== KaKs calculation using scripts provided by MCscanX ====
+
 
+
'''KaKs calculation between Jatropha species'''
+
244 :python /alima9002/63_backup/Jatropha/CDS/run.kaks.py
+
 
+
==== Large Insertion Prediction ====
+
===== LIP short primer preparation =====
+
'''Primer info'''
+
>LIP01short_F
+
AACTGAACACAGACAATGAA
+
>LIP01short_R
+
CAATTTATACACCACCTTAC
+
>LIP02short_F
+
CTCTTTGTATTTGGTGACAA
+
>LIP02short_R
+
GTATTAGCAGCTTTTGCTTA
+
>LIP03short_F
+
AATTGTAAGACATATCCCTC
+
>LIP03short_R
+
CTGCCCCACTAATAATTAAT
+
>LIP04short_F
+
TAAAAACAGAACTTGTCCAC
+
>LIP04short_R
+
ATCACAAGACTGAACAAGTA
+
>LIP05short_F
+
ATTGACATAAGGTTGCATAG
+
>LIP05short_R
+
CCTTAGCTCTTTTCTTTTGT
+
>LIP06short_F
+
GAAGGAAGGAAGCAATTATT
+
>LIP06short_R
+
TGACTTACCCTTTTTACCTT
+
>LIP07short_F
+
CACATGTTTGTCACTCTAAT
+
>LIP07short_R
+
GAAGTGAGGCCTAAAATAAA
+
>LIP08short_F
+
GAATGTATTGTCTTTGATCC
+
>LIP08short_R
+
GTTGGATTTTGTTCTTTCCA
+
>LIP09short_F
+
AGAAAAACGTCGATACCAAA
+
>LIP09short_R
+
CGATTTAGTAACCTTAGAAC
+
>LIP10short_F
+
ATCTTCAAAATGTCTCTAGG
+
>LIP10short_R
+
TACAGATATTCTTAGGCAGT
+
>LIP11short_F
+
TGTAACTCTCAATTAAGCAG
+
>LIP11short_R
+
ATCTTTCTGTAAGCACTTAG
+
>LIP12short_F
+
CTAGAACCGATTTGTTCAAA
+
>LIP12short_R
+
GCAGTTGTTTTGGATTAACA
+
>LIP13short_F
+
AAAGAGAAAGCAGAGAAATC
+
>LIP13short_R
+
ATGTATAGATTGGAGGAAAG
+
>LIP14short_F
+
ATTATGGAAAGGAATTGGAG
+
>LIP14short_R
+
CCATGTCTAGTATTTACTCA
+
>LIP15short_F
+
TTAATGACTGATCGTTAGTG
+
>LIP15short_R
+
CGGGAGTTATGAAAAATAGT
+
>LIP24short_F
+
AGTATGGTTTCAACATATGG
+
>LIP24short_R
+
GATATGAAGTTGACATGCTA
+
>LIP16short_F
+
ATTTAAAAGCTCGTAACTCC
+
>LIP16short_R
+
GGATAAGCAATTACAACACA
+
>LIP17short_F
+
CCCAAATTTTTAAATGCACC
+
>LIP17short_R
+
CTCTTGGAACGTGAAAAATT
+
>LIP18short_F
+
TTTTCTAGAAGGATTTGTGC
+
>LIP18short_R
+
CCATGCAAACCCAATTTTAA
+
>LIP19short_F
+
GTAAAACTAAGGTTGAGCTA
+
>LIP19short_R
+
CCACAAGTCACAACAATTTA
+
>LIP20short_F
+
TTATTTGTATGTTGGAGACC
+
>LIP20short_R
+
CATGGTATATAGGTTTAGGT
+
>LIP21short_F
+
CATAGAGAGTTTTGGATTAC
+
>LIP21short_R
+
AAAGAACTGATAGTGTCATG
+
>LIP22short_F
+
ATATGTACATGTATGGTGTG
+
>LIP22short_R
+
CCTAAATCTAGCAGAAGATT
+
>LIP23short_F
+
ATGTATGGAGAAATGGGTTA
+
>LIP23short_R
+
ATATAGAAATGGAGGTTGCT
+
(listed in BACKUP(J:)/박사/Indel Candidate/LIP_short_primer.fa)
+
 
+
Primer dilution
+
 
+
== 2017. 1. 3 ==
+
===AMORE work===
+
 
+
python ~/py/ret_fasta_by_gene_name.py  /alima9002/ref/Athaliana/annotation/Athaliana_167_TAIR10.cds.fa gene_list.txt > gene_list.txt.fa
+
blastp -db /alima9002/ref/Gmax/annotation/Gmax_275_Wm82.a2.v1.protein.fa -query gene_list.txt.pep.fa -evalue 1e-5 -num_alignments 1 -outfmt 6 -num_threads 6 -out gene_list.txt.pep.fablastp.Gm275.1e-5.out6
+
 
+
'''Homolog with Ath'''
+
Glyma.08G014900
+
Glyma.05G208300
+
Glyma.20G001900
+
Glyma.03G176600
+
Glyma.19G177400
+
Glyma.03G262600
+
Glyma.06G202300
+
Glyma.05G021800
+
Glyma.17G077700
+
Glyma.05G022000
+
Glyma.09G234900
+
Glyma.19G025000
+
Glyma.10G224000
+
Glyma.02G081000
+
Glyma.20G167800
+
Glyma.14G072700
+
Glyma.17G252200
+
Glyma.17G050500
+
Glyma.07G038000
+
Glyma.13G109100
+
Glyma.16G007200
+
Glyma.19G105100
+
Glyma.09G283800
+
Glyma.20G172700
+
Glyma.02G076300
+
 
+
'''SNP typing among IT182932,IT1099098,Hwangkeum-Kong'''
+
 
+
1.Read mapping using bwa mem with default options (/home/hayasen/Workspace/Glycine/GlycineMax/ver275/Reads/)
+
 
+
2.mpileup
+
samtools mpileup -f /alima9002/ref/Gmax/assembly/Gmax_275_v2.0.fa -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u -b bam_list | bcftools call -v -m -O v > Variant.vcf
+
 
+
===LIP short primer gradient PCR===
+
1~8 primer is tested with CS-12
+
 
+
Gradiant lower temp is 50 upper temp is 65
+
 
+
Sample is loaded on 1% agarose gel and It was run with 100 V on 1 hour.
+
 
+
{| class="wikitable"
+
|-
+
| 52.7 || 54.1 || 55.5 || 56.8 || 58.2 | 59.5 || 60.9 || 62.3
+
|}
+
 
+
<gallery>
+
File:17010301.jpeg|LIPshort01-04
+
</gallery>
+
1 2
+
 
+
3 4
+
 
+
 
+
LIPshort1 -> Error when it is loaded
+
 
+
LIPshort4 -> 55.5~59.5에서 증폭한 샘플만 로딩
+
 
+
<gallery>
+
File:17010302.jpg|Caption2
+
</gallery>
+
 
+
5 6
+
 
+
7 8
+
 
+
Estimated Tm:55.5~56.8
+
 
+
 
+
== 2017.1.4~2017.1.6 ==
+
 
+
농장 출장
+
 
+
===LIP Gradient PCR===
+
50~65 celsius degree
+
 
+
1% agar 100V 1h
+
 
+
 
+
 
+
<gallery>
+
File:2017010401.jpg|LIP01,09,10,11
+
File:2017010402.jpg|LIP12,13,14,15
+
</gallery>
+
 
+
All good
+
 
+
 
+
<gallery>
+
File:2017010601.jpg|LIP16,17,18,19
+
File:2017010602.jpg|LIP20,21,22,24
+
</gallery>
+
 
+
LIP16's lower band is our target
+
 
+
LIP20 did not show band
+
 
+
== 2017.1.9 ==
+
 
+
===AMORE(GK, IT182932, IT109098)===
+
==== VCF parsing ====
+
python ~/py/Reseq/filter.vcf.by.phred.hetero.depth.py Variant.vcf.SNP 5 > Variant.vcf.SNP.filtered.d5.Q30.homo
+
==== Typing ====
+
cat Variant.vcf.SNP.filtered.d5.Q30.homo.diff | python ~/py/Reseq/[Reseq]SNP_counter.py /alima9002/ref/Gmax/annotation/Gmax_275_Wm82.a2.v1.gene_exons.gff3 /alima9002/ref/Gmax/assembly/Gmax_275_v2.0.fa 30 > Variant.vcf.SNP.filtered.d5.Q30.homo.diff.type
+
 
+
===Jatropha OrthoMcl===
+
Retrieve complete pep only for OrthoMcl
+
 
+
== 2017.1.10 ==
+
===AMORE(GK, IT182932, IT109098)===
+
==== filtering SNPs on homologous ====
+
python ret_homolog_only.py Variant.vcf.SNP.filtered.d5.Q30.homo.diff.type homologs.txt > Variant.vcf.SNP.filtered.d5.Q30.homo.diff.type.homologs.only
+
==== Syn or Nonsyn typing ====
+
python ~/py/Reseq/\[Reseq\]det_syn.py /alima9002/ref/Gmax/annotation/Gmax_275_Wm82.a2.v1.gene_exons.gff3 /alima9002/ref/Gmax/assembly/Gmax_275_v2.0.fa Variant.vcf.SNP.filtered.d5.Q30.homo.diff.type.homologs.only.CDS > Variant.vcf.SNP.filtered.d5.Q30.homo.diff.type.homologs.only.CDS.SynNonsyn
+
===Lactuca Indica Cdhit===
+
  /alima9002/program/cd-hit-v4.6.4-2015-0603/cd-hit -i L.Trinity.fasta -o L.Trinity.fasta.cdhit -T 10 -M 10000
+
 
+
== 2017.1.11==
+
농장 출장
+
 
+
== 2017.1.12==
+
===Amore===
+
====Make SNP tables====
+
====INDEL analysis using snpEff====
+
java -jar /alima9002/program/snpEff/snpEff.jar ann -c /alima9002/program/snpEff/snpEff.config -ud 1000 gmax275 Variant.vcf.INDEL > Variant.vcf.INDEL.snpEff
+
 
+
==== homologs filtering using annotation file ====
+
==== VCF filter by homologs retrieved by ann file ====
+
python ret_homolog_only.py Variant.vcf.SNP.filtered.d5.Q30.homo.diff.type homologs.by.ann.txt > Variant.vcf.SNP.filtered.d5.Q30.homo.diff.type.hom.ann
+
==== Determination Synonymous ====
+
python ~/py/Reseq/\[Reseq\]det_syn.py /alima9002/ref/Gmax/annotation/Gmax_275_Wm82.a2.v1.gene_exons.gff3 /alima9002/ref/Gmax/assembly/Gmax_275_v2.0.fa Variant.vcf.SNP.filtered.d5.Q30.homo.diff.type.hom.ann.CDS > Variant.vcf.SNP.filtered.d5.Q30.homo.diff.type.hom.ann.CDS.SynNonsyn
+
 
+
===Jatropha KaKs===
+
python parsing.all.kaks.py all.kaks > all.kaks.ksonly
+
====Drawing graph using R====
+
require(ggplot2)
+
data<-read.table("all.kaks.ksonly",header=F)
+
colnames(data)<-c("Species","Ks")
+
Ks <- data$Ks
+
Species <- data$Species
+
ggplot(data,aes(Ks,colour=Species))+geom_freqpoly(binwidth=0.01)+scale_x_continuous(limits=c(0,0.8))
+
 
+
==2017.1.13==
+
농장 출장(꼬투리 lwt)
+
 
+
==2017.1.16==
+
===Jatropha Ks value using transcriptome===
+
Jat species which were not clustered by cdhit were used for TBLASTX
+
tblastx -db Jct.cds.fa.complete.fa -query Jgo.cds.fa.complete.fa -evalue 1e-10 -outfmt 6 -num_alignments 5 -out Jct.tblastx.nocdhit.Jgo.1e-10.out6 -num_threads 8
+
 
+
===Amore snpEff===
+
Split result as one by one lines
+
perl /alima9002/program/snpEff/scripts/vcfEffOnePerLine.pl Variant.vcf.INDEL.snpEff
+
 
+
===Amore SNP typing===
+
for check IT182932 mapping depth, SNP typing is performed in not filtered vcf file
+
  python ~/py/Reseq/\[Reseq\]SNP_counter.py Variant.vcf.SNP /alima9002/ref/Gmax/assembly/Gmax_275_v2.0.fa 30 > Variant.vcf.SNP.type
+
 
+
==2017.1.17==
+
===OrthoMcl for Jat Organ===
+
blastp -db goodProteins.fasta -query goodProteins.fasta -outfmt 6 -out goodProteins.fasta.allvall.jat.organ -num_threads 15 -evalue 1e-5 -seg yes -soft_masking true -max_target_seqs 999999999
+
===Drawing Venn diagram using JatSp orthomcl result===
+
D:\Lab work\Jatropha\JatSp_Orthomcl_Venn
+
===Re-SNP typing of amore study===
+
명령어가 잘못된 것을 발견
+
cat Variant.vcf.SNP | python ~/py/Reseq/\[Reseq\]SNP_counter.py /alima9002/ref/Gmax/annotation/Gmax_275_Wm82.a2.v1.gene_exons.gff3 /alima9002/ref/Gmax/assembly/Gmax_275_v2.0.fa 30 > Variant.vcf.SNP.type
+
===snpEff parsing===
+
명령어가 잘못된 것을 발견
+
cat Variant.vcf.INDEL.snpEff | 'perl' /alima9002/program/snpEff/scripts/vcfEffOnePerLine.pl > Variant.vcf.INDEL.snpEff.parsed
+
 
+
===snpEff typing===
+
python Indel.Typing.Using.snpEff.Result.py Variant.vcf.INDEL.snpEff.parsed > Variant.vcf.INDEL.snpEff.parsed.tp
+
 
+
===filtering homologs only on snpEff typing results===
+
python get_INDEL_on_homolog.py Variant.vcf.INDEL.snpEff.parsed.tp homologs.by.ann.txt > Variant.vcf.INDEL.snpEff.parsed.tp.hom.ann
+
less Variant.vcf.INDEL.snpEff.parsed.tp.hom.ann | sort -u > Variant.vcf.INDEL.snpEff.parsed.tp.hom.ann.sorted
+
 
+
===count INDEL type===
+
python count.indel.component.py Variant.vcf.INDEL.snpEff.parsed.tp.hom.ann.sorted
+
 
+
===Discussion===
+
organ은 3반복을 따로 orthomcl에 집어넣고 셋 다 있는 것을 카운트
+
 
+
==2017.1.18==
+
===ret SNP sets(1 missing GT is permitted)===
+
python filter.vcf.by.phred.hetero.depth.py.only.one.missing.permitted.py
+
===Amore SNP typing(1 missing GT is permitted)===
+
cat Variant.vcf.SNP.filtered.d5.Q30.missing1.diff | python ~/py/Reseq/\[Reseq\]SNP_counter.py /alima9002/ref/Gmax/annotation/Gmax_275_Wm82.a2.v1.gene_exons.gff3 /alima9002/ref/Gmax/assembly/Gmax_275_v2.0.fa 30 > Variant.vcf.SNP.filtered.d5.Q30.missing1.diff.type
+
 
+
==2017.1.19==
+
===LIP related figure===
+
LIP로 예측한 insertion type별 그림
+
python ~/py/GeneInfo2Figure_v1.1.py gene_list.gi Glyma.08G.123500.gi2f.config > gene_list.gi.svg
+
이후 Illustrator로 manually 수정
+
 
+
==2017.1.23==
+
===finishing SNP, INDEL typing of amore study===
+
===Jat species with no cdhit OrthoMcl ===
+
follow Orthomcl manual
+
 
+
==2017.1.31==
+
===Drawing size distribution of predicted insertion ===
+
R with ggplot2 ver. 2.2
+
ggplot(SV, aes(SV$Size))+geom_histogram(aes(fill=SV$Program),binwidth=200,position="dodge")+scale_y_continuous("Counts")+coord_cartesian(ylim=c(400,2000),expand=FALSE)
+

Revision as of 05:01, 6 February 2017

2017 Jan Taeyoung Lab note