2017 Haneul Lab note

From Crop Genomics Lab.
Revision as of 01:21, 10 April 2017 by Skyts0401 (Talk | contribs)

Jump to: navigation, search


1 / 9


parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)
find linkage group 2 map is wrong, construct map newly

Mungbean synchronous QTL

make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)
missing > 10%, hetero > 10, depth < 3 marker is filtered
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.

Mungbean pacbio assembly

moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)

1 / 10


QTL analysis by using IciMapping

Mungbean synchronous QTL

construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.

ML method, Haldane algorithm

Mungbean pacbio assembly

convert SAM format to BAM format (244:/kev8305/SK3/)


1/ 11

Mungbean synchronous QTL

QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)
just for checking locus


Mungbean pacbio assembly

coping sorted.bam file from 244 server to 63 server

variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)

samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf


Mungbean pacbio assembly

variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)

bwa index falcon_500_sspace.final.scaffolds.fasta
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam


Mungbean pacbio assembly

variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)

samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam
samtools index KJ-pe_falcon_scaffold.sorted.bam
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf


Jatropha assembly

make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)

python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg


Mungbean Chloroplast assembly

pairing Illumina PE read (63:/home/skyts0401/)

sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq


Mungbean Chloroplast assembly


gmap_build -D gmap_db -d v.radiata v.radiata.fasta
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam
samtools index Vr-cp_scaf-cp-20k.sorted.bam

2/3 ~ 2/6

Mungbean Chloroplast assembly

falcon - path : 63:/home/skyts0401/Falcon_RE/rere/

before run, copy fc_env folder (63:/data/skyts0401/Falcon/)

cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER

and configure file is on /home/skyts0401/fc_run.cfg

align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)

~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam
samtools index cp-assembly_canu-ctg.sorted.bam
samtools faidx canu_ctg_cp.fasta
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam


Mungbean Chloroplast assembly

assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)

~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta

and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)

just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)


Mungbean Chloroplast assembly

quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)

--- boost (ConsensusCore dependency) ---
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz
tar -xf boost1_63_0.tar.gz
cd boost_1_63_0/
sudo apt-get install python-dev (solution for error-pyconfig.h)
sudo ./b2 install
--- swig (ConsensusCore dependency) ---
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g
tar -xf swig-3.0.12.tar.gz 
cd swig-3.0.12/
sudo make install
--- ConsensusCore (GenomicConsensus dependency) ---
git clone https://github.com/PacificBiosciences/ConsensusCore.git
cd ConsensusCore/
sudo python setup.py install
--- GenomicConsensus ---
git clone https://github.com/PacificBiosciences/GenomicConsensus.git
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)
sudo make

Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)

bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam

Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)

bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam

Polishing by Quiver


Mungbean Chlroplast assembly

Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.

pbalign install (63:/kev8305/skyts0401/program)

--- blasr (pbalign dependency) ---
--- pbcommand (quiver dependency) ---
git clone https://github.com/PacificBiosciences/pbcommand.git
cd pbcommand
sudo python setup.py install
--- pbalign ---
git clone https://github.com/PacificBiosciences/pbalign.git
cd pbalign/
sudo pip install .

pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)

pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam


Mungbean Chloroplast assembly

Error occured while pbalign, so re-installed blasr(guess library error)


Mungbean Chloroplast assembly

variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)

samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf


Mungbean Chloroplast assembly

Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)

bwa index vr.pb.cp.fasta
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba
samtools index vr.pb.cp_PE.sorted.bam
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam
samtools index vr.pb.cp_PB.sorted.bam
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf


Mungbean Chloroplast assembly

make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)

python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta


Mungbean pacbio assembly

snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)

python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')


Mungbean pacbio assembly

too many snp for joinmap, so filtering missing < 12

python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc 
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc


Mungbean pacbio assembly

Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18

ALLMAPS install (244:/kev8305/skyts0401/program)

easy_install biopython numpy deap networkx matplotlib jcvi
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh
sh ALLMAPS-install.sh

and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile

ALLMAPS (244:/kev8305/SK3/anchoring)

python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta


Mungbean pacbio assembly

MUMmer install, for dot plot between pacbio and previous ref

wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz
tar -xvf MUMmer3.23.tar.gz
cd MUMmer3.23
make check
make install
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums


Mungbean pacbio assembly

lastz install (244:/kev8305/skyts0401/program/)

download from http://www.bx.psu.edu/~rsharris/lastz/
tar -xvzf lastz-1.02.00.tar.gz
cd lastz-distrib-1.02.00/src/
problem with Makefile, so delete -Werror in line 31 of Makefile, save.
make install

add path /home/skyts0401/lastz-distrib/bin in .profile

lastz (244:/kev8305/SK3/anchoring/)

lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam


Mungbean pacbio assembly

MUMmer, having a problem with memory, was re-installed with a memory configuration

make clean
make install

and use nucmer to align pacbio assembly and previous reference

MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa

and draw a dot plot using mummerplot

mummerplot --fat -l -png ref_qry_noextend.delta

but it occurs a error like

set mouse clipboardformat "[%.0f, %.0f]"
"out.gp", line 2594: wrong option

It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.

/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png


Mungbean pacbio assembly

compare between pacbio assembly and previous reference

1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)

python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)

2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)

scp assembly@ .
scp skyts0401@ .
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .
bwa index p_ctg.longest.longest1.fa
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam
samtools index newcontig_pacbio.sorted.bam
 ~ same samtools command with newcontig_illumina.sam ~

Find that something looked splited mapping, so re-align with end-to-end method of bowtie2

~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam

!!!bowtie2-2.3.0 version has a bug!!!
scp assembly@ .
scp assembly@ .
~ same samtools command, view, sort, index ~
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth

blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)


for i in {0..19}; do
        ../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &

python fasta_devide.py final.contigs.reformed.fasta 
chmod a+x contig_compare.sh 
ls contig_compare_all_*.psl > psl.list
nano pslfilter.py 
python pslfilter.py psl.list > conitg_compare_all.result
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result