Difference between revisions of "2017 Haneul Lab note"

From Crop Genomics Lab.
Jump to: navigation, search
(Mungbean pacbio assembly)
(Mungbean pacbio assembly)
Line 284: Line 284:
  "out.gp", line 2594: wrong option
  "out.gp", line 2594: wrong option
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png

Revision as of 07:16, 16 March 2017


1 / 9


parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)
find linkage group 2 map is wrong, construct map newly

Mungbean synchronous QTL

make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)
missing > 10%, hetero > 10, depth < 3 marker is filtered
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.

Mungbean pacbio assembly

moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)

1 / 10


QTL analysis by using IciMapping

Mungbean synchronous QTL

construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.

ML method, Haldane algorithm

Mungbean pacbio assembly

convert SAM format to BAM format (244:/kev8305/SK3/)


1/ 11

Mungbean synchronous QTL

QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)
just for checking locus


Mungbean pacbio assembly

coping sorted.bam file from 244 server to 63 server

variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)

samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf


Mungbean pacbio assembly

variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)

bwa index falcon_500_sspace.final.scaffolds.fasta
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam


Mungbean pacbio assembly

variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)

samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam
samtools index KJ-pe_falcon_scaffold.sorted.bam
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf


Jatropha assembly

make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)

python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg


Mungbean Chloroplast assembly

pairing Illumina PE read (63:/home/skyts0401/)

sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq


Mungbean Chloroplast assembly


gmap_build -D gmap_db -d v.radiata v.radiata.fasta
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam
samtools index Vr-cp_scaf-cp-20k.sorted.bam

2/3 ~ 2/6

Mungbean Chloroplast assembly

falcon - path : 63:/home/skyts0401/Falcon_RE/rere/

before run, copy fc_env folder (63:/data/skyts0401/Falcon/)

cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER

and configure file is on /home/skyts0401/fc_run.cfg

align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)

~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam
samtools index cp-assembly_canu-ctg.sorted.bam
samtools faidx canu_ctg_cp.fasta
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam


Mungbean Chloroplast assembly

assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)

~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta

and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)

just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)


Mungbean Chloroplast assembly

quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)

--- boost (ConsensusCore dependency) ---
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz
tar -xf boost1_63_0.tar.gz
cd boost_1_63_0/
sudo apt-get install python-dev (solution for error-pyconfig.h)
sudo ./b2 install
--- swig (ConsensusCore dependency) ---
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g
tar -xf swig-3.0.12.tar.gz 
cd swig-3.0.12/
sudo make install
--- ConsensusCore (GenomicConsensus dependency) ---
git clone https://github.com/PacificBiosciences/ConsensusCore.git
cd ConsensusCore/
sudo python setup.py install
--- GenomicConsensus ---
git clone https://github.com/PacificBiosciences/GenomicConsensus.git
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)
sudo make

Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)

bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam

Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)

bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam

Polishing by Quiver


Mungbean Chlroplast assembly

Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.

pbalign install (63:/kev8305/skyts0401/program)

--- blasr (pbalign dependency) ---
--- pbcommand (quiver dependency) ---
git clone https://github.com/PacificBiosciences/pbcommand.git
cd pbcommand
sudo python setup.py install
--- pbalign ---
git clone https://github.com/PacificBiosciences/pbalign.git
cd pbalign/
sudo pip install .

pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)

pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam


Mungbean Chloroplast assembly

Error occured while pbalign, so re-installed blasr(guess library error)


Mungbean Chloroplast assembly

variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)

samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf


Mungbean Chloroplast assembly

Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)

bwa index vr.pb.cp.fasta
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba
samtools index vr.pb.cp_PE.sorted.bam
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam
samtools index vr.pb.cp_PB.sorted.bam
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf


Mungbean Chloroplast assembly

make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)

python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta


Mungbean pacbio assembly

snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)

python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')


Mungbean pacbio assembly

too many snp for joinmap, so filtering missing < 12

python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc 
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc


Mungbean pacbio assembly

Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18

ALLMAPS install (244:/kev8305/skyts0401/program)

easy_install biopython numpy deap networkx matplotlib jcvi
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh
sh ALLMAPS-install.sh

and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile

ALLMAPS (244:/kev8305/SK3/anchoring)

python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta


Mungbean pacbio assembly

MUMmer install, for dot plot between pacbio and previous ref

wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz
tar -xvf MUMmer3.23.tar.gz
cd MUMmer3.23
make check
make install
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums


Mungbean pacbio assembly

lastz install (244:/kev8305/skyts0401/program/)

download from http://www.bx.psu.edu/~rsharris/lastz/
tar -xvzf lastz-1.02.00.tar.gz
cd lastz-distrib-1.02.00/src/
problem with Makefile, so delete -Werror in line 31 of Makefile, save.
make install

add path /home/skyts0401/lastz-distrib/bin in .profile

lastz (244:/kev8305/SK3/anchoring/)

lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam


Mungbean pacbio assembly

MUMmer, having a problem with memory, was re-installed with a memory configuration

make clean
make install

and use nucmer to align pacbio assembly and previous reference

MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa

and draw a dot plot using mummerplot

mummerplot --fat -l -png ref_qry_noextend.delta

but it occurs a error like

set mouse clipboardformat "[%.0f, %.0f]"
"out.gp", line 2594: wrong option

It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.

/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png