Difference between revisions of "2017 Haneul Lab note"

From Crop Genomics Lab.
Jump to: navigation, search
(Mungbean pacbio assembly)
Line 222: Line 222:
=== Mungbean pacbio assembly ===
=== Mungbean pacbio assembly ===
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18
ALLMAPS install (244:/kev8305/skyts0401/program)
ALLMAPS install (244:/kev8305/skyts0401/program)
Line 228: Line 229:
  sh ALLMAPS-install.sh
  sh ALLMAPS-install.sh
and, add directory include ALLMAPS binnary code(concorde,faSize liftOver) to $PATH in ~/.profile
and, add directory include ALLMAPS binnary code(concorde,faSize liftOver) to $PATH in ~/.profile
ALLMAPS (244:/kev8305/SK3/anchoring)
ALLMAPS (244:/kev8305/SK3/anchoring)

Revision as of 05:21, 28 February 2017


1 / 9


parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)
find linkage group 2 map is wrong, construct map newly

Mungbean synchronous QTL

make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)
missing > 10%, hetero > 10, depth < 3 marker is filtered
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.

Mungbean pacbio assembly

moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)

1 / 10


QTL analysis by using IciMapping

Mungbean synchronous QTL

construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.

ML method, Haldane algorithm

Mungbean pacbio assembly

convert SAM format to BAM format (244:/kev8305/SK3/)


1/ 11

Mungbean synchronous QTL

QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)
just for checking locus


Mungbean pacbio assembly

coping sorted.bam file from 244 server to 63 server

variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)

samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf


Mungbean pacbio assembly

variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)

bwa index falcon_500_sspace.final.scaffolds.fasta
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam


Mungbean pacbio assembly

variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)

samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam
samtools index KJ-pe_falcon_scaffold.sorted.bam
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf


Jatropha assembly

make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)

python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg


Mungbean Chloroplast assembly

pairing Illumina PE read (63:/home/skyts0401/)

sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq


Mungbean Chloroplast assembly


gmap_build -D gmap_db -d v.radiata v.radiata.fasta
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam
samtools index Vr-cp_scaf-cp-20k.sorted.bam

2/3 ~ 2/6

Mungbean Chloroplast assembly

falcon - path : 63:/home/skyts0401/Falcon_RE/rere/

before run, copy fc_env folder (63:/data/skyts0401/Falcon/)

cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER

and configure file is on /home/skyts0401/fc_run.cfg

align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)

~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam
samtools index cp-assembly_canu-ctg.sorted.bam
samtools faidx canu_ctg_cp.fasta
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam


Mungbean Chloroplast assembly

assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)

~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta

and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)

just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)


Mungbean Chloroplast assembly

quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)

--- boost (ConsensusCore dependency) ---
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz
tar -xf boost1_63_0.tar.gz
cd boost_1_63_0/
sudo apt-get install python-dev (solution for error-pyconfig.h)
sudo ./b2 install
--- swig (ConsensusCore dependency) ---
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g
tar -xf swig-3.0.12.tar.gz 
cd swig-3.0.12/
sudo make install
--- ConsensusCore (GenomicConsensus dependency) ---
git clone https://github.com/PacificBiosciences/ConsensusCore.git
cd ConsensusCore/
sudo python setup.py install
--- GenomicConsensus ---
git clone https://github.com/PacificBiosciences/GenomicConsensus.git
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)
sudo make

Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)

bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam

Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)

bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam

Polishing by Quiver


Mungbean Chlroplast assembly

Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.

pbalign install (63:/kev8305/skyts0401/program)

--- blasr (pbalign dependency) ---
--- pbcommand (quiver dependency) ---
git clone https://github.com/PacificBiosciences/pbcommand.git
cd pbcommand
sudo python setup.py install
--- pbalign ---
git clone https://github.com/PacificBiosciences/pbalign.git
cd pbalign/
sudo pip install .

pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)

pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam


Mungbean Chloroplast assembly

Error occured while pbalign, so re-installed blasr(guess library error)


Mungbean Chloroplast assembly

variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)

samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf


Mungbean Chloroplast assembly

Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)

bwa index vr.pb.cp.fasta
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba
samtools index vr.pb.cp_PE.sorted.bam
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam
samtools index vr.pb.cp_PB.sorted.bam
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf


Mungbean Chloroplast assembly

make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)

python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta


Mungbean pacbio assembly

snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)

python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')


Mungbean pacbio assembly

too many snp for joinmap, so filtering missing < 12

python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc 
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc


Mungbean pacbio assembly

Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18

ALLMAPS install (244:/kev8305/skyts0401/program)

easy_install biopython numpy deap networkx matplotlib jcvi
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh
sh ALLMAPS-install.sh

and, add directory include ALLMAPS binnary code(concorde,faSize liftOver) to $PATH in ~/.profile

ALLMAPS (244:/kev8305/SK3/anchoring)

python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta