Difference between revisions of "Prediction pipeline"
From Crop Genomics Lab.
Line 1: | Line 1: | ||
1. Whole reads were mapped against reference genome | 1. Whole reads were mapped against reference genome | ||
+ | |||
2. Both unmapped reads retrieve using following command | 2. Both unmapped reads retrieve using following command | ||
Line 8: | Line 9: | ||
|} | |} | ||
3. Make fastq file in bam file using bam2fastq | 3. Make fastq file in bam file using bam2fastq | ||
− | 4. Denovo assemble of both unmapped reads using AbySS with k-mer = 51 and q-value = 20 | + | |
− | 5. Denovo assemble of whole sequencing reads using AbySS with k-mer = 51 and q-vale = 20 | + | 4. Denovo assemble of both unmapped reads using AbySS with k-mer = 51 and q-value = 20 : U-contig |
− | 6. BLASTN between contigs | + | |
− | 7. | + | 5. Denovo assemble of whole sequencing reads using AbySS with k-mer = 51 and q-vale = 20 : W-contig |
+ | |||
+ | 6. BLASTN between W-contigs and U-contigs longer than 2k with e-value threshold, 1e-100 | ||
+ | |||
+ | 7. Retrieve W-contigs which contain full length U-contigs on mid-region, not point-region. | ||
+ | |||
+ | 8. BLASTN between W-contigs and G. max reference sequence with e-value threshold, 1e-100 | ||
+ | |||
+ | 9. Indel candidates were selected by followed condition: | ||
+ | : * Indel candidate pairs have homologies with G. max on same chromosomes | ||
+ | : * Indel candidate pairs have homologies which have same direction | ||
+ | : * homologous region of G.max have to smaller than 10k | ||
+ | : * |
Revision as of 09:09, 25 March 2014
1. Whole reads were mapped against reference genome
2. Both unmapped reads retrieve using following command
samtools view –hb –f 12 –F 256 input.bam > output.bam |
3. Make fastq file in bam file using bam2fastq
4. Denovo assemble of both unmapped reads using AbySS with k-mer = 51 and q-value = 20 : U-contig
5. Denovo assemble of whole sequencing reads using AbySS with k-mer = 51 and q-vale = 20 : W-contig
6. BLASTN between W-contigs and U-contigs longer than 2k with e-value threshold, 1e-100
7. Retrieve W-contigs which contain full length U-contigs on mid-region, not point-region.
8. BLASTN between W-contigs and G. max reference sequence with e-value threshold, 1e-100
9. Indel candidates were selected by followed condition:
- * Indel candidate pairs have homologies with G. max on same chromosomes
- * Indel candidate pairs have homologies which have same direction
- * homologous region of G.max have to smaller than 10k
- *