Difference between revisions of "Prediction pipeline"

From Crop Genomics Lab.
Jump to: navigation, search
Line 1: Line 1:
 
1. Whole reads were mapped against reference genome
 
1. Whole reads were mapped against reference genome
 +
 
2. Both unmapped reads retrieve using following command
 
2. Both unmapped reads retrieve using following command
  
Line 8: Line 9:
 
|}
 
|}
 
3. Make fastq file in bam file using bam2fastq
 
3. Make fastq file in bam file using bam2fastq
4. Denovo assemble of both unmapped reads using AbySS with k-mer = 51 and q-value = 20
+
 
5. Denovo assemble of whole sequencing reads using AbySS with k-mer = 51 and q-vale = 20
+
4. Denovo assemble of both unmapped reads using AbySS with k-mer = 51 and q-value = 20 : U-contig
6. BLASTN between contigs derived from whole sequencing reads and those derived from both unmapped reads which are longer than 2k with e-value threshold, 1e-100
+
 
7.
+
5. Denovo assemble of whole sequencing reads using AbySS with k-mer = 51 and q-vale = 20 : W-contig
 +
 
 +
6. BLASTN between W-contigs and U-contigs longer than 2k with e-value threshold, 1e-100
 +
 
 +
7. Retrieve W-contigs which contain full length U-contigs on mid-region, not point-region.
 +
 
 +
8. BLASTN between W-contigs and G. max reference sequence with e-value threshold, 1e-100
 +
 
 +
9. Indel candidates were selected by followed condition:
 +
: * Indel candidate pairs have homologies with G. max on same chromosomes
 +
: * Indel candidate pairs have homologies which have same direction
 +
: * homologous region of G.max have to smaller than 10k
 +
: *

Revision as of 09:09, 25 March 2014

1. Whole reads were mapped against reference genome

2. Both unmapped reads retrieve using following command

samtools view –hb –f 12 –F 256 input.bam > output.bam

3. Make fastq file in bam file using bam2fastq

4. Denovo assemble of both unmapped reads using AbySS with k-mer = 51 and q-value = 20 : U-contig

5. Denovo assemble of whole sequencing reads using AbySS with k-mer = 51 and q-vale = 20 : W-contig

6. BLASTN between W-contigs and U-contigs longer than 2k with e-value threshold, 1e-100

7. Retrieve W-contigs which contain full length U-contigs on mid-region, not point-region.

8. BLASTN between W-contigs and G. max reference sequence with e-value threshold, 1e-100

9. Indel candidates were selected by followed condition:

* Indel candidate pairs have homologies with G. max on same chromosomes
* Indel candidate pairs have homologies which have same direction
* homologous region of G.max have to smaller than 10k
*