Difference between revisions of "Prediction pipeline"

From Crop Genomics Lab.
Jump to: navigation, search
 
Line 29: Line 29:
 
: * If aligned reads, in the gap between homology pair, were seemed to have blunt end, They could be indel candidates
 
: * If aligned reads, in the gap between homology pair, were seemed to have blunt end, They could be indel candidates
 
: * (We can detect that feature using samtools tview or CIGAR string of samtool format)
 
: * (We can detect that feature using samtools tview or CIGAR string of samtool format)
 
+
: * but, the gap didn't have 'N'
 +
e
 
11. Design primer sets using indel candidates
 
11. Design primer sets using indel candidates

Latest revision as of 07:52, 10 July 2014

1. Whole reads were mapped against reference genome

2. Both unmapped reads retrieve using following command

samtools view –hb –f 12 –F 256 input.bam > output.bam

3. Make fastq file in bam file using bam2fastq

4. Denovo assemble of both unmapped reads using AbySS with k-mer = 51 and q-value = 20 : U-contig

5. Denovo assemble of whole sequencing reads using AbySS with k-mer = 51 and q-vale = 20 : W-contig

6. BLASTN between W-contigs and U-contigs longer than 2k with e-value threshold, 1e-100 (1.blastn_between.unmappedNwhole.py)

7. Retrieve W-contigs which contain full length U-contigs on mid-region, not point-region, with 100% identity.(2.ret_single_type.py)

8. BLASTN between W-contigs and G. max reference sequence with e-value threshold, 1e-100 (3.blastn_with_reference_genome.py)

9. Indel candidates were selected by followed condition: (4.ret_indel_candidates.py)

* Indel candidate have homology pairs with G. max on same chromosomes which were same directional
* The homology pairs have to be flanking region of U-contigs
* The homology pairs' regions on G. max were not overlapped
* The homologous region of G.max have to smaller than 10k

10. Retrieve indel candidates using read evidence (5.read_evidence.py)

* If aligned reads, in the gap between homology pair, were seemed to have blunt end, They could be indel candidates
* (We can detect that feature using samtools tview or CIGAR string of samtool format)
* but, the gap didn't have 'N'

e 11. Design primer sets using indel candidates