Preparation
INPUT files
File | Description | Required or not |
---|---|---|
List of Accessions | List of Accessions in Comma-Separated Value (CSV) file | o |
Reference genome | (Multi) FASTA file. | o |
Information of reference genome | Sequence information in Comma-Separated Value (CSV) file | o |
Variant | VCF file for each sample. | x |
BAM(depth) | Convert a BAM file to TSV (Tab-Separated Value) formatted depth file for each sample. | x |
General purpose data | BED or BEDGraph to TSV (Tab-Separated Value) file for each sample. | x |
Annotation | GFF file. | x |
Configure file | Server environment and default value for visualization. (TASUKE package contains this file.) | o |
Preparation of INPUT files
User can show these information on TASUKE browser. "Accession" is essential and must be unique value because it is used as "ID" for installation of other files. Other information is optional. This information is used to indicate additional information of accessions to the browser.
ID | Name | variety | Sub variety | Origin | Origin 2 | Type |
---|
IRGC30416,IR 36,indica,IND,Bangladesh,,Landrace
..
TASUKE make a database from variant and depth data of Chromosomes listed here. So, you can select Chromosomes to show in TASUKE Browser in this step. Length information can be obtained from reference.fa.fai file which was generated by SAMtools faidx in variant calling step.
Chromosome name | Length of each Chromosome | Start of centromere | End of centromere |
---|
chr02,35937250,13872411,13541821
..
If you don't have information about centromere start and end positions set "0" for both values.
TASUKE does not accept the chromosome name which is included "."(dot) . So replace "." to "_"(underscore) or "-"(hyphen).
One VCF file for one accession is needed. TASUKE accept VCF files generated by SAMtools and GATK. If you want to show effect of variants (e.g. non synonymous, frame shift...etc), you can add "EFF" information in "INFO" field by using snpEff. TASUKE supports a snpEff version 3.x and 4.x.
1) SAMtoolsVCF files must contain values of "DP4" in INFO column and "GT" in FORMAT column which can be added by -g and -D option of samtools mpileup.
CHROM | POS | ID | REF | ALT | QUAL | FILTER | *INFO | FORMAT | SAMPLE |
---|---|---|---|---|---|---|---|---|---|
chr01 | 335603 | . | T | C | 145.0 | . | *INFO | GT:PL:DP:SP:GQ | 1/1:178,30,0:10:0:57 |
chr01 | 370847 | . | GGTTGTTG | GGTTG | 214.0 | . | *INFO | GT:PL:DP:SP:GQ | 1/1:255,66,0:22:0:99 |
INDEL;DP=26;VDB=0.0395;AF1=1;AC1=2;DP4=0,0,11,11;MQ=49;FQ=-101;EFF=DOWNSTREAM(MODIFIER||||Os01g0106700|protein_coding|CODING|Os01t0106700-00|)
In VCF file generated by SAMtools,
If the variant is SNP, INFO starts with "DP=".
If the variant is Insertion or Deletion, INFO starts with "INDEL".
2) GATK
VCF files must contain values of "AD" and "GT" in FORMAT column.
CHROM | POS | ID | REF | ALT | QUAL | FILTER | *INFO | FORMAT | SAMPLE |
---|---|---|---|---|---|---|---|---|---|
chr01 | 335603 | . | T | C | 688.77 | . | *INFO | GT:AD:DP:GQ:PL | 1/1:0,19:19:57:717,57,0 |
chr01 | 370847 | . | GA | G | 214.0 | . | *INFO | GT:AD:DP:GQ:PL | 1/1:0,19:19:57:717,57,0 |
AC=2;AF=1.00;AN=2;DP=22;FS=0.000;HaplotypeScore=124.8587;MLEAC=2;MLEAF=1.00;MQ=56.20;MQ0=0;QD=36.03;RPA=9,7;RU=A;STR;EFF=DOWNSTREAM(MODIFIER||2223|||LOC_Os01g01369|||LOC_Os01g01369.1||1),INTERGENIC(MODIFIER||||||||||1),
One BAM file for one accession is needed to create depth information. It can accept various sequences alignment. (e.g. Whole genome, RNA) This procedure needs samtools. And we recommend that you should do this procedure in your analysis server.
$ tasuke_bamtodepth.pl -i <BAM file>
-o <output name>
-c <chromosome list>
-s <samtools path>
Required:
-i <BAM file>
: BAM file
-o <output name>
: Output depth file name (TSV)
-c <chromosome list>
: Chromosome information file (CSV)
-s <samtools path>
: The path of SAMtools for running samtool depth
-bq
<base quality threshold>
: int (default:0)-mq
<mapping quality threshold>
: int (default:0)
* Create working directory in same directory that depth file. After the processing is finished, the directory will be deleted.
* Chromosome list is using as a input file of "tasuke_init.pl".
One tsv file for one accession is needed to create any genome information (CHIP-seq, BS-seq, RNA-seq and so on.).
We recommend that you should do this procedure in your analysis server.
$ tasuke_bedtotsv.pl -i <any file>
-o <output name>
-c <chromosome list>
Required:
-i <any file>
: Genome information file (BED or BEDgraph)
-o <output name>
: Output TSV file name
-c <chromosome list>
: Chromosome information file (CSV)
-g : It accepts input file as bedgraph format.
* Create working directory in same directory that tsv file. After the processing is finished, the directory will be deleted.
* Chromosome list is using as a input file of "tasuke_init.pl".