Preparation
INPUT files
File | Description | Required or not |
---|---|---|
List of Accessions | List of Accessions in Comma-Separated Value (CSV) file | o |
Reference genome | (Multi) FASTA file. | o |
Information of reference genome | Sequence information in Comma-Separated Value (CSV) file | o |
Variant | VCF file for each sample. | x |
BAM(depth) | Convert a BAM file to TSV (Tab-Separated Value) formatted depth file for each sample. | x |
General purpose data | BED or BEDGraph to TSV (Tab-Separated Value) file for each sample. | x |
Annotation | GFF file. | x |
information of phenotype | Phenotype information (CSV)file for GWAS result visualization | x |
qqman output file | qqman output file. | x |
Configure file | Server environment and default value for visualization. (TASUKE package contains this file.) | o |
Preparation of INPUT files
User can show these information on TASUKE browser.
"Accession" is essential and must be unique value because it is used as "ID" for installation of other files. Other information is optional. This information is used to indicate additional information of accessions to the browser.
"Other 1" and "Other 2" were new parameters added in this version and can be omitted.
ID | Name | variety | Sub variety | Origin | Origin 2 | Type | [Other 1] | [Other 2] |
---|
IRGC30416,IR 36,indica,IND,Bangladesh,,Landrace
..
Each parameter has a maximum character length (in half-width):
ID=40, Name=40, variety=40, SubVariety=40, Origin=20, Origin2=30, Type=40, Other1=(NoLimit), Other2=(NoLimit)
TASUKE make a database from variant and depth data of Chromosomes listed here. So, you can select Chromosomes to show in TASUKE Browser in this step. Length information can be obtained from reference.fa.fai file which was generated by SAMtools faidx in variant calling step.
Chromosome name | Length of each Chromosome | Start of centromere | End of centromere |
---|
chr02,35937250,13872411,13541821
..
If you don't have information about centromere start and end positions set "0" for both values.
One VCF file for one accession is needed. TASUKE accept VCF files generated by SAMtools and GATK. If you want to show effect of variants (e.g. non synonymous, frame shift...etc), you can add "EFF" information in "INFO" field by using snpEff. TASUKE supports a snpEff version 3.x and 4.x.
1) SAMtoolsVCF files must contain values of "DP4" in INFO column and "GT" in FORMAT column which can be added by -g and -D option of samtools mpileup.
CHROM | POS | ID | REF | ALT | QUAL | FILTER | *INFO | FORMAT | SAMPLE |
---|---|---|---|---|---|---|---|---|---|
chr01 | 335603 | . | T | C | 145.0 | . | *INFO | GT:PL:DP:SP:GQ | 1/1:178,30,0:10:0:57 |
chr01 | 370847 | . | GGTTGTTG | GGTTG | 214.0 | . | *INFO | GT:PL:DP:SP:GQ | 1/1:255,66,0:22:0:99 |
INDEL;DP=26;VDB=0.0395;AF1=1;AC1=2;DP4=0,0,11,11;MQ=49;FQ=-101;EFF=DOWNSTREAM(MODIFIER||||Os01g0106700|protein_coding|CODING|Os01t0106700-00|)
In VCF file generated by SAMtools,
If the variant is SNP, INFO starts with "DP=".
If the variant is Insertion or Deletion, INFO starts with "INDEL".
2) GATK
VCF files must contain values of "AD" and "GT" in FORMAT column.
CHROM | POS | ID | REF | ALT | QUAL | FILTER | *INFO | FORMAT | SAMPLE |
---|---|---|---|---|---|---|---|---|---|
chr01 | 335603 | . | T | C | 688.77 | . | *INFO | GT:AD:DP:GQ:PL | 1/1:0,19:19:57:717,57,0 |
chr01 | 370847 | . | GA | G | 214.0 | . | *INFO | GT:AD:DP:GQ:PL | 1/1:0,19:19:57:717,57,0 |
AC=2;AF=1.00;AN=2;DP=22;FS=0.000;HaplotypeScore=124.8587;MLEAC=2;MLEAF=1.00;MQ=56.20;MQ0=0;QD=36.03;RPA=9,7;RU=A;STR;EFF=DOWNSTREAM(MODIFIER||2223|||LOC_Os01g01369|||LOC_Os01g01369.1||1),INTERGENIC(MODIFIER||||||||||1),
One BAM file for one accession is needed to create depth information. It can accept various sequences alignment. (e.g. Whole genome, RNA) This procedure needs samtools. And we recommend that you should do this procedure in your analysis server.
$ tasuke_bamtodepth.pl -i <BAM file>
-o <output name>
-c <chromosome list>
-s <samtools path>
Required:
-i <BAM file>
: BAM file
-o <output name>
: Output depth file name (TSV)
-c <chromosome list>
: Chromosome information file (CSV)
-s <samtools path>
: The path of SAMtools for running samtool depth
-bq
<base quality threshold>
: int (default:0)-mq
<mapping quality threshold>
: int (default:0)
* Create working directory in same directory that depth file. After the processing is finished, the directory will be deleted.
* Chromosome list is using as a input file of "tasuke_init.pl".
One tsv file for one accession is needed to create any genome information (CHIP-seq, BS-seq, RNA-seq and so on.).
We recommend that you should do this procedure in your analysis server.
$ tasuke_bedtotsv.pl -i <any file>
-o <output name>
-c <chromosome list>
Required:
-i <any file>
: Genome information file (BED or BEDgraph)
-o <output name>
: Output TSV file name
-c <chromosome list>
: Chromosome information file (CSV)
-g : It accepts input file as bedgraph format.
* Create working directory in same directory that tsv file. After the processing is finished, the directory will be deleted.
* Chromosome list is using as a input file of "tasuke_init.pl".
The phenotype information file requires three information; "Breed name"(Accession name), "Phenotype"(Phenotype name), and "Phenotype Value". The format of this file is Camma-Separated Value (CSV) file.
Breed name | Phenotype | Phenotype Value |
---|
Name 1,phenotype 2,11.1
Name 2,phenotype 1,4.5678
Name 2,phenotype 2,9999
..
The qqman file generated by GWAS alalysis tools. The format of this file is Tab-Separated Value (TSV) file with chromosome name, position and p-value.
CHR | BP (pos) | P (p-value) |
---|
chr01 6873 0.245921508033888
chr01 24810 0.198227063325373
chr01 31071 0.498345988659771
..