Installation



Outline

Here we explain commands of TASUKE for installation and making database. The time for creation of databases depends on genome size, number of samples and the power of server computer. Steps 5 and 7 are repeated for each accession.

We prepared a shell script as unified tool for installation. It automatically finds the files from specified directory and conducts these process. (it does not support GWAS and System phylogenetic tree.)
More detail: Unified installer 

outline
0. System requirements

TASUKE browser requires the LAMP server. And it requires the Linux server that has Apache, MySQL5.0 to 5.2 with mysqli module, 5.3 or later and PHP5.0 or later.

At first you have to install php-mysql and modify httpd.conf(or php.conf) file as shown below.

If php-mysql is not installed, install it. If the result of the command "php -m" contains "mysqli", it is installed.
•CentOS php5.4 or earlier
$ yum install php-mysql

•CentOS php5.5 or later
$ yum install php-mysqlnd

•Ubuntu
$ apt install php-mysql

Set to run php in the .html file.

For CentOS, add or modifying '/etc/httpd/conf.d/php.conf'
•using php5.x
AddHandler php5-script .php AddHandler php5-script .php .html

•using php7.x
AddHandler php7-script .php AddHandler php7-script .php .html
For Ubuntu, add the following to the end of '/etc/apache2/apache2.conf':
<FilesMatch ".+\.html?$">
SetHandler application/x-httpd-php
</FilesMatch>

If php-json is not installed(you're using php5.1(or earlier) or php7.x), please install it. If the result of the command "php -m" contains "json", it is installed.
$ yum install php-json
If php-curl is not installed(Ubuntu etc.), please install it. If the result of the command "php -m" contains "curl", it is installed.
e.g.) If you want to install curl module on Ubuntu PHP7.4.
$ apt-get install php7.4-curl
[Optional]. Upgrade database schema(MySQL) from legacy TASUKE to TASUKE+

If you want to upgrade from legacy TASUKE to TASUKE+, update your database schema as follows:

Command

$ tasuke_update_for_gwas.pl -db <database name> -u <user> -p <password>

-db <database name> : Database name for TASUKE
-u <user> : User name
-p <password> : Password for the database

If there is no change in the registred data, the following actions are not required.
1. Create a database(MySQL)

Here you create a MySQL database. First, you log-in to mysql with root authority and create a database. "database name" used here is used following installation steps.

$ mysql -u <user> -p

> Enter password: <password>

$ mysql> create database <database name>;

$ mysql> exit;

If you want to upgrade from legacy TASUKE to TASUKE +, update your database schema as follows:
Command

$ tasuke_update_for_gwas.pl -db <database name> -u <user> -p <password>

-db <database name> : Database name for TASUKE
-u <user> : User name
-p <password> : Password for the database


2. Initialization of database

This tool creates several tables on your database for TASUKE.

Command

$ tasuke_init.pl -db <database name> -u <user> -p <password>

Required:
-db <database name> : Database name for TASUKE
-u <user> : User name
-p <password> : Password for the database

Optional:
-h <remote host> : To connect remote host name
-r : Delete the tables from database.

> Input csv file about chromosome's information below.
> Where is the csv file? > <information of reference genome(.csv)>
> Are you sure creating database [y|n] > y


When you missed in chromosome or accession list, you must create MySQL database again.

$ mysql -u <user> -p

> Enter password: <your password>

$ mysql> drop database <database name>;

$ mysql> create database <database name>;

$ mysql> exit;


3. Add an accession

This tool registers the accessions to database.

Command

$ tasuke_accession.pl -db <database name> -u <user> -p <password>

Required:
-db <database name> : Database name for TASUKE
-u <user> : User name
-p <password> : Password for the database

Optional:
-h <remote host> : To connect remote host name
-r : Delete the accessions from database.

> Input csv file about list of accessions below.
> Where is the csv file? > <accession list(.csv)>
:
> Are you sure adding or updating database?[y|n] > y

When you want to delete an accession. You can delete the accession with -r option.

$ tasuke_accession.pl -r -db <database name> -u <user> -p <password>

-----------------------------------
Deleting accession from database
-----------------------------------

Input csv file about list of accessions below.

* WARNING : This process deletes not only accession information but also depth and variant data.


4. Input a Reference Genome file (FASTA)

This tool sets reference genome to database.

Command

$ tasuke_ref.pl -db <database name> -u <user> -p <password> -f <reference genome>

Required:
-db <database name> : Database name for TASUKE
-u <user> : User name
-p <password> : Password for the database
-f <reference genome> : FASTA formatted reference genome file

Optional:
-h <remote host> : To connect remote host name
-r : Delete the reference genome from database.

When you want to input a reference genome again, you can delete it with '-r' option.

$ tasuke_ref.pl -r -db <database name> -u <user> -p <password>


5. Input a Variant (VCF) file

This tool sets variants to database.

Command:

$ tasuke_variant_vcf.pl -db <database name> -u <user> -p <password> -n <ID> -f <variant file>
-t 'samtools' or 'freebayes' or 'gatk' or 'gatkm'

Required:
-db <database name> : Database name for TASUKE
-u <user> : User name
-p <password> : Password for the database
-n <ID> : Destination ID (accession)
-f <variant file> : Variant infromation (.VCF)
-t 'samtools' or 'gatk' or 'gatkm' : Set the program name that generated VCF file to this section
'gatkm' means multi sample VCF file generated by GATK.

Optional:
-z : For "-t gatkm". Register GT:0/0 variant(not by default)
-h <remote host> : To connect remote host name
-r : Delete the variants from database.

When you set "gatkm" for the program name(-t), list the comma-separated ID for the destination ID(-n) (no spaces). The order of IDs is the same as the order of samples in the VCF file. If IDs is less than the number of samples in VCF, ID is mapped from the first sample and the excess sample is ignored. If you want to ignore registering samples in the middle of the columns, write only commas like '-n ID1,ID2,,ID4'.

"gatkm" VCF file contains a GT:0/0 variant, but it is not registered in DB by default as it will increase data size and reduce performance. Add '-z' option to register GT:0/0 variant. GT:0/0 variant will be displayed on the track in GT color mode.

When you want to input a VCF file again, you can delete it with '-r' option.

$ tasuke_variant_vcf.pl -r -db <database name> -u <user> -p <password> -n <ID>


6. Input a Depth information file (Optional)

This tool sets depth information to database. First you need to create TSV files from your BAMs (see Preparation section).

Command:

$ tasuke_tsv_db.pl -db <database name> -u <user> -p <password> -n <ID> -f <depth file>

Required:
-db <database name> : Database name for TASUKE
-u <user> : User name
-p <password> : Password for the database
-n <ID> : Destination ID (accession)
-f <Depth file> : TSV formatted depth information file

Optional:
-h <remote host> : To connect remote host name
-r : Delete the variants from database.

You can delete a TSV file with '-r' option.

$ tasuke_tsv_db.pl -r -db <database name> -u <user> -p <password> -n <ID>

6. Input a Depth information file (Optional)

This tool sets depth information to database. First you need to create TSV files from your BAMs (see Preparation section).


7. Input a TSV file to the general purpose track (Optional)

This tool inputs any kind of TSV formatted NGS data. To input the general purpose track, you can do it by using tasuke_tsv_db.pl with '-c' option. First you need to create TSV files from your BED or BEDgraph files (see Preparation section).

Command:

$ tasuke_tsv_db.pl -c -db <database name> -u <user> -p <password> -n <ID> -f <tsv file>

You can delete the file with '-r' option.

$ tasuke_tsv_db.pl -r -c -db <database name> -u <user> -p <password> -n <ID>

If you want to set any multiple conditions to the general purpose track, try following command. And load a TSV file using tasuke_tsv_db.pl.

Command:

$ tasuke_add_condition.pl -db <database name> -u <user> -p <password> -n <ID> -f <depth file>

Required:
-db <database name> : Database name for TASUKE
-u <user> : User name
-p <password> : Password for the database
-c <condition_id> : Condition ID(name)

Optional:
-h <remote host> : To connect remote host name
-r : Delete the conditon and tables from database.


8. Annotation track (Optional)

The annotation track on the TASUKE browser can be added from GFF files.Tasuke_screenshot

Command:

$ tasuke_track_gff.pl -db <database name> -u <user> -p <password> -f <annotation file> -t <track name>

Required:
-db <database name> : Database name for TASUKE
-u <user> : User name
-p <password> : Password for the database
-f <annotation file> : GFF(3) formatted file
-t <track name> : It sets here is directoly used for track name on TASUKE

Optional:
-h <remote host> : To connect remote host name
-r : Delete the annotations from database.

You can delete the file with '-r' option.

$ tasuke_track_gff.pl -r -db <database name> -u <user> -p <password> -t <track name>


9. Phenotype data (GWAS) (Optional)

The phenotype data can be added for using GWAS function on TASUKE.

Command:

$ tasuke_phenotype.pl -db <database name> -u <user>

Required:
-db <database name> : Database name for TASUKE
-u <user> : User name

Optional:
-h <remote host> : To connect remote host name
-r : Delete the phenotype data from database.

Enter db password: <password>
> Where is phenotype data csv file?
(File format: Breed Name,Phenotype,Phenotype Value)
# <phenotype data file(.csv)>
> Where is qqman output file?
# <qqman output file(.txt or .tsv)>
> What is a phenotype of the file?
# <phenotype>
> Completed.
> Read the next file? [y/N]
# y: > Where is qqman output file?
# N: Done.

10. Use System phylogenetic tree (Optional)

To use the System phylogenetic tree, first create a distance matrix and set its path in config file.

Tasuke_screenshot

There are two ways to create a distance matrix:

1. Use tasuke_tree_dmatrix.pl

Use script included in this package to create a distance matrix from TASUKE database contents.
This script calculates distances by comparing the presence or absence of variants between accessions.

Command:

$ tasuke_tree_dmatrix.pl -db <database name> -u <user> -p <password> -o <outfile>

Required:
-db <database name> : Database name for TASUKE
-u <user> : User name
-p <password> : Password for the database
-o <outfile> : Distance matrix path

Optional:
-h <remote host> : To connect remote host name
-r <order file> : Target accession list file(Generally tasuke_www/conf/order.conf) (Default: all Accessions)
-c <target chrs> : Target chromosomes separated by commas(Default: all Chromosomes)
-m <calc method> : Distance matrix calculation method. simple(default)/jaccard/dice/soergel
-a : Check DEPTH=NULL and if so, set "NA" to that position. Depth information must be registered. It takes a lot of time
-b : [Use with "-a"] Store DEPTH=NULL positions on memory. Can be faster but uses more memory
-n : [Use with "-a"] Crosses all accessions and skips aggregation for position columns with DEPTH=NULL.
-l : Leave binary table file with name "<outfile>.btbl". This file can be reused for distance calculation by converter/makeBinaryDistanceMatrix

2. Prepare a distance matrix in your own way
You can use a distance matrix created by an external analysis tools(R, etc.). Distance matrix format must be square matrix or lower-triangular. AccessionID must be used as sample name and must include all Accessions used by TASUKE.


In the next step, set distance matrix path to a 'tasuke_www/conf/config.php'. <outfile> must be placed in a location that the www user has permission to read.

Modifying tasuke_www/conf/config.php
$distanceMatrixPath = "<outfile>";

If you left the binary table file with "-l" option, you can recreate distance matrix with the command below. You can respecify "-m" and "-n" options. Distance matrix is output to STDOUT.

Command:

$ converter/makeBinaryDistanceMatrix -i <outfile.btbl> [optional] > <outfile2>

Required:
-i <outfile.btbl> : Binary table file(0/1 CSV table)

Optional:
-m <calc method> : Distance matrix calculation method. simple(default)/jaccard/dice/soergel
-n : Crosses all accessions and skips aggregation for position columns with DEPTH=NULL.


Unified installer

This tool supports installation of TASUKE. It automatically detects any datasets and load the data to a database. It treats each file name as registered ID. Before running the tool, confirm relation of file names and accession ID.
Unified installer does not support GWAS and system phylogenetic tree registration.

Command:

$ install.sh <TASK> <Option>

TASK (Required):
all : All installation processes
init : Setting defalut tables to a database
acc : Accession informtaion
ref : Reference sequence
ann : Annotation
var : Variants
tsv : Read depth or General purpose track (defalut: read depth)

Option:
-h : Help
-r : Delete specified datasets from the database.
-g : TSV file load to general purpose track.

Set your server environment to a 'install.conf' to run the 'install.sh'. And place the install.conf in same directory as install.sh.

Modifying install.conf

##### Configuration #####
#Path of 'tasuke_bin'
SCRIPTS='/PATH/tasuke_bin/'

#Database
#mysql or oracle
BACKEND='mysql or oracle'

#Database connection
DB=<database name>
USER=<user>
PASS=<password>

#For oracle
TABLESP=<tablespace name>

#Directory for datasets
# 'install.sh' searches for datasets in following directories. And it set the datasets to the database.
# For example, this tool searches for VCF file in './tasuke_sample_data/variants/', when setting variants to the database.

#Datasets
DATADIR='/PATH/tasuke_sample_data/'

#Enter the fasta file name you use as reference genome. not a directory. DIR_FASTA='./reference.fasta'

#This scripts searches for '.gff' from in 'DIR_GFF'.
DIR_GFF='./'

#This scripts searches for '.vcf' from in 'DIR_VCF'.
DIR_VCF='./variants/'

#This scripts searches for '.tsv' from in 'DIR_TSV'.
DIR_TSV='./depth/'

#File format of your VCF files ['samtools' or 'gatk']
VCF='gatk'
#########################

In above case, the tool searches for any file from /PATH/tasuke_sample_data/ and load the file to the database.
e.g.) The tool searches for any files from '/PATH/tasuke_sample_data/variants/'. If the tool finds 'human001.vcf', it load the vcf to the table for human001 in your database.



Starting TASUKE

Starting

First, set below configuration at least.

Modifying conf/config.php

$db = <database name>;
$host = 'localhost' or <hostname>;
$user = <user name>;
$pswd = <password>;

Access the server by web browser.
if you allocated tasuke_www/* to /(Documentroot)/tasuke/, access the following URL.

http://your_domain/tasuke

A web browser which can accept HTML5 is required. We checked the operation of TASUKE with Internet Explore(9 or later), Firefox and Google Chrome on Win and Mac.

If the TASUKE does not work, see this document.



Additional setting (Optional)

Exposing on the internet

Security setting for exposing on the internet.

1. Limited-mysql-user for security protection

$ mysql -u <user> -p

> Enter password: <password>

$ mysql> create user '<new user>'@'<hostname>';

$ mysql> set password '<new user>'@'<hostname>'=password('<new password>');

$ mysql> grant select on <database name>.* to '<new user>'@'<hostname>';

$ mysql> flush privileges;

$ mysql> exit;

Modifying conf/config.php

$user = <new user>;
$pswd = <new password>;

2. Access limiting for the configuration files
Modifying /etc/httpd/conf/httpd.conf
<Directory "<Apache document root>/conf" >
Order deny,allow
Deny from all
</Directory>

Compressing the database

Using database compression, data size will be reduce and the performance is slightly improve. Particularly TSV (depth and general-purpose) data size will be reduce to 1/2 to 1/6.

Compressing
Stop the mysql-server

$ service mysqld stop

Move to the database directory

$ cd <mysql database directory> (default: /var/lib/mysql/<database name>)

Compressing the tables
<tsv table> indicates dx_accession or dx_accession_cstm
Myisampack and myisamchk are repeated for each accession

$ myisampack -v <tsv table>

$ myisamchk -rq --sort-index --analyze <tsv table>.MYI

Start the mysql-server

$ service mysqld start

Load the tables

$ mysql -u <user> -p

> Enter password: <password>

$ mysql> flush tables;

$ mysql> exit;

Decompressing
Stop the mysql-server

$ service mysqld stop

Deompressing the tables

$ myisamchk --unpack <tsv table>

Start the mysql-server

$ service mysqld start

Load the tables

$ mysql -u <user> -p

> Enter password: <password>

$ mysql> flush tables;

$ mysql> exit;



How to update

Updating

This section describes how to update a TASUKE.

1. Unpack & Copy

After download TASUKE package, set "tasuke_www" to the Apache document root.

Unpack and copy new files

$ tar xf ./tasuke_tools.tar

$ cp -r ./tasuke_tools/tasuke_www/* <TASUKE DIRECTORY>

2. Upgrade your database schema (only if necessary)
Command

$ tasuke_db_upgrade.pl -db <database name> -u <user> -p <password>

Required:
-db <database name> : Database name for TASUKE
-u <user> : User name
-p <password> : Password for the database

3. Edit the configuration file

Set any items to the updated configuration file.
More detail: Configuration-page