Configuration



TASUKE : Configuration file (config.php)

Edit a configuration file as below. The location of the file is "(Apache DocumentRoot)/conf/config.php".


MySQL Database section

Set about database connection to this section.

Modifying conf/config.php
Set the name of MySQL database

$db = "<database name>";

Set the host name

$host = "localhost or hostname";

Set the user name of MySQL

$user = "<user name>";

Set the password of MySQL

$pswd = "<password>;"


Back-end database

Set about kind of database to this section. Currently, only MySQL is supported.

Modifying conf/config.php

Back-end database
mysql: It requires php-mysql module.
$backend="mysql";


Temporary directory

A directory where temporary files such as phylogenetic tree creation are stored. If it does not exist, TASUKE will try mkdir.

Modifying conf/config.php

Temporary directory(for Phylogenetic tree)
$tempdir="/tmp/tasuke_tmp";


Serial table auto detection

When registering a large number(> 900) of Chr/Contig reference seqs, table is divided(SerialTable). If you want to use SerialTable, you have to turn ON(1) its auto-detection.
Normally, set it to OFF(0) to reduce server load.
If you are using SerialTable and this value is OFF(0), you will get a warning on web browser.

Modifying conf/config.php

Whether to automatically determine if the table is divided or not?
$autoDetectSerialTable = 0 or 1;


Redis connection (Beta)

Set about Redis connection to this section.

Modifying conf/config.php
Cache system using Redis (Beta)
Set "enabled" or "disabled" to $useRedis.
$useRedis = "disabled or enabled";
$redisHost = "localhost or hostname";
$redisPort = "6379 or port number";
$redisDBName = "0 or database name";

Naming of reference genome

This name is used for the name of reference sequence.

Modifying conf/config.php

$reference = "Reference";


Default genome region

Default genome region to be displayed on first access.

Modifying conf/config.php
// You can set the genome region to display by default.
// "chr" and "start" are required.
// "blocksize" and "end" conflict, with "end" taking precedence. If these are not specified,
// the default is "blocksize=1k".
// [!!!Warning!!!] Since the "start" and "end" positions are automatically changed according
// to the number of blocks and "blocksize", they may deviate greatly from the specified positions.
// chr: Chromosome/Contig name registered in the DB (excluding prefix(=$chromosome_name))
// start: Starting position on the genome.
// end: end position on the genome.
// blocksize: Default is "1k". Please specify one of the following.(1b=1bp, 1k=1000bp)
// 1b/2b/3b/4b/5b/6b/7b/8b/9b/10b/20b/30b/40b/50b/60b/70b/80b/90b/100b/200b/300b/400b/500b/600b/
// 700b/800b/900b/1k/2k/3k/4k/5k/6k/7k/8k/9k/10k/20k/30k/40k/50k/60k/70k/80k/90k/100k
// ex1. array("chr"=>"chr02","start"=>"365001");
// ex2. array("chr"=>"chr02","start"=>"365001","end"=>"465000");
// ex3. array("chr"=>"chr02","start"=>"365001", "blocksize"=>"1k");

$default_position = array("chr"=>"", "start"=>"");


Making external link on the annotation (GFF) track

This link enable users to jump external page by clicking the structure of the annotation.

e.g.)
1. Set "http://tasuke.com/search?=" to the $external_link and click "transcript001" on the track, the link destination URL will be set to "http://tasuke.com/search?=transcript001".
2. Set ".html" to the $external_link_suffix and click "transcript001" on the track, link destination URL will be set to "http://tasuke.com/transcript001.html".

Modifying conf/config.php
External link URL on the annotation track

$external_link = "<Destination URL>";

Suffix for external URL link

$external_link_suffix = "<suffix for the URL>";

This value means a tag from gff file for external link url parameter.
If you set undefined tag to this value, it automatically sets "ID" to this value.

$external_link_tag_gff = "<tag name>";

If you have multiple Tracks and want to set different links for each, set as follows:

Modifying conf/config.php
External links for each tracks
The value set here has priority over $external_link/$external_link_tag_gff/$external_link_suffix.
$external_sites = array(
  '<Track name1>' => array(
    'link' => "<Destination URL1>",
    'tag_gff' => "<tag name>",
    'suffix' => "<suffix for the URL>"
  ),
  '<Track name2>' => array(
    'link' => "<Destination URL2>",
    'tag_gff' => "<tag name>",
    'suffix' => "<suffix for the URL>"
  )
);

Making head of chromosome name

If the chromosome name of the reference fasta is only numbers or alphabets, you can add a word like "chr" by setting here.

e.g.)
"1" to "chr1"

Modifying conf/config.php

$chromosome_name = "";


Setting GWAS flag

If you show GWAS plot, you set phenotype flag true.
If you hide GWAS plot, you set phenotype flag false.

Modifying conf/config.php

$phenotypeFlg = "true";


Setting System phylogenetic tree

If you use system phylogenetic tree function, create a distance matrix file or Newick file in advance. You can use the system phylogenetic tree by specifying those path in $distanceMatrixPath or/and $newickPath.

Modifying conf/config.php
// [SystemTree] use system phylogenetic tree function.
// SystemTree function is enabled by specifying Newick or distance matrix file.
// If both are specified, Newick will be referenced preferentially. On the AccessionManager, user can choose which one to use.
// Sample name must be Accession ID.
// If it is empty or the file does not exist, the function will be invalid.

// Specify a distance matrix file in Phylip format(Lower-triangle matrices).
// https://evolution.gs.washington.edu/phylip/doc/distance.html
// Perform NJ clustering every time the Accession list changes. Height is accurate, but it takes time if there are many Accessions.

$distanceMatrixPath = "<path/distance_matrix>";

// Specify a newick file.
// Recreate the phylogenetic tree quickly by removing Accession leaves from the Newick tree instead of NJ clustering.

$newickPath = "<path/newick>";


// Do you want to display tree by default at startup?

$systreeDefaultOn = 1;

// By default, PHPPhylogeneticTrees is used for clustering, but if $phylipDir/exe/neighbor is available, it is mainly used.

$systreeUsePhylip = 1;

// For SystemTree ProgressBar. Characters used in progress strings.

$systreeProgressChar = "*";

// For SystemTree ProgressBar. Number of $systreeProgressChar to send per cycle of Phylip-neighbor.
// The larger the number, the smoother ProgressBar, but the larger the amount of data transferred.

$systreeProgressCharLen = 80;


// [System/PhylipTree] Branches whose height is greater than this value will collapsed.
// Float. Invalid if 0 or empty.

$systreeAutoCollapseThreshold = 0;

// What to consider for $systreeAutoCollapseThreshold. (height/ratio)

$systreeAutoCollapseType = "height";


// [System/PhylipTree] Other options
// - Correction: ""/asec/pow/cdf

$systreeCorrection = "";

// - SortType: height or subtree
// height : max height of subtree
// subtree: number of subtree leaves

$systreeSortType = "subtree";

// - SortType: "" or asc or desc

$systreeSort = "desc";

// - TreeWidth: int

$systreeWidth = "100";


Setting the number of threads

Set more threads , get fasta sequence and phylogenetic tree faster.

Modifying conf/config.php

$threadnum = "4";


Setting gene id column number

To get gene ID information on ANN/EFF information , set column number.
If you can't get correct information on SNP detail information , change correct number.

Modifying conf/config.php

$ANNcol = "6";

$EFFcol = "7"; //for older snpEff annotation

$EFFcol = "8";


For primer designing

Primer designing function requires Primer3 and MFEprimer(optional). Set path of Primer3 and MFEprimer to this section. (Primer3 version 2.3.0 or later, MFEprimer version 2.0 or later

Modifying conf/config.php
Primer3

$primer3Path = "<path/primer3_core>";

Primer3 thermodynamic parameters

$primer3ThermPath = "<path/primer3_config>";

For MFEprimer3
MFEprimer3 path

$MFE3path="<path/mfeprimer-3.x.x-linux-amd64>";

MFEprimer3 database fasta path

$MFE3DBpath="<path/reference/reference.fasta>";


"$MFEpath" and "$MFEDBpath" below are old settings for MFEprimer2 (with Python2). Usually use those for MFEprimer3 above.

For MFEprimer2
MFEprimer2 path

$MFEpath="<path/MFEprimer.py>";

MFEprimer2 database fasta path

$MFEDBpath="<path/reference/reference.fasta>";

If the primer3 does not work on the TASUKE, try the following commands.
Install SELinux policy management tool

$ yum install policycoreutils-python

Change SELinux security context

$ semanage fcontext -a -s system_u -t httpd_sys_content_t "<PRIMER3_DIRECTORY>";

$ restorecon -RF "<PRIMER3_DIRECTORY>";


For phylogenetic analysis

Phylogenetic analysis requires PHYLIP. Set path of PHYLIP to this section (does not include the "exe" subdirectory).
Enabling PHYLIP will also use it to create the System phylogenetic tree, which will draw faster.

Modifying conf/config.php

$phylipDir = "/PATH/";

If the PHYLIP does not work on the TASUKE, try same commands as primer3 section.

For BLAST search

BLAST search requires ncbi blast+ tools. Set path of BLAST to this section.
$blastDBPath must be a DB file created by makeblastdb with "-parse_seqids" option.

Modifying conf/config.php
BLAST PATH
Specify blastn or its directory path.
tblastn and tblastx are automatically searched from $blastPath.

$blastPath = "path/bin/blastn";

BLAST reference genome database PATH

$blastDBPath = "path/reference/reference.fasta";

Use TBLASTX (Default:off)
tblastx has a high server load. Also, due to the large data size of the result,
the history cannot be stored in WebStorage, which may result in an error.
(Even in that case, browsing can be continued)

$useTblastx = 0;


Maximum level of zoom for viewing each nucleotide

Limit of displaying each snps and indels when absolute position mode.

Modifying conf/config.php

$nuc_max = "100k";


General purpose track (Custom track)

Visualizing BAM, BED and BEDGrpah. (RNA-seq read depth, CHIP-seq, BS-seq...)
If you want to use this feature, Set "enabled" to the $customTrack.

Modifying conf/config.php
Set "enabled" or "disabled" to $customTrack.

$customTrack = 'enabled' or 'disabled';

For more detail setting, see following items.

Modifying conf/config.php
Track name for menu

$customTrackName = <track name>;

Kind of data name (Unit)

$customTrackKindofData = <unit>;

Normalizing based on number of read or not
It normalizes a value using header value of TSV file. It is useful for RNA-seq data.
Set "enabled" or "disabled" to $customTrackNormByRead.

$customTrackNormByRead = 'enabled' or 'disabled';

Change color gradiation for custom track

$cstm_max = 1000;

$cstm_min = 0;


Accession name

Canvas width for accession ids and groups

Modifying conf/config.php
Accession title width(ids and groups)

$accession_cvs_width = 160;
$title_id_width_rate = 0.55;

Set any color to each accession groups(variety, sub varienty, origin, type).

Modifying conf/config.php
Set "enabled" or "disabled" to $color_acc_group.

$color_acc_group = 'enabled' or 'disabled';


Color definition

You can set default color for each data. And users can change each color by the color manager.

Set a hex formatted value.
e.g.) $snp_max_col="#00FF00";

Modifying conf/config.php
Color for maximum of SNP count

$snp_max_col="#00FFFF";

Color for minimum of SNP count

$snp_min_col="#F0F8FF";

Color for maximum of INDEL count

$indel_max_col="#A52A2A";

Color for minimum of INDEL count

$indel_min_col="#FFCB8E";

Color for maximum of DEPTH value

$depth_max_col="#000000";

Color for minimum of DEPTH value

$depth_min_col="#F2F2F2";

Color for maximum of custom track value

$cstm_max_col="#A229B8";

Color for minimum of custom track value

$cstm_min_col="#F5E9F7";

Color for maximum and minimum of custom track value when multiple conditions were enabled.

$cstm_max_col_para1="#77ab42";
$cstm_min_col_para1="#f1f6ec";
$cstm_max_col_para2="#a13b4b";
$cstm_min_col_para2="#f5ebed";
$cstm_max_col_para3="#1a5dd9";
$cstm_min_col_para3="#e8eefb";


Set maximum of depth value for display

This parameter designate the value for the depth color. This value is used to make a gradation corresponding to each depth value.

Modifying conf/config.php

$depth_max = 140;

$depth_min = 0;


Set count of SNP in your expectation by each zoom level

This parameter designate the value for the depth color. This value is used to make a gradation corresponding to each depth value.

Modifying conf/config.php
$snp_100k=1000;
$snp_90k=900;
$snp_80k=800;
$snp_70k=700;
$snp_60k=600;
$snp_50k=500;
$snp_40k=400;
$snp_30k=300;
$snp_20k=200;
$snp_10k=100;
$snp_9k=90;
$snp_8k=80;
$snp_7k=70;
$snp_6k=60;
$snp_5k=50;
$snp_4k=40;
$snp_3k=30;
$snp_2k=20;
$snp_1k=10;
$snp_900b=9;
$snp_800b=8;
$snp_700b=7;
$snp_600b=6;
$snp_500b=5;
$snp_400b=4;
$snp_300b=3;
$snp_200b=2;
$snp_100b=1;
$snp_90b=1;
$snp_80b=1;
$snp_70b=1;
$snp_60b=1;
$snp_50b=1;
$snp_40b=1;
$snp_30b=1;
$snp_20b=1;
$snp_10b=1;
$snp_9b=1;
$snp_8b=1;
$snp_7b=1;
$snp_6b=1;
$snp_5b=1;
$snp_4b=1;
$snp_3b=1;
$snp_2b=1;
$snp_1b=1;
$snp_min=1;

Set count of INDEL in your expectation by each zoom level

This parameter designate the value for the depth color. This value is used to make a gradation corresponding to each depth value.

Modifying conf/config.php
$indel_100k=1000;
$indel_90k=900;
$indel_80k=800;
$indel_70k=700;
$indel_60k=600;
$indel_50k=500;
$indel_40k=400;
$indel_30k=300;
$indel_20k=200;
$indel_10k=100;
$indel_9k=90;
$indel_8k=80;
$indel_7k=70;
$indel_6k=60;
$indel_5k=50;
$indel_4k=40;
$indel_3k=30;
$indel_2k=20;
$indel_1k=10;
$indel_900b=9;
$indel_800b=8;
$indel_700b=7;
$indel_600b=6;
$indel_500b=5;
$indel_400b=4;
$indel_300b=3;
$indel_200b=2;
$indel_100b=1;
$indel_90b=1;
$indel_80b=1;
$indel_70b=1;
$indel_60b=1;
$indel_50b=1;
$indel_40b=1;
$indel_30b=1;
$indel_20b=1;
$indel_10b=1;
$indel_9b=1;
$indel_8b=1;
$indel_7b=1;
$indel_6b=1;
$indel_5b=1;
$indel_4b=1;
$indel_3b=1;
$indel_2b=1;
$indel_1b=1;
$indel_min=1;


TASUKE : Other settings


Page layout

header
Modifying docs/top.html
<div id="top" style="height:50px;">
:
</div>
footer
Modifying docs/bottom.html
<div id="bottom" style="height:50px;">
:
</div>


Order of the accessions / hide the accessions

In initially state, TASUKE loads the accessions that was sorted by installation order. this order is based on the accession information that is using as a input file of "tasuke_accession.pl". If you want to change the order of accessions, edit a below file. In addition, you can hide any accessions by using this function. An accession that does not contain "order.conf" will be hidden on the TASUKE.

If you want to use this function, write the IDs (accessions) to each line of this file.

accession_C
accession_B
accession_A
:

e.g.) When your database has 5 accessions, but you want to show 3 accessions only.

The accession information in the database
accession_1
accession_2
accession_3
accession_4
accession_5
conf/order.conf
accession_5
accession_2
accession_1
On the web browser
Tasuke_screenshot



Move to a destination region with Query string

You set several parameters for viewing destination region to URL of the TASUKE, and TASUKE views that region. It is useful for link from other web page.

e.g.) If you want to move to the 'gene001(chr01:10000-20000)' from any position or other web page, access the below URL.
http://hostname/index.html?chr=chr01&st=10000&en=20000&id=gene001

Parameter Description
chr Name of the sequence
st Start position
en End position
id If you set transcript id to this parameter, the transcript object will be highlighted on the annotation track. (Not required)

A way of showing the designated accessions from external web page

Set the designated accessions to a query string of the URL (similar to above section). This function can use other query string functions together.

e.g.) Set 'human001,human002,human004' to the query string and access the below URL.
http://hostname/index.html?acc=human001,human002,human004

Parameter Description
acc Comma-separated accessions (IDs)


Server

Exposing on the internet
1. You must use limited-mysql-user for security protection. See following steps.

$ mysql -u <user> -p

> Enter password: <password>

$ mysql> create user '<new user>'@'<hostname>';

$ mysql> set password '<new user>'@'<hostname>'=password('<new password>');

$ mysql> grant select on <database name>.* to '<new user>'@'<hostname>';

$ mysql> flush privileges;

$ mysql> exit;

Modifying conf/config.php

$user = <new user>;
$pswd = <new password>;

2. Access limiting for the configuration files
Configuration files contains database account information, accession list, etc., so web access must be limited.

Modifying /etc/httpd/conf/httpd.conf, and restart httpd.
<Directory "<Apache document root>/conf" >
Require all denied
</Directory>

Access limiting for TASUKE itself
Use Apache setting to limit accecss to TASUKE. See the Apache documentation for more configuration detail. Below is an example.

Case1. IP address filtering
Only allow access from specific IP addresses. Make sure to allow "local"host.

Modifying /etc/httpd/conf/httpd.conf, and restart httpd.
<Directory "<Apache document root>" >
Require ip 87.65.43.21 123.45.67.0/24
Require local
</Directory>

Case2. Basic authentication
You can set to require password authentication to access TASUKE.

Set user ID and password as follows, and tell them to the user who is allowed to access.

#Create first user.
$ sudo htpasswd -c /pathto/htpasswd <userid1>
New password:
Re-type new password:

$ sudo chown apache:apache /pathto/htpasswd

$ sudo chmod 600 /pathto/htpasswd

# Add second and subsequent users ("-c" not required)
$ sudo htpasswd /pathto/htpasswd <userid2>
......

Modifying /etc/httpd/conf/httpd.conf, and restart httpd.
<Directory "<Apache document root>" >
AuthType Basic
AuthName "auth"
AuthUserFile /pathto/htpasswd
Require valid-user
</Directory>

Compressing the database

Using database compression, data size will be reduce and the performance is slightly improve. Particularly TSV (depth and general-purpose) data size will be reduce to 1/2 to 1/6.

Compressing
Stop the mysql-server

$ service mysqld stop

Move to the database directory

$ cd <mysql database directory> (default: /var/lib/mysql/<database name>)

Compressing the tables
<tsv table> indicates dx_accession or dx_accession_cstm
Myisampack and myisamchk are repeated for each accession

$ myisampack -v <tsv table>

$ myisamchk -rq --sort-index --analyze <tsv table>.MYI

Start the mysql-server

$ service mysqld start

Load the tables

$ mysql -u <user> -p

> Enter password: <password>

$ mysql> flush tables;

$ mysql> exit;

Decompressing
Stop the mysql-server

$ service mysqld stop

Deompressing the tables

$ myisamchk --unpack <tsv table>

Start the mysql-server

$ service mysqld start

Load the tables

$ mysql -u <user> -p

> Enter password: <password>

$ mysql> flush tables;

$ mysql> exit;

Recommended settings for large-scale dataset

Here is an example of server settings for stable handling of large-scale data sets with TASUKE+. If your dataset has more than a few hundred accessions, please consider.


1. MySQL (MariaDB)
With TASUKE+, the number of database tables increases according to the number of Accessions. You may need to raise the limit on the number of tables that can be open at the same time.

Modifying /etc/my.cnf, and restart MySQL.

[mysqld]
table_open_cache = 2000
open_files_limit = 5000
innodb_file_per_table = 1

You can check current MySQL settings with the following command:

$ mysqladmin -u <user> -p variables
Enter password: <password>


2. PHP
While using System phylogenetic tree, the browser screen does not display properly and the error "PHP Fatal error: Allowed memory size of *** bytes exhausted" is output to /var/log/httpd/error_log. This error is caused by exceeding PHP's maximum memory per session.
Increase the maximum value as below. The value depends on the scale of the distance matrix.

Modifying /etc/php.ini, and restart httpd.

memory_limit = 256M

3. TASUKE+ settings
  • System phylogenetic tree
    If there are many accessions(>400), it takes a long time to create System phylogenetic tree(NJ clustering). By default, NJ clustering is executed by PHP, but if the phylogenetic analysis function is enabled, Phylip neighbor will be used preferentially, so it can be speeded up.
    The above is valid for Ver.20230901 or later.

    However, when the number of Accessions exceeds 1500, NJ clustering takes a long time even with Phylip. In that case, you can hide system phylogenetic tree by default by setting "$systreeDefaultOn = 0;" in config.php. Hidden System phylogenetic tree can be displayed while browsing TASUKE (clustering takes time).

  • DB registration
    If the number of Accessions is large, DB registering Variant/Depth information will take more time and effort. For this reason, We have provided a wrapper script to simplify the whole procedure and speed up the registration through parallelism.
    For Variant registration, see "tasuke_variant_vcf_multi.pl" here.
    For Depth registration, see "tasuke_tsv_db_multi.pl" here.