Configuration
TASUKE : Configuration file (config.php)
Edit a configuration file as below. The location of the file is "(Apache DocumentRoot)/conf/config.php".
MySQL Database section
Set about database connection to this section.
Modifying conf/config.php$db = "<database name>
";
$host = "localhost or hostname
";
$user = "<user name>
";
$pswd = "<password>
;"
Set about kind of database to this section. Currently, only MySQL is supported.
Modifying conf/config.php
Back-end database
mysql: It requires php-mysql module.
$backend="mysql
";
A directory where temporary files such as phylogenetic tree creation are stored. If it does not exist, TASUKE will try mkdir.
Modifying conf/config.phpTemporary directory(for Phylogenetic tree)
$tempdir="/tmp/tasuke_tmp
";
When registering a large number(> 900) of Chr/Contig reference seqs, table is divided(SerialTable). If you want to use SerialTable, you have to turn ON(1) its auto-detection.
Normally, set it to OFF(0) to reduce server load.
If you are using SerialTable and this value is OFF(0), you will get a warning on web browser.
Whether to automatically determine if the table is divided or not?
$autoDetectSerialTable = 0
or 1
;
Set about Redis connection to this section.
Modifying conf/config.phpSet "enabled" or "disabled" to $useRedis.
$useRedis = "disabled or enabled";
$redisHost = "localhost or
hostname
";$redisPort = "6379 or
port number
";$redisDBName = "0 or
database name
";This name is used for the name of reference sequence.
Modifying conf/config.php$reference = "Reference";
Default genome region to be displayed on first access.
Modifying conf/config.php// "chr" and "start" are required.
// "blocksize" and "end" conflict, with "end" taking precedence. If these are not specified,
// the default is "blocksize=1k".
// [!!!Warning!!!] Since the "start" and "end" positions are automatically changed according
// to the number of blocks and "blocksize", they may deviate greatly from the specified positions.
// chr: Chromosome/Contig name registered in the DB (excluding prefix(=$chromosome_name))
// start: Starting position on the genome.
// end: end position on the genome.
// blocksize: Default is "1k". Please specify one of the following.(1b=1bp, 1k=1000bp)
// 1b/2b/3b/4b/5b/6b/7b/8b/9b/10b/20b/30b/40b/50b/60b/70b/80b/90b/100b/200b/300b/400b/500b/600b/
// 700b/800b/900b/1k/2k/3k/4k/5k/6k/7k/8k/9k/10k/20k/30k/40k/50k/60k/70k/80k/90k/100k
// ex1. array("chr"=>"chr02","start"=>"365001");
// ex2. array("chr"=>"chr02","start"=>"365001","end"=>"465000");
// ex3. array("chr"=>"chr02","start"=>"365001", "blocksize"=>"1k");
$default_position = array("chr"=>"", "start"=>"");
This link enable users to jump external page by clicking the structure of the annotation.
e.g.)
1. Set "http://tasuke.com/search?=" to the $external_link and click "transcript001" on the track, the link destination URL will be set to "http://tasuke.com/search?=transcript001".
2. Set ".html" to the $external_link_suffix and click "transcript001" on the track, link destination URL will be set to "http://tasuke.com/transcript001.html".
$external_link = "<Destination URL>
";
$external_link_suffix = "<suffix for the URL>
";
If you set undefined tag to this value, it automatically sets "ID" to this value.
$external_link_tag_gff = "<tag name>
";
If you have multiple Tracks and want to set different links for each, set as follows:
Modifying conf/config.phpThe value set here has priority over $external_link/$external_link_tag_gff/$external_link_suffix.
$external_sites = array(
'
<Track name1>
' => array('link' => "
<Destination URL1>
",'tag_gff' => "
<tag name>
",'suffix' => "
<suffix for the URL>
"),
'
<Track name2>
' => array('link' => "
<Destination URL2>
",'tag_gff' => "
<tag name>
",'suffix' => "
<suffix for the URL>
")
);
If the chromosome name of the reference fasta is only numbers or alphabets, you can add a word like "chr" by setting here.
e.g.)
"1" to "chr1"
$chromosome_name = "";
If you show GWAS plot, you set phenotype flag true.
If you hide GWAS plot, you set phenotype flag false.
$phenotypeFlg = "true";
If you use system phylogenetic tree function, create a distance matrix file or Newick file in advance. You can use the system phylogenetic tree by specifying those path in $distanceMatrixPath or/and $newickPath.
Modifying conf/config.php// SystemTree function is enabled by specifying Newick or distance matrix file.
// If both are specified, Newick will be referenced preferentially. On the AccessionManager, user can choose which one to use.
// Sample name must be Accession ID.
// If it is empty or the file does not exist, the function will be invalid.
// Specify a distance matrix file in Phylip format(Lower-triangle matrices).
// https://evolution.gs.washington.edu/phylip/doc/distance.html
// Perform NJ clustering every time the Accession list changes. Height is accurate, but it takes time if there are many Accessions.
$distanceMatrixPath = "<path/distance_matrix>
";
// Recreate the phylogenetic tree quickly by removing Accession leaves from the Newick tree instead of NJ clustering.
$newickPath = "<path/newick>
";
// Do you want to display tree by default at startup?
$systreeDefaultOn = 1;
// By default, PHPPhylogeneticTrees is used for clustering, but if $phylipDir/exe/neighbor is available, it is mainly used.$systreeUsePhylip = 1;
// For SystemTree ProgressBar. Characters used in progress strings.$systreeProgressChar = "*";
// For SystemTree ProgressBar. Number of $systreeProgressChar to send per cycle of Phylip-neighbor.// The larger the number, the smoother ProgressBar, but the larger the amount of data transferred.
$systreeProgressCharLen = 80;
// [System/PhylipTree] Branches whose height is greater than this value will collapsed.
// Float. Invalid if 0 or empty.
$systreeAutoCollapseThreshold = 0;
// What to consider for $systreeAutoCollapseThreshold. (height/ratio)$systreeAutoCollapseType = "height";
// [System/PhylipTree] Other options
// - Correction: ""/asec/pow/cdf
$systreeCorrection = "";
// - SortType: height or subtree// height : max height of subtree
// subtree: number of subtree leaves
$systreeSortType = "subtree";
// - SortType: "" or asc or desc
$systreeSort = "desc";
// - TreeWidth: int$systreeWidth = "100";
Set more threads , get fasta sequence and phylogenetic tree faster.
Modifying conf/config.php$threadnum = "4";
To get gene ID information on ANN/EFF information , set column number.
If you can't get correct information on SNP detail information , change correct number.
$ANNcol = "6";
$EFFcol = "7"; //for older snpEff annotation
$EFFcol = "8";
Primer designing function requires Primer3 and MFEprimer(optional). Set path of Primer3 and MFEprimer to this section. (Primer3 version 2.3.0 or later, MFEprimer version 2.0 or later
Modifying conf/config.php$primer3Path = "<path/primer3_core>
";
$primer3ThermPath = "<path/primer3_config>
";
MFEprimer3 path
$MFE3path="<path/mfeprimer-3.x.x-linux-amd64>
";
$MFE3DBpath="<path/reference/reference.fasta>
";
"$MFEpath" and "$MFEDBpath" below are old settings for MFEprimer2 (with Python2). Usually use those for MFEprimer3 above.
For MFEprimer2
MFEprimer2 path
$MFEpath="<path/MFEprimer.py>
";
$MFEDBpath="<path/reference/reference.fasta>
";
$ yum install policycoreutils-python
Change SELinux security context$ semanage fcontext -a -s system_u -t httpd_sys_content_t "<PRIMER3_DIRECTORY>
";
$ restorecon -RF "<PRIMER3_DIRECTORY>
";
Phylogenetic analysis requires PHYLIP. Set path of PHYLIP to this section (does not include the "exe" subdirectory).
Enabling PHYLIP will also use it to create the System phylogenetic tree, which will draw faster.
$phylipDir = "/PATH/
";
BLAST search requires ncbi blast+ tools. Set path of BLAST to this section.
$blastDBPath must be a DB file created by makeblastdb with "-parse_seqids" option.
Specify blastn or its directory path.
tblastn and tblastx are automatically searched from $blastPath.
$blastPath = "path/bin/blastn
";
$blastDBPath = "path/reference/reference.fasta
";
tblastx has a high server load. Also, due to the large data size of the result,
the history cannot be stored in WebStorage, which may result in an error.
(Even in that case, browsing can be continued)
$useTblastx = 0;
Limit of displaying each snps and indels when absolute position mode.
Modifying conf/config.php$nuc_max = "100k";
Visualizing BAM, BED and BEDGrpah. (RNA-seq read depth, CHIP-seq, BS-seq...)
If you want to use this feature, Set "enabled" to the $customTrack.
$customTrack = 'enabled' or 'disabled';
For more detail setting, see following items.
Modifying conf/config.php$customTrackName = <track name>
;
$customTrackKindofData = <unit>
;
It normalizes a value using header value of TSV file. It is useful for RNA-seq data.
Set "enabled" or "disabled" to $customTrackNormByRead.
$customTrackNormByRead = 'enabled' or 'disabled';
Change color gradiation for custom track$cstm_max = 1000;
$cstm_min = 0;
Canvas width for accession ids and groups
Modifying conf/config.php$accession_cvs_width = 160;
$title_id_width_rate = 0.55;
Set any color to each accession groups(variety, sub varienty, origin, type).
Modifying conf/config.php$color_acc_group = 'enabled' or 'disabled';
You can set default color for each data. And users can change each color by the color manager.
Set a hex formatted value.
e.g.) $snp_max_col="#00FF00";
$snp_max_col="#00FFFF";
Color for minimum of SNP count$snp_min_col="#F0F8FF";
Color for maximum of INDEL count$indel_max_col="#A52A2A";
Color for minimum of INDEL count$indel_min_col="#FFCB8E";
Color for maximum of DEPTH value$depth_max_col="#000000";
Color for minimum of DEPTH value$depth_min_col="#F2F2F2";
Color for maximum of custom track value$cstm_max_col="#A229B8";
Color for minimum of custom track value$cstm_min_col="#F5E9F7";
Color for maximum and minimum of custom track value when multiple conditions were enabled.$cstm_max_col_para1="#77ab42";
$cstm_min_col_para1="#f1f6ec";
$cstm_max_col_para2="#a13b4b";
$cstm_min_col_para2="#f5ebed";
$cstm_max_col_para3="#1a5dd9";
$cstm_min_col_para3="#e8eefb";
This parameter designate the value for the depth color. This value is used to make a gradation corresponding to each depth value.
Modifying conf/config.php$depth_max = 140;
$depth_min = 0;
This parameter designate the value for the depth color. This value is used to make a gradation corresponding to each depth value.
Modifying conf/config.php$snp_90k=900;
$snp_80k=800;
$snp_70k=700;
$snp_60k=600;
$snp_50k=500;
$snp_40k=400;
$snp_30k=300;
$snp_20k=200;
$snp_10k=100;
$snp_9k=90;
$snp_8k=80;
$snp_7k=70;
$snp_6k=60;
$snp_5k=50;
$snp_4k=40;
$snp_3k=30;
$snp_2k=20;
$snp_1k=10;
$snp_900b=9;
$snp_800b=8;
$snp_700b=7;
$snp_600b=6;
$snp_500b=5;
$snp_400b=4;
$snp_300b=3;
$snp_200b=2;
$snp_100b=1;
$snp_90b=1;
$snp_80b=1;
$snp_70b=1;
$snp_60b=1;
$snp_50b=1;
$snp_40b=1;
$snp_30b=1;
$snp_20b=1;
$snp_10b=1;
$snp_9b=1;
$snp_8b=1;
$snp_7b=1;
$snp_6b=1;
$snp_5b=1;
$snp_4b=1;
$snp_3b=1;
$snp_2b=1;
$snp_1b=1;
$snp_min=1;
This parameter designate the value for the depth color. This value is used to make a gradation corresponding to each depth value.
Modifying conf/config.php$indel_90k=900;
$indel_80k=800;
$indel_70k=700;
$indel_60k=600;
$indel_50k=500;
$indel_40k=400;
$indel_30k=300;
$indel_20k=200;
$indel_10k=100;
$indel_9k=90;
$indel_8k=80;
$indel_7k=70;
$indel_6k=60;
$indel_5k=50;
$indel_4k=40;
$indel_3k=30;
$indel_2k=20;
$indel_1k=10;
$indel_900b=9;
$indel_800b=8;
$indel_700b=7;
$indel_600b=6;
$indel_500b=5;
$indel_400b=4;
$indel_300b=3;
$indel_200b=2;
$indel_100b=1;
$indel_90b=1;
$indel_80b=1;
$indel_70b=1;
$indel_60b=1;
$indel_50b=1;
$indel_40b=1;
$indel_30b=1;
$indel_20b=1;
$indel_10b=1;
$indel_9b=1;
$indel_8b=1;
$indel_7b=1;
$indel_6b=1;
$indel_5b=1;
$indel_4b=1;
$indel_3b=1;
$indel_2b=1;
$indel_1b=1;
$indel_min=1;
TASUKE : Other settings
Page layout
header
Modifying docs/top.html
:
</div>
Modifying docs/bottom.html
:
</div>
Here we explain how to edit "About" and "Citation" pages in "Help" menu in menu bar.
About pageModifying docs/about.html
:
</div>
Modifying docs/citation.html
:
</div>
In initially state, TASUKE loads the accessions that was sorted by installation order. this order is based on the accession information that is using as a input file of "tasuke_accession.pl". If you want to change the order of accessions, edit a below file. In addition, you can hide any accessions by using this function. An accession that does not contain "order.conf" will be hidden on the TASUKE.
If you want to use this function, write the IDs (accessions) to each line of this file.
accession_B
accession_A
:
This function does not work when this file contains nothing or the file does not exist on the conf directory.
Undefined IDs were found, and the Ids will be ignored.
e.g.) When your database has 5 accessions, but you want to show 3 accessions only.
The accession information in the databaseaccession_2
accession_3
accession_4
accession_5
accession_2
accession_1
You set several parameters for viewing destination region to URL of the TASUKE, and TASUKE views that region. It is useful for link from other web page.
e.g.) If you want to move to the 'gene001(chr01:10000-20000)' from any position or other web page, access the below URL.
http://hostname/index.html?chr=chr01&st=10000&en=20000&id=gene001
Parameter | Description |
---|---|
chr | Name of the sequence |
st | Start position |
en | End position |
id | If you set transcript id to this parameter, the transcript object will be highlighted on the annotation track. (Not required) |
Set the designated accessions to a query string of the URL (similar to above section). This function can use other query string functions together.
e.g.) Set 'human001,human002,human004' to the query string and access the below URL.
http://hostname/index.html?acc=human001,human002,human004
Parameter | Description |
---|---|
acc | Comma-separated accessions (IDs) |
Server
1. You must use limited-mysql-user for security protection. See following steps.
$ mysql -u <user>
-p
> Enter password: <password>
$ mysql> create user '<new user>
'@'<hostname>
';
$ mysql> set password '<new user>
'@'<hostname>
'=password('<new password>
');
$ mysql> grant select on <database name>
.* to '<new user>
'@'<hostname>
';
$ mysql> flush privileges;
$ mysql> exit;
$user = <new user>
;
$pswd = <new password>
;
Configuration files contains database account information, accession list, etc., so web access must be limited.
Modifying /etc/httpd/conf/httpd.conf, and restart httpd.
<Apache document root>
/conf" >Require all denied
</Directory>
Use Apache setting to limit accecss to TASUKE. See the Apache documentation for more configuration detail. Below is an example.
Case1. IP address filtering
Only allow access from specific IP addresses. Make sure to allow "local"host.
Modifying /etc/httpd/conf/httpd.conf, and restart httpd.
<Apache document root>
" >Require ip
87.65.43.21 123.45.67.0/24
Require
local
</Directory>
Case2. Basic authentication
You can set to require password authentication to access TASUKE.
Set user ID and password as follows, and tell them to the user who is allowed to access.
#Create first user.
$ sudo htpasswd -c /pathto/htpasswd
<userid1>
New password:
Re-type new password:
$ sudo chown apache:apache /pathto/htpasswd
$ sudo chmod 600 /pathto/htpasswd
# Add second and subsequent users ("-c" not required)
$ sudo htpasswd /pathto/htpasswd
<userid2>
......
<Apache document root>
" >AuthType Basic
AuthName "auth"
AuthUserFile
/pathto/htpasswd
Require valid-user
</Directory>
Using database compression, data size will be reduce and the performance is slightly improve. Particularly TSV (depth and general-purpose) data size will be reduce to 1/2 to 1/6.
The TASUKE (database) does not work until finishing this processes.
Compressed database can not be update (read-only).
If you want to update data after making compressed database, decompressing is needed.$ service mysqld stop
Move to the database directory$ cd <mysql database directory>
(default: /var/lib/mysql/<database name>
)
<tsv table>
indicates dx_accession or dx_accession_cstmMyisampack and myisamchk are repeated for each accession
$ myisampack -v <tsv table>
$ myisamchk -rq --sort-index --analyze <tsv table>
.MYI
$ service mysqld start
Load the tables$ mysql -u <user>
-p
> Enter password: <password>
$ mysql> flush tables;
$ mysql> exit;
$ service mysqld stop
Deompressing the tables$ myisamchk --unpack <tsv table>
$ service mysqld start
Load the tables$ mysql -u <user>
-p
> Enter password: <password>
$ mysql> flush tables;
$ mysql> exit;
Here is an example of server settings for stable handling of large-scale data sets with TASUKE+. If your dataset has more than a few hundred accessions, please consider.
1. MySQL (MariaDB)
With TASUKE+, the number of database tables increases according to the number of Accessions. You may need to raise the limit on the number of tables that can be open at the same time.
Modifying /etc/my.cnf, and restart MySQL.
[mysqld]
table_open_cache = 2000
open_files_limit = 5000
innodb_file_per_table = 1
The above settings affect the entire MySQL service. Make sure that there is no other conflicting web content.
"innodb_file_per_table = 1" means split the innoDB engine datafile per table. This setting has some advantages, but if your TASUKE will be accessed by more users concurrently, you may want to set "innodb_file_per_table = 0". The default value for MySQL5.6/MariaDB10 or later is "1". If you already using another MySQL database, this change might have a big impact. Do not change innodb_file_per_table if you are not sure.
$ mysqladmin -u <user>
-p variables
Enter password: <password>
If "open_files_limit" change is not reflected in MySQL, you may need to add the Linux systemd settings as below.
For MySQL
$ sudo mkdir /etc/systemd/system/mysqld.service.d
$ sudo vi /etc/systemd/system/mysqld.service.d/limits.conf
[Service]
LimitNOFILE=5000
$ sudo systemctl daemon-reload
$ sudo systemctl restart mysqld
$ sudo mkdir /etc/systemd/system/mariadb.service.d
$ sudo vi /etc/systemd/system/mariadb.service.d/limits.conf
[Service]
LimitNOFILE=5000
$ sudo systemctl daemon-reload
$ sudo systemctl restart mariadb
2. PHP
While using System phylogenetic tree, the browser screen does not display properly and the error "PHP Fatal error: Allowed memory size of *** bytes exhausted" is output to /var/log/httpd/error_log. This error is caused by exceeding PHP's maximum memory per session.
Increase the maximum value as below. The value depends on the scale of the distance matrix.
Modifying /etc/php.ini, and restart httpd.
memory_limit = 256M
- System phylogenetic tree
If there are many accessions(>400), it takes a long time to create System phylogenetic tree(NJ clustering). By default, NJ clustering is executed by PHP, but if the phylogenetic analysis function is enabled, Phylip neighbor will be used preferentially, so it can be speeded up.
The above is valid for Ver.20230901 or later.
However, when the number of Accessions exceeds 1500, NJ clustering takes a long time even with Phylip. In that case, you can hide system phylogenetic tree by default by setting "$systreeDefaultOn = 0;" in config.php. Hidden System phylogenetic tree can be displayed while browsing TASUKE (clustering takes time).
- DB registration
If the number of Accessions is large, DB registering Variant/Depth information will take more time and effort. For this reason, We have provided a wrapper script to simplify the whole procedure and speed up the registration through parallelism.
For Variant registration, see "tasuke_variant_vcf_multi.pl" here.
For Depth registration, see "tasuke_tsv_db_multi.pl" here.