Running SpeedDate as a command line application
SpeedDate can be started from the command line by invoking the following line in a terminal.
julia -e"using SpeedDate; SpeedDate.main()"
Computing divergence times
The compute
command is invoked to compute genetic distances and divergence times for an input aligned FASTA file.
Use the -h flag to display the possible options for the compute
command:
julia -e"using SpeedDate; SpeedDate.main()" compute -h usage: <PROGRAM> compute [-f FILE] [-s [DNASEQS...]] [-m MODEL] [-r MUTATION_RATE] [--method METHOD] [-o OUTFILE] [--scan] [-w WIDTH] [-c] [--onlydist] [-h] optional arguments: -f, --file FILE An input file. -s, --dnaseqs [DNASEQS...] The first of two DNA sequences to test (type: Bio.Seq.BioSequence{Bio.Seq.DNAAlphabet{4}}) -m, --model MODEL The model used to compute genetic distances. Currently jc69, and k80 are supported. (default: "jc69") -r, --mutation_rate MUTATION_RATE The mutation rate to be assumed. (type: Float64, default: 1.0e-8) --method METHOD The dating method to use. Currently 'default' and 'simple' are supported. (default: "default") -o, --outfile OUTFILE Base name for the output files(s). (default: "SpeedDate") --scan Whether or not to compute dates across sequences with a window. -w, --width WIDTH Width of the window across sequences. (type: Int64, default: 100) -c, --sepcol Write the start and end points of sliding windows in separate columns of output table. --onlydist -h, --help show this help message and exit
Entering DNA sequences
You will see that you can enter DNA sequences either from a FASTA formatted file using the -f
or --file
flags. Alternatively short sequences can be typed out on the command line using the -s
or --dnaseqs
flags.
Parameters for divergence time estimation.
SpeedDate computes evolutionary distances between sequences (or windows of sequences if you are estimating divergence times using a sliding window - see below.), by default this is currently the Jukes & Cantor 69 distance, but can be set to Kimura's 80 distance, by providing the option -m k80
or --model k80
. More estimates are anticipated in future versions.
SpeedDate requires a mutation rate to estimate divergence times, by default this is 1.0e-8, but may be altered by using the -r
or --mutation_rate
flag.
By default the SpeedDate method will be used which gives a 95% CI range for the divergence time, but a simpler estimate can be used which gives a single point estimate of the expected divergence time. To do this pass the flag option --method simple
.
Use the --scan
option flag to indicate you want to compute the divergence times for windows across your sequences. Set the size of the windows with the flag -w
, e.g. -w 1000
.
When you run SpeedDate a .csv file containing the evolutionary distance measures, and a .csv file containing the divergence time estimates between your sequences will be output. You can set the base name these output files with --outfile
or -o
.
Extra options
If the flag --onlydist
is used, then SpeedDate will only compute the evolutionary distance measures and output them to file. It will not go on to produce divergence time estimates.
Plotting SpeedDate results
The compute
command will write out the results to file as a data-frame, in a text-based file format (csv). This format should be familiar to users of R and other software frameworks that use a similar tabular data structure.
The plot
command is invoked to plot the results of a compute
run. Of course the user can use whatever plotting solution they desire if they are happy scripting with the resulting data-frame files.
The plot command is very simple and will create simple heat plots using the Gadfly.jl framework.
The options available to the user for plotting can be viewed using the -h
flag:
julia -e"using SpeedDate; SpeedDate.main()" plot -h usage: <PROGRAM> plot [--width WIDTH] [--height HEIGHT] [--units UNITS] [--backend BACKEND] [--reference REFERENCE] [-h] [inputfile] [outputfile] positional arguments: inputfile The file name of the input data. outputfile The file name of the output plot. optional arguments: --width WIDTH Width of the plot. (type: Float64, default: 12.0) --height HEIGHT Height of the plot. (type: Float64, default: 8.0) --units UNITS Units for width and height of the plot. Must be one of 'inch', 'mm', or 'cm'. (default: "cm") --backend BACKEND The backend used to produce the plot. Must be one of 'svg', 'svgjs', 'png', 'pdf', 'ps', or 'pgf'. (default: "svg") --reference REFERENCE The name of the DNA sequence to use as a reference when plotting a windowed analysis. (default: "default") --table Save the table used for plotting to file. --sortsim Sort the rows of the output table by average sequence similarity if you are plotting a windowed analysis. -h, --help show this help message and exit.
The command is very simple, with two obligatory arguments: first the filename of a SpeedDate results file is provided, followed by a name for the output plot.
Optionally: You can set the width and the height of the generated plot with the flags --width
and --height
, and you can specify the units of the width and height (in mm, cm, or inches), with the --units
flag. By default the units are in centimeters (cm).
You can choose which backend is used to save the plot (i.e. the file format) using the --backend
flag. Most commonly you will want a vectorized format ("svg" or "pdf"), or a rasterized format ("png"). By default the plot will be drawn using an SVG backend.
Finally, if you are plotting results from a SpeedDate run which used a sliding window, then you can set the name of the sequence to use as the reference sequence, against which all other sequences are plotted. By default, this falls back to the first sequence name that is mentioned in the results file you give the plot command.
This command can plot distances or divergence times produced by SpeedDate. The divergence times plotted are the "middle estimates". Recall from the intro of this manual that SpeedDate makes upper, middle, and lower estimates of the divergence time, forming a 95% confidence interval in which the true age is estimated to lie.
An example usage is below. The input file is called "SpeedDate_distances.csv", the output plot is given the name "myplot", the plot is written out in PNG format, and the width and height of the plot are set to 50 and 45 cm.
julia -e"using SpeedDate; SpeedDate.main()" plot SpeedDate_distances.csv testplot --backend png --width 50.0 --height 45.0