This is a development version of WU-BLAST (Washington University BLAST) 2.0
software for rapid and sensitive similarity searches of protein and nucleotide
sequence databases.  WU-BLAST executables for several UNIX platforms can be
downloaded from ftp://blast.wustl.edu/blast/executables

DISCLAIMER:  THIS SOFTWARE IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND.

WU-BLAST 2 should not be confused with version 2 of the NCBI's Network BLAST
software, which contains a significant change in the network communications
protocol used by the NCBI servers but not in the search algorithm or statistics
that are used.

A reverse chronological list of changes is available in the HISTORY file.

Please send suggestions to gish@watson.wustl.edu

Note:  WU-BLAST 2.0 alpha software is copyrighted and may not be sold or
redistributed without the express written consent of the author; but the
software may otherwise be freely used for commercial, nonprofit, and academic
purposes.


The principal new features in WU-BLAST version 2 are:

o Gapped alignments are produced, with potentially multiple regions of
similarity being found between each pair of sequences.  In WU-BLAST 2.0,
the gapped alignment routines are integral to the database search itself,
not a post-processing step grafted onto an old BLAST version 1.4 search,
and thus yield better sensitivity.  Each of the version 2.0 programs with gaps
executes about 10% slower than its version 1.4 counterpart, but generally
yields more easily interpretable output and much better sensitivity than
version 1.4.

o Karlin and Altschul (1993) "Sum statistics" are used to evaluate the
significance of multiple regions of similarity found between the query and a
database sequence, as described by Altschul and Gish (1996).


New command line options include the following.  Terse program usage
information can also be obtained by entering one of the program names
on the command line without arguments.

Q=#       the penalty for a gap of length 1 (default Q=9)

R=#       the per-residue penalty for extending a gap (default R=2)

nogap     do not create gapped alignments, in essence reverting
          to WU-BLAST 1.4 behavior.

gapall    generate a gapped alignment for every HSP found

gape=<e>  generate gapped alignments for all HSPs between sequences
          whose expected frequency of chance occurrence is less than or
          equal to <e>.  Default value is gape=2000.

gapw=<w>  set the window width within which gapped alignments are generated
          (default is gapw=32 for protein comparisons, gapw=16 for BLASTN).

gapK=<k>  the value of the Karlin-Altschul statistics' K parameter to use
          when evaluating the significance of gapped alignment scores.
          (Useful when precomputed values are unavailable for the chosen
          scoring matrix and gap penalty combination in the programs'
          internal tables).

gapL=<l>  the value of the Karlin-Altschul statistics' lambda parameter to use
          when evaluating the significance of gapped alignment scores

gapH=<h>  the value of the Karlin-Altschul statistics' H parameter to use
          when evaluating the significance of gapped alignment scores

noseqs    produces greatly abbreviated output that omits sequence alignments
          and yet may be interpreted correctly by existing parsers.

compat1.4 produces BLAST version 1.4-style output (no gaps) but with bug fixes
          and performance enhancements in place.

hspsepqmax   max. permitted distance along the query sequence separating two
             consistent HSPs
hspsepsmax   max. permitted distance along the subject (database) sequence
             between two consistent HSPs
gapsepqmax   max. permitted distance on the query sequence between two
             consistent gapped alignments
gapsepsmax   max. permitted distance on the subject sequence between two
             consistent gapped alignments

mmio     turns off the use of memory-mapped I/O in the reading of database
         files.  Use of this option will typically retard the search --
         particularly when multiple processors are being used -- but serves
         both to demonstrate the effectiveness of this form of I/O and to
         validate the I/O routines.


o In WU-BLAST 2.0, the BLASTDB environment variable is a colon-delimited list
of directory names.  In UNIX parlance, it is a path.  The default BLASTDB value
is ".:/usr/ncbi/blast/db", such that the programs first look in the current
working directory (".") for the requested database, then they look in the
"/usr/ncbi/blast/db" directory.  For backwards compatibility with programs that
expect BLASTDB to be a single directory specification, not a path, if the user
has set a value for BLASTDB but omitted the current working directory, the
version 2 programs look in the current working directory as a last resort.



BUGS

o Parameters lambda, K and H for gapped alignments are obtained by looking up
their values in precomputed tables, not by finding solutions to analytical
equations as is done for ungapped alignments.  Thus, values are not available
for all scoring matrix and gap penalty combinations.  When appropriate values
are unavailable in the precomputed tables, the programs issue a WARNING
and proceed to execute the database search using incorrect values; in such
cases, the statistical significance estimates reported will usually be highly
inaccurate.  If the user happens to know more appropriate values, then the
gapK, gapL and gapH parameters should be used to set them.

o When the user selects an alternative scoring matrix, the gap penalties Q and
R remain unchanged from their default values (unless otherwise specified).
This can inadvertantly yield a situation in which the programs do not have
appropriate values of lambda, K and H in their precomputed tables.  As
described above, a WARNING message will indicate such situations.

o the "hspsepqmax", "gapsepqmax", etc. parameters are measures of distance in
residues along the sequences in the specific form in which they are compared.
For instance, in a BLASTX search (conceptually translated nt. query sequence
compared against a protein sequence database), hspsepqmax refers to a distance
measured in amino acid residues, not the underlying nucleotides in the query.

o ASN.1 formatted output is currently broken.


References

Altschul, SF, and W Gish (1996).  Local alignment statistics.
ed. R. Doolittle.  Methods in Enzymology 266:460-480.

Karlin, S, and SF Altschul (1993).  Applications and statistics
for multiple high-scoring segments in molecular sequences.
Proc. Natl. Acad. Sci. 90:5873-7.

