This segment had a sizeable share in the market owing to reduced time for processing data, identification of sequence similarity, motif finding, and phylogenetic analysis.
Potentially novel Bartonella phylogroups were identified as having <96% sequence similarity with all publicly available sequences in GenBank (4).
The sequences were assigned to individual operational taxonomic units (OTUs) based on a sequence similarity of at least 97%.
The observed homology could not be identified by other means because the proteins have diverged to the point where no remnants of sequence similarity are left, yet the tertiary and quaternary organization is the same.
These markers include adenylate cyclase 3 (ADCY3); diacylglycerol kinase, alpha (DGKA); family with sequence similarity 46, member A (FAM46A); immunoglobulin superfamily, member 4 (IGSF4), which also is known as cell adhesion molecule 1 (CADM1); KIAA1539; myristoylated alanine-rich protein kinase C substrate (MARCKS); proteasome activator subunit 1 (PSME1); Ras association and pleckstrin homology domains 1, also known as LPD (RAPH1); and intracellular Toll-like receptor 7 (TLR7).
Interestingly, EBV miRNAs did not share any sequence similarity with cellular miRNAs, which indicated that they were not acquired from the host but rather that evolution had shaped some of the EBV transcripts so that the host miRNA biogenesis machinery could process them.
Assembled contigs compared against Gen Bank using BLASTn showed that all hits were below the 97% sequence similarity threshold for Hormiphora, Pleurobrachia, Mnemiopsis, Bolinopsis and Lampocteis.
The abundance of each index subcategory was calculated using different sequence similarity thresholds in order to generate a distribution of values for each subcategory and to determine how these thresholds affect data interpretation (Table 2).
KLAST is a new sequence similarity search tool which builds on a highly optimized implementation of the PLAST algorithm published in BMC Bioinformatics.
The first description of a sequence similarity search method that allows insertions, deletions, and gaps was published in [Needleman, 1970] where a computer program for finding similarities in the amino acid sequences of two proteins was developed.
This has been especially useful for classes above the species level, for example, a genus could be defined by species with 95% sequence similarity (Ludwig et al.
Major projects developed by SciDM Group are: SciDM DBMS - high performance zero-maintenance object-relational NoSQL database engine; EMBEDB - embedded data access library serving as a back-end for SciDM DBMS (open source); QSimScan - ultra-high speed DNA and protein sequence similarity search tool (open source); Transcriptomics pipeline - integrated solution for EST and RNAseq data analysis.

