Turning LCMS data into biological insight is essentially a well practiced operation, however inconsistent. Several tools exist to extract molecular information and re-construct protein sequences, but the processes vary from tool to tool. We provide a concise method for translating label-free proteomics into biological meaning which can be easily analyzed, searched and explored.Explore Your Data Discover The API Example: Human Proteome
Label-free proteomics analysis pipeline is the complete solution for transforming analytical measurements into biological observations.
Advances in LCMS have propelled proteomics towards the forefront of translational biology, approaching the breadth and complexity of large-scale whole genome analysis. However, proteomics remains less well understood due to the vast combinatorial possibilities inherent in protein sequences, post translational modifications, and genetic variation. The JAEVALEN Search Engine utilizes information retrieval, as opposed to all pairwise comparisons, to rank potential sequence matches by precomputing and indexing the entire search space, improving both search speed and statistical accuracy. Several advantages are noted, such as the size of the database has no theoretical limit on performance, and searching occurs on a per spectrum basis with observed recall times less than 100 milliseconds from a database consisting of nearly a billion peptides.
Database searching is accomplished by querying a set of precursor and fragment mass values for a given spectrum. Experimental spectra submitted for searches are noise filtered to contain peaks representing charge one monoisotopic values, and mass tolerances for both the precursor and fragments are set to 0.5 Daltons. Potential peptide matches are ranked by a Bayesian inference probabilistic formula, with a final discriminant score applied from a global model similar to current methods.
Few search engines carefully consider the spectral processing component and rely on fast assumptive filtering methods. Jaevalen is foremost an information retrieval system built around mass spectrometry and proteomics, hence the spectral data used can originate from anything like simple filtering to complex isotopic determination. It has been demonstrated that the rank one peptide sequence determined from six fragment values and a single precursor value has a PPV of 0.95.
The API for JAEVALEN Search allows for the interrogation of single spectra to identify the sequence from a large data base consisting of all known consensus sequences and known variants together with a multitude of post translational modifications.
The homology engine is geared towards proteomics, as opposed to genomics (i.e. blastp) that use genetic-based substitution matrices and scoring statistics that typically hide highly redundant sequences. In addition this engine is significantly faster than any current methods. The API for JVLN Homology allows for the Single Amino Acid Polymorphism (SAP) retrieval of any given amino-acid combination, with special consideration for proteotypic sequences.
With a pre-computed data of nearly a billion peptides the traditional target-decoy approach is not feasible for two specific reasons; its impractical to double the search space and near impossible to maintain the natural frequency and arrangement of amino acids assembled into target-like decoy sequences. Furthermore, given that the entire search space is pre-computed it can be determined that some peptides can be identified by a few fragments while others have high isospectral overlap and may never be disambiguated.
Due to the high frequency of homology an innovative new algorithm has been developed to aggregate peptides into homologous protein groups. In an iterative process the peptide pool is then divided out to protein groups based on highest observational frequency. Final protein reporting is then accomplished by having a single protein representing the group through a user specified source ranking, for instance UniProt, then UniRef then TrEMBL.