This can be conceptualized CYT387 purchase as a selleck products clustering problem. The general idea behind clustering is that each element in a given cluster should be similar to other elements in the same cluster, but dissimilar to elements from other clusters. In the context of taxonomy and protein content, the clustering of a given species could be considered sound if two criteria are satisfied: first, members of the species are similar to each other (i.e. have a large core proteome); second, they are distinct from other
organisms (i.e. have many proteins found only in that species). To determine whether existing taxonomic classifications fit these criteria, we answered the following two questions. First, is the core proteome of a particular species having N I sequenced isolates larger than the core proteome of N I randomly selected organisms from the same genus? Second, is the number of proteins that are found in all N I isolates of a given species, but none of the other organisms from the same genus (i.e. unique proteins), larger than the number of proteins found in N I randomly selected isolates of that genus, but no others? The rationale behind asking these questions is that one would expect the isolates of a given species to have a larger core proteome and unique proteome than randomly selected sets of isolates from the same genus. Thus, a PRN1371 supplier “”yes”" answer to each of the above questions
would support the species’ current taxonomic classification. In contrast, “”no”" answers
to one or both questions would suggest that the species does not fit the clustering criteria given above, and its taxonomic classification may therefore warrant reexamination. The following describes only the methodology used to address the first question; however, the methodology used to answer the second question was analogous, Etofibrate and is briefly described in the final paragraph of this section. Once again, let N I be the number of isolates that have been sequenced for a particular species S. The following methodology was performed for each species from the genera used in this study that had at least two isolates sequenced. First, a set of N I isolates from the same genus as S was randomly selected. Each random isolate was allowed to be from any species from the same genus as S; they were not limited to the species meeting the “”at least two isolates sequenced”" requirement. This set was examined to ensure that its members were not all from the same species. For instance, when generating random sets of two organisms each corresponding to the two B. thuringiensis isolates (N I = 2), a random set containing both B. thuringiensis isolates would have been disallowed, as would a random set containing two B. anthracis isolates. However, a random set containing one B. thuringiensis isolate and one B. anthracis would have been valid.