文摘
High throughput screening (HTS) data is often noisy, containing both false positives and negatives. Thus,careful triaging and prioritization of the primary hit list can save time and money by identifying potentialfalse positives before incurring the expense of followup. Of particular concern are cell-based reporter geneassays (RGAs) where the number of hits may be prohibitively high to be scrutinized manually for weedingout erroneous data. Based on statistical models built from chemical structures of 650 000 compounds testedin RGAs, we created "frequent hitter" models that make it possible to prioritize potential false positives.Furthermore, we followed up the frequent hitter evaluation with chemical structure based in silico targetpredictions to hypothesize a mechanism for the observed "off target" response. It was observed that thepredicted cellular targets for the frequent hitters were known to be associated with undesirable effects suchas cytotoxicity. More specifically, the most frequently predicted targets relate to apoptosis and celldifferentiation, including kinases, topoisomerases, and protein phosphatases. The mechanism-based frequenthitter hypothesis was tested using 160 additional druglike compounds predicted by the model to be nonspecificactives in RGAs. This validation was successful (showing a 50% hit rate compared to a normal hit rate aslow as 2%), and it demonstrates the power of computational models toward understanding complex relationsbetween chemical structure and biological function.