While it has been known that human protein kinases mediate most signal transductions in cells and their dysfunction can result in inflammatory diseases and cancers, it remains a challenge to find effective kinase inhibitors as drugs for these diseases. One major challenge is the compensatory up regulation of related kinases following some critical kinase inhibition. To circumvent the compensatory effect, it is desirable to have inhibitors that inhibit all the kinases belonging to the same family, instead of targeting only a few kinases. However, finding inhibitors that target a whole kinase family is laborious and time-consuming in wet lab.
In this paper, they present a computational approach taking advantage of interpretable deep learning models to address this challenge. Specifically, they firstly collected 9,037 inhibitor bioassay results (with 3991 active and 5046 inactive pairs) for eight kinase families (including EGFR, Jak, GSK, CLK, PIM, PKD, Akt and PKG) from the ChEMBL25 Database and the Metz Kinase Profiling Data. They generated 238 binary moiety features for each inhibitor, and used the features as input to train eight deep neural networks (DNN) models to predict whether an inhibitor is active for each kinase family. They then employed the SHapley Additive exPlanations (SHAP) to analyze the importance of each moiety feature in each classification model, identifying moieties that are in the common kinase hinge sites across the eight kinase families, as well as moieties that are specific to some kinase families. Furthermore, they finally validated these identified moieties using experimental crystal structures to reveal their functional importance in kinase inhibition.
Clustering of features of eight kinase family inhibitors by Pearson’s rr. a Hierarchical clustering using Pearson’s rr with the top 15 moiety features of each family, a combined total number of 44 moiety features. The color shed of the features indicate their SHAP importance score for each kinase family. Among the moiety features, they outline f-224 and f-225 in orange, for they have high SHAP scores for most of the eight families. They also refer to them as the common moieties. Moreover, the 9 family-specific moieties for EGFR are outlined in purple, and the 6 family-specific moieties for Akt are outlined in green. b The correlation coefficient between the features’ SHAP important scores and the odds ratios for four kinase families, one for each kinase group
With the SHAP methodology, they identified two common moieties for eight kinase families, 9 EGFR-specific moieties, and 6 Akt-specific moieties, that bear functional importance in kinase inhibition. Their result suggests that SHAP has the potential to help finding effective pan-kinase family inhibitors.
Protein kinases play important regulatory roles in cellular signal transduction including apoptosis, cell cycle progression, cytoskeleton rearrangement, differentiation, development, immune response, nervous system function, and transcription. When kinase pathways are dysregulated, a variety of diseases occur, including diabetes and autoimmune diseases, inflammatory diseases, nervous disorders and cancer. Therefore, protein kinases have been one of the most important drug targets in recent years, accounting for a quarter of all current drug development working. Up to date, 61 small molecule protein kinase inhibitors have been approved by the US FDA [3]. Most of these inhibitors only target a few specific protein kinases. However, previous studies on cancer clinical treatment have pointed out that inhibiting only a single kinase can easily lead to compensatory upregulation of other cancer pathways, and in turn reduce the effectiveness of the cancer treatment. Besides, statistics from the protein–protein interactions networks have indicated that kinases that belong to the same family are highly co-regulated in related cancer pathway. Hence, inhibition of a whole kinase family can significantly improve therapeutic efficacies. Yet, finding inhibitors that target a specific kinase through experimental profiling is time consuming and laborious in wet lab, and this is even more so if they are to find inhibitors that target a whole kinase family. An efficient drug screening strategy for identifying pan-kinase family inhibitors will be a great contribution to drug discovery and the treatment of cancers and inflammatory diseases.
They present in this paper a data-driven deep learning approach that uses explainable deep neural networks to address this issue. Specifically, they derive a new kinase-inhibitor bioactivity dataset and use deep neural networks (DNN) to predict whether an inhibitor is active for each of the eight kinase families under consideration. They note that the deep learning methodology has been employed in chemo-informatics and medicinal chemistry to predict the efficacies of new active small molecule inhibitors. Research has also been done using random forest and DNNs for inhibitor prediction for single kinase, and convolutional neural networks (CNNs) for protein–ligand binding affinity prediction. For example, Rodríguez-Pérez and Bajorath showed that a DNN with three layers performs slightly better than alternative tested machine learning methods including support vector machine and random forest, for ligand-based prediction of the activity of 19,030 ligand compounds against 103 kinases. But, to the best of their knowledge, using DNNs to predict inhibitors for whole kinase families has not been attempted before.
Average percentage of occurrence of different groups of moieties in the ligand of a cocrystal structure of a kinase belonging to a certain kinase family in the PDB and the kinase family inhibitors set. Subfigures a and b show the average percentage of occurrence of the two common moieties, the top-15 moieties per kinase family, and the bottom-29 moieties (notated as Remainder) for the 4 kinase families from different groups in PDB and Kinase family inhibitors sets, respectively. Subfigures c and d highlight the average percentage of occurrence of the family-specific moieties for the EGFR and Akt kinase families, in PDB and Kinase family inhibitors sets, respectively
While deep learning can lead to accurate classifiers, deep learning models are often regarded as “black boxes” because of their high complexity, making interpretation of the model results difficult . This limits the practical applicability of deep learning in drug discovery research. For example, Vamathevan et al. discussed the issue that a typical issue with deep-trained neural networks is model interpretation and extraction of biological insights. To make the result interpretable, the use of explainable DNNs for the prediction of kinase inhibitors has been recently explored in, both using the SHapley Additive ExPlanations (SHAP) method, a game-theoretic approach that represents the state-of-the-art approach for explaining the output of machine learning models. Their work extends these existing work in that they build explainable DNNs for whole kinase families. They show that, with SHAP, they can quantify the contribution of each moiety feature of the inhibitors for the classification tasks, and in turn identify moieties that are more often used in designing inhibitors of the same kinase family. These moieties are called preference moieties in this paper.
The major task of this paper is therefore the construction of an interpretable DNN classification model for kinase family inhibitors. This involves (i) Building a novel kinase family inhibitor bioactivity dataset for the DNN model, (ii) Identifying 34 moieties and 204 Checkmol fingerprints as features for the inhibitors, (iii) Creating eight DNN models, one for each of the eight kinase families, and (iv) Inferring the preference moieties of inhibitors for each kinase family and the common moieties of inhibitors for all the eight kinase families using SHAP. They demonstrate that their approach can provide an efficient strategy for identifying and designing selective inhibitors targeting pan-kinase families.
Fan, YW., Liu, WH., Chen, YT. et al. Exploring kinase family inhibitors and their moiety preferences using deep SHapley additive exPlanations. BMC Bioinformatics23, 242 (2022). https://doi.org/10.1186/s12859-022-04760-5
Comments