Redefining Precision: Introducing Receptor.AI’s 5-Level Compound Selectivity Platform

A structured approach to differentiating highly similar protein variants in drug discovery

Japanese version

Redefining Precision: Introducing Receptor.AI’s 5-Level Compound Selectivity Platform

A structured approach to differentiating highly similar protein variants in drug discovery

PARTNERSHIP
Announcement

Japanese version

Summary

Full Text

The ability to discriminate between several highly similar protein targets is crucial for modern precision and individualised medicine. Modern drugs are expected to be laser-focused on disease-related protein variants, which are specific to a particular tumour, tissue, cell type or the specific patient genotype while having no adverse off-target effects.

In order to achieve the goal of ultra-selectivity to similar protein variants, we created a unique set of technologies with five levels of selectivity prediction for the candidate compounds. The prediction accuracy increases on each level, so they are applied consecutively in the pipeline in concert with decreasing number of candidate compounds on each level.

‍

‍

This stack of technologies is incorporated into the Receptor.AI drug discovery platform and is available for any protein for which target and off-target variants could be determined.

Our platform is able to design ultra-selective small molecules for active and allosteric sites alike and operates even for the most challenging targets which lack known ligands or have poorly resolved structures.

In order to test our platform, we selected subsets of highly similar proteins from two most popular families of drug targets: Janus tyrosine kinases (JAKs) and kinases of Fibroblast growth factor receptors (FGFRs). These proteins are involved in a multitude of diseases, from cancer to cardiovascular diseases, inflammation and metabolic disorders. There is a large number of known ligands with reliable activity data for these proteins, which allows us to perform comprehensive and unbiased benchmarking of our technologies.

Five levels of selectivity prediction in a nutshell

Level 1: proteome-wide ranking

After initial virtual screening, the drug-target interaction AI model is applied to about ~100K selected compounds in order to get their interaction scores with all ~9.3k proteins present in our platform. The selectivity rank is computed, which is the number of proteins that are more likely to interact with a given compound than the target protein.

‍

‍

Level 2: explicit screening against off-targets

Independent virtual screenings against all explicitly defined off-target protein variants are performed. All compounds, which are selective to any of the off-targets, are discarded.

Level 3: AI-based prioritisation of protein sequence differences

A dedicated AI model based on sequence super-label architecture leverages and prioritises the differences between highly similar proteins on the level of protein sequences.

‍

‍

Level 4: AI-based prioritisation of protein structure differences on an atomic level

A set of dedicated AI models prioritises differences between highly similar proteins on the atomic level:

Structure super-label DTI architecture
Voxel-based selectivity model
Enhanced AI-assisted docking
Morphing transformer model

‍

‍

Level 5: Selectivity against lipid membranes (for cancer-related membrane targets)

Additional selectivity is achieved by designing candidate compounds as membranotropic drugs, which could accumulate selectively in the cancer cell membranes and be taken up preferentially by them. A unique AI model of drug-membrane interaction is used to search for such compounds.

The proof of the principle for such targeting was recently published.

‍

‍

Benchmarks

Target characterisation

The selectivity of compounds was determined separately against JAKs — JAK1, JAK2, JAK3, TYK2 and against FGFRs — FGFR1, FGFR2, FGFR3, FGFR4.

The overall sequence identity between the family members is rather small in the case of JAKs (40–60%) and much higher in the case of FGFRs (75–90%) (Fig. 1).

‍

**Figure. 1.** Sequence identity matrices of JAK (top) and FGFR (bottom) proteins used in the study.

‍

Despite these differences, both families could be characterized as highly similar proteins when comparing their functional binding pockets.

The active site of JAKs contains 15 functionally important residues, but only 3 of them are variable, while the rest is either identical or highly conservative. These residues are shown in Fig. 2 and 3.

‍

**Figure 2.** Sequence alignment of JAKs used in this study.

**Figure 3.** Structural alignment of JAKs used in this study. Sidechains of variable residues are shown.

‍

In the case of FGFRs, the predicted binding pocket is rather large and contains about ~50 residues altogether, but only 2 of them are variable, while the rest are either identical or highly conservative (Fig. 4–5).

‍

**Figure 4.** Sequence alignment of FGFRs used in this study.

**Figure 5.** Structural alignment of FGFRs used in this study. Sidechains of variable residues are shown.

‍

Our selectivity prediction technique emphasises the differences in a few variable residues automatically based on sequence and structural similarity between target and off-target proteins.

Compound dataset used in benchmark

6830 compounds with known activity against the JAKs family and 4016 against FGFRs were taken from the ChEMBL database. All these compounds have activity against at least two kinases from the corresponding family.

The number of selective/non-selective compounds for each kinase according to this criterion in available experimental data is shown in Table 1.

‍

**Table 1.** The number of selective/non-selective compounds for each pair of main target and off-target.

‍

It is clearly seen that the dataset is significantly skewed. The largest number of compounds is found for JAK1, which is the most commonly used as a primary drug target in its family. This creates an inevitable bias towards JAK1 in the AI model. As it is shown in the results section below, the best results are indeed obtained for JAK1 due to the larger number of compounds available for training, while the results for JAK2, JAK3 and TYK2 are expectably worse because of insufficient training data.

For FGFRs the number of selective compounds is much smaller in general. The best coverage is observed for FGFR4, followed by FGFR1. FGRF1 and FGFR2 have very few selective compounds, which are barely enough for reliable benchmarking. However, even on such low-quality datasets, our models are providing good performance and consistent results, as it is evident from the results below.

In order to establish a reliable measure of compound selectivity, we plotted the ratio of the number of selective to non-selective compounds as a function of their experimental activity ratio (Figure 6).

‍

**Figure 6.** The plot Selective / Non-selective versus Activity ratio: left — for JAKs, right — for FGFRs.

‍

For FGFRs, the number of selective compounds becomes almost constant, starting from an activity ratio of 5. For JAKs, the general picture is similar except for JAK1, which shows a more gradual decrease. In general, an activity ratio of ~5 is a good threshold for establishing the compound’s selectivity for both kinase families.

Level 1: Drug-target interaction proteome-wide rank

‍

‍

Benchmarking procedure

A series of pairwise comparisons were performed when each kinase in a family was set as a target and all the rest as off-targets. A compound that is >= 5 times more active on the target kinase than on the off-target one was considered selective to the target.

The consensus scores of all compounds were computed using the Receptor.AI SaaS platform, setting each kinase consecutively as a target and the other one’s pair as off-targets. The pairwise differences between consensus scores were evaluated. If the difference is greater than the established cutoff, it indicates that the compound is selective and vice versa. This is a classical binary classification task, and accordingly, it is evaluated by the standard metrics for such problems. The main metrics are Matthews correlation and the F1 score, but a number of secondary statistical metrics were also computed. The metrics were computed for each pair separately. Then the metrics for all pairs were averaged, and the cumulative plots were built for each family.

The main performance metrics of selectivity prediction are shown in Table 2 (for JAKs) and Table 3 (for FGFRs). The Receiver Operator Characteristic curves for the selectivity prediction (averaged for all kinases in each family) are shown in figure 7.

Results

Main performance metrics

‍

For JAKs the most robust results are obtained for JAK1 due to the largest number of data points available for this protein. For other JAKs, our model showed a worse distinction of false positives (precision values are worse) but a better percentage of correctly predicted selective compounds (recall values are better). It is remarkable that for TYK2, the model recognises almost all known selective ligands with an outstanding 0.99 recall value.

‍

**Table 3.** Main performance metrics of selectivity prediction for FGFRs.

‍

For FGFR2 the similar trend is observed. The smallest number of selective compounds is available, and almost all of them are predicted as such (recall — 0.9). However, the model predicts a significant number of false positives (precision — 0.13). For FGFR3, a comparable number of selective compounds is available, but the recall value is much worse (0.29), while the precision is much better (0.4). This indicates that although the model behaves differently on the targets with such a small number of data points, it still manages to balance the performance metrics in a consistent and predictable way.

‍

Receiver Operator Characteristics and Enrichment Plots

‍

**Figure 7.** The **Receiver Operator Characteristic curve** for the selectivity prediction (averaged for all kinases): yellow curve — for JAKs, blue — for FGFRs.

‍

Our model provides very good average characteristics in distinguishing true positives from false positives. The best effectiveness is achieved for the top 20% of samples in the case of JAKs and for the top 10% for FGFRs. The general AUC is 0.76 in both cases, which is a very good value taking into account the data quality.

Comments on performance metrics consistency

The data for FGFRs is, in general, less abundant. FGFR2 and FGFR3 have only 22 and 35 selective compounds, respectively, which is on the lower boundary of the dataset size, which could be used for reliable benchmarking for this type of model. As it is evident from the results, this lack of data leads to some fluctuations in performance metrics, which is an expected behaviour.

For FDFRs, where very few selective ligands are present, each false negative (selective ligand, which was not found by our model) has more influence on the major metrics (F1 and MCC) than each false positive (non-selective ligand is incorrectly recognised as selective). That is why secondary metrics, such as recall and specificity, should also be used to evaluate the performance.

Specificity is the ratio of prediction of non-selective compounds, while recall is the ratio of prediction of selective compounds. Specificity values are very high for all proteins (around 0.9). This is very important for the virtual screening tasks because this means that the model correctly filters out the vast majority of non-selective compounds and doesn’t pollute the results with false positives. This makes subsequent experimental validation more effective and less costly.

The average recall values vary between 0.68 for JAKs to 0.58 for FGFRs. Both values are very good, taking into account the limited size of the dataset and the small number of positive samples.

Accuracy value provides a general ratio of correctly found selective and non-selective compounds to the total number of compounds. Accuracy values are also very good for all tested protein pairs, which confirms the general robustness of our approach.

Conclusions

There is a good overall performance and a good balance between false positives and false negatives for both kinase families. The kinases in both JAK and FGFR families are very similar structurally, so such results confirm that our technique could discriminate between highly similar proteins effectively. The average metrics for both kinase families are shown in Table 4.

‍

**Table 4.** Average performance metrics of selectivity prediction for JAKs and FGFRs.

‍

On average, the system predicts true selective compounds quite well (Recall 0.63) and discriminates non-selective compounds perfectly (NPV 0.9, Accuracy 0.855). That is, if a compound is non-selective, it is reliably identified and removed from further evaluation. 63% of the selective compounds are identified correctly. When applying the error of Accuracy, it turns out that 47% of compounds, which would be passed to experimental validation in a hypothetical experiment, are selective, which is an excellent ratio.

It is necessary to emphasise that the selectivity benchmark presented here is inevitably limited by the quality of testing dataset of compounds with known selectivity against different kinases. Particularly, there is pronounced bias in the number of available compounds for JAK1 in comparison to other kinases, which is caused by the popularity of this target protein. It is also evident that experimental activity estimates often differ in different pairs of proteins, which makes direct comparisons of the pairs less reliable. Taking into account these issues, the selectivity prediction shown by our technique could be qualified as very good.

Level 2: Explicit virtual screening against multiple off-targets

Level 2 of selectivity prediction is full screening against explicitly defined off-targets with exclusion of all compounds, which are selective against off-targets according to the difference between corresponding consensus scores.