Schematic diagram of the prediction process (IMAGE)
Caption
Schematic diagram of the prediction process: Protein databases provide a dataset with 8,500 experimentally validated transporter-substrate pairs to train the model (top). Transport proteins comprise a sequence of amino acids, which have been converted into vectors by a deep learning model (centre left, in different shades of green). Information about potential substrates is also converted into numerical vectors (centre right, in different shades of yellow). These vectors train a so-called gradient boosting model (ensemble of multiple decision trees) to predict whether the molecule is a substrate for a specific transport protein (bottom). (Fig.: HHU/Alexander Kroll)
Credit
HHU/Alexander Kroll
Usage Restrictions
No restrictions.
License
Public Domain