SpotRM+ – potential reactive metabolite formations – batch analysis in Knime

Modelling and prediction of toxicity of drug compounds has been, is, and will be be a continuous area of interest. I won’t go into the detailed literature of this, here, I want to focus on SpotRM+’s contribution to that field:
This methodology focuses on reactive metabolite formation and avoidance as a means to reduce structure based toxicity issues. In addition, it is a computationally cheap method since it is solely based on SMARTS, carefully hand-curated ones at that. In addition to identifying certain structural features, SpotRM+ delivers one to three page monographs on the marketed (or withdrawn) reference compounds including mechanistic summaries. So it is more about learning than pure black box filtration.

SpotRM+ requires Bioclipse, a platform which has chemical data-mining in its focus. There is a disadvantage to this package – you can only run and analyze one compound at a time, batch mode isn’t possible.
According to the company Awametox AB, the batch mode analysis is a feature requested by a number of customers, e.g. for design/synthesis prioritization. And yes, it is possible – IF you use script based or workflow based tools with one of the simpler ones being Knime. For this, you require access to the SpotRM+ database itself and the standard chemistry mining nodes in Knime.
[note that SpotRM+ is a commercial package, though there is a free demo available; both are based on Bioclipse. For the mining suggested here you need the database itself which can be purchased separately]

One of the drawbacks of the database and the SpotRM+ system with regards to batch analysis is that it isn’t really designed for batch analysis. The readout usually consists of a traffic light colouring system of reference compounds and links to their analysis monographs. Thus, for batch mode to work, you need to ask what you desire of it -e.g.

  • Is a single “red” or “green” reference hit sufficient?
  • Do you want to summarize all the reference hits?
  • Combine with other data for further calculations?

In principle, anything goes, that’s the beauty of the flexibility of a package such as Knime. But, would that be sufficient for you to make a decision? I can imagine that a batch based “high quality” decision should be possible, if you combine the output with, e.g., a model based on measured ADMET data (and/or reactive metabolite data).
Independent of the latter, a basic workflow could look simply like this:

You can find more info and access to mentioned programs here:
SpotRM+: (bioclipse included; recently updated to V1.2!)
Bioclipse: (mainly for info, not required to download separately)

Knime External Tool – OCR of structures

Knime is a fantastic tool to create automation of handling data without the need for programming. Although, the more complicated the data becomes the better it is to have or to acquire knowledge with regards to Java, XML, SQL, etc.

Some limitations do exist with Knime, again depending on your data and the end-result you require. In certain situations the need to use an external program might arise.

Now, there is an “External Tool” node in Knime, though officially it is designed for usage in a Unix type environment. But, with some tweaking it does work under Windows!

The example I will give here will include chemical structure recognition stemming from pictures, i.e. OCR.

As it often as is with Knime (or programming in general), there are always multiple ways to solve a problem. This here is simply one solution.   

What this workflow does:

Read a directory of PNG (preferred) picture files of structures, even reactions, and converts them to smiles files. The workflow creates a DOS-batch file with the osra commands which gets executed by the external tool node.

To obtain best quality and to know about the recognition limitations, you should read about the OSRA tool via the links below.

What you will need:

  • OSRA – Optical Structure Recognition Application. I use V2.0.0. This app is freeware if you can compile yourself, otherwise you can purchase (and support the programmers) for a modest fee the compiled version. (Latest is V2.1.0) (though I think this page isn’t updated anymore?)
  • Knime, preferably latest Version >= 3.3.x (though it should run with any V3; originally it was developed in V2.x, so it should run there as well, though I can’t test it anymore at this point).
  • Nodes in Knime: Standard installation, including:
    NGS tools [I like using their “Wait” node]
    Erlwood Nodes [used for chemistry part]

In order for this workflow to work properly, you will require following files in following places – (this has to do with the fact that some nodes, as e.g. the external node tool, can’t be opened/executed for testing if certain entered data isn’t available). Thus it’s easier if you copy enclosed, alt. create (empty notepad files would be sufficient) as described:

  • OSRA in following location:
    C:\osra\V2.0.0\osra-bin.exe [I don’t make usage of %PATH% and the batch file included in OSRA distribution]
  • Additional files/folders:
    C:\osra\donotdelete\extToollGreen.txt [can be emtpy; used for giving the “clear” sign when node is done]
    C:\osra\donotdelete\ignore_me.txt [emtpy]
    C:\osra\donotdelete\ignore_me1.txt [emtpy]
    C:\osra\donotdelete\ignore_me2.txt [can be emtpy; echos the cmd line output, can be ignored, potentially parsed, I don’t]
  • Input/Output location:
    C:\test\ [currently; may be anywhere else, it includes your images and the resulting structures]

The first metanode reads the name of the picture files and creates an executable batch file called by the external tool node. [open the picture in a separate window to view full size]

The tricky part is the external tool node, should you do a full reset and not have all the necessary files in place, you won’t be able to open it and do a comparison of the set-up.

And here the flow-variables:

The remaining portions are less tricky and it is a matter of taste what you want to do with the obtained files. In my second metanode, I make a list of all the smiles files (OSRA creates one output file per one input file) and combine it with the original inputfile (resp. filename). [open the picture in a separate window to view full size]

Finally, in the third node, the structure is drawn out to have a visual comparison to the picture input. [open the picture in a separate window to view full size]

After that, it is up to you what you want to do with the results.

A zip-file, containing the workflow and mentioned text files and folder structure may be downloaded from this link. Some examples of varying quality graphics are included.

CDD – Sharing Experiences in Life Science Research in Solna, 9th of March

Today I had the opportunity to be at a meeting organized by the SciLifeLab, Stockholm/Uppsala and Collaborative Drug Disovery (resp. CDD Vault).

Some nice presentations and interesting people were present and there were number of interesting discussions! For example, Konrad Koehler from Immunscape AB presented some details of their portfolio and how and why they integrate CDD Vault in their research.

Another interesting feature presented was the free accessible CDD BioAssayExpress service, a curation effort to create a standard within protocol description and data-mining with such. Worth having a look at!

Check here, if you want to know more about the SciLifeLab organization and here if you want to know more about CDD and their main product, CDD-Vault.

#CDD #Scilifelab


Been having a few problems with the site, thus development isn’t going as expected, but should be resolved soon.

Aside from the web-page structure, I am working on an article on “Sugars in diets” (not mining related, indeed) and how to use the external tool in Knime in a Windows environment.