Part 1: What disease should I research @home? Zika-Virus as example

To dabble with basic Science@home is fun – though probably only up until the question arises “what to actually research about”?

This leads to the question on how to find or decide on any disease to start with. Since this is a rather entangled question (or rather, the answers can be), I will offer three of the simplest answers:

  • Choose whichever disease you are curious about (or have a relation to)
  • Pick a particular target that you heard (know) of and are interested in
  • Take from current news a “hot-topic” disease/target

This might sound somewhat naïve, but can be rather relevant and is used by many researchers within pharmaceutical development at least as part of the starting point. As example, my friend Fernando from InOutScience and I have been considering the Zika-Virus ourselves (hence, I will use “us/we” for the remainder of this blog series). Myself, I stumbled upon this due to the news last year (and a family interest, if you will).

What is Zika?

Zika hit the world news last year after an outbreak of epidemic proportions in South America. That the world took notice at all was (as usual?) down to economics. The 2016 Summer Olympics in Brazil and the spread to southern parts of USA.  It then though nearly equally declined by the end of the year as fast as it appeared earlier. The reasons for this still seems to be unclear for epidemiologists.

mosquitop

The virus itself is a mosquito-borne virus transmitted by Aedes mosquitoes. It leads to harmless symptoms, the most common ones’ headache, muscle and joint pain, mild fever, rash, and inflammation of the underside of the eyelid.
But: What brought this virus into the limelight is the fact that when transferred to pregnant women the fetus is at risk for birth defects!

The latter is the reason for efforts on trying to find treatments (otherwise, basic flue treatment seems to do the trick).

You can find nicely summarized facts at the World Health Organziation (WHO) webpage on Zika.

Unfortunately, as with any neglected disease (tropical ones fall most often into this category), there is no money to make in finding new medications (research & development costs versus what you can make from it….). Therefore it falls on some smaller companies as well as academic groups as major player researching these, as is the case with Zika.
You yourself can participate indirectly if you like via the WorldCommnunityGrid distributed computation project – see my blog entry here to see how. 

If you want to know more about Zika, please check out these links:

Now that we have a disease to research on – how to continue? Part 2 now available, please click here.

PS: Part of this blog series will be presented at the “ICIC 2017“, the International Conference on Trends for Scientific Information Professionals, Heidelberg, October 23-24. The presentation will be made by Fernando Huerta from InOutScience .

MolPress – Open source chemistry plugin for WordPress

So much to discover and to do – yet so little time.
Here e.g. is such a nifty thing that I will have to try out at some point, time or not since it fits the @home perspective perfectly:

MolPress is an open source chemistry plugin for WordPress.

One of my new colleagues, Alex Clark, who has been a bit longer than me in the blogosphere, is putting work into this. No need for me to reiterate what he can describe best himself – check out the Molpress page, or his blog:

Cheminformatics2.0 – MolPress

Now – to just find the time to integrate this 😀

 

Science@home for everyone – the quick and simple(st) way

Do you want to do contribute to research but don’t have the time/nerve/know-how for any kind of deeper involvement? Of course you want to 😀 !
And yes, it is possible! The answer is – distributed or volunteer computing!

This is not a new phenomena, it has been around for quite a long time now. One of the more know projects most likely is SETI@home, where you help analyze radio signals from space in the search for extra-terrestrial life.
Today, the field of distributed computing encompasses all kinds of research areas, including drug discovery. One of many summaries on this subject can be found on this blog by the OpenScientist and of course Wikipedia, on Volunteer Computing.

Thus, by allowing your computer to calculate on behalf of whatever research in question, you indirectly contribute to that project – without lifting a finger. The only thing you need to do is install a program, register yourself as user (for some you can even just run anonymously) with a tiny caveat that you also “contribute” with electricity. But hey – it’s for science, right? In addition, some projects include a fancy looking screen saver!
Don’t want to have your computer on all the time? Don’t want to be bothered while you are using your own machine? No problem, nearly all allow AFAIK several ways to restrict the client with regards to CPU/GPU usage or the time it may run or not.

Can’t decide what to contribute to? Want to contribute to multiple projects but not have multiple clients installed/have to keep track off? Then I can recommend the World Community Grid which supports out-of-the-box 7 different projects. And if I am not mistaken, with a wee bit of manual work, you can make the client run other projects via this client. And if you prefer doing something like this while playing a video-game, even that is possible, for example in EVE Online or FoldIt (these though require a bit more “work” requiring active inputs/analysis by the user and thus go beyond the idea of “simplistic” distributed computing).

Myself, I am supporting the OpenZika project, due to some personal interest in this field. Come join me and many, many others!

Click here to get started!
(Note: this includes my referral ID – don’t worry, there is no money involved, it simply gives out “badges” for “recruitment”. Use the above World Community Grid link instead if you don’t like this idea).

Awesome animation video on cancer and target Fractalkine

The enclosed video is a rather cool and awesomely made animation video of the biology of the Fractalkine receptor in context of cancer.

The reason I post this here are two-fold:

a) it’s awesome, we need more of those types of videos! Have I mentioned that it’s awesome?

b) I used to be involved in that target – and that particular compound during my time at AstraZeneca. Not so much as many of my former colleagues, but still, it does make one proud to see all that hard work come to fruition. Especially in today’s day and age of companies (smaller ones even more so) being careful about what goes into clinic or not and no longer “pump out” volumes of potentially questionable thing as it might have happened in the past.

Good luck to Kancera and hopefully it will benefit the (future) patients!

If you want to read about Fractalkine (Chemokine), you can find more information on e.g. Wikipedia: en.wikipedia.org/wiki/CX3CL1

Docking & virtual screening @home – preview

Way too many things have been happening lately, I didn’t have the time as I’d like to write new entries, one of them is the start of a new Job within the next few days 😀 [That’s a valid excuse, isn’t it?]

Anyway, a bit more complex and especially CPU/GPU heavy task is docking and receptor modelling. It depends though on what you think you want to do –

Do you just want to dock the occasional molecule(s), maybe make a nice picture, then you should be fine with a low-spec configuration as described in my post Part 1 of Drug Research @home . If you intend to do high throughput virtual screening of tens or even hundreds of thousands of compounds, you either have to have a lot of patience (in the range of days to weeks) or a lot of money for a cluster [I am not going into the possibility of using cloud-services (yet), though that would probably be an option as well].

The system I will describe is AutoDock, resp. Vina, the simplest and most “open-sourced” docking software, and combine it with other free tools for visualization, respectively preparation.

As time/computational reference: Docking a single molecule with Vina on an average modern i7 system takes ca 20-30 seconds. That’s ok for several hundreds at once. While I previously had access to Xeon based Linux Cluster, I screened 80k compounds on 12 CPUs in 10 or so days…. (well, it was a queue system shared with other users, though the way the system was set up it was more or less constantly calculation,).

Now, using Vina isn’t new and there are descriptions out there, but few deal (if at all) with automation. Furthermore, you have to pick bits and pieces from different places and combine them, which isn’t as obvious as one might think if you aren’t an expert (well, at least I don’t consider myself one in this particular field).

Until soon!

 

Abuse of open access tools and data?

As in a previous blog of mine described, it is rather simple to set up virtual compound design from the comfort of your home. Tools and data are easily accessible and hardware is cheap. Add to that a bit more hardware, maybe even a (garage) laboratory – it makes you wonder “What If”?

Is it possible that open access data is abused for criminal purposes, in particular recreational drugs? I recon it it would make sense (unfortunately) and I am sure there are more articles to be found other than the one I stumbled upon recently, dating back to 2013, by the Guardian. Though they don’t give any source or example for their (probably legitimate, imho) claim of what/were “clandestine” labs are.

Synthesis of known (recreational) drugs have been accessible since the days of Usenet newsgroups (seen them myself back in the days) and probably even BBSs. And then there is of course PhiKal, perhaps one of the main sources for Usenet/BBS in those days, before internet became bigger and easier accessible. With that know-how also follows a list of how to replace certain ingredients with household items/chemicals as replacement of otherwise only laboratory accessible items. It is so simple nowadays, a simple Google search will yield e.g. the recipe for crystal meth based on household chemicals; “Breaking Bad” in real life.

Combine the urge do to something like this with knowledge on pharmaceutical design and open access…..

Though as long as as so called designer drugs seem to be based on arbitrary testing of only slightly modified existing compounds – one of many examples fitting that picture seems to be acrylfentanyl – it doesn’t look like  open access is the culprit (yet).  It’s more the usual greed and stupidity with as fast, simple and cheap turn-over as possible – health and safety concerns have never been on the agenda. The only optimization probably is accessibility of starting materials. If there is anything valid to the above mentioned article, then of course the synthesis can go beyond your local garage and is done by “professionals” with expert equipment and chemicals. But hey, maybe I am naive and there are pro-labs doing all the typical design and test cycles as a pharmaceutical company would do…. Not that that is a good justification for illegal drugs.

It’s a rather scary thought – I am not sure what, if anything at all, can be done about this.

Perhaps the law-makers should start banning substances based on their pharmaceutical action, or generic structure (Markush like?), rather than one-by-one. I believe a similar problem exists in the area of sports & doping, were new “undetectable compounds” turn up faster than anyone has time to analyze and make new laws prohibiting previously identified ones.

I (obviously) can only recommend against any type of creating existing or new drugs – not only from a substance abuse of legal issue, but also from a plain health perspective – putting untested “shit” into your body will lead to – shitty results, plain and simple.  And if you are not a chemist doing “shit” in your garage, well, count on “shit” happening.

Drug research at home – (how) is that possible? – Part 2

Continuing on after part 1:

What to do with the tools

I’m assuming that you have (some) knowledge on how to search, what to look out for, a  workflow on the different steps required to do the job. It’s otherwise a topic on it’s own for another time. Not that it hasn’t been described before, alas, no, see just one example here:

Nicola, G., Berthold, M. R., Hedrick, M. P., & Gilson, M. K. (2015). Connecting proteins with drug-like compounds: Open source drug discovery workflows with BindingDB and KNIME. Database: The Journal of Biological Databases and Curation, 2015, bav087. https://doi.org/10.1093/database/bav087

Actual Compounds

So you identified something and want to test your hypothesis beyond in-silico. Well, that is a bit tougher – you can’t really handle and test compounds at home. Theoretically though,  you could have someone else do this part for you (order commercial compounds, synthesize something new, test in a biological assay). That is (unfortunately) not for free.

Though to obtain compounds ,if you are (or have connections to) academia or a (smaller) company, there are some interesting initiatives are available, such as within Malaria research by http://www.mmv.org/research-development/open-source-research/open-access-malaria-box, though now more broadly for pathogens at http://www.pathogenbox.org/. Then there are possibilities as described in the next section.

Once you think you have something

Actual testing aside (it never hurts), what can you do with those cool results? Well, there are a number of things – the simplest one would be: write a blog! More involved and scientifically more appropriate – at the same time more difficult – write a publication in a scientific journal or present at a scientific meeting. You could even try and patent your findings, if you have the finances. It all depends on the impact you want to have.

To go beyond a publication, if you want to be part of/follow your findings, you can contact some of the initiatives by pharmaceutical companies who are open to collaboration on new findings. For example,  Johnsson&Johnsson [jnjinnovation.com/partner-with-us], or AstraZeneca [openinnovation.astrazeneca.com], or the Medicines for Malaria Venture [www.mmv.org/partnering/our-partner-network] and many more. You can also find incubators within academia, but then you would require some contact to a research group within. The list of incubators/companies & universities is nowadays quite big and could be a topic for a separate blog entry.

If you are really in it for the money though, I think you will be disappointed. Doing drug research from home is more like a hobby just fun, in the best case though for the greater good. Having said that, should you really find something interesting and you contact any of the above mentioned initiatives, intellectual property and reimbursements will most likely be on the table at some point.

Now, start researching!

Drug research at home – (how) is that possible? – Part 1

In the current day and age of open access information, combined with cheap computing power, it is rather simple to do (some) drug research from the comfort of your home, be it as private person for fun or out of interest, or as a small (start-up) company. Actually, big pharma companies use some of the same resources combined with their own in-house data and programs as well – so why shouldn’t you?  

Where is this data? What kind of data?

There are a number of public- so called open access – databases available these days, curated over many years by high profile institutes, as e.g. the National Institute of Health, NIH for Pubchem.  Many more institutions and specific initiatives have evolved over many years, some appearing literally right now, depending on the field and data. Databases on chemical compounds, small molecules, have been around the longest, afik, with structure, properties, literature references and biological data associated.

Listing all of them would require an entire Wikipedia page (or more), and that work has already been done – you can find a substantial list here for example http://oad.simmons.edu/oadwiki/Data_repositories, though in terms of life science, on this NIH site, you can really knock yourself out: https://www.ncbi.nlm.nih.gov/guide/all/#databases_. The scientific literature has regularly some article on databases and software, as well as many blogs do, but that is outside of this scope.

More focused for our purpose of drug research, you have sites such as PubChem, BindingDBZinc, or e.g. GuideToPharmacology. I’d say with these you can get pretty far.  Curated from literature and also patents, these databases connect structures to biology, i.e. mechanism of action, structure of the target, how much is know about it (or not).  All sites and db-s are arranged differently, some you can search on the web, via an API, some by browsing, or a combination thereof. Then, there are also the semi-public databases, such as CDD-Vault – you can register and search within the public databases (all via the web, independent of your machine power), though you cannot download or batch process on the free account. It might still be worth a look at times considering you find data which is not in literature/patent based curated databases.

What will you need?

A certain understanding of the drug discovery process, chemistry and some degree of biology. If not yourself, then a good friend who might have that knowledge and can support you (though this seems like a unlikely scenario?). Some IT-skills certainly don’t hurt. Below I will focus on data-mining as the core task of the home research, methods such as docking or quantum mechanic calculations I will leave out for now.

Hardware
  • A(ny) computer – Windows, Linux, Mac – doesn’t matter.
    In my experience though when it comes to chemistry, the Windows platform still offers a broader range of both commercial and freeware programs .
  • How powerful?
    Simply put, also doesn’t matter. Sure, the more power, the smoother your experience, though for mining purpose I would go for more memory before processing power. An Intel i3 with (minimum) 16GB of RAM can get you pretty far with little money. Only for large data sets and more complicated calculations I feel this being a bit of a bottleneck. If you have an i7 or Xeon available, good for you!
    What about graphic cards? That actually doesn’t matter for data-mining and simple visualizations. Once you want to do some visual 3d-docking though, that’s another story.
  • An alternative, or even complimentary solution is a (powerful) workstation, placed “anywhere”, which could e.g. be shared with someone else sharing investment costs and then access it via any (simple) PC/Laptop via remote access, e.g. TeamViewer. Cloud computing@home so to say.
  • Reasonably fast internet connection  – for mining those web-services.
Software
  • Knime (available on all platforms) allowing for flexible, visual and fast development of search and analysis workflows.  Combined with some know-how on Java or XML and you have quite a powerful package. To start your journey, you can use some of the readily available (example) workflows before getting into details.
  • A chemical drawing program – there are a rather larger number out there, it is difficult to really make a good suggestion. Knime itself comes with a “myriad” of plugins for structural input and output, thus you actually don’t really need a separate program. Myself, I do have the free Marvin package by Chemaxon installed.
  • DataWarrior – a great package for visually guided “manual” mining, sort of “Spotfire light”, if you will.
  • Excel – or similar, can be used as light weight DataWarrior alternative, but also useful for sharing or storage (as would be Word or Powerpoint (and alternatives).
  • Scripting languages – R or Python – are not necessary, though they can make a good complement, depending on your requirements.
  • Java – also not necessary, but since Knime is built on Java, it sometimes can help for certain work-arounds.
  • XML, HTML, REST – some basics might be helpful when accessing certain services via network API.

What if you don’t know Java and such? Don’t fret, initially, I for example didn’t either. If you are though a person who is more of a “learning by doing”, then the knowledge will come automatically. Obviously, you can learn these in courses as well.

Continued in part 2.

 

SpotRM+ – potential reactive metabolite formations – batch analysis in Knime

Modelling and prediction of toxicity of drug compounds has been, is, and will be be a continuous area of interest. I won’t go into the detailed literature of this, here, I want to focus on SpotRM+’s contribution to that field:
This methodology focuses on reactive metabolite formation and avoidance as a means to reduce structure based toxicity issues. In addition, it is a computationally cheap method since it is solely based on SMARTS, carefully hand-curated ones at that. In addition to identifying certain structural features, SpotRM+ delivers one to three page monographs on the marketed (or withdrawn) reference compounds including mechanistic summaries. So it is more about learning than pure black box filtration.

SpotRM+ requires Bioclipse, a platform which has chemical data-mining in its focus. There is a disadvantage to this package – you can only run and analyze one compound at a time, batch mode isn’t possible.
According to the company Awametox AB, the batch mode analysis is a feature requested by a number of customers, e.g. for design/synthesis prioritization. And yes, it is possible – IF you use script based or workflow based tools with one of the simpler ones being Knime. For this, you require access to the SpotRM+ database itself and the standard chemistry mining nodes in Knime.
[note that SpotRM+ is a commercial package, though there is a free demo available; both are based on Bioclipse. For the mining suggested here you need the database itself which can be purchased separately]

One of the drawbacks of the database and the SpotRM+ system with regards to batch analysis is that it isn’t really designed for batch analysis. The readout usually consists of a traffic light colouring system of reference compounds and links to their analysis monographs. Thus, for batch mode to work, you need to ask what you desire of it -e.g.

  • Is a single “red” or “green” reference hit sufficient?
  • Do you want to summarize all the reference hits?
  • Combine with other data for further calculations?

In principle, anything goes, that’s the beauty of the flexibility of a package such as Knime. But, would that be sufficient for you to make a decision? I can imagine that a batch based “high quality” decision should be possible, if you combine the output with, e.g., a model based on measured ADMET data (and/or reactive metabolite data).
Independent of the latter, a basic workflow could look simply like this:


You can find more info and access to mentioned programs here:
SpotRM+:   www.awametox.com (bioclipse included; recently updated to V1.2!)
Bioclipse:   www.bioclipse.net (mainly for info, not required to download separately)
Knime:   www.knime.org

Knime External Tool – OCR of structures

Knime is a fantastic tool to create automation of handling data without the need for programming. Although, the more complicated the data becomes the better it is to have or to acquire knowledge with regards to Java, XML, SQL, etc.

Some limitations do exist with Knime, again depending on your data and the end-result you require. In certain situations the need to use an external program might arise.

Now, there is an “External Tool” node in Knime, though officially it is designed for usage in a Unix type environment. But, with some tweaking it does work under Windows!

The example I will give here will include chemical structure recognition stemming from pictures, i.e. OCR.

As it often as is with Knime (or programming in general), there are always multiple ways to solve a problem. This here is simply one solution.   

What this workflow does:

Read a directory of PNG (preferred) picture files of structures, even reactions, and converts them to smiles files. The workflow creates a DOS-batch file with the osra commands which gets executed by the external tool node.

To obtain best quality and to know about the recognition limitations, you should read about the OSRA tool via the links below.

What you will need:

  • OSRA – Optical Structure Recognition Application. I use V2.0.0. This app is freeware if you can compile yourself, otherwise you can purchase (and support the programmers) for a modest fee the compiled version.
    https://sourceforge.net/p/osra/wiki/Home/ (Latest is V2.1.0)
    https://cactus.nci.nih.gov/osra/ (though I think this page isn’t updated anymore?)
  • Knime, preferably latest Version >= 3.3.x (though it should run with any V3; originally it was developed in V2.x, so it should run there as well, though I can’t test it anymore at this point).
    https://www.knime.org/
  • Nodes in Knime: Standard installation, including:
    NGS tools [I like using their “Wait” node]
    Erlwood Nodes [used for chemistry part]

In order for this workflow to work properly, you will require following files in following places – (this has to do with the fact that some nodes, as e.g. the external node tool, can’t be opened/executed for testing if certain entered data isn’t available). Thus it’s easier if you copy enclosed, alt. create (empty notepad files would be sufficient) as described:

  • OSRA in following location:
    C:\osra\V2.0.0\osra-bin.exe [I don’t make usage of %PATH% and the batch file included in OSRA distribution]
  • Additional files/folders:
    C:\osra\donotdelete\extToollGreen.txt [can be emtpy; used for giving the “clear” sign when node is done]
    C:\osra\donotdelete\ignore_me.txt [emtpy]
    C:\osra\donotdelete\ignore_me1.txt [emtpy]
    C:\osra\donotdelete\ignore_me2.txt [can be emtpy; echos the cmd line output, can be ignored, potentially parsed, I don’t]
  • Input/Output location:
    C:\test\ [currently; may be anywhere else, it includes your images and the resulting structures]

The first metanode reads the name of the picture files and creates an executable batch file called by the external tool node. [open the picture in a separate window to view full size]

The tricky part is the external tool node, should you do a full reset and not have all the necessary files in place, you won’t be able to open it and do a comparison of the set-up.

And here the flow-variables:

The remaining portions are less tricky and it is a matter of taste what you want to do with the obtained files. In my second metanode, I make a list of all the smiles files (OSRA creates one output file per one input file) and combine it with the original inputfile (resp. filename). [open the picture in a separate window to view full size]

Finally, in the third node, the structure is drawn out to have a visual comparison to the picture input. [open the picture in a separate window to view full size]

After that, it is up to you what you want to do with the results.

A zip-file, containing the workflow and mentioned text files and folder structure may be downloaded from this link. Some examples of varying quality graphics are included.