Machine learning for light microscopy - problems to solve?

Machine learning for light microscopy - problems to solve?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I would like solve some biological problems that would improve the state-of-the-art of biology or bioinformatics. In particular, I want to apply machine learning on light microscopic images. The equipment and experience I have are:

  • Bright-field, dark-field, and phase-contrast microscopy
  • Modern laptop
  • 56-core super-computer with >100 GB of memory (on request)
  • Intricate knowledge of machine learning algorithms and signal processing
  • PhD-level research skills
  • Programming skills that would get me to work at Google
  • Limited knowledge about biology, bioinformatics, and microscopy (yet)

I want to do some publishable research free from all academic hassle. I will do this solely on my own time, without hurry to publish, in an attempt to do something good for the mankind. I can throw a few hundred dollars on the project every two months (or abour 1000 USD per year).

Much of the biological research published in Science, Nature, PNAS, Cell, etc. are so specialized that I find it difficult to detect important problems I could have a good chance of solving given my skill set. Thus, I am asking your help:

  • What kind of software you always wanted for light microscopic research, but did not know how to build?
  • What are some important biological problems you would like to get solved? (For machine learning, problems with a binary decision task are particular well suited -- e.g. "does this person have malaria or not"?)
  • What are some recent, high quality reviews on open problems in biology?
  • Something else?

While my question is a bit broad, I think this goes under the "good intention" (or whatever it is called) SE policy.

I know this question is going to close. But, if you want to work something you can work on:

Cryo super-resolution fluorescence imaging


  • CryoFM allows imaging of vitrified biological samples with fluorescence microscopy.
  • There are significant challenges to achieve high-resolution cryoFM imaging.
  • Fluorophore characteristics at low temperature offer additional advantages.
  • Cryo super-resolution fluorescence imaging will give dramatic resolution improvement.

Source: Fluorescence cryo-microscopy: current challenges and prospects.

RE: What kind of software you always wanted for light microscopic research, but did not know how to build?

I research fruit flies and in this field (and many other insect ecology model systems like beetles, moths, butterflies) we use a lot of visually scored data, e.g. body size, wing size, wing morphology, eye colours, bristle numbers, genital morphology, sex comb morphology… the list is huge! One program used is WingMachine - though the link to the software seems to be broken - which can measure morphology aspects of a fly wing.

Something I'd like to be able to do is put a vial of food under a scope and have it quickly count the number of eggs on the surface of the food. I asked a question about it a while back… This would be very useful, many labs have to count eggs (to make the number of eggs constant in each vial, variation here can have serious effects on the adult fly so control is important in ecology studies) and it is a slow, difficult and highly inaccurate, particularly variable between people. If there was some way of putting the vial under the scope, hitting a button and getting an approximation of the number it would be great!

A colleague counts dead beetles at the moment, I'm sure he'd appreciate a similar kind of program where he could image and have the software count automatically. I think both of these problems would be easy to solve with very similar software. Making software that is easy to "teach" how to recognize individuals is the key.

A slightly more complex bit of machine learning might be getting it to count different phenotypes in one image. Fitness assays in flies often use a wild type fly with dark bodied (ebony) competitors, the body of the wild type is comparatively more yellow. The fitness of the focal wild type fly is then the number of wild type offspring among the total (the dark body phenotype is recessive, therefore when the focal wild type mates with an ebony it produces wild type flies, if two ebonies mate we get a dark body offspring). Here the machine would have to be able to tell the difference and count both.

I'll attach a proper picture from under the scope later, the picture in my previous question was taken with a digital camera, not via a scope but it gives an idea of what it looks like.

You might be interested in reading the article "Machine learning in cell biology - teaching computers to recognize phenotypes" (

One of my colleagues does lots of histological work, staining and identifying tissue at the microscopic level. Software that could be helpful to that discipline might be the ability to distinguish among the different types of tissues present, and perhaps calculate the "area" occupied by each tissue type as well as empty space. This would not be unlike a GIS type problem, but I don't know if it fits well with a Yes/No binary decision framework. I don't know if it could be trained to learn to identify specific types of tissue, but perhaps it could recognize each distinct area of a cross section as different from other such areas. Here are a few cross sections to show you what I mean:



Smooth muscle:


Seminiferous tubules from testes:


Notice the different tissue types in each cross section, plus the white space. Each tissue type has different light transmission patterns, which may allow a computer to learn to distinguish among the different tissue types.

A Beginner’s Guide to Machine Learning

Should I learn now… or later? Learning is a universal skill/trait that is acquired by any living organism on this planet. Learning is defined by: the acquisition of knowledge or skills through experience, study, or by being taught.Whether that be a plant learning how to respond to light and temperature, a monkey learning how to peel a banana, or us humans learning how to ride a bike. This commonality is what makes us unique and evolve over time.

But what if I said, “Machines can learn too”

We’re in the age where machines are no different. Machine Learning is still fairly a new concept. We can teach machines how to learn and some machines can even learn on its own. This is magical phenomenon is called Machine Learning.

Targeted Audience: Beginners and/or Machine Learning Fresh Bloods

Hopefully this article will provide some useful insights and open up your mind to what computers can do now a days. I won’t go in depth about what machine learning is, but rather a high-level overview.

If there is anything that you should LEARN from reading this article, it is this:

Machines can predict the future, as long as the future doesn’t look too different from the past.

Deep Learning for Intelligent Microscopy

We have used microscopes to discover new phenomena for hundreds of years. Thanks to the digital image sensor and computer, much of this discovery work has now started to become automated. A variety of "deep" machine learning algorithms now automatically process digital microscope images to find, classify and interpret relevant phenomena, such as the indications of disease, or the presence of certain cells in an assay, or even a fully automated diagnosis

Despite their automation, microscopes themselves have still changed relatively little - they are, for the most part, still optimized for a human viewer to peer through to examine a sample in detail, which presents a number of challenges in the clinic. The diagnosis of infection by the malaria parasite offers a good example. Due to their small size (approximately 1 micron or less), the malaria parasite (P. Falciparum) must be viewed under a high-resolution objective lens (typically with oil immersion). Unfortunately, such high-resolution lenses can only see a very small area, containing just a few dozen cells. As the infection density of the malaria parasite is relatively low, one must scan through at least 100 unique fields-of-view to find enough examples to offer a sound diagnosis. This is true either if a human or an algorithm is viewing the images of each specimen - hundreds of images are still needed, which leads to a bottleneck in the diagnosis pipeline.

The Computational Optics Lab is currently solving problems like the one above by creating new microscopes, which are designed by deep learning algorithms, to ensure that their captured image data contains a maximum amount of information for the algorithm's specific task. This is a joint hardware-software optimization effort. In effect, we hope to turn the microscope into an "intelligent" agent, whose goal is to physically probe each specimen to allow the computer to learn as much as possible from it. Different optimizable hardware components that our lab has or currently is exploring inlcude programmable illumination, the optical pathway, and the detector and data management pipeline. Here are some of our currentprojects related to this "learned sensing" effort:

3. Learned sensing for joint optimization of different microscope components

Associated papers:

Project page with data and source code:

In this work, we investigate an approach to jointly optimize multiple microscope settings, together with a classification network, for improved performance of automated image analysis tasks. We explore the interplay between optimization of programmable illumination and pupil transmission, using experimentally imaged blood smears for automated malaria parasite detection, to show that multi-element “learned sensing” outperforms its single-element counterpart. While not necessarily ideal for human interpretation, the network’s resulting low-resolution microscope images (20X-comparable) offer a machine learning network sufficient contrast to match the classification performance of corresponding high-resolution imagery (100X-comparable), pointing a path towards accurate automation over large fields-of-view.

2. Learned sensing for optimized microscope illumination

Associated papers:

Project page with data and source code:

To significantly improve the speed and accuracy of disease diagnosis via light microscopy, we made two key modifications to the standard microscope: 1) we added a micro-LED unit that is optimized to illuminate each sample to highlight important features of interest (e.g., the malaria parasite within blood smears), and 2) we used a deep convolutional neural network to jointly optimize this illumination unit to automatically detect the presence of infection within the uniquely illuminated images.

Working together, our two insights allow us to achieve classification accuracies in the 95 percentile range using large field-of-view, low-resolution microscope objective lenses that can see thousands of cells simultaneously (as opposed to just dozens of cells). This removes the need for mechanical scanning to obtain an accurate diagnosis, subsequently offering a dramatic speedup to the current diagnosis pipeline (i.e., from 10 minutes for manual inspection to just a few seconds for automatic inspection).

1. Adaptively learned illumination for optimal sample classification

The Learned Sensing approach outlined above uses a convolutional neural network to establish optimized hardware settings. Here, we turned hardware optimization into a dynamic process, wherein we aim to teach the microscope how to interact with the specimen as it captures multiple images. To do so, we have turned to a reinforcement learning algorithm that treats the microscope as an agent, which can make dynamic decisions (how should I illuminate the sample next? How should I change the sample position? How should I filter the resulting scattered light?) on-the-fly during the image capture process.

Computing solutions for biological problems

Xin Gao (left) often collaborates with structural biologist Stefan Arold. Their most recent project led to a computational pipeline that can help pharmaceutical companies discover new protein targets for existing, approved drugs. Credit: KAUST

Producing research outputs that have computational novelty and contributions, as well as biological importance and impacts, is a key motivator for computer scientist Xin Gao. His Group at KAUST has experienced a recent explosion in their publications. Since January 1, 2018, they have produced 27 papers, including 11 published in the top three computational biology journals and seven presented at the top artificial intelligence and bioinformatics conferences.

Originally from China, Gao joined KAUST in 2010 after a stint with the University of Waterloo in Canada and a prestigious fellowship at Carnegie Mellon University in U.S. His group collaborates closely with experimental scientists to develop novel computational methods to solve key open problems in biology and medicine, he explains. "We work on building computational models, developing machine-learning techniques, and designing efficient and effective algorithms. Our focus ranges from analyzing protein amino acid sequences to determining their 3-D structures to annotating their functions and understanding and controlling their behaviors in complex biological networks," he says.

Gao describes one third of his lab's research as methodology driven, where the group develops theories and designs algorithms and machine-learning techniques. The other two-thirds is driven by problems and data. One example of his methodology-driven research is work1on improving non-negative matrix factorization (NMF), a dimension-reduction and data-representation tool formed of a group of algorithms that decompose a complex dataset expressed in the form of a matrix.

NMF is used to analyze samples where there are many features that might not all be important for the purpose of study. It breaks down the data to display patterns that can indicate importance. Gao's team improved on NMF by developing max-min distance NMF (MMDNMF), which runs through a very large amount of data to be able to highlight the high-order features that describe a sample more efficiently.

To demonstrate their approach, Gao's team applied the technique to human faces, using the images of 11 people with different expressions. Each image was treated as a sample with 1,024 features. After training MMDNMF to derive data to represent the features of each face, it could more correctly assign any black-and-white facial image than could be done using traditional NMF.

Opening biology's Pandora's box

Gao has many successful collaborations with KAUST researchers, but he says one of the most successful is with structural biologist, Stefan Arold.

Together, they have worked on several projects, including one that has led to a computational pipeline that can help pharmaceutical companies discover new protein targets for existing, approved drugs.

"Drug repositioning is commercially and scientifically valuable," explains Gao. "It can reduce the time needed for drug development from twenty to 6 years, and the costs from around 2 billion USD to 300 million USD. The National Institutes of Health in the United States estimates that 70 percent of drugs on the market can potentially be repositioned for use in other diseases."

Gao discovered that methods for drug repositioning face several challenges: they rely on very limited amounts of information and usually focus on a single drug or disease, leading to results that aren't statistically meaningful.

However, Gao's computational pipeline can integrate multiple sources of information on existing drugs and their known protein targets to help researchers discover new targets.

The model was tested for its ability to predict targets for a number of drugs and small molecules, including a known metabolite in the body called coenzyme A (CoA), which is important in many biological reactions, including the synthesis and oxidation of fatty acids. It predicted 10 previously unknown protein targets for CoA. Gao chose the top two: Arold and his colleagues then tested to see if they really did interact with CoA.

The collaboration verified Gao's predictions, and the computational pipeline is now being patented in several countries. It could eventually be licensed to pharmaceutical companies to enable already-approved drugs to be used for treating other diseases. The method can also help drug companies understand the molecular basis for drug toxicities and side effects.

"What makes our collaboration so synergistic is that our areas of expertise provide the minimal overlap needed to understand each other without creating redundancy," says Arold. "He brings the computational side and I bring the experimental side to the table. Our worlds touch, but don't overlap. Our discussions complement each other in a very stimulating way, without stumbling over too many semantic hurdles."

Another collaboration of Gao and Arold's involves enhancing the analysis of data gathered by electron microscopy. Arold explains that despite much progress in electron microscopy hardware and software—allowing it to be used to determine the 3-D structures of proteins and other biomolecules—the analysis of its data still needs to be improved. Gao and Arold are developing methods to reduce noise and thus improve the resolution of electron microscopic images of complex biomolecular particles.

They are also developing processes that can automate the interpretation of genetic variants and that enhance the process of assigning functions to genes. "If you put us together in a room for more than 15 minutes, we will probably come up with a new idea!" says Arold.

Improving current technologies

Other research by Gao's team includes a computational approach that can simulate a genetic sequencing technology called Nanopore sequencing. Gao's DeepSimulator3can evaluate newly developed downstream software in nanopore sequencing. It can also save time and resources through experimental simulations, reducing the need for real experiments.

His team also recently developed Gracob4, a method used to sift through genetic information and determine what pathways are turned on in microorganisms by stressful conditions, such as changes in acidity or temperature or exposure to antibiotics. This can identify genes that are dispensable under normal conditions but essential when the microorganism is stressed.

Feeding hungry algorithms

Biologists at Broad and elsewhere are increasingly using machine learning because it can pick up on subtle patterns and connections in data missed by conventional tools. Researchers create a machine learning model by training it on datasets, allowing it to build statistical relationships between the data and then using those relationships to make predictions when it crunches through new datasets.

These algorithms come in a variety of types. Supervised machine learning requires datasets that are annotated with information that primes the computer to discover a relationship of interest. By contrast, unsupervised machine learning algorithms aim to find patterns in unannotated data without being asked to look for anything in particular.

Classification and Prediction in Microbiology

Prediction of Microbial Species

There are two main types of microorganisms (Maiden et al., 1998), one of them with non-cellular morphology (Yeom and Javidi, 2006), such as viruses, and the other with cellular morphology that can divided into two types, one of them namely prokaryotes (Weinbauer, 2010), such as archaea and eubacteria, and the other namely eukaryotes (Nowrousian, 2010), such as fungi and unicellular algae. Different microorganisms have different characteristics, so it is important to identify the microorganisms properly. There are two main approaches to the identification of microorganisms. In one, the species of an unknown microorganism is determined with the goal of classifying it based on its domain, kingdom, phylum, class, order, family genus and species. In the other, the goal is to determine whether an unknown microorganism belongs to a specific species or not. For example, we can determine if an unknown microorganism is a virus or not, or more specifically, whether it is a certain virus. In this section, we will introduce recent studies that have used machine-learning methods to predict microorganisms.

In the study (Murali et al., 2018), the authors classified specific species of microorganisms using the IDTAXA, which employed the LearnTaxa and IdTaxa functions. Both of these functions are part of the R package DECIPHER, which was released under the GPLv3 license as part of the Bioconductor, which provides tools for the analysis and comprehension of high-throughput genomic data. The LearnTaxa function attempts to reclassify each training sequence into its tagged taxon using a method known as tree descent, which is similar to the decision tree, a commonly ML algorithms. IdTaxa uses the objects returned by the LearnTaxa and query sequences as input data. This system returns the classification results for each sequence in the taxonomic form and provides the relevant confidence for each level. If the confidence does not reaches the required value, which indicates that the classification cannot be accurately performed at that level. The classification of IdTaxa may lead to different conclusions in microbiological studies. Although the misclassification is small, many of the remaining misclassifications may be caused by the errors in the reference taxonomy. Fiannaca et al. (2018) presented a method for identifying the 16S short-read sequences based on k-mer and deep learning. According to their results, the method can classify both 16S shotgun (SG) and amplicon (AMP) data very well.

It is important to identify specific microbial sequences in mixed metagenomics samples. At present, gene-based similarity methods are popularly used to classify prokaryotic and host organisms from mixed samples however, these techniques have major weakness. Therefore, many studies have been conducted to identify better methods for identification of specific microorganisms. Amgarten et al. (2018) proposed a tool known as MARVEL for predicting double-stranded DNA bacteriophage sequences in metagenomics. MARVEL uses the RF method, with a training dataset composed of 1,247 phage and 1,029 bacterial genomes and a test dataset composed of 335 bacteria and 177 phage genomes. The authors proposed six features to identify the phages, then used random forests to select features and found three features provided more information (Grazziotin et al., 2017). Ren et al. (2017) developed VirFinder, which is a ML method based on k-mer for virus overlap group identification that avoids gene-based similarity searches. VirFinder trains the ML model through known viral and non-viral (prokaryotic host) sequences to detect the specificity of viral k-mer frequencies. The model was trained with host and viral genomes prior to January 1, 2014, and the test set consisted of sequences obtained after January 1, 2014. VirSorter (Roux et al., 2015) is based on reference dependence and reference independence in different kinds of microbial sequence data to identify the viral signal. Experimental results have shown that VirSorter has good performance, especially for predicting viral sequences outside the host genome.

The above methods specifically classify microorganisms according to different needs. When we want to know the taxonomy information of microorganisms, we can use the method, which proposed by Murali et al. (2018). Moreover, MARVEL, VirSort, and VirFinder can identify specific types of microorganisms. According to the Amgarten et al. (2018), these three methods have comparable performance on specificity, but MARVEL has a better recall (sensitivity) performance. We have compiled materials for implementation of the above methods, which are shown in Table 1.

Table 1. The available data and materials for prediction of microbial species.

Prediction of Environmental and Host Phenotypes

With the development of next-generation DNA and high-throughput sequencing, a new area of microbiology has been generated. The main research in this field is to link microbial populations to phenotypes and ecological environments, which can provide favorable support for disease outbreaks and precision medicine (Atlas and Bartha, 1981). It is well known that some microorganisms are parasitic and that the surrounding environment and host cells have an important impact on the microbial population. Differences in nutrient availability and environmental conditions lead to differences in microbial communities (Moran, 2015). Because microorganisms can exchange information with the surrounding environment and host cells, we can predict the environmental and host phenotypes based on the microorganisms that are present (Xie et al., 2018). This provides a more comprehensive understanding of the environment and the host, so that we can better use the environment and protect the host. Many studies have recently been conducted to predict environmental and host phenotypes using microorganisms. In this section, we introduce these studies.

Asgari et al. (2018) used shallow subsample representation based on k-mer and deep learning, random forests, and SVMs to predict environmental and host phenotypes from 16S rRNA gene sequencing using the MicroPheno system. They found that the shallow subsample representation based on k-mer is superior to OTU in terms of body location recognition and Crohn’s disease prediction. In addition, the deep learning method is better than the RF and SVM for large datasets. This method not only can improve the performance, but also avoid overfitting. Moreover, it can reduce the time of pretreatment. Statnikov et al. (2013) used OTUs as an input feature and processed the data as follows. First, the authors sequenced the original DNA, after which they removed the human DNA sequence and defined the OTUs based on the microbial sequence. Next, they quantified the relative abundance of all sequences belonging to each OTU. The authors used SVM, kernel ridge regression, regularized logistic regression, Bayesian logistic regression, the KNN method, the RF method and probabilistic neural networks with different parameters and kernel functions. Overall, they investigated 18 ML methods. In addition, they used five feature extraction methods. The experimental results revealed that the RF, SVM, kernel-regression and Bayesian logic use Laplacian prior regression provided better performance. Based on their research, human skin microorganisms collected from objects that have been touched can be used to identify the individual from which they originated. In this work, the author used a variety of classification and dimensionality reduction methods to explore the effects of each method. It is very useful for the next work, which provides a comprehensive comparison. Schmedes et al. (2018) used the microbial community for forensic identification. In their study, they developed the hidSkinPlex, a novel targeted sequencing method using skin microbiome markers developed for human identification. In forensic science, it is important to estimate the time of death. Johnson et al. (2016) used KNN regression to predict the time interval after death using datasets from nose and ear samples. This indicates that skin microbiota can be an important tool in forensic death investigation. Traditionally, marine biological monitoring involves the classification and morphological identification of large benthic invertebrates, which requires a great deal of time and money. Cordier et al. (2017) used eDNA metabarcoding and supervised ML to build a powerful prediction model of benthic monitoring. Moitinho-Silva et al. (2017), studied the microbial flora of sponges and their HMA-LMA status demonstrated the applicability of ML to exploring host-related microbial community patterns.

Due to the specificity of microbial communities, we can better identify the environment and the host. Moreover, we can judge the existing environmental conditions and host survival status according to the existence of microbial community. We summarize the available datasets and methods, which are shown in Table 2.

Table 2. The available data and materials for prediction of environmental and host phenotypes.

Using Microbial Communities to Predict Disease

Microbiomes are important to human health and disease (Bourne et al., 2009). Indeed, there are many microbial communities in the human body. Once a microbial community is out of balance or foreign microorganisms invade, the human body is likely to get sick. For example, intestinal microbial communities are associated with obesity (Ley et al., 2006b) and pulmonary communities with pulmonary infection (Sibley et al., 2008). Because of the complexity of these communities, it is difficult to determine which kind of microbiome communities cause of the disease. Recently, many studies have investigated use of microbiome communities to predict diseases, especially bacterial vaginosis (Srinivasan et al., 2012 Deng et al., 2018) and inflammatory bowel disease (Gillevet et al., 2010). By analyzing microbial communities, we can better understand the disease and then make effective decisions regarding treatment. Therefore, in this section, we discuss current studies investigating use of microbiome communities to predict diseases.

Bacterial vaginosis (BV) is a disease associated with the vaginal microbiome. Beck and Foster (2014) used the genetic algorithm (GP), RF, and logistic regression (LR) to classify BV according to microbial communities. There are two criteria for BV, the Amsel standard, which accord to the discharge, whiff, clue cells, and pH (Amsel et al., 1983), and Nugent score, which dependents on counting gram-positive cells (Nugent et al., 1991). The dataset in Beck et al. study was from Ravel et al. (2011) and Sujatha et al. (2012). The method in the paper (Beck and Foster, 2014) first classifies BV according to vaginal microbiota and related environmental factors, then identifies the most important microbial community for predicting BV.

Hierarchical feature extraction is based on the classification of microbes from kingdoms to species. The existing stratification feature selection algorithm will lead to information loss, and the stratification information of some 16S rRNA sequences is usually incomplete, influencing the classification. Therefore, Oudah and Henschel (2018) proposed a method known as hierarchical feature engineering (HFE) to identify colorectal cancer (CRC). To accomplish this, they used RF, decision trees and the NB method to classify a dataset of Next Generation Sequencing based 16S rRNA sequences provided by metagenomics studies. This method is good for processing datasets with high dimensional features. Moreover, the available dataset and method are in

In another study (Wisittipanit, 2012), the author focused on predicting inflammatory bowel disease. In that study, patients with Crohn’s disease and ulcerative colitis were compared with healthy controls to identify differences between the mucosa and lumen in different intestinal locations. The author used the Relief algorithm (Kira and Rendell, 1992) to select features, and Metastats (White et al., 2009) to detect differential features. Finally, the author used KNN and SVM as classifiers to perform disease specificity and site specificity analysis.

In this section, we discuss using microorganisms to predict different diseases. Beck and Foster (2014) predicted BV according to the microorganisms and the diagnosis standard of BV. HFE identified the CRC according to the OTU ID and the taxonomy information. Wisittipanit proposed a method to predict Crohn’s disease, based on OTU and feature selection method. The above methods used different ideas to predict diseases by using microorganisms and obtained good results. This indicates that some diseases affect human colonies. According to these colony changes, we can not only predict the disease, but also treat the disease according to the colony condition, which is a direction for future research.

Machine learning applications in cell image analysis

Dr A Kan, Division of Immunology, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia. E-mail: [email protected] Search for more papers by this author

Division of Immunology, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia

Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia

Dr A Kan, Division of Immunology, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia. E-mail: [email protected] Search for more papers by this author


Machine learning (ML) refers to a set of automatic pattern recognition methods that have been successfully applied across various problem domains, including biomedical image analysis. This review focuses on ML applications for image analysis in light microscopy experiments with typical tasks of segmenting and tracking individual cells, and modelling of reconstructed lineage trees. After describing a typical image analysis pipeline and highlighting challenges of automatic analysis (for example, variability in cell morphology, tracking in presence of clutters) this review gives a brief historical outlook of ML, followed by basic concepts and definitions required for understanding examples. This article then presents several example applications at various image processing stages, including the use of supervised learning methods for improving cell segmentation, and the application of active learning for tracking. The review concludes with remarks on parameter setting and future directions.

Machine learning and microscopy

With recent advances in machine learning (ML), an increasing number of scientists have been implementing ML techniques in image-based studies in biology and medicine, with the growing complexity of microscopic images providing an opportunity for computer aid in these fields. Machine learning helps speed up tasks, and with how time-consuming it can be to take information from images at different scales, all with different morphology and levels of noise, utilising algorithms can help automatically detect patterns in images and cells themselves.

Machine learning is a subset of artificial intelligence in which an algorithm learns from data and finds patterns and relationships. When observing cells, researchers typically use fluorescent tagging or stains to identify characteristics of a cell. In 2018, using a computer model to focus on details without using markers, an algorithm was able to correctly differentiate between smaller structures. To create an algorithm with high accuracy, a large amount of data is required to train models, and this had to be done with a lot of manual annotation. This helped with developing a neural network that could distinguish between cell types, label them down to where neurons’ bodies ended and their axons and dendrites began, and distinguish dead cells from living ones.

In addition to labelling, another area in which implementing ML models has been proven useful is in removing noise from microscopic images. Noise is caused by insufficient light, causing the graininess we often see in photographs taken at night. There will always be noise in photographs and some cannot be avoided. However, although algorithms to remove noise have been used for years, using deep learning models produces results that are far more effective.

Self-supervised learning is where the network is forced to learn what data has been withheld in order to solve a problem. Normally, with supervised learning, the machine knows what is looking for because it has been trained with images that have clean versions. However with self-supervised learning, researchers have to try using algorithms that train themselves. The outputs from these models look good, but that does not mean they are real.

The primary concern when using models is often whether data is being changed and the number of mistakes being made. With the labelling models, biologists consistently check computers’ work to ensure that structures have been defined accurately. In these scenarios, the original image is not changed, but another layer is added on top. With de-noising algorithms, the image has been modified to produce a cleaner result. If the noise in the original image is stronger, then these changes will be more pronounced. In one project, researchers were attempting to remove blur but rather than doing so, the algorithm simply picked up the pattern of stripes in the original images and removed any stripes in new images it had to de-noise, although this was fixed by adding training data.

There are now an increasing number of repositories containing different algorithms by researchers to de-noise images and in other areas like segmentation and classification. Neural networks are now being used in mapping brain tumours, studying RNA localisation and electron microscopy. As these models continue to develop in complexity and accuracy, and researchers continue to share results and work on algorithms together, we can already see how beneficial implementing AI in research can be.

Machine learning applications in cell image analysis

Machine learning (ML) refers to a set of automatic pattern recognition methods that have been successfully applied across various problem domains, including biomedical image analysis. This review focuses on ML applications for image analysis in light microscopy experiments with typical tasks of segmenting and tracking individual cells, and modelling of reconstructed lineage trees. After describing a typical image analysis pipeline and highlighting challenges of automatic analysis (for example, variability in cell morphology, tracking in presence of clutters) this review gives a brief historical outlook of ML, followed by basic concepts and definitions required for understanding examples. This article then presents several example applications at various image processing stages, including the use of supervised learning methods for improving cell segmentation, and the application of active learning for tracking. The review concludes with remarks on parameter setting and future directions.

Machine learning tool improves tracking of tiny moving particles

Scientists have developed an automated tool for mapping the movement of particles inside cells that may accelerate research in many fields, a new study in eLife reports.

Beyond manual tracing: An artist's impression of a deep neural network trained to recognise particle motion in space-time representations. Credit: Eva Pillai.

The movements of tiny molecules, proteins and cellular components throughout the body play an important role in health and disease. For example, they contribute to brain development and the progression of some diseases. The new tool, built with cutting-edge machine learning technology, will make tracking these movements faster, easier and less prone to bias.

Currently, scientists may use images called kymographs, which represent the movement of particles in time and space, for their analyses of particle movements. These kymographs are extracted from time-lapse videos of particle movements recorded using microscopes. The analysis needs to be done manually, which is both slow and vulnerable to unconscious biases of the researcher.

“We used the power of machine learning to solve this long-standing problem by automating the tracing of kymographs,” says lead author Maximilian Jakobs, a PhD student in the Department of Physiology, Development and Neuroscience at the University of Cambridge, UK.

The team developed the software, dubbed ‘KymoButler’, to automate the process. The software uses deep learning technology, which tries to mimic the networks in the brain to allow software to learn and become more proficient at a task over time and multiple attempts. They then tested KymoButler using both artificial and real data from scientists studying the movement of an array of different particles.

“We demonstrate that KymoButler performs as well as expert manual data analysis on kymographs with complex particle trajectories from a variety of biological systems,” Jakobs explains. The software could also complete analyses in under one minute that would take an expert 1.5 hours.

KymoButler is available for other researchers to download and use at Senior author Kristian Franze, Reader in Neuronal Mechanics at the University of Cambridge, expects the software will continue to improve as it analyses more types of data. Researchers using the tool will be given the option of anonymously uploading their kymographs to help the team continue developing the software.

“We hope our tool will prove useful for others involved in analysing small particle movements, whichever field they may work in,” says Franze, whose lab is devoted to understanding how physical interactions between cells and their environment shape the development and regeneration of the brain.


  1. Gentza

    Noteworthy, it's the funny information

  2. Layth

    Sorry, but this doesn't quite work for me. Who else can suggest?

  3. Akinogami

    I believe that you are wrong. I'm sure. I can defend my position. Email me at PM, we'll talk.

  4. Wealaworth

    It not absolutely that is necessary for me. Who else, what can prompt?

  5. Morfran

    Bravo, I think this brilliant idea

  6. Rickie

    Anyone can be

  7. Zulkiramar

    It is not more precise

Write a message