shental group
  • Home
  • Research
  • People
  • Publications
  • Contact

MICROBIAL PROFILING AND METAGENOMICS PROJECTS.

Picture

COMPASS

Profiling microbial mixtures refers to identifying bacteria in a certain community and their abundance, by applying massively parallel sequencing for this task. We have developed a method that profiles any community to an extremely high phylogenetic resolution and accuracy. The method is termed COMPASS, that stands for Convex Optimization for Microbial Profiling by Aggregating Short Sequence reads (pdf). Theoretical analysis of COMPASS appears in (pdf), and an application of the method appears in (pdf).

SMURF

COMPASS significantly increases profiling resolution using short reads, yet it requires an additional shearing step. This step happened to be non-uniform, i.e., different locations along the long amplicon were not uniformly sonicated, thus resulting in uneven coverage. In a following project, we sought to provide a simpler approach that would increase phylogenetic resolution while being experimentally simpler, and thus easier to implement and disseminate in the community. Instead of amplifying a single long region in COMPASS, we turned to amplifying several short regions and then computationally combine their results to a single coherent profiling solution. The approach, termed Short MUltiple Regions Framework (SMURF) solves a convex optimization problem resulting in the most likely mixture of bacteria that gave rise to the given set of reads from different regions. SMURF can be applied to any number of regions in a mix-and-match fashion, especially using any combination of common primer pairs (e.g. V1-V3, V3-V5, V4, etc.). The standard experimental procedure therefore remains unchanged, and the experimentalist simply selects any set of regions. The de facto amplicon length of SMURF is the total length across amplified regions, which significantly increases the phylogenetic resolution.
Since 2014 my group has been closely collaborating with
Ravid Straussman from the Weizmann Institute of Science on a large-scale project which aims to profile bacteria in a tumor microenvironment across many cancer types. This ambitious project required, as infrastructure, a protocol that would allow high resolution and robust microbial profiling under extremely harsh conditions. First, upon embarking on this project, we understood that, if bacteria are indeed present in tumors, their biomass is extremely low, and a specific type of 16S rRNA profiling should be devised. Second, many tumor samples are stored in formalin-fixed paraffin-embedded (FFPE) blocks, a process that is known to degrade DNA, so classical 16S rRNA sequencing (e.g. V3-V4) would fail to profile them, since DNA molecules may be shorter than required. Third, we wanted to devise a protocol that would not increase workload compared to standard methods, thus enabling the efficient screening of thousands of tumor samples. The devised protocol, termed 5R, was a specific application of SMURF where five short regions along the 16S rRNA gene, each of length 160-240bp, are amplified in multiplex. The five regions cover about 1000bp, thus allowing extremely high phylogenetic resolution when combined by SMURF; the short length of each region allows profiling of FFPE samples; workload is identical to standard methods requiring a single PCR reaction. 5R was developed by Garold Fuks from the Shental group and by Deborah Nejman from the Straussman lab.

The Tumor microbiome projectPicture

5R and SMURF reconstruction were applied to profile more than 2500 tumors across eight tumor types (breast, lung, melanoma, pancreas, ovary, colon, bone, and glioblastoma). About half of the samples were extracted from FFPE blocks and the rest from snap frozen tissue; each tumor type was recruited from two to four centers in Israel and in the USA; samples included both tumor and normal adjacent tissue. Analysis sought to identify bacteria in different cancer types or in specific conditions (e.g. bacteria present in tumor vs. adjacent normal tissue in breast cancer), and tried to correlate between bacteria and specific clinical data (e.g. smoking status in lung cancer). The extremely low bacterial load in the microenvironment of a tumor combined with the plethora of environmental contaminations posed considerable challenges and required developing specific methods for data analysis in order to overcome contamination and other confounding factors.
To corroborate 16S rRNA findings, results were further verified, both experimentally and by independent computational means. A significant effort was made to validate the presence of bacteria using bacteria-specific fluorescence in situ hybridization probes and by Immunohistochemistry, which was performed by the Straussman lab. 
A paper describing the tumor microbiome project has was published in Science in May 2020 (pdf), and was featured on its cover.

Remarks: Another paper involving 5R analysis of pancreatic cancer data was formerly published in Science in 2017 (pdf). 5R has also been applied for profiling skin microbiome in health and disease in collaboration with Shiri Meshner's lab from the Arava Science center. Several papers were published in a project funded the by the Genomics studies for personalized medicine research grant of the Israel Ministry of Science, Technology & Space (pdf, pdf,pdf,pdf).



COMPRESSED SENSING APPLICATIONS IN BIOLOGY - FORMER STUDIES AND OUR COVID-19 INITIATIVE 

Picture

Former studies

In 2010 we presented a novel application of Compressed Sensing to the problem of identifying novel mutations and their carriers in large cohorts of DNA samples via Next Generation Sequencing technology. The same method is also able to locate carriers of known mutations, e.g., in the case of genetic screening (pdf). For a general overview of the method, see the above write ups in Israeli newspapers (Haaretz daily and Galileo popular science magazine, in Hebrew) . We later validated the method by detecting rare denovo SNPs and their carriers in a cohort of more than 1000 samples of Sorghum bicolor (pdf).
This project was funded by the
US-Israel Binational Agricultural Research & Development Fund (BARD)

Covid-19 application

In March 2020, when the Covid-19 pandemic hit Israel, we understood that we can apply the above mentioned approach to detect infected individuals. Recent reports suggest that 10-40% of SARS-CoV-2 infected patients are asymptomatic and that significant viral shedding may occur prior to symptom onset. A major bottleneck of managing the COVID-19 pandemic in many countries is diagnostic testing which is primarily performed on symptomatic patients, due to limited laboratory capabilities as well as limited access to genome-extraction and Polymerase Chain Reaction (PCR) reagents. On the other hand, there is an urgent need to increase diagnostic testing capabilities in order to allow large scale screening as part of a test-trace-isolate strategy. In fact, such tests will be routinely required until a vaccine is developed. In a collaboration with Prof. Tomer Hertz and Prof. Angel Porgador, from Ben-Gurion University (BGU) we developed P-BEST - a method for Pooling-Based Efficient SARS-CoV-2 Testing which identifies all positive subjects within a large set of samples using a single round of testing. P-BEST can be configured based on the carrier rate of a given population. For example, if the carrier rate is below 1% the method provides an 8-fold improvement in testing efficiency. In our proof-of-concept study we pooled sets of 384 samples into 48 pools and successfully identified up to 5 positive carriers within these sets. We then used P-BEST to screen 1115 healthcare workers using 144 tests.
In July 2020 we established a startup company, Poold Diagnostics, to pursue our academic project. In August 2020 we obtained clinical approval from the Israeli Ministry of Health, following a pilot study that was conducted at BGU and the Soroka University Medical Center.
A paper describing the method was published in Science Advances (pdf), and has been covered by multiple news outlets: New York Times, Scientific American, Jerusalem post, Haaretz, Globes (Hebrew), BBC and Israel TV Channel 12.

haaretz_1.png jpost.png sciam.png nyt.jpg




FORMER PROJECTS

Picture
The antigen microarray project
The antigen microarray is a high throughput device that provides an "immunological profile" of a person based on a blood sample. We were involved in the algorithmic part of this project which heavily resides on machine learning techniques. This was Noam Shental's main postdoctoral project, and he later headed the bioinformatics division in a startup company, ImmunArray Inc., that was established in order to pursue the project. Our findings in Systemic Lupus Erythematosus, Scleroderma and Pemphigus Vulgaris appeared in several publications (pdf,pdf,pdf). 


Noam's Ph.D. work in machine learning


Algorithms for semi-supervised learning
We dealt with two scenarios in semi-supervised learning. Firstly, the classical scenario of a large unlabelled data set which is accompanied by a small labelled set (pdf). Secondly, we considered the scenario where partial supervision is provided in the form of equivalence constraints (pdf,pdf,pdf,pdf), which can also be considered as a constrained clustering problem (pdf).
 
Applications of graphical models for clustering and segmentation
We represented the problem of data clustering as an inference problem in an undirected graphical model, and applied Generalized Belief Propagation in order to solve it (pdf). An application to image segmentation was also considered (pdf).

 A Graphical Models approach for a storage (hard disks) and communications (cellular phones) applications
We mapped specific hard problems in the field of electrical engineering to the field of Graphical models, and solved them almost optimally (pdf,pdf,pdf).