MICROBIAL PROFILING AND METAGENOMICS PROJECTS.
Profiling microbial mixtures refers to identifying bacteria in a certain community and their abundance, by applying massively parallel sequencing for this task. We have developed a method that profiles any community to an extremely high phylogenetic resolution and accuracy. The method is termed COMPASS, that stands for Convex Optimization for Microbial Profiling by Aggregating Short Sequence reads (pdf). Theoretical analysis of COMPASS appears in (pdf), and an application of the method appears in (pdf).
COMPASS significantly increases profiling resolution using short reads, yet it requires an additional shearing step. This step happened to be non-uniform, i.e., different locations along the long amplicon were not uniformly sonicated, thus resulting in uneven coverage. In a following project, we sought to provide a simpler approach that would increase phylogenetic resolution while being experimentally simpler, and thus easier to implement and disseminate in the community. Instead of amplifying a single long region in COMPASS, we turned to amplifying several short regions and then computationally combine their results to a single coherent profiling solution. The approach, termed Short MUltiple Regions Framework (SMURF) solves a convex optimization problem resulting in the most likely mixture of bacteria that gave rise to the given set of reads from different regions. SMURF can be applied to any number of regions in a mix-and-match fashion, especially using any combination of common primer pairs (e.g. V1-V3, V3-V5, V4, etc.). The standard experimental procedure therefore remains unchanged, and the experimentalist simply selects any set of regions. The de facto amplicon length of SMURF is the total length across amplified regions, which significantly increases the phylogenetic resolution.
Since 2014 my group has been closely collaborating with Ravid Straussman from the Weizmann Institute of Science on a large-scale project which aims to profile bacteria in a tumor microenvironment across many cancer types. This ambitious project required, as infrastructure, a protocol that would allow high resolution and robust microbial profiling under extremely harsh conditions. First, upon embarking on this project, we understood that, if bacteria are indeed present in tumors, their biomass is extremely low, and a specific type of 16S rRNA profiling should be devised. Second, many tumor samples are stored in formalin-fixed paraffin-embedded (FFPE) blocks, a process that is known to degrade DNA, so classical 16S rRNA sequencing (e.g. V3-V4) would fail to profile them, since DNA molecules may be shorter than required. Third, we wanted to devise a protocol that would not increase workload compared to standard methods, thus enabling the efficient screening of thousands of tumor samples. The devised protocol, termed 5R, was a specific application of SMURF where five short regions along the 16S rRNA gene, each of length 160-240bp, are amplified in multiplex. The five regions cover about 1000bp, thus allowing extremely high phylogenetic resolution when combined by SMURF; the short length of each region allows profiling of FFPE samples; workload is identical to standard methods requiring a single PCR reaction. 5R was developed by Garold Fuks from the Shental group and by Deborah Nejman from the Straussman lab.
5R and SMURF reconstruction were applied to profile more than 2500 tumors across eight tumor types (breast, lung, melanoma, pancreas, ovary, colon, bone, and glioblastoma). About half of the samples were extracted from FFPE blocks and the rest from snap frozen tissue; each tumor type was recruited from two to four centers in Israel and in the USA; samples included both tumor and normal adjacent tissue. Analysis sought to identify bacteria in different cancer types or in specific conditions (e.g. bacteria present in tumor vs. adjacent normal tissue in breast cancer), and tried to correlate between bacteria and specific clinical data (e.g. smoking status in lung cancer). The extremely low bacterial load in the microenvironment of a tumor combined with the plethora of environmental contaminations posed considerable challenges and required developing specific methods for data analysis in order to overcome contamination and other confounding factors.
To corroborate 16S rRNA findings, results were further verified, both experimentally and by independent computational means. A significant effort was made to validate the presence of bacteria using bacteria-specific fluorescence in situ hybridization probes and by Immunohistochemistry, which was performed by the Straussman lab.
A paper describing the tumor microbiome project has was published in Science in May 2020 (pdf), and was featured on its cover.
Remarks: Another paper involving 5R analysis of pancreatic cancer data was formerly published in Science in 2017 (pdf). 5R has also been applied for profiling skin microbiome in health and disease in collaboration with Shiri Meshner's lab from the Arava Science center. Several papers were published in a project funded the by the Genomics studies for personalized medicine research grant of the Israel Ministry of Science, Technology & Space (pdf, pdf,pdf,pdf).
APPLICATIONS IN BIOLOGY - FORMER STUDIES AND OUR
we presented a novel application of Compressed
Sensing to the problem of identifying novel
mutations and their carriers in large cohorts of
DNA samples via Next Generation Sequencing
technology. The same method is also able to locate
carriers of known mutations, e.g., in the case of
genetic screening (pdf).
For a general overview of the method, see the
above write ups in Israeli newspapers (Haaretz
daily and Galileo
popular science magazine, in Hebrew) . We
later validated the method by detecting rare
denovo SNPs and their carriers in a cohort of more
than 1000 samples of Sorghum bicolor (pdf).
This project was funded by the US-Israel Binational Agricultural Research & Development Fund (BARD)
In March 2020, when the Covid-19 pandemic hit Israel,
we understood that we can apply the above mentioned
approach to detect infected individuals. Recent reports
suggest that 10-40% of SARS-CoV-2 infected patients are
asymptomatic and that significant viral shedding may
occur prior to symptom onset. A major bottleneck of
managing the COVID-19 pandemic in many countries is
diagnostic testing which is primarily performed on
symptomatic patients, due to limited laboratory
capabilities as well as limited access to
genome-extraction and Polymerase Chain Reaction (PCR)
reagents. On the other hand, there is an urgent need to
increase diagnostic testing capabilities in order to
allow large scale screening as part of a
test-trace-isolate strategy. In fact, such tests will be
routinely required until a vaccine is developed. In a
collaboration with Prof. Tomer Hertz and
Porgador, from Ben-Gurion University (BGU) we
developed P-BEST - a method for Pooling-Based
Efficient SARS-CoV-2 Testing
which identifies all positive subjects within a large
set of samples using a single round of testing. P-BEST
can be configured based on the carrier rate of a given
population. For example, if the carrier rate is below 1%
the method provides an 8-fold improvement in testing
efficiency. In our proof-of-concept study we pooled sets
of 384 samples into 48 pools and successfully identified
up to 5 positive carriers within these sets. We then
used P-BEST to screen 1115 healthcare workers using 144
In July 2020 we established a startup company, Poold Diagnostics, to pursue our academic project. In August 2020 we obtained clinical approval from the Israeli Ministry of Health, following a pilot study that was conducted at BGU and the Soroka University Medical Center.
A paper describing the method was published in Science Advances (pdf), and has been covered by multiple news outlets: New York Times, Scientific American, Jerusalem post, Haaretz, Globes (Hebrew), BBC and Israel TV Channel 12.
The antigen microarray is a high throughput device that provides an "immunological profile" of a person based on a blood sample. We were involved in the algorithmic part of this project which heavily resides on machine learning techniques. This was Noam Shental's main postdoctoral project, and he later headed the bioinformatics division in a startup company, ImmunArray Inc., that was established in order to pursue the project. Our findings in Systemic Lupus Erythematosus, Scleroderma and Pemphigus Vulgaris appeared in several publications (pdf,pdf,pdf).
Ph.D. work in machine learning
We dealt with two scenarios in semi-supervised learning. Firstly, the classical scenario of a large unlabelled data set which is accompanied by a small labelled set (pdf). Secondly, we considered the scenario where partial supervision is provided in the form of equivalence constraints (pdf,pdf,pdf,pdf), which can also be considered as a constrained clustering problem (pdf).
Applications of graphical models for clustering and segmentation
We represented the problem of data clustering as an inference problem in an undirected graphical model, and applied Generalized Belief Propagation in order to solve it (pdf). An application to image segmentation was also considered (pdf).
A Graphical Models approach for a storage (hard disks) and communications (cellular phones) applications
We mapped specific hard problems in the field of electrical engineering to the field of Graphical models, and solved them almost optimally (pdf,pdf,pdf).