The field of genomics is entering an exciting era with unprecedented opportunities for new medical insights, enabled by an enormous and ever-growing amount of genomic data. The data are characterized by highly distributed acquisition, huge storage requirements, and highly involved analyses that integrate heterogenous information. My research is dedicated to identifying and addressing the challenges arising in the context of such data. This undertaking includes the design and development of new algorithms for coping with the distribution and storage of the data, for facilitating its access, and for improving the analysis and inference performed on it.

I follow a multidisciplinary approach that combines tools from machine learning, information theory, and statistics to create a sound technical framework for tackling the challenges of modern genomic data.

Sample projects

This list is intended to get a brief overview of the kind of projects performed in my group. Note that my research is not limited to these topics.

  • Lossless and lossy compression of raw and aligned genomic data, to reduce storage costs and speed up the transmission of these files (part of the developed algorithms are being considered as part of the standard for genomic data compression that is being developed by MPEG).

  • Lossless compression of the genomic variants related to a group of individuals, with random access capabilities. The goal is to facilitate the analysis and querying of genomic variants, which are crucial for clinical decision making.

  • Improve the methods for genomic variants discovery, so as to increase the accuracy of the found variants. The goal is to increase the number of true positive variants and to decrease the number of false positives.

  • Denoising and error-correction schemes for raw and aligned genomic data that account for errors introduced by the sequencing machines.

  • Identification of genomic patterns in tumors, to enable the extension of effective therapies across the current tissue-based tumor boundaries.

  • Develop supervised learning methods for cancer RNAseq data that can relate the gene expressions to tumor stage, survival rate, etc.

  • Develop methods to facilitate access to the data, for example by allowing querying on the compressed domain.