Cancer Genomics & Molecular Therapeutics

Breast cancer, Pancreatic adenocarcinoma, Paediatric leukaemia, Genome sequencing, Expression analysis, Transcriptomics, Biomarker discovery, Novel drug targets, Cancer database.

IRIS Webinar

Pancreatic cancer (PDAC) is the fourth leading cause of cancer death worldwide and is projected to be the second within a decade. Advances in therapy have only achieved incremental improvements in the overall outcome but cannot provide notable benefit for undefined subgroups of patients. The lack of any specific genomic diagnostic markers, the difficulty in establishing a tissue diagnosis, and the aggressive nature of PDAC, which respond poorly to standard treatment, contribute to the exceptionally high mortality. Since in early stages of PDAC symptoms are uncommon and nonspecific, early detection in clinical practice is challenging. As a consequence, there is an urgent need to better understand the molecular pathology of PDAC and the identification of biomarkers and novel drug targets for early diagnostics and to develop novel therapeutic strategies. One of the Centre's primary objective is to perform a comprehensive multiomic analysis of 75 pancreatic cancers with matched normal tissue samples to extract biological insights and potentially novel diagnostic biomarkers.

  • The Centre has collaborated with the Childs Trust Hospital Chennai to collect longitudinal samples of paediatric leukaemia from 55 children (before and after treatment) for last three to four years (for every month).

  • A comprehensive logitudinal multiomic analysis of these paediatric leukaemia samples will help understand the causes of remission and recurrence of leukaemia in children.

  • There is no large-scale genomic sequencing data available for cancers of Indian origin, which is critical for the identification of population-specific cancer biomarkers and drug targets

  • Currently, available cancer biomarkers are developed based on genomic information available from the western population, which may not be suitable for the diagnostics and treatment of the Indian population because of the different genetic backgrounds and genomic heterogeneity.

  • Therefore results from our analysis will be publically available as a national cancer database specific for Indian population.

Cancer Genomics & Molecular Therapeutics

S. Mahalingam

Principal Investigator


Manikandan Narayanan

Area of Interest

Manikandan Narayanan

Co-Principal Investigator

Computer Science and Engineering
Himanshu Sinha

Area of Interest

Karthik Raman

Area of Interest


PDAC desmoplasia

Aim 1:

Comprehensive, integrated genomic analysis of pancreatic cancers with matched normal tissue samples from Indian cohorts using a combination of whole-genome exome and RNA sequencing. 

Pancreatic cancer [Pancreatic ductal adenocarcinoma (PDAC)] is an important public health problem and is the fourth leading cause of cancer death worldwide with little improvement in outcomes despite decades of research. It is among the cancers with very poor survival rates. Due to the lack of non-invasive sensitive genomic biomarkers, result in diagnosis often after the cancer has advanced locally to the point of being non-resectable or metastasized to distant sites. At present, there are no specific successful non-invasive detection methods for PDAC. We propose to carry out the following: 1. Comprehensive, integrated genomic analysis of 75 pancreatic cancers with matched normal tissue samples from Indian cohorts using a combination of NGS whole-genome exome and transcriptome sequencing. 2. Determine the mutational mechanisms and candidate genomic events important in pancreatic carcinogenesis. 3. RNA expression analysis to define subtypes and the different transcriptional networks that underpin them. 

Aim 2:

To identify specific biomarkers for early detection and novel drug targets to design better therapeutics 

Based on the differential expression profile and mutational landscape confirmed using RT-qPCR and Sanger sequencing, the significantly deregulated genes will be selected for developing NGS panel for early detection. Targeted NGS panels will have more clinical advantage over other existing diagnostic methods, due to faster turnaround times and higher sequencing depths resulting in higher analytical sensitivity and specificity. The panel will be designed to interrogate SNVs, indels, CNVs/LOH, and both known and novel fusions associated with major fusion partners. One of the major benefits of using an NGS-based genetic testing strategy is the ability to detect multiple types of aberrations in a single assay. Also, as an alternate, we will develop antibody-based detection assay based on the availability of antibodies. We will test the detection limit of both assays before being validated with a large number of tumor samples.

Expected deliverables of the research

The genomic sequences produced from the proposed work will be critical for developing India’s specific cancer genome database, and this database will be crucial for developing cancer-specific biomarkers for early detection and understanding the treatment response of the patients.  Biomarkers identified from the proposed work will be critical for developing Real-time quantitative PCR (RT-qPCR) and sequencing-based early diagnostic kits. Also, identified drug targets will be used for developing novel anti-cancer therapeutics for pancreatic cancer and pediatric leukemia.  The preclinical cancer models will be of immense help for drug screening and to understand the in vivo cancer pathogenesis. The successful development of an organoid may replace the animal model for cancer drug screening and genomics.  A similar approach will be extended to other cancers that are prevalent in India, and this will help to initiate a start-up company on cancer therapeutics or diagnostics. 

Current status

  • Isolated DNA from cancer tissue and the matched normal tissue samples for exome analysis.
  • RNA isolation for transcriptome analysis under progress.


International Collaborations

  • Established collaboration with Dr. C. Charbel and Dr. Ygal Haupt group at Peter MacCallum Cancer Center, Australia, on cancer genomics and animal model for understanding cancer pathogenesis. Jointly, we have submitted a collaborative proposal to the recently advertised call on Indo-Australian Biotechnology fund (Indian PIs: Dr. S, Mahalingam and Dr. Karthik Raman/Australian PIs:  Dr. C. Charbel and Dr. Y. Haupt).  Also, initiated the discussion on the exchange of students between the two institutes. We may also attract YIF from Australia. 
  • Already signed an MOU with Omics Data Automation, USA, and collaboratively working on developing a cancer genome database for the last one year and we have initiated the development of India’s first Breast cancer genome database with 340 breast exome and transcriptome.
  • Established collaboration with Dr. K. Rajalingam’s group at University of Mainz, Germany, on cancer pathogenesis on novel drug target identification and already we have published two research articles in high impact international peer-reviewed journals (Journal of Biological Chemistry, 2018a, b).
  • This CoE proposal can integrate well with existing and upcoming master’s programmes in computational biology and data science. The novel datasets generated through this study will also have a lot of pedagogical value. Courses related to cancer genomics, machine learning applications for biology can be offered, which will be attractive electives in interdisciplinary programmes already running at IIT Madras, as well as proposed International Master’s programmes.
  • To create awareness on cancer genomics and therapeutics to faculties and students from Indian research institutes, universities, and medical institutions, we have already established “CANCERCON” international conference series and successfully conducted CANCERCON 2010, 2014, and 2018 in collaboration with Cancer Research and Relief Trust, Chennai. More than 70 international and national faculties and more than 450 students from all over the world participated.  Representatives from many pharma companies have also participated. To the best of our knowledge, this was the only conference in India for which journals like Nature, Science, EMBO, Cell, and European Biochemical Society, sponsored and participated.   
  • Since there is a lot of demand for cancer genome sequence data analysis experts in India, it is the need of the hour to establish conference and training workshops in this area, the proposed CoE has the potential to create international conference/ workshop series on cancer omics like Gordon/EMBO international conferences.
  • We are in discussion with The Cancer Genome Atlas (TCGA), USA, to jointly organize the conferences (every year) on cancer genomics and data analysis at IIT Madras, Chennai.

Industrial collaborations

  • Signed an MoU with Omics Data Automation, USA, and collaboratively working on developing a cancer genome database for the last one year. We have completed India’s first Breast cancer genome database with 340 breast cancer whole-genome exomes and transcriptomes.
  • Signed an MoU with Indivumed, Germany, and collaboratively working on cancer tissue sample collection for genomics and developing a workflow for automating cancer pathology (tissue immune-histochemistry) and integration of patient electronic health record to genomics to understand the treatment response.
  • Bristol Myers Squibb pharma has already approached our group to work collaboratively on the identification of novel drug targets and validation for pediatric leukemia.
  • Initiated discussion with Janssen Pharmaceutical company of Johnson and Johnson on developing a diagnostic tool for medical residual disease (MRD) for assessing the drug treatment response for pediatric cancers. 
  • The proposed CoE will give more visibility of our group on cancer genomics both nationally and internationally and attract many consultancy and collaborations with industries on the development and validation of cancer biomarkers for early detection and to understand the treatment responses, as well as the identification of drug targets for the development of novel therapeutics. Importantly, the cancer organoid model developed from this CoE is a unique and ideal model system for screening drugs for treatments. This model system will potentially help to select the best drugs without trial and error towards personalized cancer treatment.  We will be in a unique position compared to all other similar research groups/institutes in India due to the unique cancer genome datasets and the availability of cancer tissue samples for validation. The cancer organoid model will attract many pharma industries for collaboration.  
  • We have already signed an MOU with Indian Academy of Pediatrics on developing biomarkers for early detection and for designing treatment strategies for better management.   

Societal impact

India is on the verge of a cancer epidemic. The sequence data from the proposed CoE will help to develop an India-specific cancer genome database, which is critical to identify and develop biomarkers for early detection and understand the drug response. Also, it will help to identify drug targets for novel therapeutics. Since there is no Indian population-specific cancer genome data available today and all studies are based on what is available from the western population, the patient survival rate is very low compared to western societies due to genomic heterogeneity. The proposed CoE is the first step towards developing an India-specific database.

Since the cancer incidence rate is very high, most pharma companies are on the lookout for Indian-specific cancer genome data to establish biomarkers for specific therapeutics. Results from the proposed CoE will have a great impact on cancer research and drug development in India. Also, it will help to represent India in the International cancer genome consortium for pediatric and pancreatic cancer genome sequences. Notably, the novel datasets, which are India-specific, will be a huge contribution to both the society, the clinical and the research .

Sustenance statement

After the completion of the establishment in different phases, the centre will be fully functional to offer its professional and technical services to the scientific community by catering the applications related to their research on varieties of areas that are mainly focusing on identifying cancer-specific biomarkers, drug discovery, and development from the laboratory to industry level. 

The centre can generate the resources through professional training programs, professional technical services, collaborative research programs, public-private-partnership, extension and consultancy activities for the industries, and providing incubation facilities for the BioPharma sector for their optimisation and validation studies.

In this way, the centre can generate money by optimally utilizing the infrastructure. Depending on the Centre’s expertise, the principal companies can use the infrastructure and technical expertise of the personnel to train the existing and prospective customer base and their workforce.

In addition, the centre can facilitate summer/winter schools, training programs, and workshops in related fields. Academic institutions, universities, medical institutions, engineering and technical institutions, biotechnology R&D organizations, and BioPharma industries can optimally utilize this facility. Attempts will be made to generate an intellectual property that could be transferred to industry for commercialization.

Technical/ Scientific Progress

New work done in the project

We have collected cancer with matched adjacent normal tissue samples and blood from 75 pancreatic cancer patients. Isolated tissue DNA and germ line DNA from tissue samples and PBMCs and performed whole exome sequencing analysis. Preliminary analysis suggest occurrence of unique genomic variations in Indian populations. Detailed report is attached

Infrastructure developments


Preliminary data was presented in the webinars

Whole exome Sequencing Analysis of Pancreatic cancers of Indian origin

The QC passed pancreatic cancer and matched adjacent normal tissue DNA and germline DNA library pools from 75 patients were sequenced using Illumina Novaseq 6000 as per the manufacturer’s instructions. The sequence data was processed using fastp to remove low quality bases and adapter sequences. The processed reads were mapped to Human reference genome build hg19 using BWA MEM algorithm. The alignments were sorted and converted to bam files using samtools. Duplicate reads were marked and removed using picard tools. Germline variants were called using Strelka2 variant caller. Somatic variants were called using VarScan2 by using tumor and normal sample pairs. Minimum read counts and p-value cut off were set 10 and 0.01 respectively. Variants were annotated using Ensembl Variant Effect Predictor (VEP). Driver genes were identified by analyzing the somatic variants with MutSigCV. On average 29.7 million paired reads (10GB) data were generated per sample and 99% of the reads were mapped to the reference genome.

Germline Variants:

Germline variant calling of the normal tissue and blood samples resulted in total 121715 variants. SNVs were 118666, indels were 3049, multiallelic variants were 2525 out of which 2263 were multiallelic SNVs. Germline variants were summarized in figure 2.

2. Summary of consequences of germline variants.

After filtering the low priority variants 46,544 variants were retained. These variants were further filtered based on genomAD and dbSNP allele frequencies. 14980 variants with allele frequency < 0.01 were retained. In addition, these variants, few variants that were reported to be associated with any type of cancer were retained.

Somatic Variants:

Number of somatic variants for per sample range from 56 to 507. Majority of the somatic variants were missense followed by synonymous and intron variants. The summary of functional somatic variants is provided in figure 3.

Figure 3. Somatic variant functional classification distributionFigure

Majority of the samples (>93%) have somatic variants in top 10 genes (Figure 4). Mutations in MUC4, DUX4L4 and PRSS3 were present in more than one sample.

Figure 4. Frequency somatic variants in top 10 genes and somatic co-occurance in pancreatic cancer samples of Indian origin

The change point analysis revealed 5 regions in the genome with relative high number of variants (Table1; Figure 5).

Table 1. Change points identified in the genome.
Chromosome StartEndnMuts Avg inter-mutation distanceSize
1128873321363511928 27695.81 747787
11521864221542855222299957.14 2099100
1168511349247162611136554271.83 78651262
1047087077126678147811370152.86 79591070

Figure 5. Rainbow plot showing somatic variant distribution and change points in the genome.


Visits planned for PI, co-PIs, international collaborators and students (both inbound and outbound)




Industrial Engagement

Exploring to develop genomic diagnostics markers and drug screening assays in collaboration with industries for early detection of pancreatic cancers in India.

University Engagement

Establishing collaboration with St.Jude Children’s Hospital, USA for the development of preclinical cancer model for drug screening and understand cancer pathogenesis.


Relevant Updates

Established the tissue storage, imaging and Next Generation Sequencing facility.

Facilities available at Centre for Cancer Genomics and Molecular Therapeutics

Class 100000 Cancer Tissue Storage facility

Sequencing Facility
Primary Cell Culture Facility
Advanced Imaging Facility