Delorenzi Mauro, Associate scientist and Head of the Bioinformatics Core Facility (BCF) e-mail pdf group members
Mauro Delorenzi studied molecular biology and received his PhD for research in homeotic gene expression in drosophila development at the University of Zürich in 1989. Later, he also acquired a master degree in statistics for work on Hidden Markov Models. In bioinformatics, he worked first on genome data mining and in Terry Speed’s group in Melbourne and then joined ISREC in 2002.
Gene expression bioinformatics and translational cancer research
The Bioinformatics Core Facility (BCF) aims to establish a center of bioinformatics competence for services and collaborations in the design, mathematical-statistical analysis and interpretation of high throughput experiments, with a priority for cancer projects and the study of patterns associated with biological functions of medical relevance.

URLs for resources maintained by the group

http://www.isrec.isb-sib.ch/BCF/index.html
Homepage with links to other resources
http://www.isrec.isb-sib.ch/BCF/preproc.html
The spotted Microarray normalization and quality control server
http://www.isrec.isb-sib.ch/webmarcoil/webmarcoilC1.html
Web-Marcoil, our coiled-coil predictor server

Methods for high throughput data analysis

The BCF collaborates with the Lausanne DNA Array Facility (DAF) for assisting experimentalists in the design, standardization, quality control and data processing that is required for the generation of high quality microarray hybridization data.The focus of the BCF is on bioinformatics and biostatistical aspects of using gene expression data (microarray and RT‑PCR) for discovering and understanding patterns of cell activity associated with phenotypes of interest, such as sensitivity of tumors to chemotherapy or survival of cancer patients under different treatments.

There are two major aspects:

  • Development of generic methods and resources (data repositories, programs)
  • Support of and collaboration in several projects at various levels, from consulting and simple data analytical operations to the realization of complex studies.

The information generated in a typical experiment is multiple and complex. In order to data mine, statistically evaluate and interpret the patterns, a growing set of commercial or public domain software is available through the BCF and the DAF, and we can advise in their use. The BCF is also becoming an interface between users and other academic developers and computing centers, both mostly in the Swiss Institute of Bioinformatics (SIB).

One serious shortcoming of most clinical microarray studies is the lack of sufficient sample size. This may be partially addressed by pooling all existing data and by adopting methods from statistical meta-analysis. There has been a substantial accumulation of public data in breast cancer, which can be very useful for discovery and validation. The expression and clinical data is being organized in a repository called SwissBrod (Swiss Breast Oncology Database). In parallel, the BCF is developing software that integrates the application of some well established methods from classical statistics and of some recent new methods for high dimensional data in a unified framework. Our implementation can cope with heavy computational load better than, for example, the most popular tool in the field, the R statistical software. The role of the BCF is instrumental in creating synergies among medical-clinical and biochemical researchers, the data generating high throughput facilities and bioinformatics experts.

Selected Collaborative Project: Grading of breast cancer with gene expression profiling

Histologic grade in breast cancer has been providing clinically important prognostic information for many years. However, 30‑60% of tumors are classified as histologic grade 2. Unlike grade 1 (low risk of early metastasis) and grade 3 (high risk), this grade is associated with an intermediate risk of recurrence and is thus not useful for clinical decision-making. Histologic grade might also suffer from an element of subjective estimation and a difficulty for precise assessment with the available tools and methods.

Figure 1: Patterns of expression of grade-related genes and their association with histologic grade (HG) and relapse-free survival.
Matrices of relative gene expression values are shown as heat maps. Heat maps are grids of rectangles with colors that indicate the value of the matrix elements, where high expression is red and low expression is green. Rows correspond to genes, sorted according to the gene-specific association with histologic grade. Columns correspond to individual tumors, which were sorted first by histologic grade 1 to 3 and then by gene expression grade index (GGI) within each histologic grade category. GGI score of each tumor is plotted below the corresponding column. Relapse-free survival times in years are indicated below the GGI scores (gray dots = censored; red = relapsed, blue = normal breast). Dataset: from Van de Vijver et al., A gene-expression signature as a predictor of survival in breast cancer, N. Engl. J. Med. (2002) 347:1999-2009.

We found that histologic grade can be associated with a strong and clear signal in gene expression profiles of breast tumors. In a study conducted in close collaboration with researchers at the Bordet Institute in Bruxelles led by Dr. Christos Sotiriou, we analyzed microarray data from 189 newly profiled invasive breast carcinomas and from three published gene expression datasets. We identified differentially expressed genes in a training set of 64 estrogen receptor (ER) positive tumor samples by comparing expression profiles between histologic grade 3 and histologic grade 1 tumors. We define a continuous score of grade, the gene expression grade index (GGI) of a tumor, based on the 97 genes, which in this training set were most strongly associated with histologic grade. Most of these genes are known to code for proteins of the cell division machinery or involved in cell cycle regulation. The GGI can therefore also be seen as a proliferation index based on the average RNA abundance of many markers.

Data from 597 independent tumors were used to evaluate the association between relapse-free survival and the GGI in a Kaplan ‑ Meier and Cox regression analysis. In these validation datasets, the GGI was strongly associated with histologic grade 1 and 3 status; however, among histologic grade 2 tumors, the index spanned the values for histologic grade 1 ‑ 3 tumors, as shown in Figure 1 (for one of the data sets). Among patients with histologic grade 2 tumors, a high gene expression grade index was associated with a higher risk of recurrence than a low gene expression grade index (hazard ratio = 3.61, Fig. 2). GGI thus reclassified patients with histologic grade 2 tumors into two groups with high versus low risks of recurrence. We refer to these groups as having a Gene expression Grade (GG) of 3 respectively 1. The other tumors are also classified as low or high GG in a way consistent with their risk of metastasis, most histologic grade 1 tumors being classified in the GG 1 class and most histologic grade 3 tumors in the GG 3 class.

Other studies had previously defined prognostic microarray profiles for breast cancer, by using survival data to guide the selection of prognostic markers. While those markers are biologically heterogeneous, the set of genes we use for calculating the GGI can be associated with a well-defined and understandable biological function, which might simplify its calibration and its use in combination with other prognostic factors, both classic clinical factors and modern molecular factors.

Our study is innovative and illuminating for other reasons:

  • Genes selected in one study could be rapidly and convincingly validated on 3 other publicly available data sets (all those that were tested).
  • The same set of genes could be used for risk classification on all the microarray platforms tested (Affymetrix, Agilent, two versions of cDNA spotted arrays), despite selection on just one of them (Affymetrix), and since then also confirmed on RT‑PCR measurements.
  • A simple mathematical function, a signed average, was used for the risk score calculation; a system that is very transparent and likely contributes to flexibility and robustness.

The risk classification did not show significant heterogeneity between different studies despite diverse patient populations, different countries and pathologists, different treatment regimes, different microarray platforms; the classification appears to be highly independent of all these factors and therefore to differentiate breast tumors on the basis of a fundamental and strong difference in aggressiveness linked to the intrinsic proliferative activity of the tumor. Although tumors of higher proliferation might possibly tend to be more or less sensitive to certain treatments, these effects were not strong enough to be noted in this study. The power to detect such effects was not high though, and additional studies will be needed to clarify this point.

Figure 2: Analysis of relapse free survival by Gene Expression Grade (GG).
Kaplan ‑ Meier analyses for 570 patients from four different studies with three different microarray platforms. Number of patients at risk and 95% confidence intervals (CIs) for the relapse-free survival estimates (shown as error bars) are indicated at 2.5‑year intervals. Difference in relapse-free survival between two groups is summarized by the hazard ratio (HR) for recurrence with its 95% CI. A) Analysis of the whole dataset by histologic grade HG1 (green), HG2 (blue), or HG3 (red). B) Analysis of patients with HG2 tumors by gene expression grade (GG). The 217 patients with HG2 tumors were separated into low‑ and high-risk subsets by GG as GG1 (green) and GG3 (red), respectively. C) Analysis of the whole dataset by GG. GG1 = green; GG3 = red. D) to E) To show consistency among different datasets, forest plots of the hazard ratios and confidence intervals for individual datasets are shown below the corresponding Kaplan ‑ Meier plots (panels D , E , and F , corresponding to panels A , B , and C , respectively). All statistical tests were two-sided.

However, our most important observation was that the three-category histologic grading system could be replaced with a two-category gene expression grading system that may be more clinically relevant, as suggested by the stronger association between relapse-free survival and gene expression grade than between relapse-free survival and histologic grade. The Genomic Grade is a strong prognostic factor that adds to the explanation of models containing clinical factors. In a multivariable combination, GG, node status and tumor size emerge as the most important factors, while histologic grade and (more surprisingly) also ER status are not adding much once these main three factors are included. This means that while ER status is the main factor distinguishing two subtypes of breast cancer, proliferation is the main determinant of the aggressive tendency of a tumor to form metastasis. While the ER negative tumors have significantly worse prognosis on average than ER positive tumors, this difference is well accounted for by our measurement of proliferation, which is not the case with histologic grade.

There might be additional molecular determinants of the tendency to form metastasis, and these might for example correlate with the tendency to form early micrometastases in the lymph nodes, as detected by the node status. The third significant risk factor is tumor size, which might be simply informative about the time that the tumor already had to disseminate, rather than represent another intrinsic property of tumors.

The use of a gene expression grading system may improve the accuracy of tumor grading and deliver useful information to an oncologist for deciding how to best treat a given case of breast cancer. In practice the information can be used to decide if a patient needs systemic chemotherapy or not, a decision of enormous importance for the patient of course, but also for the health system, as unnecessary systemic treatments is not only a burden, but also expensive.

In other ongoing studies, we hope to discover further indices useful for treatment decisions, in particular which kind of chemotherapy is most likely to be beneficial for a given patient, if she needs it, or which kind of endocrine therapy, if she has to receive hormonal treatment. Generally, we try to understand the molecular basis of cancer types and subtypes, and find determinants that help optimal individualized choice of medical treatment and that could be used in clinical applications in the near future.

Major External Collaborations

Prof. Christos Sotiriou, Institut Jules Bordet, Bruxelles; Dr. Monika Hegi, CHUV, Lausanne; Dr. Curzio Rüegg, CePo, Lausanne and Prof. Ivan Stamenkovic, CHUV, Lausanne.

Keywords

Breast cancer, prognostic profiles, microarrays