Accuracy

The closeness of agreement between the test results obtained using the new biomarker test and results obtained using a reference standard method widely accepted as producing "truth" for the analyte. For example, a reference method considered standard for detection of DNA mutations is sequencing. The observed level of agreement will depend on both the bias and precision of the new test.

Bias is the amount by which an average of many repeated measurements made using the new test systematically over- or underestimates the reference standard method result. Precision is discussed separately. For many new biomarker tests, there will not be a universally accepted reference standard method.

The reference standard method, if any exists, should be clearly stated.

For situations in which there is no universally accepted reference standard method, it may still be helpful to compare the new biomarker test to a non-reference standard test for which results are expected to show some correlation with the new test results. For example, results of a new immunohistochemical test using a novel combination of antibodies to assess protein expression might be expected to show some correlation with results obtained using an older test with a single antibody, although some differences would be expected as well.

Presentations of comparisons to a non-reference standard method should be accompanied by a discussion of reasons why some differences would be expected. For many imaging tests, accuracy measurements can usually only be made with respect to an inanimate phantom (test object). Data regarding accuracy and precision measurements of a relevant phantom should be provided. In some cases, previous studies may have been done using histologic markers to demonstrate "accuracy" of an imaging method (for example, comparing DCE-MRI results to microvessel density in biopsied tissue as a "truth" marker for angiogenesis).

When there are no reference or non-reference standard methods for measuring the biomarker, sometimes an assessment of true clinical state or condition will serve as the reference. For example, if a new test is developed to predict a toxic reaction to a drug, the results of that test could be compared to the outcome or clinical manifestation of an adverse event. If this approach is taken, then the new biomarker test results must not be used to make the clinical assessment.

A biomarker test used in this way would be an example of an integrated assay because the test is performed on all patients but no action is taken on the results directly. If the assay result was shown to predict the adverse outcome, then in a subsequent trial the assay might be used for clinical decision making (and the biomarker test would be considered an integral assay). For this example in which clinical assessment is used as the reference, evaluation of analytical accuracy is bypassed and clinical accuracy is evaluated directly.

More information:

 

Analytic sensitivity

The ability of a test to detect an analyte or entity when it is present. When the output of a test is binary, sensitivity traditionally refers to the proportion of positive test results obtained on cases that are truly positive for the entity or analyte of interest. For tests with quantitative output, the sensitivity refers to the change in the test output relative to the change in the actual amount of analyte, and this relation may depend on the absolute amount of analyte present.

Information, to be provided about the design of the sensitivity and specificity studies that were performed, should include characteristics of the samples and positive and negative controls, the rationale for interfering substances studied, analyte or entity (e.g., tumour cells harbouring a particular mutation) spike-in amounts and matrices used in any dilution experiments.

More information:

 

Analytic specificity

The ability of a test or procedure to correctly indicate absence of an analyte or entity when it is truly absent or to accurately quantify an entity or analyte in the presence of interfering or cross-reacting substances. Almost all assays demonstrate potential for false positive results due to interfering substances. When the output of a test is binary, specificity traditionally refers to the proportion of negative test results obtained on cases that truly do not possess the entity or analyte of interest.

Information to be provided about the design of the sensitivity and specificity studies that were performed should include characteristics of the samples and positive and negative controls, the rationale for interfering substances studied, analyte or entity (e.g. tumour cells harbouring a particular mutation) spike-in amounts and matrices used in any dilution experiments.

More information:

 

Cut-point(s)

Cut-points are thresholds that are applied to continuous or semi-quantitative assay measurements for purposes of reducing the assay or imaging test result to a positive/negative determination or perhaps to a few categories (e.g. low, medium, high). Any cut-point(s) must be clearly pre-specified because the statistical strength of the association between the categorized marker and a clinical endpoint, and the clinical interpretation of the assay result, may vary depending on the particular cut-point(s) used.

The cut-points to be applied to assay measurements, the rationale and the background data for the selection as it relates to the intended clinical use must be provided. In the case of a continuous marker that will be used to predict a binary outcome (e.g. treatment response or toxicity), cut-point rationale might be based on ROC analysis aimed at achieving a desired level of sensitivity or specificity. For time-to-event endpoints cut-points might be selected to achieve a specified separation of survival curves. The background information should include the sample sizes of any previous studies, a comparison of the characteristics of the previously studied patients and specimens to those that will be examined in the proposed study, and a brief explanation of how the cut-points were selected in those studies.

Frequently, cut-points are applied to assay or imaging test measurements for convenience of analysis without careful thought as to why a particular cut-point is appropriate or whether it is appropriate to apply a cut-point at all. For example, there might not be a strong rationale for applying a certain cut-point if the relationship between the assay measurement and clinical endpoint represents a biological continuum. Particularly problematic is the practice of cut-point optimisation, i.e., choosing a cut-point for a continuous or semi-quantitative measurement to maximise the degree of statistical significance (e.g. minimise the p-value) of the difference between the clinical outcomes in the two resulting marker-defined groups.

Not only does this method overestimate the true magnitude of difference in outcome between the two marker-defined groups, but it disregards the relative costs of misclassifying patients in either direction between the two groups. In general, choosing cut-points based on observed data can lead to biased results, and operating characteristics of the cut-point (e.g. sensitivity, specificity, predictive values) should be demonstrated on data sets independent of the ones used to derive them.

 

Diagnostic biomarkers

These are used to define the type of cancer a patient has. They can be developed from large population based studies or from clinical trial data. In addition to including standard imaging (e.g. CT) and cellular pathology studies, tumour markers such as PSA, CEA and CA-125 are well established as diagnostic biomarkers. Diagnostic biomarkers can also be used to detect and define recurrent disease after primary therapy.

 

Pharmacological biomarkers

These measure the effects of a drug treatment on a specific target [Proof of Mechanism (POM); e.g. enzyme inhibition or receptor blockade) or on a feature of tumour biology (Proof of Concept (POC); e.g. induction of cell death (apoptosis) or inhibition of blood vessel formation or function (angiogenesis)].

Such pharmacodynamic biomarkers can only be fully interpreted with the corresponding pharmacokinetic data, i.e. changes in POM and POC biomarkers cannot occur if potentially active drug concentrations are not achieved and maintained in the target tissue. Pharmacological biomarkers are particularly important in early phase drug development where failure to demonstrate POM or POC at tolerated doses can result in the termination of a drug development programme.

 

Precision

Closeness of agreement between independent test results obtained under essentially unchanged assay conditions. Independent test results refer to results obtained in a manner that is not influenced by previous results obtained on the same or similar test samples.

Information on expected variation in assay procedures that might impact measurement results is critical. Information to be provided should include the protocol followed, the conditions of the study, what factors were varied, and summary metrics including calculations of standard deviation (SD), coefficient of variation (CV) and descriptions of relationships between variation measures and means.

Precision studies in qualitative tests have been less well defined but can also be characterised using repeat testing and percent agreement. Investigators proposing to use an imaging test as an assay in a clinical therapy trial, as described in this document, should provide data showing how their implementation of the imaging test compares with published benchmark reproducibility data.

More information:

 

Predictive biomarkers

These biomarkers identify subpopulations of patients who are most likely to respond to a given therapy. For example, breast cancer patients with oestrogen receptor positive tumours are more likely to respond to anti-endocrine therapies, and only patients with HER2 amplification should be given trastuzumab (Herceptin) therapy.

 

Prognostic biomarkers

These indicate the likely course of the disease. Prognostic biomarkers can reflect, for example, the metastatic state or potential and/or the likely growth rate of the tumour, and are used to estimate patient outcome without consideration of the treatment given.

Prognostic biomarkers can guide treatment decisions: i.e. cancer patients with prognostic biomarkers that predict a poor outcome could be selected for aggressive treatments to increase their chance of survival, whereas patients with biomarkers predicting a good outcome could be spared unnecessary treatments. For example, intensive combination adjuvant chemotherapy is appropriate for patients with extensive lymph node involvement - a poor prognostic biomarker - as opposed to lymph node negative breast cancer.

 

Risk/predisposition biomarkers

These identify individuals at increased risk of developing cancer and can be identified in large population based studies, clinical trials or family studies, where examples include genetic screening for cancer pre-disposition genes such as BRCA1/2, APC or MLH1. Risk/Predisposition biomarkers may also involve measures of carcinogen exposure, e.g. chemical-DNA or chemical-protein adducts.

 

Screening/early detection biomarkers

These aid in identifying disease at an early stage and can be developed in large population based studies and/or clinical trials. Examples include PSA screening for prostate cancer, cytological screening for cervical cancer or mammography for breast cancer.

 

Surrogate response biomarkers

These are biomarkers which can be used as a substitute for a clinically meaningful endpoint. Although all pharmacodynamic biomarkers are potentially surrogate response biomarkers, only a few achieve surrogate response biomarker status. Importantly, the utility of a biomarker as a surrogate requires demonstration of its accuracy (the correlation of the surrogate with the clinical endpoint). Changes in tumour markers such as PSA, CEA or CA-125 are often used as surrogate response biomarkers.