INTRODUCTION
Technological developments in oncology have resulted in new treatments with great curative potential. However, while survival times have increased, corresponding benefits for health-related quality of life (HRQOL) have not been achieved. Thus, the literature reflects increasing interest in adding measurements of HRQOL to biomedical variables assessed in clinical trials of cancer therapies.[1–3]
Breast cancer and its treatment have negative effects on patients’ HRQOL. Besides fear of mastectomy and the anxiety and uncertainty about chemotherapy cytotoxicity, many patients worry that daily radiotherapy over several weeks may cause radiotoxicity, provoking temporary interruption of therapy. This period, and the post-treatment phase, while waiting for tumor response, challenge patients’ coping capacities. Some patients are not prepared to face these problems—physically, psychologically or socially.[4–12]
HRQOL is multidimensional and essentially subjective. Despite all research efforts, it has no standard definition.[13–16] Various authors suggest that HRQOL covers four essential domains: physical wellbeing (autonomy and physical capacity), somatic discomfort (symptoms related to disease and/or treatment), psychological state (emotions, anxiety, depression), and problems with social relations (familial and professional).[17–20]
Our literature review found general agreement regarding the influence of culture on wellbeing and health. Siegrist found differences in effectiveness and perceived effectiveness of medical treatments in different cultural contexts,[21] and Grossi found several cultural variables to be important determinants of individual psychological wellbeing.[22]
Most instruments to measure HRQOL have been created in English; some others, less used, were written in French or German.[23,24] But a simple translation is not enough for an instrument’s applicability outside the country where it was developed, because more important than language are conceptual and cultural similarities.[25,26] An instrument will retain traces of cultural characteristics from its country of origin, and these will be more obvious in proportion to the sociocultural and language differences of the country where it is to be adapted.
The traditional philosophy of psychometric instrument construction assumes that the instrument’s set of items (indicator variables) reflects unobservable latent constructs intended for measurement, indicating their level or degree (e.g. of depression, intelligence, etc.). In developing instruments to measure the HRQOL construct, two kinds of items are frequently used: indicator variables, which concern the level of HRQOL (psychological functioning and social/family relations) and causal variables, which refers to factors such as symptoms and adverse effects of treatment that produce changes in HRQOL (physical functioning, disease symptoms and adverse effects of cancer treatment).[27,28]
Of prime importance in an instrument that includes causal variables are basic properties such as test–retest reliability, content validity, discriminant validity (ability to detect differences between groups known to be different), predictive validity (sensitivity to changes over time) and clinical interpretability. Also needed are simplicity, understandability by the patient and applicability within the socioeconomic and cultural context of the population studied.
Thus, measuring HRQOL in Cuban breast cancer patients receiving radiotherapy requires an instrument developed in the Cuban socioeconomic and cultural environment. This will provide information about problems distressing patients from their own perspectives, since they know their own emotions best. As a result, when deciding appropriate therapies, patient needs could be more fully considered: subjective morbidity (negative emotional states and psychosomatic symptoms) as well as the impact of the disease and its treatment on patient lifestyles. In addition, knowledge derived from the instrument’s application would facilitate identification of patients requiring specialized attention to improve their individual and social responses to the real challenges of daily life. The purpose of this research was to construct and validate just such an instrument.
METHODS
This study was conducted at the Oncology and Radiobiology Institute (INOR, the Spanish acronym) in Havana, in 2010–2011. Participants were selected nonprobabilistically by the following criteria: women aged >18 years with histological diagnosis of breast cancer, referred for ambulatory radiotherapy, who provided written informed consent for participation. Excluded were patients unable to communicate orally or in writing, and patients with mental retardation, psychopathological symptoms, senile dementia or cerebral metastasis. The work was done in two phases: development and validation of the instrument.
Development phase The different items were generated by a qualitative method after collecting testimonies from patients attending their last session of ambulatory radiotherapy. Sample size was determined using the saturation of information criterion, interviews stopped when no novel points emerged, that is, when the last five women reported items that had already been included.[29] A total of 50 patients participated in this phase, organized in 10 focus groups of 5 patients each. To keep interviews uniform, a question guide was elaborated, including different aspects of HRQOL documented in the literature.[4–20,30] [A translation of the focus group guide is available online at www.medicc.org/mediccreview/Lugo—Eds.]
All interviews were done in settings providing adequate privacy. To encourage patient spontaneity, no recording equipment was used for the focus group interviews; patients were asked to stay on topic for each question asked.
For analysis and qualitative interpretation of the information gathered, a problem list was created and organized by topic, eliminating those that repeated the same idea and grouping those with similar meaning. This allowed identification of terms that were later included under the same item (e.g., despair and frustration). Problems that decreased HRQOL were documented and submitted for evaluation to a nominal group formed by experts (six oncologists and two nurses), each with ≥15 years of experience treating breast cancer patients.
A preliminary version of the instrument, called CV-MRT-P, was constructed with 42 items (distilled by semantic analysis from the 61 problems identified), using a 5-point Likert-like scale;[31] it described the items clearly and plainly, used positive and negative language, and avoided technical jargon.
An expert group of 5 oncologists, 1 biostatistician, and 1 psychologist, each with ≥15 years of experience in this area, assessed content validity of this preliminary version. Each item was evaluated according to whether it was: 1) important to patients, 2) easy for patients to understand, 3) clearly and explicitly associated with the HRQOL concept to be measured, 4) associated with the domains of the instrument, 5) formulated to be consistent with the question’s possible responses, 6) capable of eliciting varying responses among different patients, 7) worded in a way compatible with its operationalization (scale categories equidistant and hierarchically related to the data), and 8) not in violation of ethical principles.
Face validity was assessed by 20 eligible patients not involved in generating the items. They were asked to answer the questions included in the CV-MRT-P and to give their opinion about the instrument’s clarity, comprehensibility and simplicity. They were also asked to add any missing issue affecting their HRQOL.
Each answer was given a value from 1 to 5; the higher the value, the better the HRQOL. The mean scores for component items in each HRQOL domain were summarized for each corresponding synthetic variable:
- physical functioning (PhF)
- psychological functioning (PsF)
- social and family relations (SR)
- disease symptoms and adverse effects of treatment (AE)
The variable scores were then summed with the score for one discrete variable, perceived general health (pH), and divided by five to create the global HRQOL scale. Perceived HRQOL (pQ) was judged redundant and not included. Thus the scale expression was:
HRQOL = (PhF + PsF+ SR + AE+ pH)/5
Validation phase CV-MRT-P (with one modification, described in Results) was administered to 230 participants individually at three different times: before radiotherapy, at the end of radiotherapy and 4 weeks later. Information collected was stored in a database using SPSS 19.0. Dimensionality, construct validity, reliability (test–retest repeatability and internal consistency), discriminant validity, predictive validity, interpretability and response burden were evaluated.
Dimensionality Exploratory factor analysis (EFA) and varimax rotation with Kaiser normalization were used to explore instrument structure and to reduce dimensionality, minimizing information loss. Beforehand, it was decided to keep only items with factor loadings ≥0.4 (correlation between them and their domain). Bartlett sphericity test was performed to verify if it was possible to carry out EFA; it showed a significant result (<0.05) and Kaiser-Meyer-Olkin sample appropriateness of 0.811, meaning that the data matrix tolerated factor extraction.[32]
Construct validity was assessed by examining convergent and divergent validity using the multitrait multimethod correlation matrix. This method examines both types of validity using Pearson correlation. Convergent validity is present if correlations among items in the same domain, and between them and the synthetic variable of the domain to which they theoretically belong, are ≥0.4. Divergent validity is present if there is a correlation of <0.4 among items of different domains, and between them and the synthetic variables of the domains to which, in theory, they do not belong.[33]
Reliability The Pearson correlation coefficient was used to assess test–retest repeatability and Cronbach alpha for internal consistency.[34]
Discriminant validity ANOVA was applied to each of the HRQOL domains, stratified according to clinical stage of disease, on the assumption that patients in clinical stages III and IV have worse HRQOL than patients in stages I and II (it should be noted that significant differences may be found between stages without a dose-response effect).
Predictive validity is the ability of the instrument to detect changes in HRQOL produced over time by a particular event or intervention. It estimates the magnitude of changes in the scale and its domains by standardized mean response, calculated by dividing the mean change between initial and final time by its SD.[35] Cohen defines changes in standardized mean response as small, <0.2; medium, 0.2 to <0.8; and large, ≥0.8.[36]
Clinical interpretability is the degree to which quantitative scores for the scale and its domains translate into qualitative clinical meaning concerning HRQOL deterioration.[37] We defined severe deterioration as 1–1.9; moderate, 2–3.9; mild, 4–4.9; and no deterioration, or normal, as 5.
Response burden was assessed by average time in minutes patients needed to complete the questionnaire, and by nonresponse rate; the latter was considered low if ≤3% and high if ≥10%.
The analysis was repeated with a modified version of the scale (CV-MRT-01): nine items were eliminated and three were moved from PhF to AE. Results presented for CV-MRT-P include dimensionality, construct validity, repeatability, internal consistency and response burden. Results presented for CV-MRT-01 include dimensionality, construct validity, test–retest reliability (repeatability and internal consistency), discriminant validity, predictive validity and interpretability. Response burden was not analyzed in this second round because the modified scale was not administered to patients.
Ethical considerations The study was approved by the INOR research ethics committee. Participants gave written informed consent following an explanation of the study’s objectives, benefits and risks.
RESULTS
Participant age range was 30–75 years (mean 51 years for those involved in generating the items and assessing face validity, 54.3 years for the 230 women who tested the instrument). Breast cancer stage distribution for the latter group was: stage I, 12.8%; II, 39.9%; III, 36.2%; and IV, 11%. The focus groups selected 61 problems, related to financial difficulties (3), PhF (8), PsF (30), and AE (20, expressed by women who had completed 90% of radiotherapy). These problems were confirmed by the nominal group without additions. CV-MRT-P contained 42 items, 5 in PhF, 21 in PsF, 3 in SR, 11 in AE, and 2 discrete items: pH and pQ.
The experts judged content validity favorably, and the 20 patients who assessed the instrument’s face validity had no negative comments. They did suggest a change: in accordance with opinions of 95% of patients, the item, I have emotional conflicts because I am sick was replaced by Since I’ve been sick, I feel like crying.
In the preliminary version, EFA extracted four main components able to reproduce the correlations between observed variables; they explain 53.7% of total variance (Table 1). The first factor included all items in the PhF domain and explained the highest percentage of data variance (27.3%, λ = 6.282). The second component explained 11.1% of variance; it corresponded to AE, where EFA located some items that were more closely correlated with AE than with PsF: trouble sleeping; sadness, despair or frustration; and decreased sexual interest. This domain was therefore renamed physical and emotional adverse effects (of disease and treatment). The correlation matrix showed 9 PsF items that had very low correlations with the rest (< 0.3), thus adding little information (avoids looking at herself at the mirror, avoids contact with others, avoids speaking about the disease, avoids sexual relations, avoids being seen by others, avoids going out, feels diminished self esteem, concerned about negative effects of treatment, feels that treatment is not useful anymore). They were therefore eliminated from the revised instrument.
Table 1: Variance explained in patient responses to two versions of an instrument to measure HRQOL in Cuban breast cancer patients receiving radiotherapy
AE: physical and emotional adverse effects
HRQOL: health-related quality of life PhF: physical functioning
PsF: psychological functioning SR: social and family relations
The new version of the instrument (CV-MRT-01) contained 33 items, distributed in 4 domains. Two of these had changes in number of items: three from PsF were moved to AE. The current structure of PsF and AE includes 9 and 14 items, respectively (Table 2). EFA for this version extracted four components that explained somewhat more variance than the original (56.9%). Variance explained by specific components was similar in both versions.
Table 3 displays the multitrait multimethod matrix resulting from calculation of Pearson correlations between synthetic variables and HRQOL scores for CV-MRT-01 at two points (before and on completion of radiotherapy). These numbers show that the synthetic variables have higher correlations (≥0.4) with HRQOL and moderate correlations with the rest of the synthetic variables, indicating acceptable convergent and divergent validity. Pearson correlation analysis found acceptable patterns of convergence and divergence generally, with the following exceptions: appetite loss did not have an acceptable correlation (r = 0.34) with its own domain, AE. Also, swelling or pain in the arm on the same side as the affected breast had better convergence (r = 0.57) with PhF than with its own domain, AE (r = 0.38), and had low correlations (0.23–0.34) with other items in its own domain, except for pain or increased sensitivity in the breast area and pain in other parts of the body (r = 0.41 and r = 0.40 respectively). Heat or itching in the breast area, and pain in other parts of the body showed low convergence with AE (r = 0.38 and r = 0.37 respectively).
Table 2: CV-MRT-01, an instrument to measure HRQOL in Cuban breast cancer patients receiving radiotherapy
*coded inversely to reflect sense of wording
HRQOL: health-related quality of life
Test–retest reliability of CV-MRT-01 domains can be seen on the diagonal of Table 3; values range between 0.72 and 0.87, an improvement over CV-MRT-P (for which the range was 0.65–0.78). Overall internal consistency of the instrument and its domains was satisfactory for all three measurements (Cronbach alpha 0.748–0.917), and better than for CV-MRT-P (Cronbach alpha 0.721–0.755). The most important items for internal consistency are those that, when eliminated, decrease Cronbach alpha, i.e., make the domain more homogeneous. In this sense, the most important items were: perform housework; continue working as before; fatigue, low energy or lack of energy; malaise; trouble swallowing; breast inflammation or dryness; pain or increased sensitivity in the breast area; stinging irritation or burning sensation in the breast area; sadness, despair or frustration; decreased desire to enjoy what you used to like most; decreased sexual interest; tendency to hide the disease; family’s support in the way you need it; friends’ support in the way you need it; pH; and pQ.
Analysis of discriminant validity (or concurrent validity) for CV-MRT-01 confirmed that, at the end of radiotherapy, patients with more advanced stages of the disease suffered slightly greater decrease in levels of physical and psychological functioning than did patients in less advanced stages (values 3.9–4.4). PhF discriminated between stages I, II, III and IV in the second measurement, and PsF discriminated between stages I and IV in the third measurement.
Predictive validity analysis showed that CV-MRT-01 detected medium-to-large changes (>0.2) in standardized-mean response for HRQOL and its domains, including discrete items. Exceptions were SR, for which standardized-mean response remained stable throughout. Substantial changes (≥0.8) took place over both periods. There were negative changes observed in PhF and PsF between beginning and end of radiotherapy, but both improved by four weeks after completion of radiotherapy (Table 4).
Table 5 illustrates the clinical interpretability of the scale and its domains. Scores for PhF, AE, PsF and SR, as well as for the two discrete items (pH and pQ) had values >3 at the end of radiotherapy and four weeks later, indicating that at worst, radiotherapy produced a slight deterioration in HRQOL. This inference was confirmed by clinical observation, since none of the women exhibited severe secondary reactions to radiotherapy.
Table 3: Multitrait multimethod matrix and diagonal test–retest reliability of an instrument to measure HRQOL in Cuban breast cancer patients receiving radiotherapy, before and after radiotherapy
*AE: physical and emotional adverse effects / HRQOL: health-related quality of life / PhF: physical functioning / PsF: psychological functioning / SR: social and family relations
Table 4: Predictive validity of an instrument to measure HRQOL in Cuban breast cancer patients receiving radiotherapy
HRQOL: health-related quality of life / RT: radiotherapy / SMR: standardized mean response
Table 5: Mean responses for synthetic and discrete variables in instrument to measure HRQOL in Cuban breast cancer patients, before, on completion of and four weeks after radiotherapy
HRQOL: health-related quality of life
CV-MRT-P did not cause appreciable response burden. Average completion time was 7.2 minutes (4.1–11.2). All items were responded by 100% of patients, except those for decreased sexual interest, spousal and friends’ support, and malaise, all of which had < 10% nonresponse rates.
DISCUSSION
A multi-item instrument was constructed, validated and improved; experts and patients agree that it covers all aspects required to evaluate the impact of breast cancer and radiotherapy on Cuban women’s HRQOL. The structure of this tool correlates well with the degree of psychological functioning, social and family relations, physical functioning, and physical and emotional adverse effects of disease and treatment, as well as with two discrete items: perception of general health and perception of HRQOL.
For the PhF and AE domains, we used clinimetric scales that measure sequence and severity of symptoms and adverse effects of treatment. CV-MRT-01 has 33 items, fewer than both EORTC QLQ-C30 and its complement, BR-23, both of which have 53.[24] It also differs somewhat in structure from these international instruments. For example, there are actually more items in the psychological functioning domain of CV-MRT-01 than in EORTC QLQ-C30 and BR-23.[38] We attribute this to the different cultural contexts in which the scales were developed. The Cuban instrument has nine PsF items and the SR domain includes three items about relations with family, spouse and friends, whose formulation in the way you need it was influenced by patient focus groups.
Another difference appears with location of the item sleeping disturbance. This was associated with emotional stress by Cuban patients, but not by the EORTC QLQ-C30, which classified it as a symptom.[38] In a study that interviewed 75 breast cancer patients treated with surgery and chemotherapy to compare a Cuban questionnaire to FACT-B, an international questionnaire specific for breast cancer,[23,39] researchers found that Cuban women needed additional explanation to understand the FACT-B items. The items in the Cuban-developed questionnaire were expressed more simply. Researchers reported that on being asked some questions from FACT-B (I am losing hope in my fight against this disease, I worry about dying, I worry about my disease getting worse) some patients were silent for a few seconds, looking at the interviewer, as if surprised by the question, and a few burst into tears. Such reactions were not seen with the other FACT-B items, nor with any in the Cuban instrument.[40]
The role of reliability and validity evaluation in such instruments is fundamentally to collect evidence for their improvement. Since it was possible to evaluate the revised version using the same database, we were able to evaluate CV-MRT-01 without needing to interview more patients. Item formulation and measurement intervals were identical; only the number of items and their location in the synthetic variables changed. We inferred that since response burden was negligible with the 42-item CV-MRT-P, there would be no greater response burden for the 33-item CV-MRT-01.
Internal consistency assessment of both versions indicated an increase in homogeneity in the revised version, with the elimination of items that correlated poorly with their domains or the overall score. The first version already had acceptable internal consistency: Cronbach alpha >0.8 indicated that the scale measures different aspects of the same construct. Test–retest reliability was also acceptable, with repetitive results when applied to patients whose disease remained stable.[41]
Construct validity showed that, in both versions, the strongest and weakest correlations matched well with predictions (convergent and divergent validity). However, some items in the adverse effects domain did not present the expected patterns of convergence. These results are typical of clinimetric domains, since neither symptoms nor adverse effects are indicators per se of HRQOL; rather, they cause changes in level. Thus, it is not surprising that causal items show weak correlations with their own domain, or stronger correlations with another domain to which they do not theoretically belong. Such atypical behavior is the reason why internal consistency, as well as convergent and divergent validity, are not considered primordial properties of instruments that include causal items.
EFA was useful in simplifying the instrument and making it a more efficient measure of HRQOL. The 33 items retained explain the highest percentage of covariation possible and minimize information loss. This result needs to be corroborated in the future with confirmatory factorial analysis. EFA is not the most suitable analytic tool for instruments that include causal variables, because different data sets can have different factorial structures; e.g., hair loss and nausea and vomiting would correlate strongly in a study of patients on chemotherapy, but weakly in patients on hormonal therapy, which does not have the same adverse effects.[42]
Discriminant validity was satisfactory. CV-MRT-01 detected different levels of HRQOL between subgroups of clinical stages expected to have different scores. Such comparisons between groups known to differ are one way to evaluate discriminatory capacity. In the particular case of breast cancer patients, it was useful to verify that, in a period common to all patients (second measurement), the radiotoxic burden of radiotherapy differed among patients at different clinical stages of the disease. The importance of discriminant validity for an instrument with clinimetric variables is its ability to identify individuals with different clinical variables in cross-sectional designs.
The large- and medium-HRQOL effects detected by CV-MRT-01 (both domains and discrete items) are plausible. The before/after (radiotherapy) change reflects deterioration in HRQOL due to radiotherapy and the after/4-weeks-after change reflects a degree of recovery from its effects. Our findings are consistent with those of Manzanec’s study of radiotherapy’s negative effects on HRQOL.[8] The ability of an instrument to detect HRQOL changes at different clinical stages is critically important in clinical trials. Predictive validity is a particularly desirable property for scales that include causal items, because of ability to detect prospective clinical changes over time.[43]
Clinical interpretability is another important characteristic for this kind of instrument. In this respect, coherence was observed between the clinical phenomena measured and instrument scores. Scores for the synthetic variables fell within the mild-deterioration category. The values for two of these, PhF and PsF, rose in a straight line over the three measurements. Both AE and SR were lower at the end of the treatment period but recovered somewhat in the final measurement. The two discrete items, perceived general health and perceived HRQOL, changed little over the three measurements, but did move from the upper moderate category to mild deterioration four weeks after treatment ended.
This study confirms the validity and reliability CV-MRT-01. However, as Guyatt notes, validation should not be an all-or-nothing project.[44] Although our initial assumptions have been borne out, we will continue to evaluate the instrument to test its validity in different studies.
A limitation of this study is that the instrument was developed and validated with a highly selected patient population: those referred for treatment at INOR, a tertiary-level institution. Even though the patients came from different regions of Cuba, they do not constitute a nationally representative sample, restricting generalizability.
CONCLUSIONS
CV-MRT-01 demonstrates reasonably well the properties required for measurement of HRQOL in Cuba among breast cancer patients receiving radiotherapy, particularly those most pertinent to causal variables (reliability, predictive validity and interpretability). The study supports CV-MRT-01’s inclusion in clinical trials of radiotherapy in such Cuban patients.