Hypothesis / aims of study
Existing research quality on interventions for pelvic organ prolapse (POP), particularly surgical, is hampered by the lack of consensus regarding the outcomes that should be reported and how they should be measured. This renders research quality highly variable as studies have described many different outcomes leading to an inability to synthesise results and increasing research heterogeneity.
Currently there are no systematic reviews on content validity of QoL instruments in women with POP using standardised methodology. The objective of this systematic review was to evaluate QoL measurement instruments in POP for the consideration of inclusion in a core outcome and core outcome measure set (COS and COMS). Specifically, we aimed to evaluate content validity of widely used PROMs to measure QoL following surgical interventions for POP, using COSMIN standards.
Study design, materials and methods
The design of this review was based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. A comprehensive literature search was conducted using EMBASE, MEDLINE, PsycINFO and PubMed, from their inception to December 2021. The search strategy consisted of 3 groups of search terms combined with the Boolean operator ‘AND’: 1) names of the PROMs 2) pelvic organ prolapse and 3) measurement properties. Our search terms were informed by previous systematic reviews published by a working group of CHORUS, an International Collaboration for Harmonising Outcomes, Research and Standards in Urogynaecology and Women’s Health (https://i-chorus.org). These reviews evaluated reporting of outcomes and outcome measures used in anterior compartment, posterior compartment and apical vaginal prolapse RCTs. For the third group of search terms, a previously developed search filter that retrieved studies on the measurement properties was applied (1). This filter was adapted for all other databases. Snowballing and hand searches were also conducted using Google Scholar. Case reports, non-randomised and retrospective studies were excluded.
The titles and abstracts were screened individually against the inclusion criteria. Content validity studies were considered eligible for inclusion if they were full-text original articles, about women with POP and assessed the comprehensibility, comprehensiveness, and relevance of at least 1 of the PROMs. Any studies that focused on the development of any of the PROMs were also included. Cross-cultural adaptations were included if they performed a pilot study of the adapted questionnaire to evaluate comprehensibility. Full text articles were retrieved for abstracts that met the inclusion criteria or in cases where a decision could not be made based on title and abstract. Disagreements were resolved through consensus meetings among the researchers.
Using the COSMIN methodology for assessing content validity of PROMs, data extraction comprised of 3 stages (2). The first step evaluated the quality of PROM development, where the concept elicitation and cognitive interview study were assessed. Next, the quality of additional content validity studies on the PROM was assessed, where patients and professionals were asked about the relevance, comprehensiveness and comprehensibility of the PROM. These steps were rated using a 4-point scale: ‘very good’, ‘adequate’, ‘doubtful’ or ‘inadequate’. The last step was evaluating the content validity of the PROM based on a summary of available evidence from the previous steps. Part of this included rating quality of evidence using a GRADE (Grading of Recommendations Assessment, Development and Evaluation) approach: ‘high’, ‘moderate’, ‘low’ or ‘very low’.
Results
After initially yielding 1476 results, 1308 records were removed following application of filters (human, full-text, English language) and deduplication. 168 records were screened via title and abstract against the inclusion criteria, of which 158 were excluded. 9 reports were assessed for eligibility by retrieving the full text, 4 of these were excluded for not concerning content validity. Following snowballing and hand-searching, 6 further studies were added. In total, 11 studies were included in this review (see Figure 1).
In total, 34 different QoL instruments from 117 RCTs were identified, of which 6 of the most commonly reported were included: ICIQ-VS, IIQ, P-QoL, PFDI, PFIQ and UDI. ePAQ-PF was also added following hand searches on Google Scholar.
Overall, PROM development was inadequate for all 7 instruments used to measure QoL in POP patients. While 7/7 PROMs involved patients in concept elicitation, only 3/7 PROMs included cognitive interviews with patients in their development (ePAQ-PF, ICIQ-VS, P-QoL). A total of 6 reports for 3/7 PROMs (PFDI, P-QoL, ICIQ-VS) assessing content validity were identified, demonstrating gaps in literature related to content validity studies. Quality of these studies was deemed doubtful to inadequate. The quality of evidence for all 7 PROMs included in this review was low to very low.
Interpretation of results
The results of this review illustrated gaps in evidence and quality of PROMs used in research and in the clinical management of POP. These PROMs should therefore be used with caution to interpret QoL.
Patient involvement is vital especially in the concept elicitation stage of PROM development. This is done by undertaking focus groups/interviews to generate items that reflect concerns of patients. To develop a PROM specifically for POP, it is advisable to recruit patients from such a population, to ensure items of a PROM mirrors the patient experience and are relevant and comprehensive. Furthermore, cognitive interviews are fundamental in broadening patient involvement. Only 3/7 PROMs included cognitive interviews, thus it is not known whether patients experienced any challenges with the questionnaire or whether item modification was necessary in the remaining 4 PROMs. This means essential information may be missing or inaccurate.
While there are recommendations for assessing content validity during the development of new PROMs, this review supports previous literature highlighting poor reporting of qualitative research. As a minimum, PROM development should include a literature review, concept elicitation reviews or focus groups, data analysis, item generation and cognitive interviews.
Previous systematic reviews by CHORUS have highlighted the inconsistency and variations of outcome reporting in women with POP, as 34 different PROMs to measure QoL were identified. This study was carried out as part of a wider project to establish COS and COMS to reduce heterogeneity of outcome reporting in trials.
Concluding message
This review has shown gaps in quality of evidence around content validity of common QoL instruments used in women with POP (epAQ-PF, ICIQ-VS, IIQ, P-QoL, PFDI, PFIQ and UDI). There is an urgent need for adequate development and validation of PROMs. Researchers should consider using robust methodology. Content validity is the first measurement property to consider when selecting a PROM, and this study has illustrated that it is under investigated in patients with POP. These findings may contribute to the wider development of COS and COMS in POP to improve quality in research and clinical practice.