There were 2495 abstracts screened and 67 full-texts reviewed, from which 12 studies met the selection criteria and were included in this review (Figure 1). In those studies, 12 different PROMs were identified, including the Australian Pelvic Floor Questionnaire (APFQ), the Body Image in Pelvic Organ Prolapse Questionnaire (BIPOP), the electronic Personal Assessment Questionnaire-Pelvic Floor (ePAQ-PF), the International Consultation on Incontinence Questionnaire-Vaginal Symptoms (ICIQ-VS) module, Pelvic Floor Distress Inventory (PFDI) and its short form (PFDI-20), the Pelvic Floor Impact Questionnaire (PFIQ) and its short form (PFIQ-7), the Pelvic Organ Prolapse Symptom Score (POP-SS), the Pelvic Organ Prolapse-Urinary Incontinence (PISQ-IR) questionnaire, the Prolapse Quality of Life Questionnaire (P-QOL), and the Sheffield Prolapse Symptoms Questionnaire (SPS-Q). Most PROMs were only evaluated on one or two measurement properties; however, the ICIQ-VS was assessed on seven properties. The most evaluated measurement properties were internal consistency (6 PROMs, six studies) and responsiveness (11 PROMs, ten studies).
Five of the included studies reported on PROM development methodologies; the BIPOP study was rated as having Adequate ROB, the P-QOL and the POP-SS were rated as having Doubtful ROB, and the ICIQ-VS and the SPS-Q were rated as having Inadequate ROB. The PROMs that were assessed for content validity were the BIPOP, ICIQ-VS, P-QOL, and SPS-Q. All studies were rated as having Doubtful ROB because of not conforming with the COSMIN standards of reporting at least two researchers in the analysis of relevance, comprehensiveness, and comprehensibility.
All other measurement properties are summarized in Table 1. Only two studies reported on structural validity (BIPOP and the ICIQ-VS). Internal consistency was the best-performing measurement property across PROMs, with all six studies on six different PROMs (BIPOP, ICIQ-VS, PFDI-20, PFIQ-7, POP-SS, and P-QOL) receiving the lowest ROB and the highest rating of Very Good.
All four studies on reliability (PROMs: BIPOP, ICIQ-VS, P-QOL, SPS-Q) received a ROB rating of Doubtful as the studies either did not report on how they confirmed the stability of the measured construct or of the testing conditions. Regarding the measurement quality, only the BIPOP and the ICIQ-VS provided results that met a Sufficient rating. No studies investigated measurement error.
The methodological quality of the studies assessing construct validity (concurrent validity, discriminative validity) and responsiveness varied across studies, with the high ROB ratings being primarily due to the incorrect or the incomplete reporting of interventions, statistical analysis, or results (see Table 1). Most PROMs received a Sufficient rating for the measurement quality of responsiveness in a surgical setting. Only five studies evaluated PROM responsiveness in a conservative management setting (PFDI, PFDI-20, PFIQ, POP-SS, PSIQ-IR).