A good evaluation survey is one that accurately measures the things we want to know. One common challenge with getting accurate results is ceiling effects – a design problem in which survey participants respond overly positively to questions related to their knowledge, attitudes, or behaviours before a program intervention. Ceiling effects cause challenges when assessing how well participants have improved after participating in a program because there is little room for them to rate themselves higher on post-program questions. For example, if a survey tests participants’ knowledge before beginning a class and most participants score high, it would be difficult to discern whether taking that class would improve participants’ knowledge.
TNC recently collaborated with Big Brothers Big Sisters Canada (BBBSC) to design and pilot a survey to obtain accurate results that avoid ceiling effects. The survey assessed BBBSC’s mentoring programs’ impact on youth’s social and emotional competency and their postsecondary/career readiness. Our primary objective was to gauge BBBSC’s program effectiveness accurately while minimizing survey biases. We want to share the success of this program’s findings and give tips for executing surveys that circumvent issues with ceiling effects:
Pre-then-post survey design: Sometimes, programs have participants complete two surveys – one prior to the program and another after completion. However, this method often results in ceiling effects because participants tend to overrate themselves on the initial survey, unaware of what they lack. A way to avoid this problem is using a pre-then-post survey design. In this design, participants are asked to rate their skills, attitudes, and behaviours after completing the program. Then, they are asked to reflect and rate the same skills, attitudes, and behaviours before participating in the program. This design helps avoid ceiling effects, as participants are less likely to overestimate their skills and abilities when required to reflect on their past behaviours.
This design was highly effective in our collaboration with BBBSC because it minimized the presence of ceiling effect in nearly all the questions. In this case, using the pre-then-post design gave us the confidence to present more precise and reliable results to our client.
Nonetheless, we also approach this strategy cautiously, recognizing the potential for participants to overstate the program’s impact to please the evaluators. For example, if a participant believes that you want them to report they experienced a change, they may reflect that change in their responses instead of being honest.
Using longer response scales: Developing questions that use Likert scales with more categories can help capture the impact of programs when pre-program scores lean towards being high. Typically, evaluation surveys use a 5-point scale (e.g., strongly disagree, disagree, neither disagree/agree, agree, strongly agree), but evaluators may want to increase these to a 7-point scale or a 10-point scale. A more expansive scale will give participants more room to rank themselves higher when answering post-program questions.
In our work with BBBSC, implementing a 7-point scale successfully captured the slight – yet positive – changes that BBBSC mentoring had on youth’s social and emotional competence, well-being, and postsecondary readiness.
Image shows an example of how a 7-point Likert scale can be used to capture small, but important improvements.
Determine the appropriate audience to survey: In our work with BBBSC, we discovered that youth who participated for more extended periods in the mentoring program were more likely to show more significant and more positive changes post-program. When designing surveys, it needs to be well-thought-out as to what exact groups need to be surveyed to capture the most accurate impacts of the survey. This may include only surveying those who participated in certain parts of the program or stayed for a specific duration. Including the appropriate sample in the survey will help mitigate potential ceiling effects and other biases that arise during the survey design.
Using open-ended questions: In some cases, numerical data may present inaccurate and sometimes disappointing findings, no matter how well a survey is designed. A mix of open-ended questions in a survey may help balance out a lack of quantitative findings. In certain instances, qualitative data proves more effective in capturing a program’s impact compared to quantitative data. We included a mix of both open-ended and closed questions throughout our survey with BBBSC. This approach proved successful as it enabled us to collect precise numerical data alongside insightful, rich text that effectively conveyed the benefits of the mentoring program (see the image below for an example).
Image shows an example of how using open-ended questions can produce more robust findings.
These strategies have helped TNC design effective and robust tools for our clients. With these tips, we hope you can move forward to create surveys that are reliable and strong measures for your evaluation needs.
To read more about our work with BBBSC see our National Summary of the Big 3 Growth Survey.