The criteria used for this critique were derived from relevant nursing literature (Feninstein & Horwitz, 1997; Cormack, 2000; Khan et al, 2003). About a dozen criteria were specified: design, sample, inclusion/exclusion criteria, time frame of study, data collection, reliability & validity, and data analysis.
Catlette (2005) used a qualitative design. While this approach has its merits, principally a greater degree of realism and richer data, it has a number of significant drawbacks (Coolican, 1994). Observations are typically unreliable. In other words, if the same nurses were interviewed on several different occasions, about workplace violence, using the same open-ended interview protocol, their responses may vary somewhat. Various biases creep in, often caused by situational factors (e.g. open-ended questions, a very violent week followed by a particularly calm week), or personal considerations (e.g. memory deficits). Furthermore there is low internal validity. This means that it is difficult to establish with any certainty the relationship between variables, due to the lack of statistical analysis (which can estimate the probability that results occurred by chance). For example, Catlette’s interview data suggests a link between workplace violence and feelings of vulnerability amongst nurses. However, the extent to which the former variable causes the latter cannot be reliably established in a qualitative study. Winstanley and Whittington (2004) enjoy the precision of a quantitative design. While internal validity is high, the level of realism is questionable. Participants were ‘forced’ to respond to predetermined questions (e.g. on physical assault) using a fixed response format (e.g. ‘Once’, ‘More than once’). Thus, the data obtained was heavily influenced by the kind of questions asked and the particular response format used. In the real world, health care staff may perceive the level of aggression in terms that don’t match the questionnaire format. For example, a nurse may perceive physical assaults as ‘sporadic’ or ‘once in a blue moon’. Since these categorisations weren’t available in the questionnaire, the study effectively lacks a certain degree of realism. In a qualitative design, subjects describe the world as they see it, rather than via terms imposed by the researcher.
Ideally a sample should be randomly selected so that it is representative of the population from which it was drawn, in this case nurses or health care professionals. This allows findings from a single study to be generalised to the wider community. Catlette (2005) used a convenience sample, meaning it wasn’t representative of nurses in general. Granted there are considerable practical and logistic difficulties in trying to recruit a random sample of nurses. Their busy schedules and irregular shifts, for example, hamper proper scientific selection. It is also quite common for small convenience samples to be used in qualitative studies, since it is often impractical to conduct in-depth interviews with large groups. Nevertheless, Catlette’s findings, while relevant to the particular trauma centres involved, are unlikely to apply to nurses in general. This is a serious limitation, since Catlette’s stated objectives suggest a general interest in the level of violence in hospital emergency departments, rather than the particular trauma units from which subjects were drawn. Winstanley and Whittington (2004) also appear to have a used a convenience sample: they simply invited health care staff working in a general hospital, and who had regular contact with patients, to participate. Although the target sample was quite large (a bigger sample improves representation), only a minority of staff actually completed and returned questionnaires. All in all, participants weren’t recruited randomly, therefore the findings cannot be generalised to the wider population of health care staff.
Both studies seemed to have clear inclusion/exclusion criteria. Catlette (2005) only recruited and interviewed nurses who were registered, worked in a level 1 trauma centre, and had experienced workplace violence. A clear definition of what constituted violence was developed, helping to minimise any ambiguities about eligibility. Winstanley and Whittington (2004) also specify inclusion criteria. Only health care staff that had regular and substantial contact with patients were invited to participate. What constituted ‘regular’ and ‘substantial’ contact was well defined (e.g. daily contact with patients). The advantage of having clear inclusion/exclusion criteria is that it helps the researcher recruit a homogenous sample. If the participants in a study are too diverse, this effectively introduces additional sources of error that may obscure interesting themes, or relationships between variables. Findings may be more difficult to interpret. However, a major disadvantage of a homogenous sample is that it is invariably ‘ad-hoc’, that is special or unique, and hence unlikely to reflect the wider community. Nevertheless, it can be argued that sample homogeneity isn’t problematic if the wider community of interest exactly matches the inclusion/exclusion criteria. For example, Winstanley and Whittington’s (2004) study was about patient aggression towards health care staff. Thus, the population of interest was invariably going to be staff that had regular contact with patients. In this respect the sample selected corresponds with the population of interest. However, randomly selecting nurses from the designate population would have provided a representative sample that permits useful generalisations. Simply using volunteers, as Winstanley and Whittington did is unscientific.
Time frame of study
Winstanley and Whittington’s (2004) study was effectively a retrospective (i.e. cross-sectional) survey. This means that data was collected at one point in time, specifically an 8-week period. Retrospective designs are considered inferior to prospective (i.e. longitudinal) designs in which data is collected on two or more occasions, over several weeks, months, or even years (Coolican, 2004). This method allows tentative causal inferences to be made – if a variable measured at Time 1 predicts or correlates with a factor measured at Time 2, then there is a possibility that former variable affected the latter, but not vice versa. Retrospective designs don’t allow for such inferences. Any correlations between variables are just that – correlations! There is no sequence that may help delineate possible causality. For example, in their introduction and statements of study aims, Winstanley and Whittington imply that particular professions (e.g. nurses, doctors) and hospital departments (e.g. medical, A & E) may elicit different levels of physical aggression experienced by staff. Thus, profession/department seemed to be conceptualised as causal factors. However, although data analysis revealed relationships between these factors and physical aggression, there is no provision in the retrospective design to infer causality, since all the variables are measured simultaneously. A prospective method in which profession/department predicts experiences of physical assault several weeks subsequently would be more conclusive. Catlette (2005) doesn’t explicitly state the time frame for her study, albeit interviews typically take several days, weeks, or perhaps months to complete. Notions of prospective and retrospective designs are typically associated with quantitative studies, and rarely applied to qualitative research. This is because qualitative studies are often exploratory, merely seeking to identify interesting phenomena rather than establish causal relationships between variables. Nevertheless, interviewing participants on two or more separate occasions can be used to demonstrate the robustness and reliability of any themes observed. For example, if the same themes emerge during interviews conducted at two different points in time, this would suggest that the themes are significant rather than fleeting.
Catlette (2005) appears to have used semi-structured interviews for data collection (Coolican, 1994). By asking every interviewee pre-set but open-ended questions in a particular sequence, she avoided the inconsistency and sloppiness often associated with wholly unstructured (i.e. casual) interviews. It is possible the interviews were informal but guided, meaning that pre-set questions were asked, albeit in no particular order. Either way, a guided or semi-structured interview suffers from certain constraints. Asking specific questions, albeit open-minded ones, restricts the interviewers flexibility to ask follow-up questions depending on the interviewees response. Interviews are also heavily affected by interpersonal factors, such as lack of rapport, physical attraction, and psychological manipulation. Winstanley and Whittington (2004) collected data via a questionnaire. This method has a number of limitations. One is the typically low response rate. Of 1141 questionnaires posted out to participants, only 375 (33%) were returned, denoting a considerable waste of resources. Often the questionnaires returned represent an unusually keen sub-sample that may differ in key respects from the original target group. This means that the researcher has to devote time and resources establishing what these differences are, and how they might affect the results. Furthermore, because the final sample is smaller, statistical power is reduced, increasing the possibility of a type II error. Another limitation of questionnaires is the use of restricted (or ‘forced choice’) response format. For example, subjects in Winstanley and Whittingtons’ (2004) study were forced to choose from three options – ‘none’, ‘one’ or ‘more than one’. Thus, there is no room for participants to qualify their answers, for example by pointing out memory lapses (e.g. ‘I can’t remember’), or indicating ambiguous experiences (e.g. ‘not sure’). All in all, these restrictions reduce the realism and richness of data collected. Interviewing subjects on the same issues, but using open-ended questions, will probably yield slight different outcomes to those reported by Winstanley and Whittington (2004). Another limitation is that the bulk of questionnaire communication is written. There is no provision to measure visual cues and gestures, which typically account for much of human communication, or even auditory cues. For example, a frown or grunt, may signify a particularly traumatising experiencing, which simply can’t be detected from questionnaire responses. Finally, questionnaires are often completed in the absence of the researcher (e.g. postal questionnaire), making it difficult to supervise the proceedings, or verify whether the subject is the same person who completed the questionnaire. Overall, these constraints negate the conclusiveness of Winstanley and Whittington’s (2004) findings.
In line with standard procedure in qualitative research Catlette (2005) performed thematic analysis to identify recurring patterns in the data. Meaningful information was extracted from the interview transcripts, after which themes were identified using a coding system. Although a highly useful procedure, Braun and Clarke (2006) note that thematic analysis has certain disadvantages. One is the possible overlap between themes. Catlette identifies two themes – vulnerability and inadequate safety measures. Categories, and subcategories reported suggest considerable overlap between these dimensions (e.g. the sentiment ‘feeling unsafe’ may depict both feelings of vulnerability and an unsafe environment). Another weakness is the high correspondence between the data collection questions (i.e. interview guide) and themes identified. In other words, the themes reported merely reflect the questions asked during the interview (e.g. questions on safety, such as “How do you feel about the safety of your workplace?” are bound to produce safety-related responses, and hence themes). This suggests very limited analytic work was done to identify themes independent of the interview format. Another shortcoming of thematic analysis is failure to incorporate alternative or contradictive data in the results reported. Catlette offers little if any account of oddities in the data that don’t necessarily fit the two emerging themes. For example, the interviews revealed that violence wasn’t a concern during interactions with co-workers. Clearly this revelation is incompatible with the notion of vulnerability and lack of safety in the workplace. Yet, little is made of this inconsistency, making Catlettes’ rather ‘tidy’ themes appear rather suspicious. Few data sets in qualitative research are completely harmonious with no contradictions, so a study that fails to report these oddities is highly questionable. Winstanley and Whittington (2004) employed an inferential statistical test to analysis their data, consistent with the quantitative design of their study. Chi-square was used to test for significant trends in the frequency of physical assaults as a function of different health care professionals (e.g. nurses and doctors) and hospital departments (e.g. medical, surgical, A & E). Chi-square was appropriate given that the data was categorical (i.e. in the form of frequencies). However, as a non-parametric test, chi-square lacks sensitivity. This combined with the limitations of frequency data (e.g. it fails to account for subtle degrees of variation between individual subjects or groups; for example, asking nurses if they’ve experienced aggression ‘once’ or ‘more than once’ fails to take into account any differences in the intensity and duration of these aggressive episodes), increases the risk of wrongly accepting the null-hypothesis.
Reliability & Validity
A major methodological concern in scientific research is reliability and validity. Reliability refers to the consistency of observations, while validity depicts the authenticity of observations. Both issues are particularly pertinent in qualitative studies, due to the lack of structure, precision, and quantification. Catlette (2005) appears to have taken steps to enhance reliability/validity. She kept a journal throughout the duration of the study, in order to identify any biases that may corrupt the data. Interviews were conducted using a standard protocol, then the data was transcribed verbatim, and analysed using regular procedures. However, these measures may be inadequate. Coolican (1994) identifies several procedures for ensuring good reliability, none of which appears to have been used by Catlette: triangulation, analysis of negative cases, repetition of research cycle, and participant consultation. Triangulation involves verifying emerging themes using another data collection method other than open-ended interviews. For example a questionnaire measure of perceived workplace violence and safety strategies could have been administered or close-ended interviews conducted. Data from these alternative methods could then be compared with the original observations to gauge the degree of consistency in emerging themes. Analysis of negative cases involves scrutinising cases that don’t fit the emerging themes. Repetition of research cycle entails repeatedly reviewing assumptions and inferences, to further verify emerging themes. Finally participant consultation involves communicating with participants to see if observations from the study match their own experiences. None of these measures seem to have been applied in Catlette’s study, raising serious concerns about the stability and authenticity of her observations. Winstanley and Whittington’s’ (2004) study doesn’t appear to have fared much better. Although the numerical precision inherent in quantitative designs offers some degree of reliability and validity, this is by no means guaranteed, and has to be demonstrated empirically. They fail to report any Cronbach Alpha reliability coefficients for the questionnaire used. Thus, it is unclear if the items in this instrument were internally consistent. Test-retest reliability wasn’t reported either, again raising questions about the consistency of participants responses over time. A badly designed questionnaire (e.g. one with ambiguous statements, or grammatical errors) could easily confuse participants, leading to irregularities in their responses over time. No information on validity is provided either. Normally, validity could be demonstrated by correlating data from the questionnaire with data from another measure of experiences of aggression (a high correlation would indicate good validity), submitting the questionnaire to a team of judges to ascertain if the content addresses all forms of human aggression (e.g. indirect forms of aggression, such as spreading rumours or social exclusion don’t appear to have been assessed), and even performing factor analysis to establish construct validity (i.e. verify the dimensions of aggression assumed to be measured by items in the questionnaire). These inadequacies render the findings from Winstanley and Whittington’s (2004) study inconclusive. For example, the claim that aggression is “widespread” is questionable because not all forms of aggression were measured.
Overall, both studies are fairly categorical in their conclusions. Winstanley and Whittington (2004) surmise that their data demonstrates the significant levels of aggression to which hospital staff are exposed. Catlette (2005) reaches a similar conclusion, emphasising the vulnerability and lack of safety perceived by nurses. However, both studies suffer from various analytic and methodological constraints. Perhaps the most serious of these is the apparent absence of reliability and validity measures that may reveal any volatility or misrepresentations in the data. These limitations mean that any conclusions have to be regarded as tentative, subject to further research.
Braun, V. & Clarke, V. (2006) Using thematic analysis in psychology. Qualitative Research in Psychology, 3, pp.77-101.
Catlette, M. (2005) A descriptive study of the perceptions of workplace violence and safety strategies of nurses working in Level I trauma centres. Journal of Emergency Nursing, 31, 519-525.
Coolican, H. (1994) Research Methods and Statistics in Psychology, London, Hodder & Stoughton.
Cormack, D. (2000) The Research Process in Nursing: Fourth Edition. London: Blackwell Science.
Eastabrooks, C.A. (1998) Will evidence-based nursing practice make practice perfect. Canadian Journal of Nursing Research. 30, pp.15-36.
Feninstein, A. R., & Horwitz, R. I. (1997) Problems in the “evidence” of “evidence-based medicine. American Journal of Medicine 103, 529-535.
Khan, K., Kunz, R., Kleijnen, J. & Antes, G. (2003) Systematic Reviews to Support Evidence-based Medicine: How to Review and Apply Findings of Healthcare Research. Oxford: Royal Society of Medicine Press.
Winstanley, S. & Whittington, R. (2004) Aggression towards health care staff in a UK general hospital: variation among professions and departments. Journal of Clinical Nursing, 13, pp.3-10.