Printed by permission The Journal of Band Research

All correspondence about new subscriptions, renewals, and changes of address should be sent to Deanna Ernsberger, Managing Editor, Journal of Band Research, Library Serials Department, Troy University, Troy, Alabama 36082.

Checks should be made payable to: Journal of Band Research. When reporting a change of address, please give the old address as well as the new.

Email: dernsberger@troy.edu
Phone: 334-670-3252
 
Subscription Rates
 
Subscriptions are available to individuals and institutions at the rate of $15.00 for one year $29.00 for two years, and $42.00 for three years. Add $5.00 postage per year for subscriptions to foreign countries. The Journal of Band Research is published by Troy University twice each (school) year, and is mailed from Troy University, Troy, Alabama.

 

 

 

A Study of the Reliability of Adjudicator Ratings at the 2005 Virginia Band and Orchestra Directors Association State Marching Band Festivals

 

Stephen E. King and Vernon Burnsed

Department of Music 0240

Virginia Tech

Blacksburg, VA 24061

 

 

Abstract

 

The purpose of this study was to examine the reliability of adjudicator ratings at the 2005 Virginia Band and Orchestra Directors Association State Marching Band Festivals.  Judge evaluation sheets for 124 bands were collected and tabulated from five festival sites throughout Virginia during October of 2005.  The subsequent data were analyzed for the reliability of the final ratings, caption ratings, reliability at each site, differences in reliability for three versus five judges, and differences between evaluations for small and larger bands. Inter-correlations and Cronbach Alphas indicated that the overall judge reliability was very good, alpha = .94.  There was also no significant variation in reliability among the five sites or between three or five judges. The inter-correlations of the caption ratings were very high, suggesting one global evaluation factor. A one-way analysis of variance indicated that smaller bands were rated significantly lower than larger bands. These findings were in general agreement with other research in this area. The high inter-correlations of the caption ratings suggested that the evaluation form as currently utilized may lack diagnostic validity.

 

 

Participants in marching band festivals are frequently concerned about the reliability of the ratings received at these events. Band directors may question whether their groups were evaluated according to equitable standards and whether or not the judges agreed with one another. Research tends to support these concerns. Rickles (2006) found that larger band ratings in Arizona were significantly different than small band ratings, and Brakel (2004) reported a difference in ratings consistency between advanced and less experienced bands. Sullivan (2003) suggested that differences in ratings between large and small bands might be due to adjudicator bias or the particular classification system.

Several studies of band and orchestra adjudication have indicated differences between the reliability of the final and caption ratings (Burnsed, Hinkle and King, 1985; Burnsed & King, 1987; Garman, Boyle and DeCarbo, 1991). The results of these studies suggest that adjudicators generally agree on the final rating an ensemble receives. There is often, however, disagreement between adjudicators on caption ratings. This would suggest that adjudicators are really offering the final rating as a global assessment and are making the caption ratings fit the overall rating (Burnsed and King, 1987).

The focus of much of the judge reliability literature has been the evaluation of solo, concert festivals, or recorded examples of performances (Fiske, 1983; Burnsed and King 1987; Garman, Boyle and DeCarbo, 1991; Bergee, 2003; Norris and Borst, 2007; Smith and Barnes, 2007). Only a few of these studies have used actual data from festivals and very little research has been conducted on the adjudication of marching band festivals. The purpose of the present study was to examine the reliability of adjudicator ratings at the 2005 Virginia Band and Orchestra Directors Association State Marching Band Festivals. We wanted to see if some of the often-mentioned concerns about judging and festivals were evident at the 2005 marching festivals sponsored by the Virginia Band and Orchestra Directors Association (VBODA).

 

This study attempted to address the following research questions about the 2005 VBODA Marching Band Festivals:

 

  1. What is the reliability of the adjudicator final ratings?
  2. What is the reliability of the adjudicator caption ratings?
  3. Was the reliability of ratings different at each site?
  4. Was the reliability of ratings different for three and five adjudicators?

5.   Did larger bands receive higher ratings than smaller bands?

 

 

Method

 

The Virginia Band and Orchestra Directors Association sponsored marching festivals at five sites in Virginia during October of 2005. The VBODA festivals use the Olympic scoring system. Five judges rate each ensemble but the highest and lowest ratings are not used for determining the final overall rating. Each participating band was awarded a 1 (superior) through 5 (poor) evaluation on Music Performance: Quality of Sound, Technique, Musicianship; Visual Performance: Technique and Ensemble; General Effect: Design and Performance; and Final Rating (see Figure 1).  The State Marching Band Festival Chairman collected rating Recap Sheets and Tally Sheets from each festival site.  To assure confidentiality the chairman replaced the band names with a random number and the judges for each site were listed as Judge 1, Judge 2, Judge 3, Judge 4 and Judge 5. A number (1, 2, 3, 4 or 5) also replaced each site name. All data were then entered into an Excel data file. The Final Rating and five caption ratings for each band were entered into the data set. Ratings from five judges for 124 marching bands were recorded and analyzed using the JMP 6.0.2 statistical analysis software.

 

 

Results

 

A multiple correlation of all the judges’ final ratings revealed very high inter-correlations and a Cronbach Alpha of .94. The Olympic scoring format, throwing out the high and low judges, did not affect the reliability significantly (.93). A random sample of frequency count analyses also revealed no effect on final ratings when the ratings of the high and low judges ratings were not counted.

The reliability of all the adjudicator ratings, caption and final, was also found to be very good (.95). Comparison of ratings at each individual site revealed comparable reliability at each site (Site 1 = .94; Site 2 = .95; Site 3 = .93; Site 4 = .91; Site 5 = .94).

A multiple correlation of all the captions and the final ratings revealed that all the caption ratings are closely related to each other and to the final rating. See Table 1. Initial review of these correlations suggested that the best predictors of the final rating might be Quality of Sound, r = .84; Technique, r = .88; and Musicianship, r = .86. A factor analysis, however, revealed that one factor (Eigenvalue = 4.82, Percent = 80.43) contributed to more than 80% of the total variance. This indicated that all the ratings are so closely related that they represent a single factor.

A one-way analysis of variance of the final ratings revealed significant differences between the ratings of bands according to class. (F = 5.8, 4/123, p < .0002) A Tukey HSD post hoc test revealed that 1A bands were rated significantly lower than 4 and 5A bands. See Table 2. Evidently there is a relationship between final rating and size or classification of band.


 

 

Discussion

 

The overall reliability at the 2005 VBODA Marching Festivals was very good. Judges were in close agreement about the caption and final ratings a band received.  Reliability at each festival site was also very good and the Olympic scoring system did not significantly affect reliability. These findings agree with Burnsed, Hinkle and King (1985), which also found good reliability among judge final ratings of concert festivals. One different finding was the very close agreement among all judges for the caption ratings. Analyses revealed that the caption ratings were so closely related that they really did not represent distinct categories. One general factor accounted for eighty percent of the variance in the ratings. This might suggest that judges are really giving a global final rating without much consideration for individual caption ratings. This has been suggested by other studies (Burnsed & King, 1987; Garman, Boyle and DeCarbo, 1991). These other studies, however, also found some disagreement between judges on some caption ratings.

            The results of this study also indicate that smaller bands are rated lower than larger bands. There was a significant difference between the ratings of 1A and 4 and 5A bands. This has also been documented in other research (Rickels, 2006). The mean rating of 1A bands in this study was generally a whole point lower than the mean rating of 4 and 5A bands.  Further research is needed to understand this finding. It may be that smaller bands are generally less effective as marching units. Errors or inconsistencies may be more noticeable in a smaller group than in a larger one. Festival organizers should consider ways to alleviate this inherent fault for smaller groups.

All of the analyses suggest that the adjudication form used by VBODA in 2005 is very reliable. Band directors may be confident that the judges agreed on their ratings. The very high inter-correlations among caption variables, however, suggest that the caption ratings may not be measuring distinct elements. This may raise questions about the diagnostic validity of the form. A more criteria specific rating scale/form as suggested by Saunders and Holahan (1997) might provide a more descriptive analysis of a marching band’s performance. This too might help pinpoint why smaller bands receive lower ratings.  Criteria specific rating scales utilize rubrics that give descriptive levels of achievement for each dimension of performance. Studies with these scales report consistent judge evaluation and specific feedback to guide future instruction (Norris and Borst, 2007; Smith and Barnes, 2007). Given the concerns of directors and the results of this study and others, the development of a criteria specific rating scale for marching band seems both desirable and feasible. 

 

 

References

 

Brakel, T. D. (2004). Inter-judge reliability of the Indiana State School Music Association high school instrumental festival.  Unpublished paper, University of Toledo, Toledo, Ohio.

 

Bergee, M (2003). Faculty interjudge reliability of music performance evaluation. Journal of Research in Music Education, 51, 137-150.

 

Burnsed, V., Hinkle, D. and King, S. (1985).  Performance evaluation reliability at selected concert band festivals.  Journal of Band Research, 21(1), 22-29.

 

Burnsed, V. & King, S. (1987). How reliable is your festival rating? Update: The Applications of Research in Music Education, 5(3), 12-13.

 

Conrad, D. (2003).  Judging the judges:  Improving rater reliability at music contests.  Retrieved from http://www.manteno.k12.li.us/finearts/assess/jrgereliability.pdf

 

Garman, B. R., Boyle, J. D. and DeCarbo, N. J. (1991).  Orchestra festival evaluations: Interjudge agreement and relationships between performance categories and final ratings.  Research Perspectives in Music Education, 2, 19-24.

 

Fiske, H. (1983). Judging musical performance: method or madness. Update: The Applications of Research in Music Education, 1(3).

 

Norris, C. E. and Borst, J. D. (2007). An examination of the reliabilities of two choral adjudication forms. Journal of Research in Music Education, 55, 237-251.

 

Rickels, D. A. (2006).  A comparison of contributing variables in Arizona marching band festival results.  Unpublished dissertation, University of Arizona, Tempe.

 

Saunders, T. C., and Holohan, J. M. (1997). Criteria-specific rating scales in the evaluation of high school instrumental performance. Journal of Research in Music Education, 45, 259-272.

 

Smith, B. P. and Barnes, G. V. (2007). Development and validation of an orchestral performance rating scale. Journal of Research in Music Education, 55, 268-280.

 

Sullivan, T. M. (2003).  Factors influencing participation of Arizona high school marching bands in regional and state festivals (Doctoral dissertation, University of Arizona, 2003).  Dissertation Abstracts International, 64 (02A), 388.

 


 

 

Table 1

Inter-Correlation of Ratings

 

 

Quality

Technique

Musicianship

Tech/Ens

Design

Final

 

Quality

1.0000

0.7705

0.7733

0.6616

0.7008

0.8479

Technique

0.7705

1.0000

0.8117

0.7047

0.7350

0.8854

Musicianship

0.7733

0.8117

1.0000

0.6565

0.7379

0.8679

Tech/Ens

0.6616

0.7047

0.6565

1.0000

0.7116

0.7614

Design

0.7008

0.7350

0.7379

0.7116

1.0000

0.8241

Final

0.8479

0.8854

0.8679

0.7614

0.8241

1.0000

 

 

Table 2

Mean Final Ratings by Class

 

Level                            Mean

1A       A                    2.1750000

3A       A         B          1.8947368

2A       A         B          1.7777778

4A                  B          1.2500000

5A                  B          1.2500000

 

 

Levels not connected by same letter are significantly different.

 

 

 

Figure Caption

 

Figure 1. The VBODA State Marching Band Festival Adjudication Form

 

 

Author Bios

 

Stephen King, Ed.D. is Visiting Professor of Music Education at Virginia Tech. Vernon Burnsed, Ph.D. is Professor of Music and Coordinator of Music Education Virginia Tech.

 

back to State Marching Festival page