Common_Core_for_Range_Program.zip
Common_Core_Sublists.zip
The Common Core List (CCL)
Dee Gardner Brigham Young University
Basic Description
BNC-COCA Core-4 on Lextutor utilizes the Common Core List (CCL) described in Gardner (2013) (http://www.routledge.com/books/details/9780415585453/. The actual list is downloadable at http://www.routledge.com/cw/rial/p/vocabulary/ . Essentially, the CCL was established by determining the shared words from the top 4,000 word families of the British National Corpus (BNC) per the Range Vocabulary Program (Source), and the top 4,000 lemmas of the Corpus of Contemporary American English (COCA) per Davies and Gardner (2010). Comparing word families (BNC) to lemmas (COCA) provided some measure of control for frequency biases associated with word families. The shared words were classified into Sublists A, B, and C, reflecting their relative impact in terms of common (shared) frequency. Sublist A was further subdivided into function words (A Function) and content words (A Content). For educational purposes, a handful of lower-frequency function words were subsequently added to A Function, and the remaining 64 0f 570 AWL words (Coxhead, 2000) not already in the sublists were added to Sublist C. These procedures resulted in the following numbers of word families in each of the CCL sublists:
- A Function: 150 > basewrd1
- A Content: 999 > basewrd2
- B Content: 821 >basewrd3
- C Content: 887 >basewrd4
Total: 2,857
Why the CCL?
1. With a few minor exceptions involving the function words and Sublist C (noted above), the CCL sublists represent true shared frequency between the BNC and COCA. In this regard, they differ from the first two 1,000 word family lists of BNC-COCA (25), which were obtained from a more specialized corpus, not the BNC or COCA. If your interest is in the highest frequency words of general English (~ top 3,000), then the CCL should be a more reliable option. If you need to account for word families beyond this, then the BNC-COCA (25) is the better option.
2. All sublists of the CCL contain less than 1,000 word families, and the total of all four sublists is only 2,857 word families, which is 143 word families less than the first three 1,000 word tiers in either BNC-COCA (25) or BNC (20). Despite having 143 fewer families, the CCL often provides higher coverage than the first three tiers of the other two versions.
3. As with the traditional GSL-AWL version of Range, the CCL allows users to get a quick look at words not found on any list (topic and domain specific words), but with the added confidence of knowing that the core words accounted for in the CCL were derived from two large and contemporary corpora, representing two major registers of English (British and American).
4. The CCL version on Lextutor allows users to easily separate the coverage of high frequency function words from high frequency content words, giving a much clearer picture of true sublist impact.