Cambridge YLE Vocabulary Dataset

A cover image of a stylized spreadsheet.

Published 2024 May 28th

Cambridge YLE
ESL
Teaching resource

Cambridge English offers a range of very nice assessments for learners of English as a second language. I personally have been preparing students to take their Young Learners English (YLE) examinations for a quite a few years now. They publish lists of vocabulary that students should know at each level, but I’ve only ever found it in hardcopy or PDF formats, like this one:

https://www.cambridgeenglish.org/images/149681-yle-flyers-word-list.pdf

That’s nice, but hardly flexible. If I wanted to get, for example, a list of all nouns with irregular plurals used in Starters and Movers, then I’d have a lot of work ahead of me flipping through pages and trying to copy lists of words from a fiddly PDF document. To make tasks like this easier I collected all of the words and assigned them a bunch of true/false values according to their properties.

The spreadsheet linked below is the quickest way to view and interact with the dataset. You can make a copy of your own and use filters to find words with any combination of properties you wish. How about a list of all the words that are different between British and American English? A list of words related to animals and the natural world? Well, that’ll be a lot easier now.

https://docs.google.com/spreadsheets/d/1_JpZPO8QjzKbOrhq75Pu4JFsc7bTBonr16KxAigXrFw/edit?usp=sharing

For those of you who’d like a csv file of the raw data. here you go:

https://github.com/ozbonus/yle-vocabulary-dataset