__STYLES__
Tools used in this project
Wordle Gameplay - Excel

About this project

ABOUT THE DATA

This project uses a Kaggle dataset called “Wordle Games Dataset” compiled by Oscar Calvert Sisó. The data consisted of the results of Wordle games posted to Twitter between Jan 1, 2022 and Nov 16, 2022. Each record contained the details for 1 tweet including:

  • The game number
  • An anonymized username (integer)
  • The turn number of the solve (1 – 7), where 7 represents an unsuccessful solve
  • The color-coded letter grid
  • The target word (solution)

Even though the data collected from Twitter is substantial (6.8 M records from almost 1 M unique users), the population of Wordle players is bigger (I don’t tweet my daily results). It’s also possible that scores of 7 are underreported because people might be reluctant to post when their streak breaks.

SCOPE OF PROJECT

Many theories have sprung up about the optimal way to play Wordle. Even so, I’m more interested to use this data to assess the performance of these players than to sketch an ideal playing strategy. This dataset also allows us to determine which words proved to be easy or elusive targets for these players, making it possible to describe the lexical traits of hard or easy Wordle solutions.

INSIGHTS

Over the course of the 320 days of play, the most users posted on 1/31/22 (almost 32K). That number steadily decreased. An average of about 21K users posted their Wordles daily to Twitter.

The overall average score hovers a little north of 4. But Wordle threw more curveballs at the end of the year: I see wider swings in the average score in Sep and Oct 2022 compared with the spring.

There are two ways to describe success in Wordle: solving as quickly as possible using the fewest possible turns OR merely surviving by getting the answer within 6 tries. To assess the performance of these users we can and should use both metrics.

Getting to the right word quickly is the true mark of an easy puzzle because a short path to the solution does not leave as much space for getting stuck. The easiest words in this sample (according to average # of turns) were TRAIN, RAINY, DREAM, ALIEN, and TREAT. Four of these five words contain diphthongs—two vowels making a single sound. Similarly 3 have a consonant + R cluster. 3 are single-syllable words.

Getting to the right word with effort represents a moderate result. It’s never a shame to keep your streak alive, but a player struggles more the longer they go without a solution.

So the most difficult words are those that break people’s streaks, the words that people can’t get in 6 tries. The words that broke the most people’s streaks in this sample were PARER, CATCH, MUMMY, FOYER, and LOWLY. I note the frequency of repeated letters and double consonants in every word but FOYER. The other source of trouble present here is final -ER, -ATCH, and -LY, which, in the absence of further information admit of numerous solutions (e.g. BATCH, HATCH, MATCH, PATCH, WATCH).

Discussion and feedback(0 comments)
2000 characters remaining