My Statistical Odyssey: How I Finally Conquered “The Art of Statistics” (without a brain aneurysm)

Gather ’round because I have a tale to tell. A tale of statistical daring-do, of intellectual battles fought and won (eventually), and of a book that nearly broke me but ultimately sparked a lifelong love affair with data.

The hero of our story? “The Art of Statistics” by David Spiegelhalter. The villain? My own statistically insignificant attention span.

Our story begins in 2019, a simpler time when “pandemic” was just a scary word in a board game and sourdough starter wasn’t a mandatory kitchen accessory. I bright-eyed and bushy-tailed, decided to tackle this tome, convinced I would emerge a statistical savant, capable of predicting the lottery numbers and the exact moment my toast would burn.

Turns out, statistics is a bit more complicated than the pie charts I used to colour in at school. Who knew? So began my years-long wrestling match with this book. I’d read a chapter, feel my brain cells staging a mass exodus, and promptly retreat to the soothing embrace of a comic, minecraft or Fortnite. Rinse and repeat.

But like a stubborn stain on my favorite shirt, I just couldn’t get rid of this book. So, I persevered. I re-read chapters. I Googled terms that sounded like they belonged in a Harry Potter spellbook (“heteroscedasticity,” anyone?). I even resorted to drawing diagrams on my windows with dry-erase markers (much to the confusion of my neighbours).

And slowly, miraculously, something started to click. David Spiegelhalter, bless his statistically significant heart, has a way of making even the most mind-bending concepts understandable. He’s like the data whisperer, the statistical Yoda, the… okay, I’ll stop with the analogies. But seriously, his writing is engaging, witty, and surprisingly relatable. Plus, the examples he uses are fascinating – from the probability of winning the lottery (spoiler alert: don’t quit your day job) to the statistical quirks of birth dates and death rates.

This book, my friends, was a journey. A statistical odyssey, if you will. It challenged me, frustrated me, and ultimately, inspired me. It sparked a curiosity about data that led me to the Google Data Analytics course I’m currently immersed in (more on that in another blog post, because this one is already longer than the average attention span, statistically speaking).

So, what’s the moral of the story? Well, first, never underestimate the power of a good book. Second, statistics can be fascinating. And third, if I can conquer “The Art of Statistics,” then by the transitive property of awesomeness, I can probably conquer this data analytics course too.

P.S. Pelican Books, you guys are the real MVPs. Bringing back all those school textbook memories (the good ones, mostly). And for publishing this gem of a book? You deserve a statistically significant high-five.

Using OpenAI’s API

I enrolled in this course in May, a time when access to OpenAI was limited and its commercial model was still under development. Hence, leveraging the API emerged as the most straightforward method to use the platform. Jose Portilla’s course on Udemy brilliantly introduces how to tap into the API, harnessing the prowess of OpenAI to craft intelligent Python-driven applications.

The influx of AI platforms and services last summer indicates that embedding AI models into developments has become a standard practice.

OpenAI’s API ranks among the most sophisticated artificial intelligence platforms today, offering a spectrum of capabilities, from natural language processing to computer vision. Using this API, developers can craft applications capable of understanding and interacting with human language, generating coherent text, performing sentiment analysis, and much more.

The course initiates with a rundown of the OpenAI API basics, including account and access key setup using Python. Following this, learners embark on ten diverse projects, which include:

  • NLP to SQL: Here, you construct a POC that enables individuals to engage with a cached database and fetch details without any SQL knowledge.
  • Exam Creator: This involves the automated generation of a multiple-choice quiz, complete with an answer sheet and scoring mechanism. The focus here is on honing prompt engineering skills to format text outputs efficiently.
  • Automatic Recipe Creator: Based on user-input ingredients, this tool recommends recipes, complemented with DALLE-2 generated imagery of the finished dish. This module particularly emphasizes understanding the various models as participants engage with the Completion API and Image API.
  • Automatic Blog Post Creator: This enlightening module teaches integration of the OpenAI API with a live webpage via GitHub Pages.
  • Sentiment Analysis Exercise: By sourcing posts from Reddit and employing the Completion API, students assess the sentiment of the content. Notably, many news platforms seem to block such practices, labeling them as “scraping.”
  • Auto Code Explainer: Though I now use Co-pilot daily, this module introduced me to the Codex model. It’s adept at crafting docstrings for Python functions, ensuring that every .py file returns with comprehensive docstrings.
  • Translation Project: This module skims news from foreign languages, providing a concise English summary. A notable observation is the current model’s propensity to translate only to English. Users must also ensure they’re not infringing on site restrictions.
  • Chat-bot Fine-tuning: This pivotal tutorial unveils how one can refine existing models using specific datasets, enhancing output quality. By focusing on reducing token counts, learners gain insight into training data pricing, model utility, and cost-effectiveness. The module also underscores the rapid evolution of available models, urging students to consult OpenAI’s official documentation for the most recent updates.
  • Text Embedding: This segment was a challenge, mainly due to the intricate processes of converting text to N-dimensional vectors and understanding cosine similarity measurements. However, the module proficiently guides through concepts like search, clustering, and recommendations. It even delves into the amusing phenomenon of “model hallucination” and offers strategies to counteract it via prompt engineering.
  • General Overview & The Whisper API: Concluding the course, these tutorials provide a holistic understanding of the OpenAI API and its history, along with an introduction to the Whisper API, a tool adept at converting speech to text.

It’s noteworthy that most of the course material utilized the ChatGPT-3.5 model. However, recent updates have introduced a more efficient -turbo model. Additional information can be found here.

The course adopts a project-centric approach, with each segment potentially forming the cornerstone of a startup idea. Given the surge in AI startups, one wonders if this course inspired some of them.

This journey unraveled the intricate “magic” and “engineering” behind AI, emphasizing the importance of prompt formulation. Participants grasp essential elements like API authentication, making API calls, and processing results. By the course’s conclusion, you’re equipped to employ the OpenAI API to develop AI-integrated solutions. Prior Python knowledge can be advantageous.

New teachings, new learnings

So old man loves to learn and has been lucky enough to have found time this year to explore some new teachings. 

Firstly I have picked up Python again and have been learning all about version 3 and trying to become proficient again. By mentioning this in my blog now, anyone can pull me up and ask how I am getting on – this should prompt me to continue the learnings – feel free to suggest some projects I can complete. However I must try not to disappear down rabbit holes like https://www.modular.com/mojo which will be what I move onto once I get through the Udemy course (https://www.udemy.com/course/complete-python-bootcamp/) from Jose Portilla (https://www.udemy.com/user/joseportilla/).

Secondly I have committed to improve my Spanish from the tourist drivel of “dos cervezas, por favor” that got me by in my 20s, travelling Spain with Ricardo and Graham. I have trusted that learning to Duolingo (https://www.duolingo.com/), which I have to say I am really enjoying and dare I jinx it, feel like some of it is sticking. I will be looking for some Spanish speakers in August to start practicing if you fancy helping and have lots of patience. I am enjoying learning Spanish, it is such a beautiful language and will be fun to be able to communicate with people from other cultures.

Thirdly the good old guitar. Jeez I cannot count the number of times I have tried to learn the guitar and never been successful. I am not sure if I am tone deaf or if I just do not have the patience, more likely just have zero talent when it comes to music. However I will strum away and attempt yet again to learn some basic tunes. Kumbaya might be the best I can do.

I am excited to continue learning, with the future so uncertain it is hard to know what skills will be needed in 6 months let alone 2 years, but I am open to new challenges and welcome the coming changes. I would also like to share my learnings and experiences with others, but have not found a way to do that yet?

If you are interested in learning new skills, I encourage you to give it a try. It is never too late to learn something new, and it is very rewarding to achieve a goal.

I hope you enjoyed my blog post. Please feel free to leave a comment below.

“The mind is like a muscle.
The more you use it, the stronger it gets.”

– Henry Ford