
Algorithms and Theorems of ItOCO Chapter 1
I also call it the "ItOK" book. Funny because it's one of the coolest textbooks I've ever picked up. :D
Good news! I am almost done reading Chapter 2 of Elad Hazan’s excellent Introduction to Online Convex Optimization. That being said, reviewing previously read material is absolutely crucial for maximizing your brain’s encoding of newly-learned concepts and information.
Therefore, last weekend I decided to go over Chapter 1 one more time as a sanity check to confirm that I had indeed understood and indeed retained at least the majority of the material.
No better way to do just that, aside from re-reading the whole chapter, than to zero in on the MEAT of the chapter — the theorems and algorithms that every chapter is centred around.
So here we go, I present to you the 🥩🍖🥓🍗 e a t of Chapter 1.
Theorem 1.1
There does NOT exist a deterministic algorithm that can guarantee fewer than 2L mistakes, where L is the number of mistakes made by the best expert in hindsight.
Theorem 1.2
There is an efficient deterministic algorithm that can guarantee fewer than [see formula below] mistakes;
There is an efficient randomized algorithm for which the expected number of mistakes is at most [see formula below].
Algorithm 1.1
Hedge is the first concrete algorithm presented in the book that can actually be easily coded up for fun and it’s the subject of Theorem 1.3 below.
Theorem 1.3
This is a powerful result.
It basically says that the Hedge algorithm guarantees that the expected number of mistakes that it will make on average approaches the expected number of mistakes that the best expert will have made in hindsight plus some extra overhead that is a function of the square losses, log of the number of experts, and the hyper-parameter epsilon.
And voilà — this blog post will serve me as a nice Chapter 1 cheatsheet for my future review sessions. The chapter itself details the mathematical proofs for all of the above and generously ensures that the pedagogical (is that a word?) experience of the reader is very smooth (for the most part). You can tell that the author put a lot of effort into doing that.
Yours truly,
Andrew Pak
PS: I prepared all of the content for this blog post over the weekend, but didn’t want to publish it then as it was already late and nothing good comes out of pushing something into production late at night. 🔥😂
Thus I’m publishing this now while on a lunch break at work, so my employer cannot reproach me for doing any of this on “company time”.
Besides, reading this book is literally making me better at my job as a Machine Learning Engineer because I am acquiring a deep understanding of the mathematical ML theory that underlies all modern ML algorithms that power the coolest ML applications out in the wild.
By knowing precisely what makes each of the algorithms tick on the inside, what each of its parameters actually does under the hood, I will become more effective at building my ML models at work.
PPS: This is the longest post-scriptum I have ever written. Yay, I guess. 🔥🤷🏼♂️🎉 🤣