Book: "Algorithms to Live By"
by Brian Christian & Tom Griffiths

You set a predetermined amount of time for “looking”—that is, exploring your options, gathering data—in which you categorically don’t choose anyone, no matter how impressive. After that point, you enter the “leap” phase, prepared to instantly commit to anyone who outshines the best applicant you saw in the look phase

In English, the words “explore” and “exploit” come loaded with completely opposite connotations. But to a computer scientist, these words have much more specific and neutral meanings. Simply put, exploration is gathering information, and exploitation is using the information you have to get a known good result.

Under Gittins index assumptions, we should choose the slot machine (multi armed bandits) that has a track record of 1–1 (and an expected value of 50%, less explored) over the one with a track record of 9–6 (and an expected value of 60%).

Regret is the result of comparing what we actually did with what would have been best in hindsight

In 1985, Herbert Robbins took a second shot at the multi-armed bandit problem, some thirty years after his initial work on Win-Stay, Lose-Shift. He and fellow Columbia mathematician Tze Leung Lai were able to prove several key points about regret. First, assuming you’re not omniscient, your total amount of regret will probably never stop increasing, even if you pick the best possible strategy—because even the best strategy isn’t perfect every time. Second, regret will increase at a slower rate if you pick the best strategy than if you pick others; what’s more, with a good strategy regret’s rate of growth will go down over time, as you learn more about the problem and are able to make better choices. Third, and most specifically, the minimum possible regret—again assuming non-omniscience—is regret that increases at a logarithmic rate with every pull of the handle.

Logarithmically increasing regret means that we’ll make as many mistakes in our first ten pulls as in the following ninety, and as many in our first year as in the rest of the decade combined. (The first decade’s mistakes, in turn, are as many as we’ll make for the rest of the century.) That’s some measure of consolation. In general we can’t realistically expect someday to never have any more regrets. But if we’re following a regret- minimizing algorithm, every year we can expect to have fewer new regrets than we did the year before.

Sorting something that you will never search is a complete waste; searching something you never sorted is merely inefficient.

Operating at industrial scale, with many thousands or millions of individuals sharing the same space, requires a leap beyond. A leap from ordinal to cardinal.

"In the practical use of our intellect, forgetting is as important a function as remembering." ~WILLIAM JAMES