I Asked an Algorithm to Optimize My Life. Here’s What Happened

By cutting off five, I would choose a random option for about one out of every 20 decisions I made with my algorithm. I chose five as a cutoff because it seemed like a reasonable frequency for random randomization. For the uninitiated, there are other optimizations to decide which cut to use, or even change the cut value as you continue to learn. Often the best bet is to experiment with a few values ​​and see which ones are most effective. Reinforcement learning algorithms sometimes take random actions because they are based on past experience. Choosing the expected best option can always mean missing out on a better option that has not been tried before.

I doubted that this algorithm would really improve my life. But the optimization framework, backed by mathematical proofs, peer-reviewed scientific papers, and billions in Silicon Valley revenue, made a lot of sense to me. How, exactly, can it practically collapse?

8:30 am

The first decision? Whether or not you get up at 8:30 as you planned. I turned off the alarm, turned on the RNG, held my breath as it spun and spit out … 9!

Now the big question: In retrospect, has sleeping at home or waking up on time produced better results for me? My intuition screamed that I should skip any logic and just sleep, but in fairness, I tried to ignore it and count my hazy memories of my morning nap. The joy of staying in bed He was Bigger than an unhurried weekend morning, I decided, as long as I didn’t miss anything important.

09:00

I had a group project meeting in the morning and some machine learning reading to finish before it started (“Bayesian Deep Learning via Subnetwork Inference,” anyone?), so I couldn’t sleep for long. The RNG assigned me to make a decision based on past experience about skipping the meeting; I chose to attend. In order to decide whether to do the reading, I rolled again and got a 5, meaning I would randomly choose between doing and skipping the reading.

This was a small decision, but I was surprisingly nervous as I set up yet another random number on my phone. If I got 50 or less, I’d skip reading to honor the “exploration” element of the decision algorithm, but I really didn’t want to. Apparently, disavowing your reading is only fun when you do it on purpose.

You have pressed the generate button.

65. I was reading after all.

11:15 a.m

I wrote a list of options for how to spend the free time I’m having right now. I could walk to a remote coffee shop I wanted to try, call home, start some schoolwork, look at PhD programs to apply to, go down an unrelated rabbit hole, or take a nap. Too many RNG’s came in – I’ll need to make a data-driven decision about what to do.

This was the first decision of the day more complicated than Yes or no, and the moment I began to puzzle over how “favorable” each option was, it became clear that I had no way of making an accurate estimate. When an AI agent that follows an algorithm like mine makes its decisions, computer scientists have already told it what is considered a “favorite.” They translate what the agent experiences into a reward score, which the AI ​​then tries to maximize, such as “time left in a video game” or “money earned in the stock market”. Reward jobs can be difficult to define. The smart cleaning robot is a classic example. If you tell the robot to maximize the pieces of trash thrown away, it can learn ways to trash and put the same trash away again to increase its score.

.

Leave a Reply

Your email address will not be published. Required fields are marked *