What I read/listened-to the week of 1/29?
What I read
- Of course failures are a part of the story too. Everyone fails, Joe Louis said “Everyone has to figure to get beat some time.” The question isn’t did you fail but did you pick yourself up and move ahead? And there is one other little question: ‘Did you collaborate in your own defeat?” A lot of people do. Learn not to.
- “Meaning is not something you stumble across, like the answer to a riddle or the prize in a treasure hunt. Meaning is something you build into your life. You build it out of your own past, out of your affections and loyalties, out of the experience of humankind as it is passed on to you, out of your own talent and understanding, out of the things you believe in, out of the things and people you love, out of the values for which you are willing to sacrifice something. The ingredients are there. You are the only one who can put them together into that unique pattern that will be your life. Let it be a life that has dignity and meaning for you. If it does, then the particular balance of success or failure is of less account.”
Quick Essay: Large Language Models, How to Train Them, and xAI’s Grok
- To understand and generate text like humans, there are a few things that language models must be able to do:
- Understand the meanings of various words
- Understand the context of words in relation to other words
- Remember long strings of these words
- Do all of the above very quickly
In 2017, a new type of architecture called a “transformer” was introduced that promised to solve many of these issues. Two key breakthroughs, “positional encoding” and “self-attention” made this architecture much more efficient to train and better at recognizing the context of words. As language models were trained with more compute power and data using this architecture, new capabilities emerged. Today, models can reason about topics, write code, and even understand information across multiple modalities including images and audio.
- Companies building new language models today face two major challenges. The first is that exponential increases in the amount of data used to train a new model only result in linear improvements in performance. So with an abundance of data available for training, all else equal, models eventually converge towards a single level of performance.
- The second is a lack of context. Many models like ChatGPT lack context beyond their training period, meaning that they have no awareness of information and events beyond a given date. When asked about information after this period, they either refuse to answer, or worse still, hallucinate and provide a convincing but made-up response.
- Good code maintains stability and predictability, even as your project grows in complexity. It’s like having a reliable tool that keeps performing, no matter how tough the job gets. When it comes to scaling up, good code is essential. It allows for expansion without the bottlenecks and headaches that come with a more shortsighted approach.
What I listened to
Art of Investment podcast - Cal Fussman, compounding human connections
- a masterpiece on story telling, the first 10-min of the pod was immediately appealing that it's just hard to pause or stop listening.