74 stories
·
0 followers

21 Powerful Life Lessons From My Mentor (George Raveling)

1 Share

Like most people, I am a product of my mentors.

But when I talk about one of the most influential people in my life, everyone usually assumes I am referring to Robert Greene. Robert, of course, taught me so much and I continue to learn from him.

Actually…there’s someone else. Someone whose wisdom, generosity, and curiosity have shaped my life, work, and thinking more than almost anyone I’ve met. Someone who has influenced how I approach relationships, how I treat others, and how I try to give back.

That someone is George Raveling.

Who is George Raveling? I think he’s one of the most remarkable people of the 20th century. His story is extraordinary. His father died when he was young. His mother was placed in a mental institution, and he was raised by his grandmother. He went to a series of Catholic schools, thrived as a basketball player at Villanova, and after serving briefly in the Air Force, found his calling in coaching. He became the first African American basketball coach in what’s now the Pac-12 and went on to have a Hall of Fame career, leading programs at Washington State, the University of Iowa, and USC. He was instrumental in bringing Michael Jordan to Nike and has mentored some of the most influential coaches in college basketball. I’ve watched John Calipari, Shaka Smart, and Buzz Williams all call him to get his advice on something when I’ve spent time with George. In college basketball, he’s known as the Godfather.

And if that weren’t enough, George owned the original typewritten draft of the “I Have a Dream” speech, which Martin Luther King Jr. handed him while he was working security at the March on Washington. In an extraordinary gesture, in 2021, George donated the historic document to his alma mater, Villanova University, on the condition that they collaborate with the Smithsonian and the National Museum of African American History to loan it out, ensuring that more people can see and be inspired by it.

He’s been a mentor and friend to me, someone whose message I’ve tried to help share with the world. Most recently, I played a small role in bringing to life his memoir, What You’re Made For: Powerful Life Lessons from My Career in Sports, which I pitched to my publisher. It just came out yesterday.

In this article, I wanted to share some of the many lessons I’ve learned from George over the years and in the process of working on the book with him. His wisdom and example have influenced my life in ways I never could have imagined—I hope these 21 lessons impact you as much as they have impacted me…

– You have two choices today. George told me that when he wakes up in the morning, as he puts his feet on the floor but before he stands up, he says to himself, “George, you’ve got two choices today. You can be happy or very happy. Which will it be?” (Voltaire put it another way I love: The most important decision you make is to be in a good mood.)

– Always be reading. He told me a story from when he was a kid—“George,” his grandmother asked him, “do you know why slave owners hid their money in their books?” “No, Grandma, why?” he said. “Because they knew the slaves would never open them,” she told him. To me, the moral of that story is not just that there is power in the written word (that’s why they made it illegal to teach slaves to read), but also that what’s inside them is very valuable. And the truth is that books still have money between the pages. My entire career has been made possible by what I read.

– Go learn things and meet people. It’s not enough to read—you have to go down rabbit holes, look up words you don’t know, share interesting ideas with others, earmark pages, and make notes in the margins. A few years ago, George was reading a book when the word “mastermind” caught his eye. He’d never heard it before. As was his habit, he circled it and made a note to look it up later. That sent him down a rabbit hole—researching the concept, reading articles, and learning about an event called Mastermind Dinners. He shared what he found with a few friends, including me. As it happened, I knew the guy who ran the Mastermind Dinners and offered to connect them. “Go for it!” George replied. Not long after, I got a photo of him at the conference in Ojai, California. He was the oldest person there. The only one not an entrepreneur. The only one from sports. The only one retired. But by the end, he was everyone’s favorite. People told me afterward that George was the highlight of the event. He asked great questions, he listened, he shared, he made people think. He could have told himself he didn’t belong. Instead, he showed up, stepped outside his comfort zone, and kept learning—at eighty-three!

– Keep a commonplace book. At his house, George has these big red binders filled with notes. He calls them his “learning journals.” They’re his version of a commonplace book—a collection of ideas, quotes, observations, and information gathered over time. The purpose is to record and organize these gems for later use in your life and work. It’s a habit he’s kept since 1972. To this day, he told me, “I go back and just read through them. I’ll just get one of the binders and I’ll sit down at the kitchen table and start reading through it. Sometimes I come across stuff that is more applicable today than it was when I wrote it in there.”

– Live like it’s the 4th quarter. George nearly died in a brutal car crash at 57. When he woke up in the hospital, a police officer told him, “Coach, you don’t know how lucky you are.” He took that to heart—treating every day after as a second chance, an opportunity to do more, learn more, and give more. He went on to have a whole second act, joining Nike, shaping the future of basketball, and achieving things he never imagined. We shouldn’t need a near-death experience to wake us up to what we have. Seneca put it well: Go to bed each night saying, I have lived. If you wake up, treat it as a gift.

– Learn from everyone. George once said in an interview that I was his mentor, which, of course, is preposterous. But I’ll take the point: you can learn from anyone. It doesn’t matter if they’re younger than you, if they live a completely different life, or even if you disagree with them on 99% of things. Everyone can teach you something. Anyone can be your mentor.

– Do the most important thing. When George became Nike’s Director of International Basketball at 63, he had no prior corporate experience and was overwhelmed by self-doubt. Until a mentor gave him a simple system: “When you leave the office every day, leave a yellow pad in the middle of the desk, and when you come in the morning, write down the three most important things you gotta get done that day in that order. That day, do not do anything else but the first thing on the pad. And if you get the first one, then you go to the second one.” That structure put order to his day and gave him a sense of purpose. Instead of spinning his wheels or getting lost in distractions, he focused on what mattered most. One thing at a time.

– Choose opportunity over money. George once told me, “Never take a job for money. Always take a job for opportunity.” That’s how he’s lived his life, and that’s why he’s had such an incredible life. It’s why he took the job at Nike, not despite the fact that he had no experience as a global corporate executive—but precisely because he had no experience as a global corporate executive. It was a chance to step into something completely new, to learn, to grow, to challenge himself. He didn’t take the job because it was safe. He took it because it was filled with opportunities—to meet fascinating people, travel the world, immerse himself in different cultures, and bring the game he loves to new places and new people. Most people would have stuck to what was comfortable and familiar, but George went where the opportunity was.

– Always be prepared. When we were working on What You’re Made For, George and I had weekly calls that ran for one to two hours. It was my job to pull stories and lessons out of him. George is obviously the boss and the questions were largely about his life, so it could have been pretty relaxed, but that’s not his style. He clearly spent hours preparing for each hour we were on the phone, always coming intensely prepared with notes, questions, and ideas ready to go. He treated every call the way I imagine he prepared for a big game back in his coaching days or a high-stakes meeting at Nike. In one of our calls, he told me, “Right to this day, I think it’s disrespectful to go into a meeting and not be prepared.”

– Trust is earned. George and Michael Jordan have known each other for decades. Their relationship is built on trust—so much so that George told me, “Other than my mom and my grandma, never in my life have I had anybody who trusts me as much as Michael Jordan.” And he’s never done anything to jeopardize it. In all their years of friendship, even when he ran Michael’s basketball camps for 22 years George said, “I’ve never asked Michael for anything in my life—no money, no tickets to games, nothing.”It shouldn’t be a surprise, then, that when George told Jordan he should seriously consider signing with Nike, Jordan listened. That billion-dollar decision was the result of the trust Coach built when he coached Jordan on the ‘84 Olympic Team. As Jordan writes in the foreword (not something he does often!) to What You’re Made For, “There are all kinds of stories out there, but George is truly the reason I signed with Nike. As I’ve said before, I was all in for Adidas. George preached for Nike, and I listened.”

– Practice the art of self-leadership. George once told me, “One of the most underrated aspects of leadership is our ability to lead ourselves.” Before you can lead a team, a company, or a family, you have to be able to lead yourself. And isn’t that what the Stoics say? That no one is fit to rule who is not first ruler of themselves?

– Be a positive difference maker. George has a powerful question he often asks: “Are you going to be a positive Difference Maker today?” It’s a question that challenges you to think about the impact you want to have each day. I think about it all the time.

– Find the good in everything. George once texted me out of the blue, “I am absolutely unequivocally the luckiest human being on planet Earth.” He sees everything that’s happened to him, even the terrible things, even the adversity, even the unfair things. He sees them as all leading up to who he is now. He walks through the world with a sense of gratitude and appreciation and a belief in his ability to turn everything into something positive.

– Tell them what they mean to you. When we would do our calls for the book, it caught me off guard at first. George, before hanging up, would say, “I love you.” I’m not used to that—at least not from people outside my family. But George never hesitated. “I’ve learned that it’s hard for people, especially men, to say ‘I love you,’” he told me. Even with his own son, he noticed that for years it felt uncomfortable for him to say it back. “It’s strange,” George said, “because every one of us has a thirst to be loved, appreciated, acknowledged, respected. And yet, for some reason, we struggle to express it.” So George has made a habit of saying things like, “I appreciate you.” “I respect you.” “I’m glad you’re my friend.” “I’m here for you.” Simple words that so many people rarely hear. George didn’t assume people knew how he felt—he told them.

– It’s up to you. George used to give a talk at basketball camps titled, “If it’s to be, it’s up to me.” He said, “At the end of the day, either our hands are gonna be on the steering wheel of our lives or someone else’s hands are gonna be on the steering wheel of our lives.”

– Do less, better. Once in a meeting at Nike, the president asked the team, “Would we be better off doing 25 things good or 5 things great?” George said he still applies that day-to-day. “My day really revolves around just three or four things…I try to declutter the day and say, ‘Okay, if I can get these four things done today, it will be a good day.’ Every day, on a notecard, I write down 5-6 things I want to get done that day. Every day, I cross these off and tear up the card. That’s it. That’s the system.”

– Cultivate relationships. While we were working on the book, George told me, “Often people say, how do you account for what’s happened to you in your life? And the one word I use to capture it all is: relationships. My whole life has been built on relationships. People seeing something in me that I didn’t see in myself.” When I look at my own life, the most pivotal moments, the biggest opportunities—they all came from relationships. From people who believed in me when I didn’t believe in myself. Relationships aren’t just about networking; they’re about surrounding yourself with people who see your potential, sometimes before you do.

– Build your team. George sometimes refers to his family as Team Raveling, and his wife, Delores, as the CEO of their family. He talks about how too many people put more thought, effort, and strategy into their careers than they do into their families. They chase professional success with careful planning, clear goals, and relentless discipline—but expect their relationships to work out on their own. You wouldn’t expect a company to succeed by just winging it. A family is no different—it can’t thrive without leadership, communication, clearly defined roles, and a shared vision. Whether it’s your spouse, close friends, or a chosen family, you have to build your team with the same intention and commitment you bring to your work.

– Listen. George is one of the best listeners I’ve ever met. He says, “The quality of your conversations is greatly dependent on the quality of your listening.” I used to think I was a good listener, but watching George taught me how much better I could be. He doesn’t just wait for his turn to talk—he listens to understand.

– Become the go-to. When George was a player at Villanova, initially, he wasn’t getting much playing time. So he looked around and noticed something: no one on the team was a great rebounder. And he figured if he became the best rebounder on the team, his coaches would have no choice but to play him. So he made it his role. He invented his own rebounding drills and practiced them every day. By the time he graduated, he had set multiple rebounding records and was one of the best rebounders in the game. I love the idea of inventing a role for yourself—finding something that’s being overlooked or not addressed and deciding to become the go-to person for it. It’s not just a good strategy for athletes—it’s a way to make yourself indispensable in any field.

Know your boundaries…and enforce them. I once connected George with someone interested in working on a project with him. Everything was going well—until they sent over the proposed terms. George didn’t argue or negotiate. He sent back a clear, firm email terminating the discussion. The other party was surprised and followed up to ask why. “The offer was insulting and ridiculous,” George explained. He didn’t waste time debating or trying to make it work. He knew his worth, and he wasn’t going to entertain anything less. Too many people accept bad deals out of fear or politeness, but George believed in setting clear boundaries—and enforcing them.

I will leave you with this…

Although he’s famous for being a coach, that’s not what it said on the door of his office. Instead, it said,

George Raveling

Educator

He, to this day, sees himself as a teacher. And he teaches by example, by how he lives his life. That’s why, even though I never played for George Raveling, I’ve learned so much from him. By watching how he carries himself, how he lives, and how he treats others, I’ve learned more than I ever could have from words alone.

And you can, too.

Read the whole story
peior
5 days ago
reply
Share this story
Delete

You might not get a third chance

1 Share

The first impression is vitally important. It positions us, establishes the tone of our relationship and earns trust.

But we’re human, and it’s unlikely that every first impression will be as useful as we’d like. Fortunately, people can speak up and let us know, particularly if we make it easy for them to do so.

When a customer or partner let’s us know that we made a lousy first impression, it’s time to lean in. You’re not going to get a third chance to make a second impression.

If a customer service call goes wrong, or if a new employee is stumbling, this is the moment to escalate and get the second impression just right. It shows that we can recover, that we’re listening, and that the relationship is worth something to us.

What an opportunity to make things right. If your team isn’t empowered to escalate support at the first hint of a problem, you’re letting them down.

Read the whole story
peior
7 days ago
reply
Share this story
Delete

The Fascinating Power Of Human Wormholes

1 Share

One of the most mind-blowing experiences of my life happened on a porch in East Austin. I had brought George Raveling, then 80, to visit with Richard Overton, then 111.

It struck me as these two kind and wise men chatted that I was in a sort of human wormhole.

When George was born in 1937 (he writes about this in his beautiful new book What You’re Made For that I was lucky enough to play a small part in getting published), the Golden Gate Bridge had just opened, the Great Depression ravaged America and Pablo Picasso was putting the finishing touches on his haunting, heartbreaking anti-war mural, “Guernica” as Europe plunged itself back into violence.

When Richard Overton was born in 1906, just a few miles down the road from my ranch, Theodore Roosevelt was president. As a child in Texas, he remembered seeing Civil War veterans walking around. Not many, but they were there—men who had fought for a Confederacy that had enslaved his ancestors. When he was a kid, Henry L. Riggs, a veteran of the Black Hawk War, was still alive. Riggs was born in 1812. And when Riggs was born, Conrad Heyer—a Revolutionary War veteran and the earliest-born person to ever be photographed—was still alive.

It’s easy to forget how little time separates us from what we think of as “history.” Richard plus two other people takes you back to before America was a country. He was a teenager during WWI, served in WWII, and then lived long enough to be the nation’s oldest living veteran at 112 and to hold my son, who, born in 2016, might live to see the 22nd century.

Here’s my son with Richard Overton

It’s easy to see history as this distant thing that happened to other people–people on the page or in old portraits. George played college basketball against Jerry West…the man who became the NBA logo. George Raveling was there the day of the March of Washington in 1965. Martin Luther King Jr. came down the steps of the Lincoln Memorial and handed him the copy of the speech he gave that day. And then just a few decades later, George helped bring a young rookie named Michael Jordan to Nike, beginning a process that would turn Jordan into a billionaire. George would meet six or seven presidents starting with Truman. Richard would be flown to the White House to meet Obama.

Just two guys and you have a good chunk of American–and world–history. Just two guys shaking hands or witnessing or taking part in events and people that resound to this day.

History isn’t something distant or abstract. It’s just a few handshakes away. Just a few degrees of separation, it turned out, from one of my neighbors.

The past is not dead and distant, Faulkner observed. It’s not even past.

Did you know that England’s government only recently paid off debts it incurred as far back as 1720 from events like the South Sea Bubble, the Napoleonic wars, the empire’s abolition of slavery, and the Irish potato famine? For more than a decade and a half of the twenty-first century, there was still a direct and daily connection to the eighteenth and nineteenth centuries. Even today, the United States continues to pay pensions related to the Civil War and the Spanish-American War.

Did you know that in 2013 they discovered living whales born before Melville published Moby Dick? Or the world’s oldest tortoise, Jonathan, lives on an island in the Atlantic and is 192 years old? Or that President John Tyler, born in 1790, who took office just ten years after little Jonathan was born, still has living grandchildren?

And that’s all relatively ‘modern’ history. The woolly mammoth was still roaming the earth while the pyramids were being built. Cleopatra lived closer to our time than she did to the construction of those pyramids. When British workers dug the foundations for Nelson’s Column in Trafalgar Square, they found the bones of actual lions—creatures that had once roamed the exact spot they were standing on. History isn’t some far-off, untouchable thing. It’s right under our feet.

When we were doing a small construction project at the bookstore recently, we moved an old antique bar and found some paint on the wall, covered in plaster. Carefully scraping it away, we found a date and a kind of sign–January 16, 1922. What was happening in the world that day? Who were the people who stood there and supervised it being painted? A young Richard might have walked by and looked at it (from the outside, of course, as it was probably segregated).

When I lived in New Orleans, my apartment was partitioned out of a 19th-century convent. I’d head uptown to write what became my first book, hopping on the longest continually running streetcar in the world, the St. Charles Avenue Streetcar Line. A train that has traveled the same tracks for nearly 200 years. How many millions of people have ridden those same rails? Sat, even, in the same seat? Tennessee Williams, Walker Percy, Shelby Foote, George Washington Cable, Edgar Degas—could have looked out those very windows. They, along with so many others not as easily remembered, lived and struggled just as I did. Just as you do.

In Goethe’s The Sorrows of Young Werther (a favorite of Napoleon’s), there is a scene in which Werther writes to a friend about his daily trip to a small, beautiful spring. He sees the young girls coming to gather water and thinks about how many generations have been doing that—have come and had the same thoughts he is having.

“When I sit there,” he explains, “I see them all. The ancestral fathers, making friends and courting by the spring, I sense the benevolent spirits that watch over springs and wells. Oh, anyone who cannot share this feeling must never have refreshed himself at a cool spring after a hard day’s summer walking.”

I think about the things that happened in George’s life. I think about the horrible things that happened during Richard’s. I think about the progress made in both. I think of how much has changed…and how much has remained the same. I remember as I sat there on the porch, as Richard told me about a tree he had planted that was, some seven decades later, pushing up the foundation of the house, thinking of the Bible verse that Hemingway opens his book, The Sun Also Rises, with: “One generation passeth, and another generation cometh; but the earth abideth forever. The sun also riseth, and the sun goeth down, and resteth to the place where he arose.” It was this passage, his editor would say, that “contained all the wisdom of the ancient world.”

Richard Overton on his porch (2017)

The view from Overton’s porch

And what wisdom is that? One of the most striking things about history is just how long human beings have been doing what they do. Though certain attitudes and practices have come and gone, what’s left are people—living, dying, loving, fighting, crying, laughing.

Instability. Uncertainty. Danger. Division.

This is one of the most consistent themes of the Stoics and particularly of Meditations, the way that events flow past us like a river, the way the same things keep happening over and over again. That’s what history was, Marcus Aurelius said, whether it was the age of Vespasian, his own, or some time even more distant—it was “people doing the exact same things: marrying, raising children, getting sick, dying, waging war, throwing parties, doing business, farming, flattering, boasting, distrusting, plotting, hoping others will die, complaining about their own lives, falling in love, putting away money, seeking high office and power.”

From this angle, human life looks very small. But also a connection with the past can make you feel very big–like you’re a part of something. That we are much more interconnected and closer to the center of things than it sometimes feels.

Indeed, these wormholes, illustrating the “great span” as they do, give us perspective. They remind us how many have been here before us and how close they remain. That even though we are small, we are also a piece of this great universe.

“Look at the past,” Marcus Aurelius writes in Meditations, “and from that, extrapolate the future: the same thing. No escape from the rhythm of events.”

There’s something lovely about intersecting with the past, about connecting with it.

I’ll cherish that day with Richard and George, as long as I live.

Hopefully, that will be a long time in the future…but even if it’s not, I feel like by spending time with them my life has already stretched far enough back in time.

Read the whole story
peior
17 days ago
reply
Share this story
Delete

Ensemble stars

1 Share

Over the last 50 years, 167 different people have been part of the Saturday Night Live ensemble cast. Some of them went on to become comedy superstars, others lasted a less than a season and are fairly obscure in the cultural pantheon.

But if you were tasked of creating an all-time MVP cast, it would be a mistake to only pick the movie stars.

Like a baseball team or the local non-profit, the group works when it functions as an ensemble, not a collection of individuals striving for the spotlight.

A brilliant and generous ensemble player isn’t someone who tried to be the big star and almost didn’t make it. They’ve chosen a different path and done it with skill.

Phil, Jane and Fred showed up to do a different sort of work. It’s not a runner’s-up prize, it’s the point.

Because most organizations don’t celebrate this role, they rely on simply stumbling around until they find the linchpin who can hold things together.

Two questions to get you started:

  1. Did the dozen ensemble stars who made a huge difference on SNL do it with intention?
  2. What is your organization doing to find and train and reward this work?
Read the whole story
peior
26 days ago
reply
Share this story
Delete

Understanding Reasoning LLMs

1 Share

This article describes the four main approaches to building reasoning models, or how we can enhance LLMs with reasoning capabilities. I hope this provides valuable insights and helps you navigate the rapidly evolving literature and hype surrounding this topic.

In 2024, the LLM field saw increasing specialization. Beyond pre-training and fine-tuning, we witnessed the rise of specialized applications, from RAGs to code assistants. I expect this trend to accelerate in 2025, with an even greater emphasis on domain- and application-specific optimizations (i.e., "specializations").

Stages 1-3 are the common steps to developing LLMs. Stage 4 specializes LLMs for specific use cases.

The development of reasoning models is one of these specializations. This means we refine LLMs to excel at complex tasks that are best solved with intermediate steps, such as puzzles, advanced math, and coding challenges. However, this specialization does not replace other LLM applications. Because transforming an LLM into a reasoning model also introduces certain drawbacks, which I will discuss later.

To give you a brief glimpse of what's covered below, in this article, I will:

  1. Explain the meaning of "reasoning model"

  2. Discuss the advantages and disadvantages of reasoning models

  3. Outline the methodology behind DeepSeek R1

  4. Describe the four main approaches to building and improving reasoning models

  5. Share thoughts on the LLM landscape following the DeepSeek V3 and R1 releases

  6. Provide tips for developing reasoning models on a tight budget

I hope you find this article useful as AI continues its rapid development this year!

How do we define "reasoning model"?

If you work in AI (or machine learning in general), you are probably familiar with vague and hotly debated definitions. The term "reasoning models" is no exception. Eventually, someone will define it formally in a paper, only for it to be redefined in the next, and so on.

In this article, I define "reasoning" as the process of answering questions that require complex, multi-step generation with intermediate steps. For example, factual question-answering like "What is the capital of France?" does not involve reasoning. In contrast, a question like "If a train is moving at 60 mph and travels for 3 hours, how far does it go?" requires some simple reasoning. For instance, it requires recognizing the relationship between distance, speed, and time before arriving at the answer.

A regular LLM may only provide a short answer (as shown on the left), whereas reasoning models typically include intermediate steps that reveal part of the thought process. (Note that many LLMs who have not been specifically developed for reasoning tasks can also provide intermediate reasoning steps in their answers.)

Most modern LLMs are capable of basic reasoning and can answer questions like, "If a train is moving at 60 mph and travels for 3 hours, how far does it go?" So, today, when we refer to reasoning models, we typically mean LLMs that excel at more complex reasoning tasks, such as solving puzzles, riddles, and mathematical proofs.

Additionally, most LLMs branded as reasoning models today include a "thought" or "thinking" process as part of their response. Whether and how an LLM actually "thinks" is a separate discussion.

Intermediate steps in reasoning models can appear in two ways. First, they may be explicitly included in the response, as shown in the previous figure. Second, some reasoning LLMs, such as OpenAI's o1, run multiple iterations with intermediate steps that are not shown to the user.

"Reasoning" is used at two different levels: 1) processing the input and generating via multiple intermediate steps and 2) providing some sort of reasoning as part of the response to the user.

When should we use reasoning models?

Now that we have defined reasoning models, we can move on to the more interesting part: how to build and improve LLMs for reasoning tasks. However, before diving into the technical details, it is important to consider when reasoning models are actually needed.

When do we need a reasoning model? Reasoning models are designed to be good at complex tasks such as solving puzzles, advanced math problems, and challenging coding tasks. However, they are not necessary for simpler tasks like summarization, translation, or knowledge-based question answering. In fact, using reasoning models for everything can be inefficient and expensive. For instance, reasoning models are typically more expensive to use, more verbose, and sometimes more prone to errors due to "overthinking." Also here the simple rule applies: Use the right tool (or type of LLM) for the task.

The key strengths and limitations of reasoning models are summarized in the figure below.

The key strengths and weaknesses of reasoning models.

A brief look at the DeepSeek training pipeline

Before discussing four main approaches to building and improving reasoning models in the next section, I want to briefly outline the DeepSeek R1 pipeline, as described in the DeepSeek R1 technical report. This report serves as both an interesting case study and a blueprint for developing reasoning LLMs.

Note that DeepSeek did not release a single R1 reasoning model but instead introduced three distinct variants: DeepSeek-R1-Zero, DeepSeek-R1, and DeepSeek-R1-Distill.

Based on the descriptions in the technical report, I have summarized the development process of these models in the diagram below.

Development process of DeepSeeks three different reasoning models that are discussed in the DeepSeek R1 technical report.

Next, let's briefly go over the process shown in the diagram above. More details will be covered in the next section, where we discuss the four main approaches to building and improving reasoning models.

(1) DeepSeek-R1-Zero: This model is based on the 671B pre-trained DeepSeek-V3 base model released in December 2024. The research team trained it using reinforcement learning (RL) with two types of rewards. This approach is referred to as "cold start" training because it did not include a supervised fine-tuning (SFT) step, which is typically part of reinforcement learning with human feedback (RLHF).

(2) DeepSeek-R1: This is DeepSeek's flagship reasoning model, built upon DeepSeek-R1-Zero. The team further refined it with additional SFT stages and further RL training, improving upon the "cold-started" R1-Zero model.

(3) DeepSeek-R1-Distill*: Using the SFT data generated in the previous steps, the DeepSeek team fine-tuned Qwen and Llama models to enhance their reasoning abilities. While not distillation in the traditional sense, this process involved training smaller models (Llama 8B and 70B, and Qwen 1.5B–30B) on outputs from the larger DeepSeek-R1 671B model.

The 4 main ways to build and improve reasoning models

In this section, I will outline the key techniques currently used to enhance the reasoning capabilities of LLMs and to build specialized reasoning models such as DeepSeek-R1, OpenAI's o1 & o3, and others.

Note: The exact workings of o1 and o3 remain unknown outside of OpenAI. However, they are rumored to leverage a combination of both inference and training techniques.

1) Inference-time scaling

One way to improve an LLM's reasoning capabilities (or any capability in general) is inference-time scaling. This term can have multiple meanings, but in this context, it refers to increasing computational resources during inference to improve output quality.

A rough analogy is how humans tend to generate better responses when given more time to think through complex problems. Similarly, we can apply techniques that encourage the LLM to "think" more while generating an answer. (Although, whether LLMs actually "think" is a different discussion.)

One straightforward approach to inference-time scaling is clever prompt engineering. A classic example is chain-of-thought (CoT) prompting, where phrases like "think step by step" are included in the input prompt. This encourages the model to generate intermediate reasoning steps rather than jumping directly to the final answer, which can often (but not always) lead to more accurate results on more complex problems. (Note that it doesn't make sense to employ this strategy for simpler knowledge-based questions, like "What is the capital of France", which is again a good rule of thumb to find out whether a reasoning model makes sense on your given input query.)

An example of classic CoT prompting from the 2022 Large Language Models are Zero-Shot Reasoners paper (https://arxiv.org/abs/2205.11916).

The aforementioned CoT approach can be seen as inference-time scaling because it makes inference more expensive through generating more output tokens.

Another approach to inference-time scaling is the use of voting and search strategies. One simple example is majority voting where we have the LLM generate multiple answers, and we select the correct answer by majority vote. Similarly, we can use beam search and other search algorithms to generate better responses.

I highly recommend the Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters paper that I described in my previous Noteworthy AI Research Papers of 2024 (Part Two) article (https://magazine.sebastianraschka.com/p/ai-research-papers-2024-part-2) for more details on these different strategies.

Different search-based methods rely on a process-reward-based model to select the best answer. Annotated figure from the LLM Test-Time Compute paper, https://arxiv.org/abs/2408.03314

The DeepSeek R1 technical report states that its models do not use inference-time scaling. However, this technique is often implemented at the application layer on top of the LLM, so it is possible that DeepSeek applies it within their app.

I suspect that OpenAI's o1 and o3 models use inference-time scaling, which would explain why they are relatively expensive compared to models like GPT-4o. In addition to inference-time scaling, o1 and o3 were likely trained using RL pipelines similar to those used for DeepSeek R1. More on reinforcement learning in the next two sections below.

2) Pure reinforcement learning (RL)

One of my personal highlights from the DeepSeek R1 paper is their discovery that reasoning emerges as a behavior from pure reinforcement learning (RL). Let's explore what this means in more detail.

As outlined earlier, DeepSeek developed three types of R1 models. The first, DeepSeek-R1-Zero, was built on top of the DeepSeek-V3 base model, a standard pre-trained LLM they released in December 2024. Unlike typical RL pipelines, where supervised fine-tuning (SFT) is applied before RL, DeepSeek-R1-Zero was trained exclusively with reinforcement learning without an initial SFT stage as highlighted in the diagram below.

The development process of DeepSeek-R1-Zero model.

Still, this RL process is similar to the commonly used RLHF approach, which is typically applied to preference-tune LLMs. (I covered RLHF in more detail in my article, LLM Training: RLHF and Its Alternatives.) However, as mentioned above, the key difference in DeepSeek-R1-Zero is that they skipped the supervised fine-tuning (SFT) stage for instruction tuning. This is why they refer to it as "pure" RL. (Although, RL in the context of LLMs differs significantly from traditional RL, which is a topic for another time.)

For rewards, instead of using a reward model trained on human preferences, they employed two types of rewards: an accuracy reward and a format reward.

  • The accuracy reward uses the LeetCode compiler to verify coding answers and a deterministic system to evaluate mathematical responses.

  • The format reward relies on an LLM judge to ensure responses follow the expected format, such as placing reasoning steps inside <think> tags.

Surprisingly, this approach was enough for the LLM to develop basic reasoning skills. The researchers observed an "Aha!" moment, where the model began generating reasoning traces as part of its responses despite not being explicitly trained to do so, as shown in the figure below.

A figure from the DeepSeek R1 technical report (https://arxiv.org/abs/2501.12948) showing the emergence of the "Aha" moment.

While R1-Zero is not a top-performing reasoning model, it does demonstrate reasoning capabilities by generating intermediate "thinking" steps, as shown in the figure above. This confirms that it is possible to develop a reasoning model using pure RL, and the DeepSeek team was the first to demonstrate (or at least publish) this approach.

Ahead of AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

3) Supervised finetuning and reinforcement learning (SFT + RL)

Next, let's look at the development of DeepSeek-R1, DeepSeek’s flagship reasoning model, which serves as a blueprint for building reasoning models. This model improves upon DeepSeek-R1-Zero by incorporating additional supervised fine-tuning (SFT) and reinforcement learning (RL) to improve its reasoning performance.

Note that it is actually common to include an SFT stage before RL, as seen in the standard RLHF pipeline. OpenAI's o1 was likely developed using a similar approach.

The development process of DeepSeek-R1 model.

As shown in the diagram above, the DeepSeek team used DeepSeek-R1-Zero to generate what they call "cold-start" SFT data. The term "cold start" refers to the fact that this data was produced by DeepSeek-R1-Zero, which itself had not been trained on any supervised fine-tuning (SFT) data.

Using this cold-start SFT data, DeepSeek then trained the model via instruction fine-tuning, followed by another reinforcement learning (RL) stage. This RL stage retained the same accuracy and format rewards used in DeepSeek-R1-Zero’s RL process. However, they added a consistency reward to prevent language mixing, which occurs when the model switches between multiple languages within a response.

The RL stage was followed by another round of SFT data collection. In this phase, the most recent model checkpoint was used to generate 600K Chain-of-Thought (CoT) SFT examples, while an additional 200K knowledge-based SFT examples were created using the DeepSeek-V3 base model.

These 600K + 200K SFT samples were then used for another round of RL. In this stage, they again used rule-based methods for accuracy rewards for math and coding questions, while human preference labels used for other question types.

The final model, DeepSeek-R1 has a noticeable performance boost over DeepSeek-R1-Zero thanks to the additional SFT and RL stages, as shown in the table below.

Benchmark comparison of OpenAI A1 and DeepSeek R1 models. Annotated figure from the DeepSeek-R1 technical report (https://arxiv.org/abs/2501.12948).

4) Pure supervised finetuning (SFT) and distillation

So far, we have covered three key approaches to building and improving reasoning models:

1. Inference-time scaling, a technique that improves reasoning capabilities without training or otherwise modifying the underlying model.

2. Pure reinforcement learning (RL) as in DeepSeek-R1-Zero, which showed that reasoning can emerge as a learned behavior without supervised fine-tuning.

3. Supervised fine-tuning (SFT) plus RL, which led to DeepSeek-R1, DeepSeek’s flagship reasoning model.

So, what’s left? Model "distillation."

Surprisingly, DeepSeek also released smaller models trained via a process they call distillation. However, in the context of LLMs, distillation does not necessarily follow the classical knowledge distillation approach used in deep learning. Traditionally, in knowledge distillation (as briefly described in Chapter 6 of my Machine Learning Q and AI book), a smaller student model is trained on both the logits of a larger teacher model and a target dataset.

Instead, here distillation refers to instruction fine-tuning smaller LLMs, such as Llama 8B and 70B and Qwen 2.5 models (0.5B to 32B), on an SFT dataset generated by larger LLMs. Specifically, these larger LLMs are DeepSeek-V3 and an intermediate checkpoint of DeepSeek-R1. In fact, the SFT data used for this distillation process is the same dataset that was used to train DeepSeek-R1, as described in the previous section.

To clarify this process, I have highlighted the distillation portion in the diagram below.

The development process of DeepSeek-R1-Distill models.

Why did they develop these distilled models? In my opinion, there are two key reasons:

1. Smaller models are more efficient. This means they are cheaper to run, but they also can run on lower-end hardware, which makes these especially interesting for many researchers and tinkerers like me.

2. A case study in pure SFT. These distilled models serve as an interesting benchmark, showing how far pure supervised fine-tuning (SFT) can take a model without reinforcement learning.

The table below compares the performance of these distilled models against other popular models, as well as DeepSeek-R1-Zero and DeepSeek-R1.

Benchmark comparison of distilled versus non-distilled models. Annotated figure from the DeepSeek-R1 technical report (https://arxiv.org/abs/2501.12948).

As we can see, the distilled models are noticeably weaker than DeepSeek-R1, but they are surprisingly strong relative to DeepSeek-R1-Zero, despite being orders of magnitude smaller. It's also interesting to note how well these models perform compared to o1 mini (I suspect o1-mini itself might be a similarly distilled version of o1).

Before wrapping up this section with a conclusion, there’s one more interesting comparison worth mentioning. The DeepSeek team tested whether the emergent reasoning behavior seen in DeepSeek-R1-Zero could also appear in smaller models. To investigate this, they applied the same pure RL approach from DeepSeek-R1-Zero directly to Qwen-32B.

The results of this experiment are summarized in the table below, where QwQ-32B-Preview serves as a reference reasoning model based on Qwen 2.5 32B developed by the Qwen team (I think the training details were never disclosed). This comparison provides some additional insights into whether pure RL alone can induce reasoning capabilities in models much smaller than DeepSeek-R1-Zero.

Benchmark comparison distillation and RL on a smaller 32B model. Annotated figure from the DeepSeek-R1 technical report (https://arxiv.org/abs/2501.12948).

Interestingly, the results suggest that distillation is far more effective than pure RL for smaller models. This aligns with the idea that RL alone may not be sufficient to induce strong reasoning abilities in models of this scale, whereas SFT on high-quality reasoning data can be a more effective strategy when working with small models.

For completeness, it would have been useful to see additional comparisons in the table:

1. Qwen-32B trained with SFT + RL, similar to how DeepSeek-R1 was developed. This would help determine how much improvement can be made, compared to pure RL and pure SFT, when RL is combined with SFT.

2. DeepSeek-V3 trained with pure SFT, similar to how the distilled models were created. This would allow for a direct comparison to see how effective RL + SFT is over pure SFT.

Ahead of AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Conclusion

In this section, we explored four different strategies for building and improving reasoning models:

1. Inference-time scaling requires no additional training but increases inference costs, making large-scale deployment more expensive as the number or users or query volume grows. Still, it remains a no-brainer for improving the performance of already strong models. I strongly suspect that o1 leverages inference-time scaling, which helps explain why it is more expensive on a per-token basis compared to DeepSeek-R1.

2. Pure RL is interesting for research purposes because it provides insights into reasoning as an emergent behavior. However, in practical model development, RL + SFT is the preferred approach as it leads to stronger reasoning models. I strongly suspect that o1 was trained using RL + SFT as well. More precisely, I believe o1 starts from a weaker, smaller base model than DeepSeek-R1 but compensates with RL + SFT and inference-time scaling.

3. As mentioned above, RL + SFT is the key approach for building high-performance reasoning models. DeepSeek-R1 is a nice blueprint showing how this can be done.

4. Distillation is an attractive approach, especially for creating smaller, more efficient models. However, the limitation is that distillation does not drive innovation or produce the next generation of reasoning models. For instance, distillation always depends on an existing, stronger model to generate the supervised fine-tuning (SFT) data.

One interesting aspect I expect to see next is to combine RL + SFT (approach 3) with inference-time scaling (approach 1). This is likely what OpenAI o1 is doing, except it's probably based on a weaker base model than DeepSeek-R1, which explains why DeepSeek-R1 performs so well while remaining relatively cheap at inference time.

Thoughts about DeepSeek R1

In recent weeks, many people have asked for my thoughts on the DeepSeek-R1 models. In short, I think they are an awesome achievement. As a research engineer, I particularly appreciate the detailed technical report, which provides insights into their methodology that I can learn from.

One of the most fascinating takeaways is how reasoning emerged as a behavior from pure RL. And it's impressive that DeepSeek has open-sourced their models under a permissive open-source MIT license, which has even fewer restrictions than Meta's Llama models.

How does it compare to o1?

Is DeepSeek-R1 better than o1? I’d say it’s roughly in the same ballpark. However, what stands out is that DeepSeek-R1 is more efficient at inference time. This suggests that DeepSeek likely invested more heavily in the training process, while OpenAI may have relied more on inference-time scaling for o1.

That said, it's difficult to compare o1 and DeepSeek-R1 directly because OpenAI has not disclosed much about o1. For instance, we don’t know:

  • Is o1 also a Mixture of Experts (MoE)?

  • How large is o1?

  • Could o1 just be a slightly refined version of GPT-4o with minimal RL + SFT and only extensive inference-time scaling?

Without knowing these details, a direct comparison remains an apples-to-oranges comparison.

The cost of training DeepSeek-R1

Another point of discussion has been the cost of developing DeepSeek-R1. Some have mentioned a ~$6 million training cost, but they likely conflated DeepSeek-V3 (the base model released in December last year) and DeepSeek-R1.

The $6 million estimate is based on an assumed $2 per GPU hour and the number of GPU hours required for the final training run of DeepSeek-V3, which was originally discussed back in December 2024.

However, the DeepSeek team has never disclosed the exact GPU hours or development cost for R1, so any cost estimates remain pure speculation.

Either way, ultimately, DeepSeek-R1 is a major milestone in open-weight reasoning models, and its efficiency at inference time makes it an interesting alternative to OpenAI’s o1.

Developing reasoning models on a limited budget

Developing a DeepSeek-R1-level reasoning model likely requires hundreds of thousands to millions of dollars, even when starting with an open-weight base model like DeepSeek-V3. This can feel discouraging for researchers or engineers working with limited budgets.

The good news: Distillation can go a long way

Fortunately, model distillation offers a more cost-effective alternative. The DeepSeek team demonstrated this with their R1-distilled models, which achieve surprisingly strong reasoning performance despite being significantly smaller than DeepSeek-R1. However, even this approach isn’t entirely cheap. Their distillation process used 800K SFT samples, which requires substantial compute.

Interestingly, just a few days before DeepSeek-R1 was released, I came across an article about Sky-T1, a fascinating project where a small team trained an open-weight 32B model using only 17K SFT samples. The total cost? Just $450, which is less than the registration fee for most AI conferences.

This example highlights that while large-scale training remains expensive, smaller, targeted fine-tuning efforts can still yield impressive results at a fraction of the cost.

Figure from the "Sky-T1: Train your own O1 preview model within $450" article, https://novasky-ai.github.io/posts/sky-t1/

According to their benchmarks, Sky-T1 performs roughly on par with o1, which is impressive given its low training cost.

Pure RL on a budget: TinyZero

While Sky-T1 focused on model distillation, I also came across some interesting work in the "pure RL" space. One notable example is TinyZero, a 3B parameter model that replicates the DeepSeek-R1-Zero approach (side note: it costs less than $30 to train).

Surprisingly, even at just 3B parameters, TinyZero exhibits some emergent self-verification abilities, which supports the idea that reasoning can emerge through pure RL, even in small models.

The TinyZero repository mentions that a research report is still work in progress, and I’ll definitely be keeping an eye out for further details.

A figure from the TinyZero repository (https://github.com/Jiayi-Pan/TinyZero) showing that the model is capable of self-verification. (It would have been interesting to see the response of the base model in comparison.)

The two projects mentioned above demonstrate that interesting work on reasoning models is possible even with limited budgets. While both approaches replicate methods from DeepSeek-R1, one focusing on pure RL (TinyZero) and the other on pure SFT (Sky-T1), it would be fascinating to explore how these ideas can be extended further.

Beyond Traditional SFT: Journey Learning

One particularly interesting approach I came across last year is described in the paper O1 Replication Journey: A Strategic Progress Report – Part 1. Despite its title, the paper does not actually replicate o1. Instead, it introduces an different way to improve the distillation (pure SFT) process.

The key idea in the paper is "journey learning" as an alternative to "shortcut learning."

  • Shortcut learning refers to the traditional approach in instruction fine-tuning, where models are trained using only correct solution paths.

  • Journey learning, on the other hand, also includes incorrect solution paths, allowing the model to learn from mistakes.

This approach is kind of related to the self-verification abilities observed in TinyZero’s pure RL training, but it focuses on improving the model entirely through SFT. By exposing the model to incorrect reasoning paths and their corrections, journey learning may also reinforce self-correction abilities, potentially making reasoning models more reliable this way.

Journey learning, as opposed to traditional shortcut learning, includes wrong solutions paths in the SFT data. Annotated figure from the O1 Replication Journey: A Strategic Progress Report – Part 1 (https://arxiv.org/abs/2410.18982)

This could be an exciting direction for future work, particularly for low-budget reasoning model development, where RL-based approaches may be computationally impractical.

Anyways, a lot of interesting work is currently happening on the reasoning model front, and I'm sure we will see a lot more exciting work in the upcoming months!


This magazine is a personal passion project. For those who wish to support me, please consider purchasing a copy of my Build a Large Language Model (From Scratch) book. (I am confident that you'll get lots out of this book as it explains how LLMs work in a level of detail that is not found anywhere else.)

Build a Large Language Model (From Scratch) now available on Amazon

If you read the book and have a few minutes to spare, I'd really appreciate a brief review. It helps us authors a lot!

Your support means a great deal! Thank you!

Read the whole story
peior
36 days ago
reply
Share this story
Delete

“Can’t complain” (but it might be worth considering)

1 Share

Complaining is a cultural phenomenon, but it’s particularly prevalent in societies with a consumer culture (the customer is always right) and those where comfort is coming to be expected.

Given all the complaining we do (about the weather, leadership, products, service and various ailments), it’s worth taking a moment to think about why we complain.

The obvious one might not be the main one.

The obvious reason to complain is to make a change happen.

If that’s the goal, though, we ought to focus those complaints where they’ll do the most good, and be prepared to do the work to have an impact. Organize the others, take consistent and persistent action, and market the complaint in a format and with a focus that will lead to action.

Most of the time, though, I’m not sure that’s what we’re really after.

Here are some others:

  1. to let off steam
  2. to signal group affiliation
  3. to create hope that things might get better
  4. to increase one’s status by selfishly demanding more
  5. to gain affiliation by complaining on behalf of someone else
  6. to gain status by demanding more for others who can’t speak up
  7. to validate our feelings by seeking acknowledgment from others that their grievance is legitimate
  8. to preemptively lower expectations or manage blame
  9. to conceal our fear or embarrassment
  10. to avoid responsibility by pointing to someone else
  11. to establish dominance or control in a situation
  12. to bond with others through shared experiences of dissatisfaction

Not on the list, because it belies almost all of these: “Whining in the face of imperfection often ruins what you’ve already got.”

Whining is the evil cousin of complaining. Whining purports to exist to make things better, but it never does.

James Murphy of LCD Soundsystem said, “The best way to complain is to make things.”

And perhaps we can extend that to: “The best way to complain is to make things better.”

Read the whole story
peior
39 days ago
reply
Share this story
Delete
Next Page of Stories