How Generative AI is Shortening the Path to Expertise

A Deep Dive with Gene Hwang, Researcher at The Multiverse School

Sep 10, 2024

Curious about AI research at The Multiverse School?

Join Gene Hwang — Multiverse Researcher and Student — as we explore the rapid evolution of Generative AI, from the release of ChatGPT-3.5 to the recent announcement of OpenAI's reasoning engine known as Project Strawberry.

We'll explore the current landscape of frontier model capabilities and discuss the real-world challenges of prompt engineering, agentic frameworks, and ops tooling.

Overcoming Imposter Syndrome

When Gene joined The Multiverse School last year, they had some brief experience with ChatGPT as an end-user, and a rudimentary understanding of Python for Data Analysis. Reflecting on their own journey, they shared how they initially felt overwhelmed coming from a public health background with only basic familiarity in tools like Python and SQL.

“A big blocker for a while was just believing that I couldn't code,” they admitted.

For many, one of the biggest challenges for newcomers to AI is overcoming the mental barrier that advanced generative AI is reserved only for the most technically proficient.

However, by using resources like ChatGPT to clarify unfamiliar terms and concepts, Gene found a way to self-tutor effectively. Combined with structured learning at The Multiverse School, they progressed from a programming novice to conducting research on agentic capabilities in only a year.

LLMs: Then vs. Now

Gene started taking classes at The Multiverse just a couple months after the release of the last generation of frontier models: GPT-3.5 Turbo, Claude 3 Opus, and Gemini 1.0 Pro.

Scatter plot showing the upward trend of Quality vs. Price Per 1 million tokens, comparing last-generation and current-generation large language models. The data was collected from https://artificialanalysis.ai/leaderboards/models — As large language models improve, so does the cost of inference.

They spent much of their time learning about and working around the limitations of these models, namely that the smaller context window (16k tokens for GPT-3.5 Turbo as opposed to 128k for GPT-4o) meant that complex prompts had to be broken down into smaller, more condensed versions that generated less complete answers and had higher rates of hallucinations than the best models available today.

Gene treated these limitations not as setbacks, but instead turned them into a rigorous training ground to optimize their prompts by emulating human cognitive processes.

Agent Engineering

Gene spent 2024 running local models and using agentic frameworks, pushing and prodding to find the boundaries of AI’s capabilities. They spent hours refining models, tweaking parameters, and gaining a comprehensive understanding of the way LLMs process tokens and generate output. Tools like Oobabooga allowed them to test various models in the cloud without needing a powerful GPU or paying for compute, and libraries like Microsoft’s Autogen made it possible for multiple agents to talk with each other in what is known as an Agent Swarm.

Multi-agent systems are not a new paradigm in software engineering. However, agent swarms provide a groundbreaking way for different models to collaborate, compete, and complement each other to extend the utility one can get out of language models.

Take the Actor-Critic Model, for example, where one agent performs the task specified in your prompt and the other agent acts as a critic, identifying where the response is lacking or misrepresenting key information. The agents can pass their own responses back to one another, iterating on this cycle until you are satisfied with the quality and accuracy of the information, asking you for input whenever necessary to make better decisions with more context. A baseline understanding of Complex Adaptive Systems is very helpful to understand and characterize the emergent properties of agentic behavior, so that you can design better agents and adjust your strategies effectively.

Poring through documentation and learning about complex systems proved immensely helpful in Gene’s learning, but real breakthroughs came from research and insights posted by AI Engineers on Twitter/X. As the de facto place for software engineers and cybersecurity professionals to share their work with the world, Gene was able to build a network of people sharing academic research on AI, discussing their positive and negative experience with certain tools and frameworks, and highlighting their own work at the forefront of the field.

Once you have a baseline knowledge for how these systems work, Twitter may just be the most accessible place where a considerable amount of bleeding-edge AI research happens.

The Democratization of Skills

The utility that LLMs provide is extensive, and you don’t need to be a subject matter expert to achieve outstanding results. Gene makes the case that the path to up-skilling in virtually anything has become an order of magnitude faster by using AI as a personal tutor. Since chatbots like ChatGPT are trained on many terabytes of data and now have internet search capabilities, they can aggregate information spread across many credible sources and pick out the best, most accurate information in an organized, easy to understand way. It can be a very natural conversational experience like talking to a friend, or it can be an intentional, structured tutoring session with the world’s most competent TA on every niche subject you might want to learn.

Code generation in particular has come a long way overnight with the newest models like ChatGPT-4o and Claude 3.5 Sonnet. Not only can these chatbots help you to learn basic syntax and best coding practices, but as long as you can clearly conceptualize what you want to make and describe it to a language model, you can create a working prototype of anything in a fraction of the time it would take you to write it from scratch. Keep in mind that Gene has spent a little over a year using AI as a tool to learn how LLMs work, and is now able to keep up with the most current research in the field.

The rapid progress you can make with LLM code generation is not to be understated, and can be quite empowering for new and experienced developers alike. Gene views these language models as a catalyst for “the democratization of skills,” as the speed and ease at which they allow you to learn new concepts is perhaps only matched by studying under the tutelage of an expert. The people who will get the most out of this technology are those who use it to aid in the learning process, not use it as a replacement for their own critical reasoning and problem-solving.

If you haven’t used AI yet or haven’t found much utility from it, they encourage you to see how far the newest models have come. And with Gemini 2.0 and ChatGPT’s rumored Orion model — based on the reasoning engine “Project Strawberry” — speculated to arrive this year, there is no better time to hone your prompt engineering skills than now. Generative AI is evolving at a breakneck speed, and what was cutting edge in September may be obsolete by October as new and improved technology takes its place.

If you are already an AI power user and want to take your experience to the next level, three of the best things you can do are:

Follow Prompt Engineering/Generative AI professionals on X/Twitter
Find a framework or tool you like (such as Autogen or Langchain) and keep up with the GitHub repo updates
Read research papers at arxiv.org

These are recommendations for more advanced LLM users to keep up with the pace of AI development and research. If you’re just starting out in prompt engineering, you might find the following prompting techniques more helpful to learn. After all, LLMs can tackle complex tasks, but in order to get quality answers, you first need to know how to write a good prompt.

Prompt Engineering Techniques

One of the simplest ways to improve an LLM’s response is to use Chain-of-Thought Prompting. This is a technique that induces an LLM to recount its own reasoning process by asking it to explain how it reached its conclusion. This can be as easy as adding the phrase “think step-by-step” at the beginning of a prompt. Having this additional context is vital when using AI for more complex tasks, since without showing its work it may provide you with an incorrect answer more often, and it’s more difficult to identify where its reasoning went wrong. When you ask an LLM to show its reasoning, it uses that additional context to steer itself in the right direction. Remember that LLMs use next-token prediction to provide you with a response, so by simulating a reasoning process that closely aligns with logical human deduction, we greatly reduce the risk of hallucination. By guiding the model to "think step-by-step," you’re not just getting an answer; you're prompting it to reconstruct the pathways it might take to reach that conclusion, making it easier for you to verify the response and understand the underlying logic. Recently, researchers at Arizona State University have shown that chain-of-thought prompting breaks down with complex logic that has nested or recursive steps, and that these prompts need to be narrowly defined to the specific problem set with simplified examples in order to get accurate answers.

Providing examples of the way you want a response to be structured is another way you can nudge language models in the right direction. This is a technique referred to as “few-shot” or “many-shot” prompting. “By providing examples in your prompt you're showing the model exactly what you are looking for in terms of output structure, tone, and style.” (The Few Shot Prompting Guide) This technique is particularly useful when you need the model to produce outputs that are consistent and aligned with specific requirements or when working on tasks that require a particular format. For example, if you're generating product descriptions, you could provide a few samples that showcase the desired tone, length, and key elements to include. The LLM will then mimic this pattern, making the output more uniform and reducing the likelihood of inconsistencies.

You can also improve an AI chatbot’s responses by giving it structured input and asking for structured output. This is especially helpful when asking it to do tasks related to data analysis, as responses start to break down when you overload them with too many requests at once. If you’ve ever asked ChatGPT to make a list of more than 20 items, then you know exactly how quickly its responses can deteriorate. Asking it to format its responses in a particular way can be helpful, but it’s even more helpful to ask it to provide output in a structured data format, such as .json or .csv. This way, an LLM can write and execute scripts to run various different analyses on your data without ever opening a code editor yourself. If Python can process a thousand data points with relative ease, then ChatGPT and Claude don’t have to.

Infographic illustrating three prompt engineering techniques: Chain-of-Thought Promtping, Few-Shot/Many-Shot Prompting, and providing structured input, such as .json or .csv files.

To get more deterministic output from Large Language Models, it’s essential to craft your prompts with a clear scope, taking into account the model's inherent behaviors and limitations. Be mindful of how many requests you're making at once and the overall prompt length, as overly complex or lengthy prompts can increase the likelihood of hallucinations or incomplete responses. Breaking down your tasks into smaller, manageable steps and using reinforcement learning can significantly enhance the reliability of your output. Techniques like few-shot, many-shot, and chain-of-thought prompting help to emulate human cognitive processes, thereby allowing the model to produce outputs that are more aligned with your specific needs.

The Shifting Software Development Landscape

Gene notes the significant impact of AI on traditional software development roles. They predict that developers will need to adapt to a future where AI is central to software development, not just as a tool for automating tasks but as a transformative shift in how coding is approached.

“I do think that lots of traditional software developers are going to have to transition to more oversight-heavy, managerial roles. They’ll have to understand what the code is supposed to accomplish at a high level and translate it to natural language so that they can use code gen[eration],” they explained. The alternative, they warned, is to risk being left behind in a field that is evolving at an unprecedented pace.

Knowledge workers who embrace AI are likely to gain a significant advantage over those who stick with traditional methods. By adopting AI tools, developers can reduce iteration time, accelerate deployment, and expand their skill sets more rapidly than those who choose not to integrate AI into their workflow. While some may resist this change, these technological advancements are already disrupting the job market, transforming how we think about and solve problems. With the release of new and improved models right around the corner, AI is only likely to grow more significant in our lives with each passing day. It’s in the best interest of everyone, not just developers, to prepare for a future where these tools will play a vital role.