Understanding AI for software developers

0xMarko|2024

As a developer, you're familiar with writing code where the outcome is predictable and deterministic. That means when you provide the same input to your program, you'll always get the same output. This works because you've explicitly told the software what to do in every scenario.

AI is different. Most AI systems, especially modern ones, are probabilistic. Instead of you writing code to handle each possible case, AI systems learn patterns from data. When you give an AI model input, it may generate different outputs depending on what it has learned during training.

Let's take a look at how AI works, starting with the fundamental concept that powers modern AI systems: Machine Learning.

Machine Learning: Teaching Software to Learn

Machine Learning (ML) is a way of teaching computers to recognize patterns from data without explicitly coding all the rules.

If you show an ML model thousands of images of cats and dogs, it will learn to recognize the patterns that make a cat look different from a dog.

AI models like ChatGPT are trained on massive amounts of text data from the internet. During training, the model learns the patterns in human language, allowing it to generate text, answer questions, or even write code.

Neural Networks: The Core Structure of AI

AI models are often built using neural networks, which are inspired by the way human brains work. These networks consist of layers of interconnected nodes, or "neurons," that process data.

A typical neural network has three main types of layers:

Input Layer: Where data enters the system. For text-based models, words are converted into numbers (embeddings) before entering this layer.
Hidden Layers: Layers where the real learning happens. Each neuron in these layers applies a weight (a numerical value) to the input and passes it through an activation function to determine if the information should proceed further.
Output Layer: Produces the final result, such as predicting the next word in a sentence or identifying the contents of an image.

The hidden layers are the most interesting part of a neural network. They are not directly interacted with by the user. However, users have control over the input that enters the network and the output that emerges from it

If a neural network lacked a hidden layer, the output would simply mirror the input. The hidden layers serve to process the input data in stages. As data moves through these layers, the network learns patterns and structures, providing the AI system with a hierarchical understanding of the input. This progression transforms the input into the final output.

While a basic neural network might have just one hidden layer, adding more layers allows the network to engage with the input in more complex ways. Each additional hidden layer can focus on different aspects or finer details of the input, gradually refining its understanding as the data moves through the layers.

This concept of stacking hidden layers forms the foundation of deep learning. For a network to be classified as "deep learning", it must have more than three layers: the input, at least one hidden layer, and the output. The greater the number of layers, the deeper the learning process.

Deep Learning: Going Deeper with More Layers

Deep learning is a subset of machine learning that uses neural networks with many hidden layers. These deeper networks can learn more complex patterns than simpler networks. For example, a basic neural network might only handle simple text predictions, while a deep learning model can generate entire paragraphs or translate languages.

Deep learning powers many modern AI applications, including:

Image recognition: Identifying objects in photos.
Language models: Understanding and generating human-like text.
Speech recognition: Converting spoken words into text.

The "depth" of these networks enables AI to understand intricate relationships within data, making them incredibly powerful for tasks that require complex understanding.

Embeddings: Turning Words into Numbers

AI models can't process text directly - they need numbers. Embeddings are numerical representations of words or phrases that capture their meanings and relationships. Each word is mapped to a vector (a list of numbers) in a high-dimensional space, typically ranging from 100 to 1000 dimensions.

Think of dimensions as different aspects or features of a word's meaning. While we can only visualize 2 or 3 dimensions, computers can work with hundreds:

In 2D, a word might be represented by just 2 numbers: [0.2, 0.5]
In 3D, we add another number: [0.2, 0.5, 0.8]
In 100D, we have 100 numbers: [0.2, 0.5, 0.8, ..., 0.3] (97 more numbers)

Each dimension could represent different semantic features, such as:

How formal/informal the word is
How masculine/feminine the concept is
How abstract/concrete the meaning is
How related it is to different topics (food, technology, nature, etc.)
And many more subtle patterns learned from data

The more dimensions we use, the more complex relationships the embedding can capture. This is why models often use hundreds of dimensions - it allows them to represent complex linguistic relationships that wouldn't be possible with just a few numbers.

These vectors are learned during training through techniques like Word2Vec, which analyze how words appear together in large text datasets. The basic principle is:

Words that frequently appear in similar contexts get similar vector representations
The similarity between words is calculated using cosine similarity between their vectors
The closer the cosine similarity is to 1, the more similar the words are

You can visualize and explore embeddings using tools like TensorFlow's Embedding Projector. This interactive tool allows you to:

Visualize high-dimensional embeddings in 2D or 3D space
Search for similar words based on their vector representations
Explore relationships between words in the embedding space
Upload and analyze your own embedding data

For example, since "cat" and "kitten" often appear in similar contexts (like "pet", "meow", "furry"), their embedding vectors end up being close to each other in the vector space. Using the Embedding Projector, you could visualize this relationship and find other semantically similar words clustering nearby.

Tokenization: Preparing Text for AI

Before a neural network can process text, it must be broken into smaller pieces called tokens. Tokenization is the process of splitting text into tokens and converting those tokens into numbers (token IDs) that the model can understand.

For example:

The word "unbelievable" might be tokenized as ["un", "believ", "able"]
These tokens are then converted to numbers, like [234, 1856, 345]

Each token ID serves as an index into an embedding table, where each token has a corresponding vector of numbers (typically 256 to 4096 dimensions). For example:

Token "un" (ID: 234) → [0.2, -0.5, 0.8, ...]
Token "believ" (ID: 1856) → [0.1, 0.7, -0.3, ...]
Token "able" (ID: 345) → [-0.4, 0.2, 0.6, ...]

These embedding vectors capture the meaning and relationships between tokens. The neural network then processes these vectors through its layers to:

Understand the relationships between tokens using attention mechanisms
Generate predictions about what tokens might come next
Convert the final output back into human-readable text

Tokenization allows models to handle rare words, misspellings, and different languages more effectively. Some tokenization methods, like Byte Pair Encoding (BPE), break words into subwords to cover a wide range of vocabulary with fewer tokens.

Context: How AI Understands Conversations

Context refers to the surrounding information that AI uses to generate relevant responses. Modern AI models can remember and consider several previous inputs, which allows them to carry on coherent conversations.

For example:

You: "What’s the capital of Croatia?"
AI: "Zagreb."
You: "What’s the weather like there?"

The AI knows that "there" refers to "Zagreb" because it remembers the context of the conversation.

However, AI models have a context window that limits how much information they can consider at once. If the conversation exceeds this limit, older parts of the conversation may be forgotten.

Fine-Tuning: Adapting AI for Specific Tasks

Fine-tuning is the process of taking a pretrained AI model and adjusting it to perform well on a specific task or domain. While large language models (LLMs) like GPT are trained on vast amounts of general data, fine-tuning helps tailor them for specialized applications, such as customer service, legal document analysis, or code generation

Fine-tuning involves:

Training on a Smaller Dataset: A high-quality, task-specific dataset is used to refine the model. This dataset is often curated and reviewed by humans to ensure quality.
Adjusting Weights and Biases: During fine-tuning, the model’s internal parameters (weights and biases) are slightly adjusted to improve performance on the new dataset.
Reinforcement Learning from Human Feedback (RLHF): To further align the model’s behavior with human values, RLHF uses human feedback to guide the model toward desirable responses.

Fine-tuning enables AI to generate more accurate and contextually appropriate outputs for specific use cases, enhancing its utility in real-world applications.

Transformers: The Backbone of Modern AI

The transformer architecture revolutionized AI when it was introduced by Google researchers in their 2017 paper "Attention Is All You Need". Instead of processing words one by one, transformers can analyze an entire sentence or paragraph simultaneously. This enables them to understand relationships between words, even if those words are far apart.

For instance, in the sentence:

"The cat sat on the mat because it was tired."

the word "it" refers to "cat." Self-attention helps the transformer figure out this connection.

Transformers power large language models like GPT, Claude, and Gemini. They enable these models to generate text, translate languages, write code, and more.

Inference: How AI Models Make Predictions

Inference is the process where an AI model generates outputs based on new inputs. During inference, the model uses its trained parameters (weights and biases) to process input data and make predictions. Unlike training, which modifies the model's parameters, inference only uses these parameters without changing them.

The inference process typically involves:

Input Processing: Converting raw input (like text) into a format the model can understand through tokenization and embedding.
Forward Pass: Passing the processed input through the model's layers, where each layer performs calculations using the trained weights.
Output Generation: Converting the model's numerical outputs back into human-readable format (like text).

For language models, inference often uses techniques like "temperature" and "top-k sampling" to control the randomness of outputs.

Temperature and sampling methods help control how the model selects its next tokens:

Temperature (0.0 to 1.0): This parameter affects how the model assigns probabilities to each possible next token:
- At temperature 0, the model always picks the most likely token
- At temperature 1, the model uses the raw probabilities it learned during training
- At temperatures above 1, the model becomes increasingly random
Top-k Sampling: Instead of considering all possible next tokens, top-k sampling only considers the k most likely tokens:
- If k=1, it always selects the most probable token (like temperature 0)
- If k=50, it considers only the 50 most likely next tokens
- This helps prevent the model from selecting very unlikely or nonsensical tokens

For example, if the model is completing the sentence "The cat sat on the...", it might assign these probabilities:

"mat" (40%)
"floor" (30%)
"chair" (20%)
"banana" (0.1%)

With temperature 0, it would always choose "mat". With a higher temperature, it might sometimes choose "floor" or "chair", making the output more varied but still sensible. Top-k sampling with k=3 would only consider "mat", "floor", and "chair", completely eliminating unlikely options like "banana".

As a software developer, understanding AI fundamentals is becoming critical. While you don't need to be an AI expert, knowing how these systems work helps you:

Make informed decisions about when to use AI in your projects
Understand the limitations and capabilities of AI tools
Understand how AI can be used to improve your applications
Communicate effectively with AI/ML specialists

Remember, AI is not replacing traditional programming — it's augmenting it. Developers can build more powerful and intelligent applications than ever before.