Chapter 10

Neural Networks & Deep Learning

Intuition behind layers, weights, and why deep learning is so powerful — and so hungry for data and compute.

⏱ 12 min read

What makes a neural network "neural"

Neural networks are loosely inspired by the brain — but don't take the analogy too literally. A biological neuron fires when it receives enough signal from connected neurons. An artificial neuron does something similar: it receives numerical inputs, multiplies each by a weight, sums them up, and passes the result through an activation function to produce an output.

Stack millions of these simple units in layers, connect every unit in one layer to every unit in the next, and let the whole system adjust its weights to minimise prediction error on a large dataset — and you have a neural network.

Input
layer
x₁
x₂
x₃
x₄
Hidden
layer 1
Hidden
layer 2
Output
layer
ŷ

A simple neural network: inputs flow left → right through hidden layers to a prediction


Weights: what the network actually learns

Every connection between neurons has a weight — a number that determines how much that connection contributes to the next neuron's output. When we say a model is "trained," what we mean is: those weights have been adjusted, iteratively, to minimise prediction error on the training data.

A network with 2 hidden layers and 512 neurons per layer has roughly 500,000 weights. GPT-4 has an estimated 1.8 trillion. Each weight is a tiny, learned fact about the patterns in the training data.

The training process, in brief

Forward pass: feed an input through the network, get a prediction.
Loss: measure how wrong the prediction was.
Backpropagation: calculate how much each weight contributed to the error.
Gradient descent: nudge each weight slightly in the direction that reduces error.
Repeat billions of times.


Why "deep" learning?

"Deep" refers to the number of hidden layers. Shallow networks (1–2 layers) can approximate many functions but struggle with complex patterns. Deep networks (many layers) learn hierarchical representations — early layers detect simple features, later layers combine them into increasingly abstract concepts.

In image recognition, this is literal:

In language models, the hierarchy is less visible but equally real — early layers handle syntax and grammar, later layers handle semantics and reasoning.


Activation functions: why non-linearity matters

Without activation functions, a neural network — no matter how many layers — is just a linear transformation. Linear transformations can't model complex, curved decision boundaries. Activation functions introduce non-linearity, allowing the network to learn any shape of function given enough neurons.

The most common activation today is ReLU (Rectified Linear Unit: the name just means it "rectifies" by zeroing out negatives): the rule is simply output = max(0, input). If a signal is positive, pass it through. If it's negative, output zero. Simple, fast, effective. Transformers use a smoother variant called GELU (Gaussian Error Linear Unit) that tapers off near zero rather than cutting hard — it behaves slightly better in practice for language tasks.

PM Insight

The choice of activation function is an implementation detail. What matters is why they exist: activations are what make neural networks powerful enough to learn from unstructured data — images, text, audio — that simpler models can't handle. If a use case involves unstructured data, neural networks are likely the right class of model.


Specialised architectures for different data types

The basic "fully connected" network works well for tabular data. Different data types need different architectures:

Architecture Best for Product examples
CNN
Convolutional Neural Network
Images, spatial data Photo moderation, medical imaging, document parsing, visual search
RNN / LSTM
Recurrent NN / Long Short-Term Memory
Sequences where order matters Time-series forecasting, early NLP, user session modelling
Transformer Language, code, multimodal LLMs (GPT, Claude), translation, summarisation, code generation
GNN
Graph Neural Network
Relationship/network data Social networks, fraud rings, molecular property prediction
Diffusion models Generative tasks Image generation, audio synthesis, video generation

Why deep learning needs so much data and compute

A neural network with millions of weights needs millions of examples to learn meaningful patterns rather than memorising noise. With too little data, it overfits — the weights encode specific training examples rather than generalisable rules.

Compute is needed because training involves billions of arithmetic operations per training step, across millions of steps. This is why GPUs (and TPUs) matter — they're designed for the massively parallel matrix multiplications that make up most of neural network computation.

Rough data requirements by task type

Tabular ML (gradient boosting) → thousands to tens of thousands of examples
Fine-tuning a pretrained model → hundreds to thousands of examples
Training a vision model from scratch → millions of labelled images
Training a large language model → trillions of tokens of text

PM Insight

"Let's just train a model on our data" is often impractical. For most product teams, the right starting point is a pretrained model (foundation model) that you adapt — through prompting, fine-tuning, or RAG — rather than training from scratch. The data and compute requirements for training from scratch are beyond what most companies can justify outside of core AI infrastructure.


Transfer learning: standing on giants' shoulders

Training from scratch is expensive. Transfer learning is the practical alternative: start with a model pretrained on a large general dataset, then adapt it to your specific task with far less data and compute.

This is why foundation models changed everything. A model pretrained on the entire internet has learned rich representations of language, facts, and reasoning that you can leverage for your product without reproducing that training. The cost you pay is adaptation, not education.


PM Playbook — Questions to ask


4 questions