Contents

  • Broad introduction to machine learning topics.
  • Perceptron.

Careers in Machine Learning

  • Most ML careers focus on applying and deploying existing algorithms — building data pipelines, training models, and integrating them into products.
  • Only a small fraction of roles involve inventing new algorithms (e.g. research labs, PhDs).
  • Industry demand remains strong and growing across sectors — from finance and healthcare to logistics and creative industries.
  • Key growth areas include MLOps (infrastructure and deployment), ML engineering (model training and serving), and data science (applied analysis).
  • Generative AI has expanded opportunities — especially in model integration, evaluation, and retrieval‑augmented generation systems.

In short: a career in ML today means turning learning algorithms into reliable, scalable tools — not just developing new maths.

Reinforcement Learning

  • Learning through trial and error.
  • The model (agent) interacts with an environment.
  • Receives rewards or penalties for its actions.
  • Goal: Learn a policy that maximizes long-term (cumulative) reward.

How it Works

  • At each step:
    • The agent observes the state of the environment.
    • It chooses an action.
    • The environment provides a reward and a new state.
    • The agent updates its policy based on experience.
  • Repeat until the agent learns which actions yield the best outcomes.

Examples of Reinforcement Learning

  • Game-playing AI (e.g., AlphaGo, chess, or Atari agents).
  • Robotics — learning to walk, grasp, or balance.
  • Autonomous driving — navigating and avoiding obstacles.
  • Personalized recommendations — optimizing for engagement over time.
  • Industrial control — adjusting manufacturing parameters dynamically.

Perceptron

What is the perceptron?

  • Binary linear classifier: outputs +1 or −1 from a weighted sum of inputs and a threshold.
  • Unit model: sign(w·x + b).
  • Learning rule: iteratively tweaks weights when it makes a mistake.

History: Threshold neurons were formalised by McCulloch & Pitts (1943); the online learning algorithm was introduced by Frank Rosenblatt (1957–58).

Classification set‑up

We have training examples \((x^{(i)}, y^{(i)})\) with labels \(y^{(i)} \in \{+1,-1\}\).

Features: \(x^{(i)} = (x^{(i)}_1,\dots,x^{(i)}_{n_f})\in\mathbb{R}^{n_f}\).
Labels: \(y^{(i)}\in\{\pm 1\}\).

Model and decision rule

Score (affine):

\[ z(x; w,b) = w^\top x + b \]

Prediction (hard threshold):

\[ \hat y = \phi(z) = \operatorname{sign}(z) \in \{+1,-1\}. \]

Bias trick: augment \(\tilde x = (1, x_1,\dots,x_{n_f})\), \(\tilde w = (b, w_1,\dots,w_{n_f})\) so that \(z=\tilde w^\top \tilde x\).

Geometry

The set \(\{x : w^\top x + b = 0\}\) is a hyperplane (a line in 2D). It splits space into two half-spaces:

  • \(w^\top x + b > 0\Rightarrow \hat y=+1\)
  • \(w^\top x + b < 0\Rightarrow \hat y=-1\)

Perceptron learning rule (online)

For each example \((x^{(i)}, y^{(i)})\):

  1. Predict \(\hat y = \operatorname{sign}(w^\top x^{(i)} + b)\).
  2. If correct (i.e., \(y^{(i)}\hat y = +1\)), do nothing.
  3. If mistake (i.e., \(y^{(i)}\hat y = -1\)), update \[ w \leftarrow w + \eta\, y^{(i)} x^{(i)}, \qquad b \leftarrow b + \eta\, y^{(i)}. \]

\(\eta>0\) is the learning rate (a hyperparameter). With the bias trick: \(\tilde w \leftarrow \tilde w + \eta\, y^{(i)} \tilde x^{(i)}\).

Why the update makes sense

On a mistake, \(y^{(i)}(w^\top x^{(i)} + b) < 0\). The update adds a vector in the direction of \(y^{(i)} x^{(i)}\), increasing the signed margin \(y^{(i)}(w^\top x^{(i)} + b)\).

Geometrically: it nudges the decision boundary toward correctly placing the misclassified point on the right side.

Pseudocode

# initialize
w = 0; b = 0
for epoch in 1..T:
  shuffle(training_data)
  for (x, y) in training_data:
      if y * (w·x + b) ≤ 0:   # mistake
          w ← w + η * y * x
          b ← b + η * y

(Many implementations simply set \(\eta=1\) for separable problems; scaling of \(w\) does not change decisions.)

Convergence (when it works)

  • If the data are linearly separable with margin \(\gamma>0\) and all \(\|x\|\le R\), the algorithm makes at most \(\mathcal O\big((R/\gamma)^2\big)\) mistakes and converges (Novikoff bound).
  • Otherwise, it may cycle forever; in practice we cap epochs.

Linearly separable?

Separable (linear)

separable

Not linearly separable

not linearly separable

Real data are often almost separable with noise/outliers.

When it struggles

  • Non-separable data (e.g., XOR) ⇒ no single hyperplane can separate.
  • Zero margin / outliers ⇒ boundary hugs points ⇒ poor generalisation.
  • Order sensitivity ⇒ shuffling typically helps.

probably not separable

Practical tips

  • Standardise/normalise features; include an explicit bias or use the bias trick.
  • Shuffle each epoch; limit epochs; consider early stopping on a validation set.
  • Set \(\eta\) (e.g., 1.0) or use a small decay; for separable cases, \(\eta\) mainly affects speed.

Useful variants (still simple)

  • Averaged perceptron: predict with the running average of weights; often generalises better.
  • Perceptron with margin: only update if \(y(w^\top x + b) \le \delta\) for margin \(\delta>0\).
  • Feature maps / kernels: map inputs so a hyperplane exists in feature space.

Perceptron Summary

  • Perceptron is a simple online algorithm for linear decision boundaries.
  • Converges on linearly separable data; otherwise use caps/variants.
  • Forms a conceptual bridge to modern linear models and neural nets.