Second half of Core Ia: Introduction To Machine Learning And Statistics.
Same procedure as part 1 with lectures and workshops on Tuesday and Friday. There are four notebooks but eight lectures.
Course website: https://miscada-ml-2526.notes.dmaitre.phyip3.dur.ac.uk/
Notebook server: https://miscada-ml-2526.notebooks.danielmaitre.phyip3.dur.ac.uk/
I'm Will Yeadon and will be organising this portion of the course
Can contact me via email will.yeadon@durham.ac.uk
Office is PH304 in Rochester Building



Machine Learning is a subset of artificial intelligence that focuses on enabling computers to learn from data and improve their performance over time without being explicitly programmed.
In short: a career in ML today means turning learning algorithms into reliable, scalable tools — not just developing new maths.
Machine learning algorithms can be classified according to:



>
The boundary between the two types can be blurred:
Some problems can be approached using either classification or regression techniques, depending on the desired outcome.




Choosing the right learning mode depends on the dataset size and computational resources.
The entire training set is used for each iteration of the model optimization.
Advantages:
Disadvantages:
The model is updated incrementally as each new training example arrives.
Advantages:
Disadvantages:
The model is optimized using small subsets (batches) of the training set.
Advantages:
Disadvantages:
The choice between instance-based and model-based methods depends on the problem at hand.
Example: Predicting final grade \( g_4 \) of a student given their 1st, 2nd, and 3rd-year results \( g_1 \), \( g_2 \), and \( g_3 \).
This illustrates the fundamental difference in approach between the two methods.
Model-based algorithms are powerful for capturing underlying patterns in data.
Instance-based methods are simple and effective for certain types of problems.
History: Threshold neurons were formalised by McCulloch & Pitts (1943); the online learning algorithm was introduced by Frank Rosenblatt (1957–58).

We have training examples \((x^{(i)}, y^{(i)})\) with labels \(y^{(i)} \in \{+1,-1\}\).
Features: \(x^{(i)} = (x^{(i)}_1,\dots,x^{(i)}_{n_f})\in\mathbb{R}^{n_f}\).
Labels: \(y^{(i)}\in\{\pm 1\}\).
Score (affine):
\[ z(x; w,b) = w^\top x + b \]
Prediction (hard threshold):
\[ \hat y = \phi(z) = \operatorname{sign}(z) \in \{+1,-1\}. \]
Bias trick: augment \(\tilde x = (1, x_1,\dots,x_{n_f})\), \(\tilde w = (b, w_1,\dots,w_{n_f})\) so that \(z=\tilde w^\top \tilde x\).
The set \(\{x : w^\top x + b = 0\}\) is a hyperplane (a line in 2D). It splits space into two half-spaces:
For each example \((x^{(i)}, y^{(i)})\):
\(\eta>0\) is the learning rate (a hyperparameter). With the bias trick: \(\tilde w \leftarrow \tilde w + \eta\, y^{(i)} \tilde x^{(i)}\).
On a mistake, \(y^{(i)}(w^\top x^{(i)} + b) < 0\). The update adds a vector in the direction of \(y^{(i)} x^{(i)}\), increasing the signed margin \(y^{(i)}(w^\top x^{(i)} + b)\).
Geometrically: it nudges the decision boundary toward correctly placing the misclassified point on the right side.












# initialize
w = 0; b = 0
for epoch in 1..T:
shuffle(training_data)
for (x, y) in training_data:
if y * (w·x + b) ≤ 0: # mistake
w ← w + η * y * x
b ← b + η * y
(Many implementations simply set \(\eta=1\) for separable problems; scaling of \(w\) does not change decisions.)
Real data are often almost separable with noise/outliers.

In this lecture, we covered the following key concepts in Machine Learning:
Machine Learning is a broad and evolving field that bridges data, algorithms, and real-world applications. Understanding these concepts forms a solid foundation for deeper exploration into modern ML techniques and applications.