Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .sync-upstream-base
Original file line number Diff line number Diff line change
@@ -1 +1 @@
b963cf63ffbb574af35da8db301ebb1381515ed8
148de3663f8bab4c90355292de8d0fac81dc2a86
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,10 @@ QR 分解内部就是这么干的。Q 是那组标准正交基,R 记录投影
- 计算特征值(QR 算法)
- 最小二乘回归(标准的数值方法)

```figure
eigen-directions
```

## 动手构建

### 第 1 步:从零写向量(Python)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,10 @@ Broadcasting stretches the vector across rows:

每个现代框架都会自动做这件事。理解它能让你在形状看起来不对、代码却照跑不误时不犯迷糊。

```figure
vector-projection
```

## 动手构建

### 第 1 步:Vector 类
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,10 @@ det = -1: area preserved but orientation flipped (reflection)
| det(Reflection) | = -1 (orientation flipped)
```

```figure
matrix-transform
```

## 动手构建

### 第 1 步:从零写变换矩阵(Python)
Expand Down
4 changes: 4 additions & 0 deletions phases/01-math-foundations/04-calculus-for-ml/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,10 @@ graph RL

前向传播算出预测和损失。反向传播算出损失对每个权重的梯度。然后每个权重往下坡迈一小步。重复几百万步。这就是深度学习。

```figure
derivative-tangent
```

## 动手构建

### 第 1 步:从零写数值导数
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,10 @@ PyTorch 内部:

这个图是动态的(define-by-run)。每次前向传播都会构建一个新图。这就是为什么 PyTorch 支持在模型里写控制流(if/else、循环)。

```figure
chain-rule
```

## 动手构建

### 第 1 步:Value 类
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,10 @@ Log-softmax 把 softmax 和 log 合在一起以保证数值稳定。PyTorch 在

从任意分布采样需要逆变换采样、拒绝采样或重参数化技巧(VAE 里用)这类技术。

```figure
gaussian-pdf
```

## 动手构建

### 第 1 步:概率基础
Expand Down
4 changes: 4 additions & 0 deletions phases/01-math-foundations/07-bayes-theorem/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,10 @@ MAP 在参数本身之上加了一个先验。如果你相信参数应该偏小

**模型比较是贝叶斯的。** 贝叶斯信息准则(BIC)、边际似然和贝叶斯因子,全都用贝叶斯推理在模型之间选择而不过拟合。

```figure
bayes-update
```

## 动手构建

### 第 1 步:贝叶斯定理函数
Expand Down
4 changes: 4 additions & 0 deletions phases/01-math-foundations/08-optimization/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,10 @@ graph TD

尖锐的最小值泛化得差。平坦的最小值泛化得好。这是带动量的 SGD 在最终测试准确率上常常胜过 Adam 的原因之一:它的噪声防止落进尖锐的最小值。

```figure
gradient-descent
```

## 动手构建

### 第 1 步:定义一个测试函数
Expand Down
4 changes: 4 additions & 0 deletions phases/01-math-foundations/09-information-theory/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,10 @@ Perplexity = e^H(P,Q) (if using nats)

GPT-2 在常见基准上达到约 30 的困惑度。现代模型在表示充分的领域里能做到个位数。

```figure
entropy-kl
```

## 动手构建

### 第 1 步:信息量和熵
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,10 @@ explained_ratio_k = eigenvalue_k / sum(all eigenvalues)

重构误差不止用来选 k。你可以用它做异常检测:重构误差高的样本是不符合学到的子空间的离群点。这是生产系统里基于 PCA 的异常检测的基础。

```figure
pca-axes
```

## 动手构建

### 第 1 步:从零写 PCA
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -360,6 +360,10 @@ It is faster and more numerically stable.

这意味着你在第 10 课学的关于降维的一切,引擎盖下都是 SVD。PCA 是 SVD 在机器学习里最常见的应用。

```figure
svd-rank-reconstruction
```

## 动手构建

### 第 1 步:用幂迭代从零写 SVD
Expand Down
4 changes: 4 additions & 0 deletions phases/01-math-foundations/12-tensor-operations/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,10 @@ graph LR

关键模式:`i,i->`(点积)、`i,j->ij`(外积)、`ii->`(迹)、`ij->ji`(转置)、`bij,bjk->bik`(批量矩阵乘法)、`bhtd,bhsd->bhts`(注意力分数)。

```figure
tensor-broadcast
```

## 动手构建

代码在 `code/tensors.py` 里。每一步都引用那里的实现。
Expand Down
4 changes: 4 additions & 0 deletions phases/01-math-foundations/13-numerical-stability/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,10 @@ LayerNorm(x) = (x - mean(x)) / (std(x) + epsilon) * gamma + beta
原因:float16 表示不了低于 6e-8 的梯度幅度或高于 65,504 的激活值。
修复:用带损失缩放的混合精度(AMP),或改用 bfloat16。

```figure
logsumexp-stability
```

## 动手构建

### 第 1 步:演示浮点精度极限
Expand Down
4 changes: 4 additions & 0 deletions phases/01-math-foundations/14-norms-and-distances/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -423,6 +423,10 @@ Product quant. Compress vectors, search FAISS (memory-constrained)

HNSW(分层可导航小世界)是现代向量数据库里占主导的算法。它构建一个多层图,每个节点连到它的近似最近邻。搜索从顶层(稀疏、长跳)开始,下降到底层(密集、短跳)。

```figure
norm-unit-balls
```

## 动手构建

### 第 1 步:所有范数和距离函数
Expand Down
4 changes: 4 additions & 0 deletions phases/01-math-foundations/16-sampling-methods/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -437,6 +437,10 @@ Reverse process (learned):

整个图像生成过程就是迭代采样:从噪声出发,每一步以学到的去噪模型为条件,采样一个噪声稍微少一点的版本。

```figure
monte-carlo-pi
```

## 动手构建

### 第 1 步:均匀采样和逆 CDF 采样
Expand Down
4 changes: 4 additions & 0 deletions phases/01-math-foundations/17-linear-systems/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -386,6 +386,10 @@ CG 用于:

**特征工程。** X^T X 的条件数告诉你特征是否共线。如果 kappa 大,就丢特征或加正则化。

```figure
linear-system-conditioning
```

## 动手构建

### 第 1 步:带部分主元的高斯消元
Expand Down
4 changes: 4 additions & 0 deletions phases/01-math-foundations/18-convex-optimization/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -381,6 +381,10 @@ Replace x_i^T x_j with K(x_i, x_j) to get the kernel trick.
| Adam | O(n) | O(n) | 深度学习默认 |
| K-FAC | O(n) | 每层 O(n) | 研究、大批量训练 |

```figure
convex-vs-nonconvex
```

## 动手构建

### 第 1 步:凸性检查器
Expand Down
4 changes: 4 additions & 0 deletions phases/01-math-foundations/19-complex-numbers/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,10 @@ graph LR
U1 --> A3
```

```figure
roots-of-unity
```

## 动手构建

### 第 1 步:Complex 类
Expand Down
4 changes: 4 additions & 0 deletions phases/01-math-foundations/20-fourier-transform/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,10 @@ Example:

真正的频率分辨率只取决于观测时间 T = N / fs。要分辨相隔 delta_f 的两个频率,你至少需要 T = 1 / delta_f 秒的数据。再多补零也改变不了这个根本极限。

```figure
fourier-synthesis
```

## 动手构建

### 第 1 步:从零写 DFT
Expand Down
4 changes: 4 additions & 0 deletions phases/01-math-foundations/21-graph-theory/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,10 @@ graph LR
| 谱聚类 | 无监督节点分组 |
| PageRank | 节点重要性、网页搜索 |

```figure
graph-degree-distribution
```

## 动手构建

### 第 1 步:从零写图类
Expand Down
4 changes: 4 additions & 0 deletions phases/01-math-foundations/22-stochastic-processes/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,10 @@ graph LR
| 马尔可夫决策过程 | 强化学习 |
| Metropolis-Hastings | 贝叶斯推断、后验采样 |

```figure
random-walk-diffusion
```

## 动手构建

### 第 1 步:随机游走模拟器
Expand Down
4 changes: 4 additions & 0 deletions phases/02-ml-fundamentals/02-linear-regression/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,10 @@ Cost = MSE + lambda * sum(w_i^2)

惩罚项抑制大权重。超参数 lambda 控制这个权衡:lambda 越大,权重越小、正则化越强。这会在后面的课里深入讲。现在你只需知道它存在,以及它为什么有用。

```figure
linear-regression-fit
```

## 动手构建

### 第 1 步:生成示例数据
Expand Down
4 changes: 4 additions & 0 deletions phases/02-ml-fundamentals/03-logistic-regression/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,10 @@ F1 = 2 * (Precision * Recall) / (Precision + Recall)
- **召回率**:当假负例代价高时(癌症筛查,你不想漏掉肿瘤)
- **F1**:当你需要一个平衡的单一指标时

```figure
logistic-sigmoid
```

## 动手构建

### 第 1 步:sigmoid 函数和数据生成
Expand Down
4 changes: 4 additions & 0 deletions phases/02-ml-fundamentals/04-decision-trees/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,10 @@ importance(feature_j) = sum over all nodes where feature_j is used:

当数据有空间或序列结构(图像、文本、音频)时,神经网络才赢。对于平铺的特征表格,树是默认选择。

```figure
decision-tree-depth
```

## 动手构建

### 第 1 步:Gini 不纯度和熵
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,10 @@ SVM 在这些场景仍然赢:
- 有清晰间隔结构的二分类
- 异常检测(单类 SVM)

```figure
svm-margin
```

## 动手构建

### 第 1 步:hinge loss 和梯度
Expand Down
4 changes: 4 additions & 0 deletions phases/02-ml-fundamentals/06-knn-and-distances/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,10 @@ prediction = sum(w_i * y_i) / sum(w_i)

KNN 回归产生分段常数(加权时分段平滑)的预测。它无法外推到训练数据范围之外。如果训练目标全在 0 到 100 之间,KNN 永远不会预测出 200。

```figure
knn-smoothness
```

## 动手构建

### 第 1 步:距离函数
Expand Down
4 changes: 4 additions & 0 deletions phases/02-ml-fundamentals/07-unsupervised-learning/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,10 @@ GMM 能建模椭圆形簇(不像 K-Means 只能球形),并天然处理重
- **DBSCAN**:噪声点按定义就是异常
- **GMM**:在所有高斯下概率都低的点是异常

```figure
kmeans-step
```

## 动手构建

### 第 1 步:从零实现 K-Means
Expand Down
4 changes: 4 additions & 0 deletions phases/02-ml-fundamentals/08-feature-engineering/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,10 @@ TF-IDF = TF * IDF

**为什么选择重要:** 一个有 10 个好特征的模型,通常胜过一个有 10 个好特征加 90 个噪声特征的模型。噪声特征给了模型在不可泛化的训练数据规律上过拟合的机会。

```figure
feature-scaling
```

## 动手构建

### 第 1 步:从零实现数值变换
Expand Down
4 changes: 4 additions & 0 deletions phases/02-ml-fundamentals/09-model-evaluation/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,10 @@ K=5 或 K=10 是标准选择。每个数据点恰好被用于验证一次。平

**测试太频繁**:每次你看测试性能再调整,就在过拟合测试集。测试集是一次性的。

```figure
precision-recall-threshold
```

## 动手构建

### 第 1 步:训练/验证/测试划分
Expand Down
4 changes: 4 additions & 0 deletions phases/02-ml-fundamentals/10-bias-variance/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,10 @@ flowchart TD
G --> H[试更复杂的模型]
```

```figure
bias-variance
```

## 动手构建

`code/bias_variance.py` 里的代码运行完整的偏差-方差分解实验。下面是逐步的做法。
Expand Down
4 changes: 4 additions & 0 deletions phases/02-ml-fundamentals/12-hyperparameter-tuning/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,10 @@ print(f"Nested CV MSE: {-outer_scores.mean():.4f} +/- {outer_scores.std():.4f}")

**拿不准时:** 随机搜索,试验次数取超参数数量的 2 倍(比如 6 个超参数 = 至少 12+ 次试验)。你会惊讶于 50 次试验的随机搜索有多频繁地打败精心设计的网格搜索。

```figure
k-fold-cv
```

## 动手构建

### 第 1 步:从零实现网格搜索
Expand Down
4 changes: 4 additions & 0 deletions phases/02-ml-fundamentals/14-naive-bayes/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,10 @@ flowchart LR
log P(class | features) = log P(class) + sum_i log P(feature_i | class)
```

```figure
naive-bayes
```

## 动手构建

`code/naive_bayes.py` 里的代码从零实现了 MultinomialNB 和 GaussianNB。
Expand Down
4 changes: 4 additions & 0 deletions phases/02-ml-fundamentals/17-imbalanced-data/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,10 @@ flowchart TD
M -->|是| O[交付]
```

```figure
class-imbalance
```

## 动手构建

### 第 1 步:生成一个不平衡数据集
Expand Down
4 changes: 4 additions & 0 deletions phases/03-deep-learning-core/01-the-perceptron/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,10 @@ AND (separable): XOR (not separable):

解法:把感知机叠成多层。一个多层感知机可以把两个线性决策组合成一个非线性决策,从而解决 XOR。

```figure
perceptron-boundary
```

## 动手构建

### 第 1 步:Perceptron 类
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,10 @@ graph LR

神经网络是可组合的。你可以把它们叠起来、串起来、并行跑。一个 Whisper 模型用一个编码器网络处理音频,再用一个独立的解码器网络生成文本。现代 LLM 是仅解码器(decoder-only)的。BERT 是仅编码器(encoder-only)的。T5 是编码器-解码器的。架构的选择决定了模型能做什么。

```figure
mlp-forward
```

## 动手构建

纯 Python,不用 numpy。每个矩阵运算都从零手写。
Expand Down
4 changes: 4 additions & 0 deletions phases/03-deep-learning-core/03-backpropagation/docs/zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,10 @@ dL/db1 = dL/dz1

每个梯度都是从损失反向追溯回来的一连串局部导数的乘积。反向传播就这么点东西。

```figure
backprop-vanishing
```

## 动手构建

### 第 1 步:Value 节点
Expand Down
Loading
Loading