Python tensorflow深度學習中的過擬合防治：早停策略,動態調整學習率,Dropout,正則化調整loss

by 儲蓄保險王 · 2025-06-18

在深度學習中，
過擬合（Overfitting） 是模型訓練中常見的問題。
當模型在訓練集上表現良好，
但在驗證集或測試集上性能下降時，
就發生了過擬合。為了解決這一問題，
我們可以使用多種技術來提升模型的泛化能力，
例如正則化、早停、Dropout 和動態調整學習率等。
本文將詳細介紹這些技術的概念、
數學基礎及其實現方式，
並重點解析 偏置項（Bias） 和
權重正則化（Kernel Regularization） 的作用。

1. 過擬合是什麼？為什麼需要正則化？

1.1 過擬合的表現

訓練損失持續下降，但驗證損失停止下降甚至開始增加。
測試集上的準確率大幅低於訓練集。
模型過於複雜，學到了訓練數據中的噪聲，導致泛化能力下降。

Python tensorflow深度學習中的過擬合防治：早停策略,動態調整學習率,Dropout,正則化調整loss - 儲蓄保險王

1.2 正則化的目標

正則化的核心是限制模型的複雜度，
防止模型過於擬合訓練數據。
這是通過在損失函數中添加懲罰項來實現的，
常見的正則化方法包括：

L1 正則化：懲罰參數的絕對值，產生稀疏模型。
L2 正則化（權重衰減）：懲罰參數的平方，防止過大的權重。

2. 偏置項與權重的作用與區別

在神經網絡中，神經元的輸出計算公式為：

y=f(w⋅x+b)

w 是權重向量，控制輸入特徵的重要性。
b 是偏置項，用於調整輸出的偏移量。

2.1 偏置項的作用

幾何意義：偏置項使得決策邊界不必通過原點，從而更靈活地適應數據分布。
數學意義：當所有輸入為零時，偏置項決定了神經元的輸出。

2.2 權重與偏置的區別

偏置項與輸出數量相關：偏置項的數量只取決於輸出神經元的數量。
權重與輸入數量相關：權重的數量與輸入和輸出神經元數量的積成正比。

輸入層：
 x1    x2    x3    x4
  |     |     |     |
 [w11] [w12] [w13] [w14]  <- 對應神經元 1 的權重
  |     |     |     |
 [w21] [w22] [w23] [w24]  <- 對應神經元 2 的權重
  |     |     |     |
 [w31] [w32] [w33] [w34]  <- 對應神經元 3 的權重
  |     |     |     |
偏置： b1    b2    b3      <- 偏置項，只與輸出神經元數量一致

3. 防止過擬合的技巧

3.1 正則化損失函數

正則化通過對模型參數施加懲罰來限制模型的複雜性。

L2 正則化實現（TensorFlow 示例）

import tensorflow as tf

# 定義模型
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', 
    kernel_regularizer=tf.keras.regularizers.l2(0.01), bias_regularizer=tf.keras.regularizers.l2(0.01)),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

# 編譯模型
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

kernel_regularizer 用於權重正則化。
bias_regularizer 用於偏置正則化
（較少使用，但在某些情況下有幫助）。

L1 正則化懲罰參數的絕對值（abs(w) ），鼓勵參數稀疏化。
L2 正則化懲罰參數的平方值（w*w），鼓勵權重值較小但不為零。
L1_L2 正則化結合了 L1 和 L2 的優點，
既能產生稀疏解（L1），又能平滑權重分佈（L2）。
TensorFlow 提供了內建的正則化器，支持 L1、L2 和 L1_L2 正則化。

L1 正則化

from tensorflow.keras.regularizers import l1

# 使用 L1 正則化
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', 
    kernel_regularizer=l1(0.01))
])

L2 正則化

from tensorflow.keras.regularizers import l2

# 使用 L2 正則化
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', 
    kernel_regularizer=l2(0.01))
])

L1_L2 正則化

from tensorflow.keras.regularizers import l1_l2

# 使用 L1 和 L2 結合的正則化
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', kernel_regularizer=l1_l2(l1=0.01, l2=0.01))
])

3.2 早停（Early Stopping）

早停是在驗證集性能不再提高時自動停止訓練，防止過擬合。

TensorFlow 早停實現

from tensorflow.keras.callbacks import EarlyStopping

early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

history = model.fit(
    train_data, train_labels,
    validation_data=(val_data, val_labels),
    epochs=100,
    callbacks=[early_stopping]
)

3.3 Dropout

Dropout 是隨機關閉部分神經元的輸出，
降低模型對某些神經元的依賴，從而防止過擬合。

Dropout 實現

model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.5),  # 50% 的神經元被隨機屏蔽
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

3.4 動態調整學習率

動態調整學習率可以根據訓練進展自動調整模型的學習率。

ReduceLROnPlateau

當監控的指標在若干 epoch 中不再改善時，減小學習率。

from tensorflow.keras.callbacks import ReduceLROnPlateau
#ReduceLROnPlateau = 
#Reduce LR(Learning Rate) On Plateau(停滯)

reduce_lr = ReduceLROnPlateau(monitor='val_loss', 
factor=0.5, patience=3, min_lr=1e-6)

history = model.fit(
    train_data, train_labels,
    validation_data=(val_data, val_labels),
    epochs=100,
    callbacks=[reduce_lr]
)

學習率調度器

可以根據 epoch 動態調整學習率。

from tensorflow.keras.callbacks import LearningRateScheduler

def scheduler(epoch, lr):
    if epoch < 10:
        return lr
    else:
        return lr * tf.math.exp(-0.1)

lr_scheduler = LearningRateScheduler(scheduler)

history = model.fit(
    train_data, train_labels,
    validation_data=(val_data, val_labels),
    epochs=50,
    callbacks=[lr_scheduler] 
    #callbacks 可以同時使用 早停 與 動態調整學習率
)

tensorflow基礎模型:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
def create_model(input_dim):
    """
    input_dim: 輸入特徵的維度
    """
    model = Sequential([
        Dense(64, activation='relu', input_dim=input_dim),#input_shape=(16,)

        Dropout(0.2),
        #加入正則化 kernel_regularizer bias_regularizer
        Dense(32, activation='relu', 
              kernel_regularizer=tf.keras.regularizers.l2(0.001), 
              bias_regularizer=tf.keras.regularizers.l2(0.001)),
        Dropout(0.2),
        Dense(16, activation='relu', 
              kernel_regularizer=tf.keras.regularizers.l2(0.001), 
              bias_regularizer=tf.keras.regularizers.l2(0.001)),

        Dense(1)  # 回歸問題，輸出層不使用激活函數
    ])
    
    model.compile(
        optimizer=Adam(learning_rate=0.001),
        loss='mse',
        metrics=['mae'] #['mae', 'mse']
    )
    return model

model = create_model(input_dim)
# 訓練模型
history = model.fit(
                X_train_scaled, y_train_scaled,
                epochs=100,
                batch_size=32,
                validation_split=0.2,
                # 自動從訓練數據中取20%作為驗證集,torch需手動實現
                verbose=0,
                callbacks=[early_stopping, reduceLR] 
                #early_stopping, reduceLR 前面需要定義
            )

輸入層的 input_dim 確實必須等於特徵值數量。
輸出層的神經元數量依據問題類型而定：

回歸：1 (不使用激活函數)。
二元分類：1（Sigmoid 激活函數,將輸出的範圍壓縮到 [0,1]，
表示輸入屬於某一類的概率）。
多元分類：類別數（Softmax 激活函數,將輸出的結果轉換為概率分佈，
所有類別的概率加總為 1）。

4. 總結與實踐建議

4.1 策略選擇

模型簡單時：使用正則化損失函數（L1/L2 正則化）。
模型複雜時：結合 Dropout 和早停。
訓練不穩定時：使用動態調整學習率。

4.2 總結

過擬合是深度學習中不可忽視的問題，本文從偏置項與權重正則化的基礎原理，到具體的實現方法，系統性地介紹了防止過擬合的多種技巧。這些方法可以單獨使用，也可以結合使用，根據模型的需求靈活調整，從而提高模型的泛化能力，實現更高的性能表現。

儲蓄保險王

儲蓄險是板主最喜愛的儲蓄工具,最喜愛的投資理財工具則是ETF,最喜愛的省錢工具則是信用卡

Python tensorflow深度學習中的過擬合防治：早停策略,動態調整學習率,Dropout,正則化調整loss

1. 過擬合是什麼？為什麼需要正則化？

1.1 過擬合的表現

1.2 正則化的目標

2. 偏置項與權重的作用與區別

2.1 偏置項的作用

2.2 權重與偏置的區別

3. 防止過擬合的技巧

3.1 正則化損失函數

L2 正則化實現（TensorFlow 示例）

3.2 早停（Early Stopping）

TensorFlow 早停實現

3.3 Dropout

Dropout 實現

3.4 動態調整學習率

ReduceLROnPlateau

學習率調度器

4. 總結與實踐建議

4.1 策略選擇

4.2 總結

You may also like...

發佈留言取消回覆

hahow

近期文章

分類

近期留言

熱門討論

FB粉絲團

瀏覽量

月曆

2025 年 8 月
一	二	三	四	五	六	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Python tensorflow深度學習中的過擬合防治：早停策略,動態調整學習率,Dropout,正則化調整loss

1. 過擬合是什麼？為什麼需要正則化？

1.1 過擬合的表現

1.2 正則化的目標

2. 偏置項與權重的作用與區別

2.1 偏置項的作用

2.2 權重與偏置的區別

3. 防止過擬合的技巧

3.1 正則化損失函數

L2 正則化實現（TensorFlow 示例）

3.2 早停（Early Stopping）

TensorFlow 早停實現

3.3 Dropout

Dropout 實現

3.4 動態調整學習率

ReduceLROnPlateau

學習率調度器

4. 總結與實踐建議

4.1 策略選擇

4.2 總結

You may also like...

Python TQC考題308 迴圈位數加總,真的依題意把輸入值當數字很容易出錯,當字串並轉化為list會很好做,

PYTHON TQC考題110_正 n 邊形面積計算,import math, math.pi, math.tan(), math.pow(), ** 次方

Python: 如何取得bool numpy.ndarray中, 元素==True的index? boolAry.nonzero() ; pandas.Series() 的isin() 函式

發佈留言 取消回覆

hahow

近期文章

分類

近期留言

熱門討論

FB粉絲團

瀏覽量

月曆

發佈留言取消回覆