Python 機器學習: 如何將if ~ else ~語法寫為一列,塞進lambda函數中, pandas.DataFrame 如何使用 .apply(func) 增加新的一欄? model = tensorflow.keras.models.Sequential() #均一化資料

加入好友
加入社群
Python 機器學習: 如何將if ~ else ~語法寫為一列,塞進lambda函數中, pandas.DataFrame 如何使用 .apply(func) 增加新的一欄? model = tensorflow.keras.models.Sequential() #均一化資料 - 儲蓄保險王

本篇為前篇的衍伸
詳細可以先看前篇
epochs只設200
未均一化資料:

Python 機器學習: 如何將if ~ else ~語法寫為一列,塞進lambda函數中, pandas.DataFrame 如何使用 .apply(func) 增加新的一欄? model = tensorflow.keras.models.Sequential() #均一化資料 - 儲蓄保險王

未均一化資料
準確度為0.9444 (score)
均一化資料後
準確度達1.0

Python 機器學習: 如何將if ~ else ~語法寫為一列,塞進lambda函數中, pandas.DataFrame 如何使用 .apply(func) 增加新的一欄? model = tensorflow.keras.models.Sequential() #均一化資料 - 儲蓄保險王

均一化的code:

lis_col = df_dp.columns.to_list()
for col in lis_col:
    max_value = df_dp[col].max()
    ratio = 1/max_value
    df_dp[col] =   df_dp[col]*ratio

其實歸一化不是這樣做
除非最小值剛好是0
正確做法可以參考這裡:

Python 機器學習: 如何將if ~ else ~語法寫為一列,塞進lambda函數中, pandas.DataFrame 如何使用 .apply(func) 增加新的一欄? model = tensorflow.keras.models.Sequential() #均一化資料 - 儲蓄保險王

完整code:

# -*- coding: utf-8 -*-
"""
Created on Sun Nov 12 15:27:56 2023

@author: SavingKing
"""

import os
import pandas as pd
import numpy as np
dirname = r"P:\Python class\powen\AI人工智慧自然語言與語音語意辨識開發應用實務班\powen2_AI人工智慧自然語言與語音語意辨識開發應用實務班\day6\ch07-MLP-flower"
basename = "weather.csv"

fpath = os.path.join(dirname,basename)
#'C:\\Users\\SavingKing\\Downloads\\day6\\ch07-MLP-flower\\weather.csv'

df = pd.read_csv(fpath)

func = lambda strr: 1 if strr == "Yes" else 0

df["RainTomorrow_01"] = df["RainTomorrow"].apply( func )

lis_drop_col = ["WindGustDir", "WindDir9am","WindDir3pm","RainToday","RainTomorrow"]
#非數值資料的欄位先drop只留下數值資料的欄位
#WindSpeed9am 資料含有NA

df_dp = df.drop(columns = lis_drop_col)
#[366 rows x 18 columns]


#資料中含有NA,需要先dropna做資料清洗,不然loss, predict會出現nan
df_dp.dropna(axis=0, how='any', subset=None, inplace=True)
#[354 rows x 18 columns]

lis_col = df_dp.columns.to_list()
for col in lis_col:
    max_value = df_dp[col].max()
    ratio = 1/max_value
    df_dp[col] =   df_dp[col]*ratio

X = df_dp.drop(columns="RainTomorrow_01").values
#[366 rows x 17 columns]
Y = df_dp.iloc[:,-1].values
#pandas.core.series.Series
"""
X.shape
Out[61]: (366, 17)

Y.shape
Out[62]: (366,)
"""

from sklearn.model_selection import train_test_split
import tensorflow as tf

category=2   #下雨/不下雨 分別用1 0 代表, 兩種結果
dim=X.shape[1] #有幾個特徵值?
x_train , x_test , y_train , y_test = train_test_split(X,Y,test_size=0.05)
y_train2=tf.keras.utils.to_categorical(y_train, num_classes=(category))
y_test2=tf.keras.utils.to_categorical(y_test, num_classes=(category))
#One-hot Encoding 單熱編碼
#y_train y_test 都是一維的資料

print("x_train[:4]",x_train[:4])
print("y_train[:4]",y_train[:4])
print("y_train2[:4]",y_train2[:4])

# 建立模型
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(units=200,
    activation=tf.nn.relu,
    input_dim=dim)) #input_dim 有幾個特徵值不能錯
model.add(tf.keras.layers.Dense(units=200,
    activation=tf.nn.relu ))
model.add(tf.keras.layers.Dense(units=200,
    activation=tf.nn.relu ))
model.add(tf.keras.layers.Dense(units=200,
    activation=tf.nn.relu ))
model.add(tf.keras.layers.Dense(units=category,
    activation=tf.nn.softmax )) #最後一層units=category也不能錯
model.compile(optimizer='adam',
    loss=tf.keras.losses.categorical_crossentropy,
    metrics=['accuracy'])
model.fit(x_train, y_train2,
          epochs=200,
          batch_size=64)

#測試
model.summary()

score = model.evaluate(x_test, y_test2)
print("score:",score)

predict = model.predict(x_test)
print("predict:",predict)
lis_ans = [np.argmax(predict[i]) for i in range(predict.shape[0]) ]
print("Ans:\t",lis_ans)
# print("Ans:",np.argmax(predict[0]),np.argmax(predict[1]),np.argmax(predict[2]),np.argmax(predict[3]))

# =============================================================================
# predict2 = model.predict_classes(x_test)
# print("predict_classes:",predict2)
# print("y_test",y_test[:])
# =============================================================================

資料清洗(dropna)
以及均一化
都滿有幫助

推薦hahow線上學習python: https://igrape.net/30afN

正確的均一化:

lis_col = df_dp.columns.to_list()
for col in lis_col:
    max_value = df_dp[col].max()
    min_value = df_dp[col].min()
    rng = max_value-min_value
    
    df_dp[col] =   (df_dp[col]-min_value)/rng
df_dp.to_excel("./df_dp.xlsx")

df_dp.xlsx 部分內容(數值都在0~1):

MinTempMaxTempRainfallEvaporationSunshineWindGustSpeedWindSpeed9amWindSpeed3pmHumidity9amHumidity3pmPressure9amPressure3pmCloud9amCloud3pmTemp9amTemp3pmRISK_MMRainTomorrow_01
00.507633587786260.59219858156028400.2352941176470590.4632352941176470.20.1463414634146340.3846153846153850.5079365079365080.1927710843373490.5918367346938780.50.8750.8750.581300813008130.6292517006802720.09045226130653271
10.7366412213740460.6843971631205670.09045226130653270.3088235294117650.7132352941176470.3058823529411770.09756097560975610.3269230769230770.6984126984126980.277108433734940.4056122448979580.3186813186813190.6250.3750.7073170731707320.7006802721088440.09045226130653271
20.7251908396946570.5602836879432620.09045226130653270.4117647058823530.2426470588235290.8470588235294120.1463414634146340.1153846153846150.730158730158730.6746987951807230.3316326530612240.28571428571428810.8750.6219512195121950.51360544217687111
30.7099236641221380.28014184397163110.5147058823529410.6691176470588240.4823529411764710.7317073170731710.4615384615384620.4126984126984130.5180722891566270.2295918367346940.2802197802197810.250.8750.5447154471544720.3061224489795920.07035175879396991
40.492366412213740.3014184397163120.07035175879396990.3970588235294120.7794117647058820.4352941176470590.4878048780487810.5384615384615380.5079365079365080.4337349397590360.556122448979590.5961538461538460.8750.8750.4471544715447160.35034013605442200
50.4389312977099240.32978723404255300.4117647058823530.6029411764705880.3647058823529410.4878048780487810.4615384615384620.539682539682540.5301204819277110.696428571428570.6840659340659350.8750.6250.4390243902439030.3299319727891160.00502512562814070
60.4351145038167940.3758865248226950.00502512562814070.2941176470588240.6176470588235290.3529411764705880.4634146341463420.50.4285714285714290.4096385542168680.7168367346938740.6978021978021990.50.750.50.41496598639455800
70.5190839694656490.33333333333333300.3970588235294120.3382352941176470.3294117647058820.2682926829268290.4615384615384620.460317460317460.5301204819277110.757653061224490.7527472527472530.750.8750.4878048780487810.3537414965986400
80.5381679389312980.42198581560283700.2794117647058820.3014705882352940.4117647058823530.4634146341463420.3269230769230770.539682539682540.4216867469879520.7551020408163230.7115384615384620.8750.8750.5691056910569110.4693877551020410.4070351758793971
90.5229007633587790.5390070921985820.4070351758793970.3823529411764710.5661764705882350.2117647058823530.1707317073170730.1153846153846150.730158730158730.2289156626506020.7040816326530580.6565934065934080.8750.1250.5365853658536590.56462585034013600
100.5496183206106870.62411347517730500.2941176470588240.8750.20.1463414634146340.1730769230769230.6031746031746030.2530120481927710.7117346938775530.6675824175824180.1250.250.5894308943089430.6428571428571430.00502512562814070
110.5267175572519080.6985815602836880.00502512562814070.5147058823529410.9191176470588240.3294117647058820.04878048780487810.2884615384615380.2857142857142860.2650602409638550.696428571428570.63461538461538400.3750.6788617886178860.71088435374149700
120.5877862595419850.71985815602836900.5147058823529410.9558823529411770.20.1463414634146340.1346153846153850.4126984126984130.1927710843373490.6505102040816320.55769230769230800.1250.6869918699186990.74829931972789100
130.664122137404580.82624113475177300.4411764705882350.9117647058823530.3647058823529410.1707317073170730.3846153846153850.4920634920634920.08433734939759040.5306122448979570.4478021978021990.1250.50.7967479674796750.87074829931972800
140.5877862595419850.83687943262411400.6323529411764710.9632352941176470.3294117647058820.1463414634146340.3846153846153850.1428571428571430.0361445783132530.5535714285714290.46428571428571600.1250.756097560975610.86054421768707500
150.6755725190839690.86879432624113500.6029411764705880.8161764705882350.3882352941176470.1707317073170730.1730769230769230.539682539682540.1084337349397590.5459183673469380.43956043956043800.3750.7723577235772360.87074829931972800

執行結果(準確度1.0):

Python 機器學習: 如何將if ~ else ~語法寫為一列,塞進lambda函數中, pandas.DataFrame 如何使用 .apply(func) 增加新的一欄? model = tensorflow.keras.models.Sequential() #均一化資料 - 儲蓄保險王

推薦hahow線上學習python: https://igrape.net/30afN

加入好友
加入社群
Python 機器學習: 如何將if ~ else ~語法寫為一列,塞進lambda函數中, pandas.DataFrame 如何使用 .apply(func) 增加新的一欄? model = tensorflow.keras.models.Sequential() #均一化資料 - 儲蓄保險王

儲蓄保險王

儲蓄險是板主最喜愛的儲蓄工具,最喜愛的投資理財工具則是ETF,最喜愛的省錢工具則是信用卡

You may also like...

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *