關聯規則: 購物欄分析
類別尺度資料準備
https://pse.is/3y8qjp
購物清單:
TID | BUY_ITEM |
1 | cola |
1 | egg |
1 | ham |
2 | cola |
2 | diaper |
2 | beer |
3 | cola |
3 | diaper |
3 | beer |
3 | ham |
4 | diaper |
4 | beer |
import pandas as pd
import numpy as np
fpath = r”C:\Python\P107\doc\things_saled.csv”
df = pd.read_csv(fpath)
seriesTID = df[“TID”]
#pandas.core.series.Series
setTID = set(seriesTID)
#{1, 2, 3, 4}
lis = []
for tid in setTID:
seriesBool= seriesTID == tid
boolAry = np.array(seriesBool)
#boolAry = (seriesTID == tid).to_numpy() 也可
#boolAry = (seriesTID == tid).values 也可
lisTrueIdx = boolAry.nonzero()[0].tolist()
ser = df.iloc[lisTrueIdx,1]
lis.append(ser.tolist())
參考解答:
推薦hahow線上學習python: https://igrape.net/30afN
交易資料轉換(TransactionEncoder):
再複習一次lis內容:
承接前面的程式碼
from mlxtend.preprocessing import TransactionEncoder
tr = TransactionEncoder()
trAry = tr.fit_transform(lis) #建模又順便生出trAry
dfInput = pd.DataFrame(trAry, columns = tr.columns_)
可以用自己的方法做出
trAry = tr.fit_transform(lis)
一樣的DataFrame
set1 = set(lis[0])
for i in range( 1,len(lis) ):
setTemp = set(lis[i])
set1 = set1 | setTemp
lisCol = list(set1)
#lis長度4, lisCol長度5
#lisCol = [‘ham’, ‘beer’, ‘egg’, ‘diaper’, ‘cola’]
tup = (len(lis),len(lisCol))
aryInput = np.zeros( shape= tup, dtype = bool)
dfInput = pd.DataFrame(aryInput,
index= range( len(lis) ),
columns = lisCol )
for r in range(dfInput.shape[0]):
for c in range( dfInput.shape[1] ):
if dfInput.columns[c] in lis[r]: #最關鍵的一行判斷式
dfInput.iat[r,c] = True
print(“inputed DataFrame:\n “,dfInput)
TransactionEncoder()
上表跟後面apriori演算法
做出來的DataFrame一起比較
可以意會支持度的意思
(ham出現2/4, beer出現3/4……):
加碼題:請問如果我看到一個人購物籃裡面有尿布跟可樂,我
該推薦他買啤酒嗎?請用筆計算一下信賴度,並解釋一下吧!
解答:信賴度居然是(100%) 耶,也就是有尿布與可樂的話,
啤酒也會買,這不推一下太對不起老闆了啊~
2筆訂單(index 1, 2)有尿布與可樂
這2筆訂單都有啤酒 (2/2 = 100%)
支持度/信賴度:
推薦hahow線上學習python: https://igrape.net/30afN
from mlxtend.frequent_patterns import apriori
#apriori 由因及果的;由一般到具體的;演繹的
#ap-rio-ri
apriori 演算法:
from mlxtend.frequent_patterns import apriori
#apriori 由因及果的;由一般到具體的;演繹的
import time
minSupport = 0.5
tick = time.time()
ap = apriori(dfInput,
min_support = minSupport,
use_colnames = True)
#type(ap) = pandas.core.frame.DataFrame
#ap.shape = (9, 2)
tock = time.time()
spentTime = tock – tick
print(“The apriori DataFrame:\n”,ap)
for row in range(ap.shape[0]):
print(“\tRule”,set(ap.iat[row,1]),
“found with support=”,ap.iat[row,0])
print(“I spent”,spentTime,”seconds finding”,
ap.shape[0],”rules”)
參考解答:
輸出結果:
ap = apriori(dfInput,
min_support = minSupport,
use_colnames = False)
use_colnames 預設值 False
itemsets變成0, 1, 2, 3, 4