DataFrame.groupby(by=None, axis=0, level=None,
as_index=True, sort=True, group_keys=True,
observed=False, dropna=True)
實用的, by=label or list of labels
A label or list of labels may be passed to group by the columns in self.
實用的, level=0
(通常只有一層index,
依據第0層的index做分類)
官網範例:
df.groupby([‘Animal’]).mean()
#’Animal’為index
df.groupby([‘Animal’],as_index=False).mean()
#’Animal’ 變為columns,而非index
df.groupby([‘Animal’]).mean().reset_index()
#跟as_index=False同效果,
#把 ‘Animal’ 從index重新推回columns
#as_index=False 也有機會用到,
#但使用 reset_index() 更熟悉
Hierarchical Indexes #多層index
df.groupby(level=0).mean()
推薦hahow線上學習python: https://igrape.net/30afN
如果資料夾中有兩個屬於資料的csv,
以及其他非資料非csv的檔案,
如何將兩個csv的資料做平均?
code:
import pandas as pd
import numpy as np
import glob
from typing import List
import os
folder = r"C:\Python\z correction\X Cut"
#X Cut , Y Cut資料夾需要分開
fname = "*.csv"
cut = "X" # Y
def folder_fname2lis_fpath(folder,fname="*.csv"):
fpath = "\\".join([folder,fname])
lis_fpath = glob.glob(fpath)
return lis_fpath
lis_fpath:List[str] = folder_fname2lis_fpath(folder,fname="*.csv")
# ['C:\\Python\\z correction\\X Cut\\X cut 1st_-11cm to 11cm.csv',
# 'C:\\Python\\z correction\\X Cut\\X cut 2nd_-11cm to 11cm.csv']
def lis_fpath2lisDF(lis_fpath:List[str]):
lisDF:List[pd.DataFrame] = []
for fpath in lis_fpath:
df = pd.read_csv(fpath)
df = df[["X","Y","Z"]]
# [111 rows x 3 columns]
lisDF.append(df)
return lisDF #不要誤入for迴圈的縮排中
lisDF:List[pd.DataFrame] = lis_fpath2lisDF(lis_fpath)
#lisDF 中的兩個DF,都是[111 rows x 3 columns],
#index都是0~110一致的
#重要,需確定index都一樣,後面的groupby()才不會做錯
def lisDF2avg_df(lisDF:List[pd.DataFrame]):
if len(lisDF) >= 2:
# 计算平均值
avg_df = pd.concat(lisDF).groupby(level=0).mean()
else:
avg_df = lisDF[0]
return avg_df
avg_df:pd.DataFrame = lisDF2avg_df(lisDF)
#[111 rows x 3 columns] 正確
currentFolder = os.getcwd()
#'C:\\Python\\z correction'
exFolder = "\\".join([currentFolder,"export"])
#'C:\\Python\\z correction\\export'
if not os.path.exists(exFolder):
os.makedirs(exFolder)
xlsx_path = "\\".join([exFolder,f"avg_df for {cut} Cut.xlsx"])
#'C:\\Python\\z correction\\export\\avg_df for X Cut.xlsx'
avg_df.to_excel(xlsx_path)
print(f"xlsx檔案已經輸出到:\n{xlsx_path}\n")
#C:\Python\z correction\X Cut\X Cut Avg.ods #手工算Avg
#跟手工算的Avg 比對無誤
glob.glob()可以做濾除的動作:
看一下lisDF (list中的兩個DF,長度,index都一樣):
pd.concat(lisDF):
pd.concat(lisDF)
其index從0~110,再重複一次0~110,
注意rows已經加倍
可以多assert檢查資料,
不然,就算程式能跑
結果也是錯的
檢查徹底一點,
整個index是不是一樣?
推薦hahow線上學習python: https://igrape.net/30afN
如果不使用groupby()做樞紐分析,
或者除了mean以外,
還想要保留X1,X2….的資料,
大概如下作法:
輸出結果:
pd.set_option(“display.max_columns”,None)
#顯示所有columns
使用apply( lambda ) ,效果同上:
from pandas官網:
axis{0 or ‘index’, 1 or ‘columns’}, default 0
Axis along which the function is applied:
- 0 or ‘index’: apply function to each column.
- 1 or ‘columns’: apply function to each row.
官網範例:
推薦hahow線上學習python: https://igrape.net/30afN