Python如何使用chardet.detect() 偵測編碼(encoding)? #”charset(字符集) detection”

by 儲蓄保險王 · 2023-03-08

chardet 的英文字縮寫是

“charset(字符集) detection”，

意思是編碼偵測。

chardet 是 Python 的一個套件，

可以自動判斷文字檔案的編碼方式，

通常用在處理 CSV、JSON、XML 等純文字檔案時，

可以先使用 chardet 套件來判斷檔案的編碼，

再使用正確的編碼方式進行讀取，

避免因編碼不同而導致的資料解析錯誤。

import os
import chardet
import pandas as pd

folder = r"C:\Temp"
fname = "test.txt"
fpath = os.path.join(folder,fname)
#'C:\\Temp\\test.txt'

with open(fpath, 'rb') as f:
result = chardet.detect(f.read())
# "charset(字符集) detection"

“””注意要使用rb模式開檔,

不然會出現TypeError

File C:\ProgramData\Anaconda\lib\site-packages\chardet\__init__.py:36 in detect
raise TypeError(‘Expected object of type bytes or bytearray, got: ‘

TypeError: Expected object of type bytes or bytearray, got: <class ‘str’>

“””

print("charset(字符集) detection:\n",result)

df = pd.read_csv(fpath, encoding=result['encoding'])

Python如何使用chardet.detect() 偵測編碼(encoding)? #"charset(字符集) detection" - 儲蓄保險王

一	二	三	四	五	六	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Python如何使用chardet.detect() 偵測編碼(encoding)? #”charset(字符集) detection”

You may also like...

發佈留言取消回覆

hahow

近期文章

分類

近期留言

熱門討論

FB粉絲團

瀏覽量

月曆

Python如何使用chardet.detect() 偵測編碼(encoding)? #”charset(字符集) detection”

You may also like...

Word:表格(開發人員選項,以控制項製作表單)

Python機器學習: train_test_split() 切割資料(波士頓地區房價)為訓練資料跟測試資料; from sklearn.model_selection import train_test_split ; xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.3, random_state=42, shuffle=True)

Python TQC考題410 繪製等腰三角形,print(” “*sp + “*”*star)

發佈留言 取消回覆

hahow

近期文章

分類

近期留言

熱門討論

FB粉絲團

瀏覽量

月曆

Python TQC考題410 繪製等腰三角形,print(” “sp + “”*star)

發佈留言取消回覆