Python 狀態機重編號 Word 標題：安全更新章節號不掉圖片的正統策略; str.isdigit() ; str.isspace() ; str.isalpha()

目的與場景
自動生成或調整 .docx 測試計畫時，常見問題：

中間刪段導致編號斷裂（1,2,4…）
標題前有不規則空白 / Tab / 手動插入的點
一行標題被拆成多個 runs（格式切換、複製貼上）
標題段落含圖片、超連結、欄位等嵌入物件
目標：只修正「章節編號 + 分隔符（統一為單一 tab）」這個前綴，不破壞後續內容與嵌入元素。

DOCX 結構（與演算法相關的最小認知）
doc.element.body 是文件主體的序列： <w:p>段落、 <w:tbl>表格、 <w:sectPr>節屬性。
每個封裝為 Paragraph；段落內部由多個 run (<w:r>) 組成。
標題在視覺上是“一行”，底層可能是：

[' ', '\t', '9', '.5.3 SDC test']
#標題文字SDC test 跟 殘留的標題號還在同一run
#需要辨識run_idx 與 character_idx
#還有無效的前導空白與\t

程式需跨 run 才能辨識：9.5.3\tSDC test

3. 為什麼不能用「合併字串 + 正則重建」？

狀態機核心理念
目標：找到「章節號（數字 + ‘.’）+ 後面分隔符（空白 / tab）」的區域，改寫成新編號 + ‘\t’，其餘保持不動。

四個狀態：

特別說明：

為什麼 leading_ws 的空白不切換成 sep？因為 sep 的語意是“編號之後的分隔符”，若在看到第一個數字前就把空白當分隔符，會讓 " 7.1 Title" 被誤判成“沒有編號，正文是 7.1”，導致編號失敗。
只有在 digits 之後遇到的空白才進入 sep，這保證分隔符的語意正確。

5. 狀態機掃描核心程式（含詳細註解）

def renumber_headings_state_machine(doc, headings, separator='\t', verbose=True):
    """
    安全重編號：只改前綴，不刪 run，不清除圖片/超連結。
    headings: List[dict]，每筆需包含：
        - start: int    (底層 body index)
        - full_number_path: str 新編號 (例 '7.1.2')
        - level: int | None  （None 代表非標準 Heading 樣式 → 跳過）
        - title: str    標題正文（顯示用）
        - raw: str      原始整段文字（可選）
    """
    from docx.text.paragraph import Paragraph
    from docx.oxml.ns import qn

    body = list(doc.element.body)
    W_P = qn('w:p')
    modified, skipped = 0, 0

    for h in headings:
        start_idx = h.get('start')
        new_number = h.get('full_number_path')
        level = h.get('level')
        title = h.get('title', '')
        raw = h.get('raw', '')

        # 基本過濾
        if level is None or not new_number or start_idx is None or start_idx >= len(body):
            skipped += 1
            continue

        el = body[start_idx]
        if el.tag != W_P:      # 只處理段落元素
            skipped += 1
            continue

        para = Paragraph(el, doc._body)
        runs = para.runs
        if not runs:
            skipped += 1
            continue

        # 狀態機初始化
        prefix_digits = []          # 收集編號本體：數字與點
        prefix_sep = []             # 收集編號後的分隔符（空白/Tab）
        phase = 'leading_ws'        # 初始狀態
        cut_position = None         # (run_index, char_index) 正文開始位置
        last_prefix_run_index = None

        # 逐 run、逐字元掃描
        for ri, r in enumerate(runs):
            txt = r.text or ''
            for ci, ch in enumerate(txt):

                if phase == 'leading_ws':
                    # 遇到數字或 '.' 才開始認定為“編號”
                    if ch.isdigit() or ch == '.':
                        phase = 'digits'
                        prefix_digits.append(ch)
                        last_prefix_run_index = ri
                    # 空白：前導垃圾，忽略。不進 sep，避免誤判“已有編號”
                    elif ch.isspace():
                        continue
                    else:
                        # 直接正文（沒有編號）→ 結束
                        phase = 'done'
                        cut_position = (ri, ci)
                        break

                elif phase == 'digits':
                    if ch.isdigit() or ch == '.':
                        prefix_digits.append(ch)
                        last_prefix_run_index = ri
                    elif ch.isspace():
                        # 編號後的第一段空白 → 分隔符開始
                        phase = 'sep'
                        prefix_sep.append(ch)
                        last_prefix_run_index = ri
                    else:
                        # 正文開始 → 結束
                        phase = 'done'
                        cut_position = (ri, ci)
                        break

                elif phase == 'sep':
                    if ch.isspace():
                        # 仍在分隔符（允許多空白與 tab 混合）
                        prefix_sep.append(ch)
                        last_prefix_run_index = ri
                    else:
                        # 正文開始 → 結束
                        phase = 'done'
                        cut_position = (ri, ci)
                        break

            if phase == 'done':
                break

        # 沒有任何編號（無 digits）→ 跳過
        if not prefix_digits:
            skipped += 1
            continue

        # 清理編號：移除頭尾孤立的點（例如 '.7.1.' → '7.1'）
        old_prefix_number = ''.join(prefix_digits).strip('.')

        # 如果正文起點在某 run 中間，取後半 fragment 以便保留
        remaining_fragment = ''
        if cut_position:
            ri, ci = cut_position
            remaining_fragment = runs[ri].text[ci:]

        # 清空所有包含前綴的 runs 文字（僅文字，run 元素還留著）
        # last_prefix_run_index：前綴影響到的最後一個 run
        for ri in range(0, (last_prefix_run_index or -1) + 1):
            runs[ri].text = ''
        """ 邏輯等價於：
        if last_prefix_run_index is not None:
          for ri in range(0, last_prefix_run_index + 1):
              runs[ri].text = ''
        """
        # 在第一個 run 寫入「新編號 + 統一分隔符 + 正文碎片（若存在）」
        runs[0].text = f"{new_number}{separator}{remaining_fragment}".rstrip()
        runs[0].text = runs[0].text.lstrip()  # 去除可能殘留的左側空白

        modified += 1
        if verbose:
            print(f"[EDIT] idx={start_idx:3} {old_prefix_number} -> {new_number} | {title[:40]}")

    if verbose:
        print(f"[SUMMARY] modified={modified} skipped={skipped}")
    return doc

6. 使用範例

headings = find_heading_boundaries_detail(doc, compute_number_path=True, verbose=False)
doc = renumber_headings_state_machine(doc, headings, separator='\t', verbose=True)
doc.save("renumbered.docx")

7. 測試案例（建議最小集）

快速檢查圖片保留：

para = Paragraph(list(doc.element.body)[h['start']], doc._body)
has_pic = any(r._element.xpath('.//w:drawing') for r in para.runs)

為什麼 leading_ws 的空白不改成 sep（再強調一次要點）
sep 的語意是“編號已結束”。
在看到第一個數字前，無法認定這些空白屬於“編號後的分隔符”。
若空白直接進 sep，” 7.1 Title” 會被誤判成“正文是 7.1”，導致不重編號。
這是編號解析準確性的關鍵分界。
常見擴充（可選）