用 python-docx 刪除多個章節：最簡教學（半開區間 + 元素引用去重策略）

示例用的test_document.docx:

Heading 1的標題:

你會學到什麼
如何定位「標題章節」的範圍（[start, end) 半開區間）
為什麼不需要「合併重疊區間」也能安全刪除
利用「元素引用去重」避免索引位移問題
保留或補回文件末尾必需的（否則後續插入表格可能出錯）
為什麼不要一開始就做“索引刪除 + 區間合併”
傳統想法：先找出每個要刪的區間，合併重疊與相鄰，再逆序刪除，避免索引位移。
簡化想法：直接把每個區間對應的「XML 元素物件」存起來，最後用元素引用刪除；即使某元素在不同區間重複出現，只要最後「去重」再刪就好 —— 完全不需要在意索引變動。

核心概念：半開區間 [start, end)
當我們說某個標題章節範圍是 [start, end)：

start = 標題元素本身的索引（在 doc.element.body 序列裡）
end = 下一個同級標題的索引；如果沒有下一個，就等於 len(body)
被刪除的實際元素集合：start, start+1, …, end-1（不包含 end）
好處：兩段「相鄰」時（例如 [10,20) 和 [20,30)），沒有重疊元素

整體流程（簡化版）
取得 body：body = doc.element.body
掃描出所有符合樣式（例如 Heading 1）的標題索引
對每個「要刪的文字關鍵字」：
找出所有符合該文字的標題索引
為每個標題計算 end（下一個標題索引或文件尾）
得到多個 (start, end, heading_text)
將 body[start:end] 的元素引用全部收集進 to_remove
對收集到的元素引用做「按 id 去重」（避免重複刪）
刪除（跳過末尾）
若不存在（意外被刪），補一個空節點
精簡版：標題邊界偵測函式
教學用版本只支援：

指定樣式（預設 “Heading 1″）
關鍵字子字串（不分大小寫）
回傳：[(start, end, heading_text), …]

from docx.text.paragraph import Paragraph

def find_heading_boundaries_min(doc, target_text="", style_name="Heading 1"):
    """
    回傳所有符合 target_text 的指定樣式標題章節邊界 (start, end, heading_text)。
    - 子字串判斷（不分大小寫）
    - style_name 為單一字串
    - 半開區間 [start, end)
    """
    body = doc.element.body
    total = len(body)
    target_lower = target_text.lower().strip()

    # 1. 取得該樣式的所有標題索引
    heading_indices = []
    for i, el in enumerate(body):
        if type(el).__name__ == "CT_P":  # paragraph
            p = Paragraph(el, doc)
            if p.style.name == style_name:
                heading_indices.append(i)

    results = []
    for idx in heading_indices:
        p = Paragraph(body[idx], doc)
        text_lower = p.text.lower()
        # 若 target_text 為空字串 => 視為列出所有此樣式標題
        if (not target_lower) or (target_lower in text_lower):
            # 找下一個同級標題，沒有則到尾
            end = next((h for h in heading_indices if h > idx), total)
            results.append((idx, end, p.text))
    return results

精簡版：刪除章節（整合版，不外部依賴）
版本：不接受 exact、case_sensitive 等參數，就是教學最小化。

from docx import Document

def remove_sections_simple_min(doc, remove_texts, style_name="Heading 1"):
    """
    刪除多個標題章節（含內容）。
    - remove_texts: List[str] 關鍵字（子字串，不分大小寫）
    - style_name: 限定標題層級，例如 'Heading 1'
    - 若某關鍵字匹配多個同名標題，全部刪除
    - 不做區間合併；採元素引用去重策略
    - 保留最後 <w:sectPr>，缺失則補上
    """
    if not remove_texts:
        return doc

    # ---- 蒐集所有候選 (start,end,heading_text) ----
    sections = []
    for text in remove_texts:
        sections.extend(find_heading_boundaries_min(doc, target_text=text, style_name=style_name))
    if not sections:
        return doc

    body = doc.element.body
    body_list = list(body)  # 快照（避免刪除過程改變索引）

    # ---- 收集元素引用 ----
    to_remove = []
    for start, end, _t in sections:
        s = max(0, start)
        e = min(end, len(body_list))
        if s < e:
            to_remove.extend(body_list[s:e])

    if not to_remove:
        return doc

    # ---- 去重 ----
    seen = set()
    uniq = []
    for el in to_remove:
        ident = id(el)
        if ident not in seen:
            seen.add(ident)
            uniq.append(el)

    # ---- 刪除（避免刪掉最後 sectPr） ----
    last_elem = body[-1] if len(body) else None
    def is_last_sectpr(node):
        return node is last_elem and getattr(node, 'tag', '').lower().endswith('sectpr')

    for el in uniq:
        if is_last_sectpr(el):
            continue
        try:
            body.remove(el)
        except Exception:
            pass  # 已刪過（極少發生）

    # ---- 確保仍有 sectPr ----
    if body.sectPr is None:
        from docx.oxml import OxmlElement
        body.append(OxmlElement('w:sectPr'))

    return doc

7. 實際範例

# %%
from docx import Document

doc = Document(r"D:\\Temp\\test_document.docx")

remove_texts = [
    "環境測試要求",
    "品質標準",
]

doc = remove_sections_simple_min(doc, remove_texts, style_name="Heading 1")
doc.save(r"D:\\Temp\\after_remove.docx")

after_remove.docx

8. 視覺化

9. Edge Cases（邊界情況）

10. 常見 FAQ
Q: 為什麼不用 reverse index 刪除？
A: 因為用元素引用（el）直接刪，和索引無關，不會位移。

Q: 為什麼不用合併區間？
A: 合併只是減少迴圈次數；對正確性沒有必要。

Q: 可以支援 Heading 1 + Heading 2 一起嗎？
A: 可以，將 style_name 改成接受 list，判斷時 if p.style.name in style_names: 即可。

Q: 如果我要「只刪第一個」匹配標題？
A: 取 sections[0:1] 再執行後續步驟即可。

延伸練習（可自行加）
加入 dry_run=True：只回傳會刪的 heading 清單。
回傳統計 dict：{‘sections’: len(sections), ‘unique_elements’: len(uniq)}
支援多層樣式：style_name 改成 list。
支援精確匹配：把子字串判斷換成 ==。
提供 preserve_styles = True：刪前先記錄每個 heading 的文字與 style。
最小對照摘要