인포데믹 Epistemic–Causal Layered Architecture (2026-03-09 10:00)

인지 단위 분해부터 인과적 위험 시뮬레이션까지의 통합 대응 파이프라인

Epistemic Logic Causal DAG

💡 시스템 핵심 목표 및 원칙 (혼동 금지 규칙)

복잡한 인포데믹 현상을 정확히 진단하기 위해, 문장을 '주장(Claim)'과 '근거(Evidence)'로 먼저 분해합니다. 이를 통해 정보 자체의 믿음(신뢰도)과 사회적 위험도(파급력)를 완전히 분리하여 독립적으로 계산합니다.
핵심 원칙: “무엇이 사실인가(진위)”와 “무엇이 확산/피해를 일으키는가(전파·영향 인과)”를 절대 같은 층에서 섞지 않습니다. (거짓이어도 위험하지 않을 수 있고, 사실이어도 큰 패닉을 유발할 수 있기 때문입니다.)
이러한 분리 설계를 통해 (1) 기계가 논리적으로 판단 가능한 단위(CAU)를 만들고, (2) 이 정보가 시스템 수준에서 어떻게 위험을 증폭시키는지(DAG) 객관적으로 모델링하며, (3) 부작용 없는 안전한 개입(Intervention) 시나리오를 시뮬레이션합니다.
명심할 점: Claim과 Evidence의 연결은 서로를 뒷받침하거나 반박하는 “인식론적 관계(Epistemic)”일 뿐, 그 자체로 확산을 일으키는 “인과 관계(Causal)”가 아님을 명확히 정의합니다.

Layer 0. Ingestion & Normalization 데이터 수집 및 정규화

역할 및 산출물

뉴스 기사, SNS 게시글은 물론 영상 자막(STT), 이미지 속 텍스트(OCR) 등 다양한 형태(멀티모달)의 원천 데이터를 수집하고, 기계가 이해하기 쉬운 텍스트 형식으로 표준화합니다.
분석의 투명성과 재현성을 보장하기 위해 수집 당시의 원문, AI가 추출한 결과물, 사용된 모델의 버전 정보를 하나도 빠짐없이 100% 보존합니다.

DB 설계 및 표준 객체

ContentItem (원문 보존) ExtractedArtifact (추출물) FeatureVector (임베딩 벡터)

⬇

Layer 1. Epistemic Layer Claim–Evidence 구조화 (팩트체크)

판단 대상 생성 (CAU)

긴 글 속에서 진짜 핵심이 되는 '주장(Claim)' 단위만을 정교하게 분리해 낸 뒤, 이를 검증할 수 있는 공식 '근거(Evidence)' 후보군과 매칭합니다.
단순한 키워드 매칭이 아니라 주장과 근거가 서로 지지(Supports)하는지 반박(Contradicts)하는지 인식론적 관계를 분석하여 최종 '주장 평가 단위(CAU)'를 산출합니다.

핵심 알고리즘 및 안전장치

작동 순서: 문장 내 주장 탐지 → RAG 모델을 활용한 외부 신뢰성 높은 근거 검색 → 주장과 근거의 스탠스(Stance) 분류 → 최종 신뢰도(Epistemic) 스코어링의 파이프라인을 거칩니다.
안전장치: 이 평가(CAU)는 절대적인 진리가 아니라 '현재 확보된 근거'를 바탕으로 한 시점별 결과이므로, 새로운 근거가 발견되면 유연하게 갱신(Versioning)됩니다.

핵심 데이터 구조

claim_id (주장 고유키) evidence_id (근거 고유키) cau_id (최종 평가 결과) epistemic_score (신뢰 점수)

Slide 1 / 24

전파 영향 예측 및 시뮬레이션 레이어

실제 사회적 피해를 유발하는 위험 요인 탐지와 선제적 대응 전략 도출

Risk Detection (위험 탐지) do-calculus (개입 시뮬레이션)

Layer 2. Causal Structure Layer 전파·영향 인과 구조 레이어

위험 증폭 요인 모델링

이전 단계의 '진위 여부'에서 벗어나, 해당 정보가 어떤 속도로 퍼지고(Time-series), 누가 퍼뜨리며(Source), 어떤 알고리즘을 타는지(Channel) 등 실제 피해를 키우는 행동 및 관측 변수들을 분석합니다.
다양한 변수들을 조합하여 '무엇이 확산을 가속화하는가'에 대한 통계적 인과 가설(Causal DAG)을 세우고, 불확실성을 포함한 확산 궤적을 예측합니다.
단순히 조회수가 높은 것이 아니라 봇(Bot)의 개입 여부, 사용자의 감정 자극도, 지역적 민감도 등 다차원적인 리스크 인자를 종합합니다.

causal_graph_info (그래프 메타) causal_graph_edge (변수 간 인과 가중치)

⚠️ 안전장치 및 한계

이곳에서 도출된 인과 지도(DAG) 결과를 역으로 "이 정보는 널리 퍼지니까 진짜/가짜다"라고 단정 짓는 데 사용해서는 안 됩니다.
이 레이어의 목적은 오직 "현재 가장 심각한 리스크 증폭 요인이 무엇인지"를 찾아내어 다음 단계의 '운영 개입' 우선순위를 정하는 데 있습니다.

⬇

Layer 3. Intervention & Simulation Layer 개입·시뮬레이션 레이어

Do-연산 기반 선제적 대응 시나리오

"만약 우리가 특정 조치(X)를 취한다면, 예상되는 리스크와 피해가 얼마나 줄어들 것인가?"를 행동에 옮기기 전에 미리 추정하고 테스트합니다.
단순 예측을 넘어, 실제 플랫폼의 제재 수단(팩트체크 라벨링 부착, 노출 알고리즘 저감, 악성 계정 차단, 반박 콘텐츠 적극 배포 등)을 수학적 변수(do-연산)로 입력하여 인과론적 시뮬레이션을 수행합니다.

시뮬레이션 출력 지표 (Effect Report)

expected_risk_delta: 해당 조치 실행 시 예상되는 평균적인 리스크 감소 효과 수치
credible_interval: 예측 결과의 변동 가능성을 보여주는 불확실성 신뢰 구간
recommended_actions: 비용 대비 효과가 가장 뛰어난 시스템 추천 우선순위 조치안

Slide 2 / 24

운영 거버넌스 및 XAI 확장 아키텍처

실무 적용을 위한 제품화(API/UI) 및 의미망(Semantic) 기반 판단 구조 진화

Productization (제품화) Semantic XAI (설명 가능한 AI)

Layer 4. Productization Layer

근거 기반 UI 및 관리 거버넌스

방역당국이나 분석관 등 실무자가 AI의 블랙박스에 의존하지 않고, 투명하게 제시된 '근거'를 바탕으로 최종 의사결정을 내릴 수 있도록 통합 대시보드(UI)를 제공합니다.
주요 화면 구성: 주장과 근거를 대조하는 Claim Workspace, 위험 증폭 요인을 시각화한 Causal Map, 조치 효과를 엿보는 What-if Simulator 등
누가 언제 어떤 판단과 개입을 했는지 투명하게 기록하는 감사 추적(Audit) 로그와 버저닝(Versioning) 시스템을 강제하여 정책적 책임 소재를 명확히 합니다.

타 시스템 연동용 핵심 API 설계

GET /cau/{id}: 특정 주장에 대한 신뢰도 평가 결과와 근거 자료 상세 조회
GET /causal/graphs: 특정 시간대별 여론 확산의 인과 그래프 시각화 데이터 조회
POST /interventions/simulate: 특정 조치(라벨링 등) 적용 시 예상 파급 효과 시뮬레이션 요청

Appendix A. Semantic Chunk 기반 확장 설계 (심화)

단순 형태소 분석에서 '판단 단위'로의 진화

기존의 텍스트 분석이 단순 단어(Token) 매칭에 불과했다면, 본 시스템은 문장 내에서 의미적으로 단단히 묶여 인지되는 최소 단위인 '의미 덩어리(Semantic Chunk)'를 Epistemic Layer의 핵심으로 통합합니다.
이를 통해 "전문가들에 따르면~(인용)", "확실치 않지만~(추정)", "결코 ~하지 않다(강한 부정)"와 같이 사람만이 이해할 수 있는 미묘한 언어적 뉘앙스(Hedge, Attribution)를 AI가 수학적 규칙으로 정교하게 계산해 냅니다.
단순한 감성(긍정/부정) 점수 계산을 넘어, 이 정보가 얼마나 '확실성을 띄고 있는지(Belief Weight)'를 확률적으로 업데이트하는 베이지안(Bayesian) 추론 모델의 뼈대가 됩니다.

구조적 주의사항

발화자가 "아마도(might, possibly)" 같은 추정 표현(Modal hedge)을 사용한 청크가 발견되면, 시스템은 해당 주장에 대한 강한 신뢰 가중치(Belief weight)를 자동으로 감쇠(Downscale)시킵니다.
Semantic chunk 간의 연결 고리는 어디까지나 '문맥과 의미'를 파악하기 위한 인식론적 결합일 뿐, 사회적 확산을 나타내는 인과 그래프(Causal DAG)와 동일 선상에서 혼용하여 분석해서는 안 됩니다.

Slide 3 / 24

Sentence Structure Parsing Layer 추가 설계

Semantic Chunk 이전에 문장 구조를 해석해야 Claim의 진짜 스코프와 강도를 계산할 수 있음

Sentence Structure Clause Graph

왜 이 레이어가 필요한가

같은 단어라도 문장 구조가 다르면 claim의 방향·강도·주체·범위(scope)가 완전히 달라집니다. 따라서 semantic chunk만으로는 불충분하며, 그 앞단에 sentence structure 해석 계층이 반드시 필요합니다.
특히 인포데믹 문장은 인용, 전언, 추정, 조건, 반박, 양보가 얽혀 있는 경우가 많아 주절(main clause) / 내포절(embedded clause) / 인용절(reporting clause) 구분이 핵심입니다.

해석 파이프라인

1) 구문 분석

Dependency parsing + constituency parsing으로 절 경계, 주절/종속절, 인용절 구조를 복원

→

2) Scope 계산

negation, modality, attribution, quantifier의 적용 범위를 계산해 proposition에 부착

→

3) Proposition 추출

평가 가능한 절/내포절만 분리해 claim candidate 또는 evidence candidate로 전환

핵심 산출물

sentence_tree clause_graph operator_scope_map reference_chain predicate_argument_frame

clause_graph: 주절·종속절·인용절 간 관계를 그래프로 저장
operator_scope_map: not / may / reportedly / according to 등이 어느 proposition에 걸리는지 명시
predicate_argument_frame: 누가(agent) 무엇(theme)을 어떻게 주장했는지 구조화

문장 구조에서 꼭 잡아야 할 요소

Clause type: 평서절, 의문절, 명령절, 조건절, 이유절, 양보절, 인용절
Predicate–argument structure: 행위자, 대상, 원인, 시간, 장소
Scope resolution: 부정 스코프, 양태 스코프, 인용 스코프, 수량사 스코프
Coreference: “그것”, “이 백신”, “해당 주장”과 같은 지시 대상 연결

같은 단어, 다른 구조, 다른 해석

① “전문가들은 백신이 위험하다고 말했다.”
→ 주된 proposition은 전문가의 발화 사건, ‘위험하다’는 인용된 주장

② “백신이 위험하다는 주장은 전문가들에 의해 반박되었다.”
→ 주된 proposition은 반박 사건, ‘위험하다’는 평가 대상 주장

③ “백신이 위험할 수 있다는 우려가 제기되었다.”
→ modality와 report 구조가 있어 direct claim보다 강도 낮음

Slide 4 / 24

Proposition & Epistemic Operator Composition

Clause를 평가 가능한 명제로 전환하고, 인용·추정·부정·전언을 연산자처럼 부착하여 CAU를 구성

Proposition Unit Epistemic Operator CAU Composition

개정된 Layer 1 내부 구조

Layer 1A

Sentence Structure Parsing
constituency / dependency / clause segmentation / scope resolution

→

Layer 1B

Semantic Chunk & Proposition Layer
surface chunk + proposition unit + operator tagging

→

Layer 1C

Epistemic Composition Layer
belief, uncertainty, attribution-weight 계산 후 CAU 생성

Proposition Unit 정의

Surface chunk는 형태적/인지적 묶음이고, Proposition unit은 참/거짓 또는 평가 가능성을 갖는 절 수준 의미 단위입니다.
Claim은 문장 전체가 아니라 proposition 중 평가 대상성을 갖는 부분만 추출합니다.
Evidence도 독립 proposition 또는 외부 출처와 연결된 supporting / contradicting proposition으로 저장합니다.

proposition_id proposition_type scope_signature operator_bundle

Epistemic Operator 모델

연산자 유형	예시	언어학적 역할	시스템 처리
Attribution	“전문가들은 …라고 말했다”	발화 주체와 인용 내용을 분리	claim 직접성 감소, source credibility와 분리 계산
Modal / Hedge	may, might, ~일 수 있다, ~로 보인다	확실성 하향, 추정/가설 표시	belief weight downscale, uncertainty 증가
Negation	not, no evidence, 아니다, 못하다	명제 방향성 및 스코프 반전	scope 기반 polarity/stance 재계산
Evidentiality	~라고 한다, ~로 알려졌다	직접 관찰이 아닌 전언/보도 표지	reported speech로 분리, evidence prior 조정
Condition / Concession	만약 ~라면, ~에도 불구하고	가정/양보 구조로 direct fact claim과 구분	hypothetical claim 또는 context proposition으로 분류

CAU 합성 공식(개념)

CAU = f(Proposition, Operator Bundle, Evidence Set, Source Priors, Temporal Validity)

최종 신뢰도는 단어 합산이 아니라, 문장 구조에서 추출된 proposition에
① 인용/전언/추정/부정 연산자를 부착하고,
② 외부 근거의 stance 분포와 출처 priors를 반영하며,
③ 시점별 유효성을 반영하여 계산합니다.

한국어 특화 분석 포인트

종결어미: ~다 / ~다더라 / ~인 듯하다 / ~아니다
인용 표지: ~라고, ~다는, ~라는
추정 표지: ~것 같다, ~로 보인다, ~듯하다
전언 표지: ~라고 한다, ~라고 알려졌다
부정/불능: 안, 못, 아니다
화제·대조: 은/는, 도, 만

Slide 5 / 24

Epistemic 관계 vs Causal 관계

지식 평가 관계와 현실 세계의 확산·영향 인과 관계를 분리해야 시스템이 철학적으로 안전해짐

Epistemic Relation Causal Relation

왜 반드시 구분해야 하는가

Claim–Evidence 연결은 어떤 근거가 어떤 주장을 지지(support)하거나 반박(contradict)하는지에 관한 지식 평가 관계입니다.
Causal 연결은 어떤 요인이 실제로 노출, 공유, 재확산, 행동 변화, 피해를 유발하는지에 관한 현실 세계의 원인–결과 관계입니다.
같은 그래프로 취급하면, 반박 기사나 공식 팩트체크가 오히려 “허위정보 확산 네트워크의 일부”로 잘못 모델링되는 오류가 발생할 수 있습니다.

Epistemic Graph = 진위·신뢰 판단 그래프

노드: Claim, Evidence, Source, Proposition, CAU
엣지: supports / contradicts / mixed / uncertain
산출물: belief score, uncertainty, rationale, stance summary
질문: “이 주장은 현재 근거 기준으로 얼마나 믿을 수 있는가?”

Claim: “백신은 위험하다” Evidence: “WHO 보고서: 심각한 부작용은 매우 드물다” Relation: Evidence → Claim (contradicts)

Causal Graph = 확산·영향 설명 그래프

노드: exposure, sharing_rate, algorithmic_boost, emotion_intensity, moderation_latency 등
엣지: 원인→결과의 방향성을 갖는 인과 가설
산출물: risk driver, propagation path, intervention effect
질문: “무엇이 이 정보를 더 퍼뜨리고 더 위험하게 만드는가?”

Emotion intensity → Sharing rate Algorithm boost → Exposure Exposure → Claim frequency

같은 콘텐츠라도 두 그래프의 의미는 다름

“전문가들은 백신이 위험하다는 주장에 근거가 없다고 밝혔다.”

• Epistemic: 전문가 발표는 해당 claim을 반박하는 근거
• Causal: 그 발표가 기사화되며 논쟁을 키워 일시적으로 노출이 증가할 수도 있음

설계 원칙

Epistemic relation은 진위·신뢰 판단에만 사용
Causal relation은 확산·영향·개입 효과 분석에만 사용
두 그래프는 직접 동일시하지 않고, 중간 계층을 통해서만 연결

Slide 6 / 24

Epistemic-to-Causal Interface Layer

Claim–Evidence 그래프와 Causal 그래프를 직접 연결하지 않고, Epistemic State 변수를 매개로 연결

Epistemic State Interface Layer Causal Input

연결의 핵심 원리

Claim–Evidence 그래프는 “무엇을 믿을 것인가”를 계산하고, Causal 그래프는 “무엇이 무엇을 퍼뜨리는가”를 계산합니다.
둘을 직접 이어 붙이는 대신, Claim–Evidence 그래프로부터 Epistemic State Vector를 생성하여 Causal 그래프의 입력 변수로 사용합니다.
즉, 연결 공식은 “Claim–Evidence Graph → Epistemic State Interface → Causal Graph”입니다.

권장 데이터 흐름

1) CAU 생성

claim별 belief, uncertainty, stance를 산출

→

2) Snapshot 생성

시점별 epistemic 상태를 요약한 feature vector 생성

→

3) Causal 입력

확산·영향 DAG의 입력 변수로 사용

Claim–Evidence Graph ↓ Epistemic State Vector - belief_score - uncertainty - contradiction_ratio - evidence_density - source_credibility - novelty - emotional_charge ↓ Causal Graph

중간 인터페이스 객체 설계

직접 엣지를 연결하지 말고, 아래와 같은 중간 객체를 두는 것이 DB/시스템 설계상 가장 안전합니다.

CAU EpistemicSnapshot CausalObservation

CAU: claim 평가 결과
EpistemicSnapshot: 시점별 belief/uncertainty/반박비율/근거밀도 요약
CausalObservation: 노출, 공유율, 댓글량, cross-platform jump, 행동 변화 측정

Epistemic State 변수 예시

belief_score uncertainty contradiction_ratio evidence_count authoritative_evidence_count source_diversity temporal_decay ambiguity_score novelty emotional_intensity

이 변수들은 claim 자체가 아니라, 현재 그 claim이 어떤 epistemic 상태에 놓여 있는지를 요약합니다.

세 가지 연결 방식

Feature Bridge: epistemic feature를 causal variable로 직접 사용
State Transition Bridge: belief/uncertainty 변화가 확산 동학에 미치는 영향 추적
Intervention Bridge: fact-check, label, counter-message 같은 조치가 두 그래프를 동시에 변화시키는지 관찰

authoritative contradiction 증가 → uncertainty 감소
그러나 동시에 논쟁성 상승 → 단기 노출 증가 가능

Slide 7 / 24

ACL 제출용 연구설계

Sentence Structure와 Epistemic Operator를 활용한 Linguistically Grounded Claim Modeling

ACL Main Research Design

Proposed Paper Title (ACL)
From Sentence Structure to Epistemic State: Linguistically Grounded Claim Modeling for Fact Verification

Abstract (Draft)

Most claim verification systems treat claims as flat sentences and ignore the internal sentence structure that shapes epistemic meaning. In this work, we propose a linguistically grounded framework that models claims as structured propositions derived from sentence structure, clause hierarchy, and epistemic operators such as attribution, modality, negation, and evidentiality. Our approach decomposes sentences into proposition units and composes them into Claim Assessment Units (CAUs), enabling explicit modeling of belief, uncertainty, and rationale. Experiments on misinformation and fact‑verification datasets show that incorporating sentence structure and epistemic operators improves evidence reasoning and claim verification performance compared to sentence‑level baselines. We also introduce an annotation schema and dataset design tailored for epistemic claim modeling in misinformation analysis.

핵심 연구 질문

RQ1. sentence structure, clause hierarchy, operator scope를 반영한 claim representation이 기존 sentence-level 방식보다 더 정확한 claim verification을 제공하는가?
RQ2. attribution / hedge / negation / evidentiality를 별도 epistemic operator로 모델링하면 evidence reasoning 및 uncertainty estimation이 개선되는가?
연구 포지셔닝: 단순 sentence classification이 아니라 “Sentence → Clause → Proposition → Epistemic Operator → CAU”로 이어지는 언어학 기반 검증 프레임워크 제안

문제정의 및 제안 방법

입력 문장을 constituency/dependency 기반으로 clause graph로 분해
평가 가능한 proposition unit을 추출하고 operator bundle(negation, modality, attribution, evidentiality)을 부착
외부 evidence와의 stance(support/contradict/mixed/unknown) 및 source prior를 결합하여 CAU 생성
최종적으로 claim verification label + uncertainty + rationale을 동시에 산출

Clause Segmentation Scope Resolution Epistemic Operator Claim Verification Uncertainty Estimation

데이터셋 및 주석 스키마

2026년 말 구축 예정 NIPA/AX 인포데믹 사업 데이터셋 활용
주석 단위: sentence, clause, proposition, claim, evidence, stance, operator, epistemic state
한국어 중심 + 필요 시 FEVER/AVERITEC/SciFact와 교차 평가

claim evidence stance operator uncertainty rationale

실험 설계

구분	내용
Baseline	sentence-level encoder, retrieval+NLI, operator 미반영 모델
Proposed	structure-aware proposition encoder + epistemic operator composition
Ablation	sentence structure 제거 / operator 제거 / source prior 제거 / uncertainty head 제거
Tasks	evidence retrieval, stance classification, claim verification, uncertainty calibration

평가 지표 및 기대 기여

평가 항목	지표
Evidence Retrieval	Recall@k, Precision@k, MRR
Stance Classification	Macro-F1, Accuracy
Claim Verification	Label Accuracy, Macro-F1
Uncertainty	ECE, Brier Score, calibration plots

ACL 기여 포인트
① 언어학적으로 grounded된 proposition-level claim modeling 제안
② epistemic operator를 명시적으로 모델링한 verification framework 제안
③ 한국어 인포데믹 검증 데이터셋 및 annotation schema 제시

Slide 8 / 24

FEVER 제출용 연구설계

Claim–Evidence Graph와 Epistemic State를 활용한 Fact Verification 및 Evidence Reasoning

FEVER Evidence Reasoning

Proposed Paper Title (ACL)
From Sentence Structure to Epistemic State: Linguistically Grounded Claim Modeling for Fact Verification

Abstract (Draft)

Most claim verification systems treat claims as flat sentences and ignore the internal sentence structure that shapes epistemic meaning. In this work, we propose a linguistically grounded framework that models claims as structured propositions derived from sentence structure, clause hierarchy, and epistemic operators such as attribution, modality, negation, and evidentiality. Our approach decomposes sentences into proposition units and composes them into Claim Assessment Units (CAUs), enabling explicit modeling of belief, uncertainty, and rationale. Experiments on misinformation and fact‑verification datasets show that incorporating sentence structure and epistemic operators improves evidence reasoning and claim verification performance compared to sentence‑level baselines. We also introduce an annotation schema and dataset design tailored for epistemic claim modeling in misinformation analysis.

Proposed Paper Title (FEVER)
Epistemic Graphs for Evidence Reasoning: Bridging Claim–Evidence Structures and Fact Verification

Abstract (Draft)

Fact verification tasks such as FEVER typically model claim–evidence reasoning using sentence‑pair classification pipelines. However, real-world misinformation often involves multiple pieces of evidence with varying credibility and contradictory signals. We propose an epistemic graph framework that represents claim–evidence relationships as structured graphs and aggregates them into an epistemic state capturing belief, uncertainty, contradiction ratio, and evidence density. This representation enables graph‑based evidence reasoning and improves interpretability by producing rationale subgraphs that explain verification decisions. Experiments demonstrate that incorporating epistemic state features enhances evidence selection, stance prediction, and verification robustness compared to standard FEVER pipelines. We further present a dataset design compatible with FEVER-style evaluation while supporting richer epistemic annotations.

핵심 연구 질문

RQ1. Claim–Evidence 관계를 단순 sentence pair가 아니라 graph 구조로 모델링하면 evidence selection과 stance reasoning이 향상되는가?
RQ2. belief_score, contradiction_ratio, evidence_density, source_credibility 같은 epistemic state를 도입하면 fact verification의 설명가능성과 안정성이 증가하는가?
연구 포지셔닝: FEVER류 검증 태스크를 sentence-pair NLI에서 확장하여 graph-based evidence reasoning + epistemic state modeling으로 발전

문제정의 및 제안 방법

Claim 노드와 다수의 Evidence 노드를 연결한 Claim–Evidence Graph 구성
엣지는 support / contradict / mixed / uncertain 관계로 정의
노드 및 엣지 집계 결과를 기반으로 Epistemic State Vector 생성
최종 verification label 뿐 아니라 rationale subgraph를 함께 출력

Claim–Evidence Graph Stance Edge Belief Score Rationale Subgraph

데이터셋 전략

자체 구축 인포데믹 데이터셋을 FEVER 포맷에 맞춰 병행 설계
기존 FEVER / Climate-FEVER / AVERITEC와의 호환 가능 schema 마련
필드 예시: claim, candidate_evidence, gold_evidence_set, stance, label, source_credibility, uncertainty

gold_evidence_set support/contradict belief_score rationale_graph

실험 설계

구분	내용
Baseline	표준 FEVER pipeline, sentence pair NLI, reranker 기반 evidence selection
Proposed	graph-based evidence reasoning + epistemic state aggregation
Ablation	graph 제거 / epistemic state 제거 / source credibility 제거 / rationale subgraph 제거
Tasks	evidence selection, multi-evidence reasoning, label prediction, rationale explanation

평가 지표 및 기대 기여

평가 항목	지표
Evidence Selection	FEVER evidence score, Recall@k
Label Prediction	FEVER score, Accuracy, Macro-F1
Graph Quality	edge-level F1, rationale overlap
Explainability	human preference, rationale faithfulness

FEVER 기여 포인트
① Claim–Evidence Graph 기반 evidence reasoning 프레임워크 제안
② epistemic state를 통해 verification 결과와 불확실성을 함께 산출
③ FEVER 호환형 인포데믹 검증 데이터셋 및 rationale subgraph 설계

Slide 9 / 24

ACL Introduction Story

Why Epistemic Structure Matters for Claim Verification

Research Motivation ACL Story

Problem Background

Recent NLP research on misinformation and fact verification typically treats claims as flat sentences.
However, real-world claims often contain embedded clauses, attribution, modality, negation, and evidential markers.
These linguistic structures fundamentally change the epistemic meaning of a statement.

Limitation of Existing Approaches

Sentence-level encoding ignores internal claim structure.
Evidence reasoning often assumes simple sentence-pair relationships.
Uncertainty, attribution, and evidentiality are rarely modeled explicitly.

Typical pipeline: claim sentence → embedding → NLI classifier

Our Key Insight

Claims should be modeled as structured epistemic propositions.
Sentence structure reveals the scope of negation, modality, and attribution.
These signals can be used to compute a richer epistemic state for claim verification.

sentence → clause structure → proposition units → epistemic operators → claim assessment unit

Research Contributions

Introduce a linguistically grounded claim representation based on sentence structure.
Propose an epistemic operator framework capturing attribution, modality, negation, and evidentiality.
Design a claim–evidence graph with epistemic state modeling for fact verification.
Provide a dataset schema and evaluation setup tailored to misinformation analysis.

Expected Impact

Improved claim verification accuracy and uncertainty estimation.
More interpretable evidence reasoning through rationale graphs.
Bridging linguistic analysis with misinformation detection research.

Slide 10 / 24

FEVER Introduction Story

Why Claim–Evidence Graphs and Epistemic States Matter for Fact Verification

FEVER Research Story Evidence Reasoning

Problem Background

Fact verification tasks such as FEVER evaluate whether a claim is supported or contradicted by evidence.
However, most pipelines treat claim–evidence reasoning as a sentence-pair classification problem.
Real-world misinformation often involves multiple evidence sources, conflicting signals, and varying credibility.

Limitations of Standard FEVER Pipelines

Evidence reasoning is modeled as independent sentence pairs.
Contradictory evidence and uncertainty are not explicitly represented.
Source credibility and evidence density are typically ignored.

Typical pipeline: claim → retrieve evidence → NLI classifier

Our Key Idea

Represent verification as a Claim–Evidence Graph.
Each evidence node contributes support or contradiction signals.
Aggregate signals into an Epistemic State representing belief, uncertainty, and contradiction.

Claim–Evidence Graph → stance edges → epistemic aggregation → verification decision

Research Contributions

Introduce graph-based evidence reasoning for claim verification.
Model epistemic state features such as belief, uncertainty, contradiction ratio, and evidence density.
Generate interpretable rationale subgraphs explaining verification results.
Design a FEVER-compatible dataset schema supporting epistemic annotations.

Expected Impact

Improved evidence selection and stance reasoning.
More robust fact verification in the presence of conflicting evidence.
Better interpretability through graph-based rationale explanations.

Slide 11 / 24

Related Work Positioning (ACL)

Positioning the Research Across Claim Verification, Linguistics, and Graph-based Reasoning

ACL Related Work Research Positioning

Research Context

Our work sits at the intersection of claim verification, linguistically grounded NLP, and graph-based reasoning.
Existing work typically addresses these areas independently rather than integrating them into a unified epistemic modeling framework.
We position our work as a bridge between sentence-level claim verification and structure-aware epistemic reasoning.

Claim Verification

Previous research models verification as sentence-pair NLI tasks.
Systems rely on pretrained encoders (BERT, RoBERTa) with retrieval pipelines.
Limitations: limited modeling of internal claim structure.

Evidence Reasoning

Recent work explores multi-hop reasoning and evidence aggregation.
However, reasoning is still largely sentence-based.
Our approach introduces epistemic graph aggregation.

Linguistically Grounded NLP

Prior work explores dependency parsing, semantic role labeling, and discourse analysis.
These techniques are rarely applied to fact verification tasks.
We integrate sentence structure and epistemic operators directly into claim modeling.

Graph-based NLP

Graph neural networks and knowledge graphs have been used for reasoning tasks.
Our work differs by modeling epistemic relations rather than entity relations.

Slide 12 / 24

Related Work Positioning (FEVER)

Connecting Evidence Retrieval, Verification Pipelines, and Epistemic Graph Modeling

FEVER Related Work Evidence Reasoning

Research Context

Fact verification research has largely focused on improving retrieval pipelines and NLI-based classification.
However, real-world verification often requires aggregating multiple evidence signals with different credibility and stance.
Our work introduces an epistemic graph perspective to represent these relationships.

FEVER-style Verification Pipelines

Most systems follow a 3-stage pipeline: retrieval → evidence selection → NLI classification.
These systems treat evidence independently rather than as a structured reasoning graph.

Multi-evidence Reasoning

Recent work attempts to combine multiple evidence sentences.
Our framework models evidence interactions through claim–evidence graphs.

Explainable Fact Verification

Explainability is typically limited to highlighting supporting evidence.
We propose rationale subgraphs representing epistemic reasoning paths.

Our Position

We extend FEVER-style verification with epistemic state modeling.
This allows capturing belief, uncertainty, contradiction ratio, and evidence density.

Slide 13 / 24

ACL Paper Introduction (Full 1‑Page Draft)

Linguistically Grounded Epistemic Modeling for Claim Verification

ACL Paper Introduction Draft

Draft Introduction

Recent advances in natural language processing have significantly improved automated fact verification and misinformation detection. Most existing approaches treat claim verification as a sentence‑pair classification problem, where a claim and a candidate evidence sentence are encoded and evaluated using neural entailment models. While effective in controlled benchmarks, such approaches assume that claims can be represented as flat sentences and that evidence reasoning occurs independently across sentence pairs. However, real‑world claims often contain complex linguistic structures—including attribution (“experts say”), modality (“may cause”), negation (“not proven”), and evidential markers (“according to reports”)—that fundamentally alter the epistemic meaning of the statement. Ignoring these structures can obscure what is actually being asserted, who is making the assertion, and how strongly the claim is expressed.

In this work, we argue that claim verification should be grounded in sentence structure and epistemic interpretation. We propose a linguistically informed framework that decomposes sentences into clause‑level proposition units and explicitly models epistemic operators such as attribution, modality, negation, and evidentiality. These elements are combined into structured Claim Assessment Units (CAUs) that capture not only the semantic content of a claim but also its epistemic context. By representing claims and evidence within a structured claim–evidence graph, our framework enables richer reasoning over multiple pieces of evidence and supports the computation of epistemic state variables such as belief strength, uncertainty, contradiction ratio, and evidence density.

We evaluate the proposed approach on misinformation and fact‑verification datasets and compare it against strong sentence‑level baselines. Our experiments show that incorporating sentence structure and epistemic operators improves evidence reasoning and claim verification performance, while also enabling calibrated uncertainty estimates and interpretable rationale subgraphs. Furthermore, we introduce an annotation schema and dataset design tailored for epistemic claim modeling, which facilitates future research on linguistically grounded verification. Overall, this work demonstrates that integrating linguistic structure with epistemic reasoning provides a promising direction for more robust and interpretable fact verification systems.

Slide 14 / 24

Figure 1 — Paper Overview

Sentence Structure → Epistemic Modeling → Claim Verification

ACL Figure 1 System Overview

Overview of the Proposed Framework

The system models claim verification as a pipeline that moves from linguistic structure to epistemic reasoning.
Sentence structure is decomposed into clause and proposition units.
Epistemic operators such as attribution, modality, negation, and evidentiality modify the epistemic interpretation.
The resulting Claim Assessment Units are linked with evidence to form a claim–evidence graph.
Finally, epistemic states derived from this graph guide the verification decision.

Sentence

  ↓

Clause Graph

  ↓

Proposition Units

  ↓

Epistemic Operators

  ↓

Claim Assessment Unit (CAU)

  ↓

Claim–Evidence Graph

  ↓

Epistemic State (belief / uncertainty / contradiction)

  ↓

Verification Decision

Slide 15 / 24

Figure 1 Deep Dive (1/7)

Sentence → Clause Graph

Figure 1 Explanation Stage 1

왜 Sentence를 바로 Embedding하지 않는가

기존 fact verification은 대개 문장 전체를 하나의 벡터로 인코딩하지만, 실제 claim은 종종 종속절, 인용절, 내포절 안에 존재합니다.
Sentence 단계의 목적은 문장을 절(clause) 단위 의미 구조로 분해하여, “무엇이 실제 주장인가”를 찾을 준비를 하는 것입니다.

Clause Graph 정의

Clause Graph는 문장을 clause-level proposition의 그래프로 표현한 구조입니다. 각 node는 절(clause)을 나타내고, edge는 attribution, condition, contrast, complement 등의 관계를 나타냅니다.

Experts say that vaccines may cause side effects [Clause 1] Experts say | | attribution v [Clause 2] vaccines may cause side effects

핵심 역할

claim이 존재하는 절 식별
reporting clause와 proposition clause 분리
negation / modality / attribution의 적용 범위 준비
sentence-level flattening으로 인한 의미 손실 방지

Slide 16 / 24

Figure 1 Deep Dive (2/7)

Clause Graph → Proposition Units

Figure 1 Explanation Stage 2

문장 전체가 claim이 아니라, proposition이 claim 후보가 된다

Clause Graph로부터 검증 가능한 명제 단위(proposition unit)를 추출합니다.
이 단계의 목적은 문장 내부에서 실제로 참/거짓 판단 대상이 되는 내용을 분리하는 것입니다.

예시

It is not proven that vaccines cause autism Clause 1: it is not proven Clause 2: vaccines cause autism Proposition unit: [vaccines cause autism]

여기서 Clause 2가 claim candidate이고, Clause 1은 epistemic modifier 역할을 합니다.

핵심 역할

주절/종속절 중 평가 대상 분리
claim candidate와 context candidate 구분
sentence 전체를 claim으로 오인하는 문제 방지

Slide 17 / 24

Figure 1 Deep Dive (3/7)

Proposition Units → Epistemic Operators

Figure 1 Explanation Stage 3

명제는 그대로 claim이 되지 않는다

같은 proposition도 attribution, modality, negation, evidentiality에 따라 epistemic force가 달라집니다.
따라서 proposition 위에 epistemic operator bundle을 부착하여 해석해야 합니다.

Attribution

“Experts say ...”

발화 주체와 주장 내용을 분리하며, direct assertion보다 claim strength를 낮춥니다.

Modality / Hedge

“may”, “might”, “~일 수 있다”

가능성/추정 표지로 belief를 낮추고 uncertainty를 높입니다.

Negation / Evidentiality

“not proven”, “according to reports”

부정 스코프와 정보 출처를 반영하여 proposition의 해석을 조정합니다.

Slide 18 / 24

Figure 1 Deep Dive (4/7)

Epistemic Operators → Claim Assessment Unit (CAU)

Figure 1 Explanation Stage 4

CAU는 verification의 기본 연산 단위

CAU는 proposition, operator, source context를 결합한 판단 가능한 claim 객체입니다.
문장 단위가 아니라 CAU 단위로 평가해야 belief, uncertainty, rationale 계산이 가능합니다.

CAU 구성 예시

CAU ├ claim: vaccines cause side effects ├ attribution: experts say ├ modality: may └ source context

산출값

belief score
uncertainty
stance-ready representation
rationale-ready object

Slide 19 / 24

Figure 1 Deep Dive (5/7)

CAU → Claim–Evidence Graph

Figure 1 Explanation Stage 5

이제 claim과 evidence를 연결한다

CAU를 중심으로 여러 evidence를 연결하여 claim–evidence graph를 구성합니다.
여기서의 edge는 epistemic relation이며, support / contradict / mixed / uncertain을 나타냅니다.

예시 그래프

Claim: vaccines cause infertility Evidence 1: WHO report → contradict Evidence 2: blog post → support Evidence 3: news article → uncertain

핵심 역할

다중 evidence reasoning
support와 contradiction 동시 집계
source credibility 반영
rationale subgraph 생성 기반

Slide 20 / 24

Figure 1 Deep Dive (6/7)

Claim–Evidence Graph → Epistemic State → Verification

Figure 1 Explanation Stage 6

그래프를 최종 판단 변수로 요약한다

Claim–Evidence graph로부터 claim의 현재 지식 상태를 요약한 epistemic state를 계산합니다.
최종 verification decision은 sentence embedding이 아니라 이 epistemic state에 기반합니다.

Epistemic State

belief_score
uncertainty
contradiction_ratio
evidence_density

Verification Output

SUPPORTED
REFUTED
NOT ENOUGH INFO

부가 산출물

calibrated uncertainty
rationale subgraph
explainable evidence summary

Slide 21 / 24

Figure 1 Summary

Why This Pipeline Is Different from Standard Sentence-Level Verification

Figure 1 Explanation Summary

핵심 차별점

기존 방식: sentence → embedding → NLI
제안 방식: sentence → clause graph → proposition → epistemic operators → CAU → claim–evidence graph → epistemic state → verification
즉, verification을 단순 entailment 분류가 아니라 linguistically grounded epistemic reasoning으로 재정의합니다.

Standard: sentence → embedding → NLI classifier Proposed: sentence → clause graph → proposition units → epistemic operators → Claim Assessment Unit → claim–evidence graph → epistemic state → verification decision

Slide 22 / 24

Paper-Style Figure 1 Design

ACL reviewers가 직관적으로 이해할 수 있는 논문용 다이어그램 설계안

Figure 1 Paper Diagram

Input Sentence

Complex claim sentence with attribution, modality, negation, and evidential markers

→

Sentence Structure Parser

Dependency / constituency parsing, clause segmentation, scope resolution

Clause Graph

Clause-level nodes linked by attribution, condition, complement, or contrast

→

Proposition Extractor

Extract verifiable proposition units from clause graph

Epistemic Operator Layer

Attach attribution, modality, negation, evidentiality to propositions

→

Claim Assessment Unit (CAU)

Structured claim object for belief, uncertainty, and rationale computation

Evidence Retriever & Linker

Retrieve candidate evidence and link them to CAUs with stance edges

→

Claim–Evidence Graph

Graph over claims and evidence with support / contradict / uncertain relations

Epistemic State Aggregator

Aggregate belief, uncertainty, contradiction ratio, evidence density, credibility

→

Verification Output

SUPPORTED / REFUTED / NEI + calibrated uncertainty + rationale subgraph

Recommended caption: Overview of the proposed linguistically grounded epistemic verification framework. The system transforms an input sentence into clause-level proposition structures, augments them with epistemic operators, constructs Claim Assessment Units (CAUs), links them to external evidence, and aggregates the resulting claim–evidence graph into epistemic states for final verification.

Slide 23 / 24

연구계획 → 시스템 모듈 매핑표

논문/연구 개념을 실제 구현 가능한 서비스 모듈로 대응시킨 실행 아키텍처

System Mapping Implementation

매핑 원칙

연구계획의 각 개념은 추상 이론으로 남기지 않고, 독립 서비스 또는 내부 엔진 모듈로 분해하여 구현 가능하도록 설계합니다.
우선순위는 ① 바로 구현 가능 ② 검증 파이프라인 핵심 ③ 설명가능성/운영 고도화 ④ 장기 연구 과제 순으로 둡니다.
모듈 간 데이터 흐름은 Sentence → Clause/Proposition → Operator → CAU → Evidence Graph → Epistemic State를 기준 축으로 유지합니다.

시스템 모듈	대응 연구개념	주요 역할	입력 / 출력	우선순위	비고
sentence_structure_parser	Sentence Structure Parsing Layer	문장 분리, dependency/constituency parsing, 절 경계 탐지, scope 해석 후보 생성	입력: 원문 텍스트 출력: sentence_tree, clause_graph, operator_scope_map	즉시	전처리 핵심 엔진
clause_graph_builder	Clause Graph	주절/종속절/인용절 관계를 그래프로 구성하고 attribution·condition·contrast 등 edge 부여	입력: parse 결과 출력: clause_graph	즉시	ACL 핵심 novelty
proposition_extractor	Proposition Units	검증 가능한 절 수준 명제 추출, claim candidate / context candidate 구분	입력: clause_graph 출력: proposition_units, normalized_proposition	즉시	rule + parsing 병행 가능
epistemic_operator_tagger	Epistemic Operators	attribution, modality, negation, evidentiality, condition/concession 탐지 및 스코프 부착	입력: proposition, scope map 출력: operator_bundle, epistemic modifiers	즉시	시연 효과 큼
cau_builder	Claim Assessment Unit (CAU)	proposition + operator + source context를 통합하여 claim 평가 기본 객체 생성	입력: proposition, operators, metadata 출력: cau_id, claim_repr, belief_prior, uncertainty_prior	즉시	DB 중심 핵심 객체
evidence_retriever	Evidence Retrieval	claim 기준 외부 근거 검색, relevance ranking, 출처 메타 수집	입력: CAU 출력: evidence candidates, retrieval score	핵심	RAG/검색 결합
stance_classifier	Support / Contradict Reasoning	evidence가 claim을 지지/반박/불확실 중 무엇인지 판정	입력: CAU + evidence 출력: stance label, confidence	핵심	FEVER 직접 연결
claim_evidence_graph_builder	Claim–Evidence Graph	claim/evidence 노드와 support·contradict·uncertain edge를 구성	입력: CAU, evidence, stance 출력: graph json / graph tables	핵심	시각화 친화적
epistemic_state_aggregator	Epistemic State	belief, uncertainty, contradiction ratio, evidence density, credibility를 요약 계산	입력: claim–evidence graph 출력: epistemic snapshot	핵심	대시보드 지표화 가능
verification_explainer	Rationale / Explanation	최종 판정 근거, 핵심 evidence, operator 영향 설명 생성	입력: snapshot, rationale graph 출력: explanation text, rationale subgraph	고도화	XAI/공공 실무 중요
claim_workspace_ui	Productization Layer	claim, evidence, stance, uncertainty를 분석관이 검토하는 운영 화면 제공	입력: graph/snapshot/API 출력: 검토 UI, 감사 로그	고도화	실증/시연용 핵심
causal_bridge_interface	Epistemic-to-Causal Interface	epistemic snapshot을 확산·영향 예측용 feature vector로 변환	입력: belief, uncertainty, contradiction ratio 등 출력: causal input vector	장기	연구·운영 연결층
propagation_modeler	Causal Graph / Risk Modeling	확산·노출·행동 변화 인과 구조를 추정하고 리스크 드라이버 식별	입력: causal input vector, time-series 출력: causal DAG, risk drivers	장기	데이터 축적 필요
intervention_simulator	Intervention & Simulation	fact-check, label, counter-message 등의 개입 효과를 do-연산 관점에서 시뮬레이션	입력: causal DAG, action plan 출력: expected_risk_delta, CI, recommended_actions	장기	2차년도 후반 적합

Slide 24 / 24