Skip to main content

Table 7 Influence of large language models (LLaMA 2 and ChatGPT) as a second stage retriever to re-rank the top candidate claims retrieved by BM25 and UTDRM. UTDRM is the deafult UTDRM-MPNet performance from Table 3. UTDRM+ChatGPT signifies that UTDRM-MPNet performs the initial ranking, and ChatGPT conducts the second-stage ranking. The highest scores for each dataset and metric are in bold

From: UTDRM: unsupervised method for training debunked-narrative retrieval models

Datasets

Metrics

BM25+LLaMA2

UTDRM+LLaMA2

BM25+ChatGPT

UTDRM+ChatGPT

UTDRM

Snopes

MAP@1

0.460

0.657

0.667

0.841

0.831

MRR

0.659

0.728

0.862

0.890

0.890

CLEF 22 2A-EN

MAP@1

0.794

0.890

0.895

0.919

0.933

MRR

0.835

0.913

0.916

0.936

0.948

CLEF 21 2A-EN

MAP@1

0.782

0.911

0.906

0.926

0.906

MRR

0.836

0.939

0.927

0.949

0.936

CLEF 20 2A-EN

MAP@1

0.673

0.729

0.849

0.925

0.945

MRR

0.724

0.762

0.894

0.948

0.961

Average

MAP@1

0.679

0.819

0.822

0.895

0.904

Twitter-based

MRR

0.777

0.860

0.902

0.925

0.934

Politifact

MAP@1

0.260

0.293

0.512

0.561

0.516

MRR

0.333

0.417

0.607

0.680

0.627

CLEF 22 2B-EN

MAP@1

0.285

0.346

0.400

0.400

0.392

MRR

0.383

0.426

0.486

0.493

0.467

CLEF 21 2B-EN

MAP@1

0.266

0.310

0.361

0.361

0.348

MRR

0.347

0.388

0.445

0.425

0.422

Average

MAP@1

0.270

0.316

0.424

0.441

0.419

Political-based

MRR

0.354

0.411

0.513

0.533

0.505