[ad_1]
 
View a PDF of the paper titled NitiBench: A Comprehensive Study of LLM Framework Capabilities for Thai Legal Question Answering, by Pawitsapak Akarajaradwong and 6 other authors
Abstract:The application of large language models (LLMs) in the legal domain holds significant potential for information retrieval and question answering, yet Thai legal QA systems face challenges due to a lack of standardized evaluation benchmarks and the complexity of Thai legal structures. This paper introduces NitiBench, a benchmark comprising two datasets: the NitiBench-CCL, covering general Thai financial law, and the NitiBench-Tax, which includes real-world tax law cases requiring advanced legal reasoning. We evaluate retrieval-augmented generation (RAG) and long-context LLM-based approaches to address three key research questions: the impact of domain-specific components like section-based chunking and cross-referencing, the comparative performance of different retrievers and LLMs, and the viability of long-context LLMs as an alternative to RAG. Our results show that section-based chunking significantly improves retrieval and end-to-end performance, current retrievers struggle with complex queries, and long-context LLMs still underperform RAG-based systems in Thai legal QA. To support fair evaluation, we propose tailored multi-label retrieval metrics and the use of an LLM-as-judge for coverage and contradiction detection method. These findings highlight the limitations of current Thai legal NLP solutions and provide a foundation for future research in the field. We also open-sourced our codes and dataset to available publicly.
Submission history
 From: Pawitsapak Akarajaradwong [view email]      
 [v1]
        Sat, 15 Feb 2025 17:52:14 UTC (2,800 KB)
 [v2]
        Tue, 4 Mar 2025 06:45:23 UTC (2,802 KB)
 [v3]
        Sat, 8 Mar 2025 05:11:53 UTC (2,803 KB)
 [v4]
        Thu, 21 Aug 2025 21:51:12 UTC (1,419 KB)
Source link 
 
 #Comprehensive #Study #LLM #Framework #Capabilities #Thai #Legal #Question #Answering
 
 [ad_2]
 
 








