APPLICATION OF A LARGE LANGUAGE MODEL FOR AIR POLLUTION ANALYSIS BY USING A RAG SYSTEM
DOI:
https://doi.org/10.17770/etr2025vol2.8603Keywords:
Air pollution, Artificial Intelligence, LLM, RAG SystemAbstract
The paper examines the application of an LLM (Large Language Model) for analysing text information about air pollution using a RAG (Retrieval-Augmented Generation) system. It describes what RAG systems are and how they help increase the credibility of the text generated by LLM models (reducing their hallucinations) using information retrieval from reliable sources. An approach for the development of a RAG system suitable for answering queries on imported text documents by the user is presented. The realisation of this approach is the development of a RAG chatbot using the Python language, answering queries (questions) towards a database of uploaded text documents by the user. Some results from experiments on a selected database of scientific publications related to air pollution are presented.
References
A.Vina, “From Code to Conversation: How Does an LLM Work?”,November 2024.[Online]. Available: https://www.ultralytics.com/de/blog/from-code-to-conversation-how-does-an-llm-work [Accessed: Jan. 30, 2025].
Cloudflare, “What is a large language model (LLM)?”. [Online]. Available: https://www.cloudflare.com/learning/ai/what-is-large-language-model/ [Accessed: Jan. 30, 2025].
C. Onielfa, “Introducing REMi, the first ever open-source RAG evaluation model”, August 2024. [Online]. Available: https://nuclia.com/developers/remi-open-source-rag-evaluation-model [Accessed: Jan. 30, 2025].
Nexla, “LLM Hallucination-Types, Causes, and Solution”.[Online]. Available: https://nexla.com/ai-infrastructure/llm-hallucination/ [Accessed: Jan. 30, 2025].
N. Muendler, J. He, S.Jenko, M. Vechev, “Self-Contradictory Hallucinations of LLMs: Evaluation, Detection and Mitigation”, March 2024.[Online]. Available: https://arxiv.org/pdf/2305.15852 [Accessed: Feb. 07, 2025].
A. Choudhury, “Best Strategies to minimize Hallucinations in LLMs: A Comprehensive Guide”, November 2023. [Online]. Available: https://www.turing.com/resources/minimize-llm-hallucinations-strategy [Accessed: Feb. 07, 2025].
H. Chawre, “What Is Retrieval-Augmented Generation (RAG) in LLMs?”.[Online]. Available: https://www.turing.com/resources/understanding-retrieval-augmented-generation-rag [Accessed: Feb. 07, 2025].
Wikipedia, “Retrieval-augmented generation”.[Online]. Available: https://en.wikipedia.org/wiki/Retrieval-augmented_generation [Accessed: Feb. 07, 2025].
K. Safjan, “Understanding Retrieval-Augmented Generation (RAG) empowering LLMs”.[Online]. Available: https://safjan.com/understanding-retrieval-augmented-generation-rag-empowering-llms/ [Accessed: Feb. 07, 2025].
P. Menon, “How RAG Works: A Detailed Explanation of its Components and Steps”, January 2024.[Online]. Available: https://www.linkedin.com/pulse/how-rag-works-detailed-explanation-its-components-steps-pradeep-menon-ws7sc [Accessed: Feb. 07, 2025].
BgGPT-Gemma-2, [Online]. Available: https://huggingface.co/collections/INSAIT-Institute/bggpt-gemma-2-673b972fe9902749ac90f6fe [Accessed: Feb. 07, 2025].
INSAIT, “INSAIT releases new AI models, setting a standart for open national language models”, November 2024.[Online]. Available: https://insait.ai/insait-releases-new-ai-models-setting-a-standard-for-open-national-language-models/ [Accessed: Feb. 15, 2025].
Dataset Card for Wikipedia, [Online]. Available: https://huggingface.co/datasets/legacy-datasets/wikipedia [Accessed: Feb. 07, 2025].
FOCUS News, [Online]. Available: https://www.focus-news.net/[Accessed: Feb. 15, 2025].
H. Sajid, “Improving Information Retrievala nd RAG with Hypothetical Document Embeddings (HyDE)”, July 2024.[Online]. Available: https://zilliz.com/learn/improve-rag- and-information-retrieval-with-hyde-hypothetical-document-embeddings [Accessed: Feb. 15, 2025].
Z. Rackauckas, “RAG-Fusion: A new take on Retrieval-Augmented Generation”, February 2024.[Online]. Available: https://arxiv.org/pdf/2402.03367 [Accessed: Feb. 20, 2025].
BGE-M3, [Online]. Available: https://huggingface.co/BAAI/bge-m3 [Accessed: Feb.15, 2025].
G. Cormack, C. Clarke, S. Buettcher, Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods.[Online]. Available: https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf [Accessed: Feb. 20, 2025].
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Svetlomir Stankov, Desislava Velinova

This work is licensed under a Creative Commons Attribution 4.0 International License.