1 FastRM: An efficient and automatic explainability framework for multimodal generative models While Large Vision Language Models (LVLMs) have become masterly capable in reasoning over human prompts and visual inputs, they are still prone to producing responses that contain misinformation. Identifying incorrect responses that are not grounded in evidence has become a crucial task in building trustworthy AI. Explainability methods such as gradient-based relevancy maps on LVLM outputs can provide an insight on the decision process of models, however these methods are often computationally expensive and not suited for on-the-fly validation of outputs. In this work, we propose FastRM, an effective way for predicting the explainable Relevancy Maps of LVLM models. Experimental results show that employing FastRM leads to a 99.8% reduction in compute time for relevancy map generation and an 44.4% reduction in memory footprint for the evaluated LVLM, making explainable AI more efficient and practical, thereby facilitating its deployment in real-world applications. 8 authors · Dec 2, 2024
- Explainability Paths for Sustained Artistic Practice with AI The development of AI-driven generative audio mirrors broader AI trends, often prioritizing immediate accessibility at the expense of explainability. Consequently, integrating such tools into sustained artistic practice remains a significant challenge. In this paper, we explore several paths to improve explainability, drawing primarily from our research-creation practice in training and implementing generative audio models. As practical provisions for improved explainability, we highlight human agency over training materials, the viability of small-scale datasets, the facilitation of the iterative creative process, and the integration of interactive machine learning as a mapping tool. Importantly, these steps aim to enhance human agency over generative AI systems not only during model inference, but also when curating and preprocessing training data as well as during the training phase of models. 3 authors · Jul 21, 2024
- Explainability as statistical inference A wide variety of model explanation approaches have been proposed in recent years, all guided by very different rationales and heuristics. In this paper, we take a new route and cast interpretability as a statistical inference problem. We propose a general deep probabilistic model designed to produce interpretable predictions. The model parameters can be learned via maximum likelihood, and the method can be adapted to any predictor network architecture and any type of prediction problem. Our method is a case of amortized interpretability models, where a neural network is used as a selector to allow for fast interpretation at inference time. Several popular interpretability methods are shown to be particular cases of regularised maximum likelihood for our general model. We propose new datasets with ground truth selection which allow for the evaluation of the features importance map. Using these datasets, we show experimentally that using multiple imputation provides more reasonable interpretations. 4 authors · Dec 6, 2022
- FOLD-SE: An Efficient Rule-based Machine Learning Algorithm with Scalable Explainability We present FOLD-SE, an efficient, explainable machine learning algorithm for classification tasks given tabular data containing numerical and categorical values. FOLD-SE generates a set of default rules-essentially a stratified normal logic program-as an (explainable) trained model. Explainability provided by FOLD-SE is scalable, meaning that regardless of the size of the dataset, the number of learned rules and learned literals stay quite small while good accuracy in classification is maintained. A model with smaller number of rules and literals is easier to understand for human beings. FOLD-SE is competitive with state-of-the-art machine learning algorithms such as XGBoost and Multi-Layer Perceptrons (MLP) wrt accuracy of prediction. However, unlike XGBoost and MLP, the FOLD-SE algorithm is explainable. The FOLD-SE algorithm builds upon our earlier work on developing the explainable FOLD-R++ machine learning algorithm for binary classification and inherits all of its positive features. Thus, pre-processing of the dataset, using techniques such as one-hot encoding, is not needed. Like FOLD-R++, FOLD-SE uses prefix sum to speed up computations resulting in FOLD-SE being an order of magnitude faster than XGBoost and MLP in execution speed. The FOLD-SE algorithm outperforms FOLD-R++ as well as other rule-learning algorithms such as RIPPER in efficiency, performance and scalability, especially for large datasets. A major reason for scalable explainability of FOLD-SE is the use of a literal selection heuristics based on Gini Impurity, as opposed to Information Gain used in FOLD-R++. A multi-category classification version of FOLD-SE is also presented. 2 authors · Aug 16, 2022 1
- Rethinking Explainability as a Dialogue: A Practitioner's Perspective As practitioners increasingly deploy machine learning models in critical domains such as health care, finance, and policy, it becomes vital to ensure that domain experts function effectively alongside these models. Explainability is one way to bridge the gap between human decision-makers and machine learning models. However, most of the existing work on explainability focuses on one-off, static explanations like feature importances or rule lists. These sorts of explanations may not be sufficient for many use cases that require dynamic, continuous discovery from stakeholders. In the literature, few works ask decision-makers about the utility of existing explanations and other desiderata they would like to see in an explanation going forward. In this work, we address this gap and carry out a study where we interview doctors, healthcare professionals, and policymakers about their needs and desires for explanations. Our study indicates that decision-makers would strongly prefer interactive explanations in the form of natural language dialogues. Domain experts wish to treat machine learning models as "another colleague", i.e., one who can be held accountable by asking why they made a particular decision through expressive and accessible natural language interactions. Considering these needs, we outline a set of five principles researchers should follow when designing interactive explanations as a starting place for future work. Further, we show why natural language dialogues satisfy these principles and are a desirable way to build interactive explanations. Next, we provide a design of a dialogue system for explainability and discuss the risks, trade-offs, and research opportunities of building these systems. Overall, we hope our work serves as a starting place for researchers and engineers to design interactive explainability systems. 5 authors · Feb 3, 2022
- Explainability in Deep Reinforcement Learning A large set of the explainable Artificial Intelligence (XAI) literature is emerging on feature relevance techniques to explain a deep neural network (DNN) output or explaining models that ingest image source data. However, assessing how XAI techniques can help understand models beyond classification tasks, e.g. for reinforcement learning (RL), has not been extensively studied. We review recent works in the direction to attain Explainable Reinforcement Learning (XRL), a relatively new subfield of Explainable Artificial Intelligence, intended to be used in general public applications, with diverse audiences, requiring ethical, responsible and trustable algorithms. In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box. We evaluate mainly studies directly linking explainability to RL, and split these into two categories according to the way the explanations are generated: transparent algorithms and post-hoc explainaility. We also review the most prominent XAI works from the lenses of how they could potentially enlighten the further deployment of the latest advances in RL, in the demanding present and future of everyday problems. 3 authors · Aug 15, 2020
1 From Explainability to Interpretability: Interpretable Policies in Reinforcement Learning Via Model Explanation Deep reinforcement learning (RL) has shown remarkable success in complex domains, however, the inherent black box nature of deep neural network policies raises significant challenges in understanding and trusting the decision-making processes. While existing explainable RL methods provide local insights, they fail to deliver a global understanding of the model, particularly in high-stakes applications. To overcome this limitation, we propose a novel model-agnostic approach that bridges the gap between explainability and interpretability by leveraging Shapley values to transform complex deep RL policies into transparent representations. The proposed approach offers two key contributions: a novel approach employing Shapley values to policy interpretation beyond local explanations and a general framework applicable to off-policy and on-policy algorithms. We evaluate our approach with three existing deep RL algorithms and validate its performance in two classic control environments. The results demonstrate that our approach not only preserves the original models' performance but also generates more stable interpretable policies. 3 authors · Jan 16
- When Explainability Meets Privacy: An Investigation at the Intersection of Post-hoc Explainability and Differential Privacy in the Context of Natural Language Processing In the study of trustworthy Natural Language Processing (NLP), a number of important research fields have emerged, including that of explainability and privacy. While research interest in both explainable and privacy-preserving NLP has increased considerably in recent years, there remains a lack of investigation at the intersection of the two. This leaves a considerable gap in understanding of whether achieving both explainability and privacy is possible, or whether the two are at odds with each other. In this work, we conduct an empirical investigation into the privacy-explainability trade-off in the context of NLP, guided by the popular overarching methods of Differential Privacy (DP) and Post-hoc Explainability. Our findings include a view into the intricate relationship between privacy and explainability, which is formed by a number of factors, including the nature of the downstream task and choice of the text privatization and explainability method. In this, we highlight the potential for privacy and explainability to co-exist, and we summarize our findings in a collection of practical recommendations for future work at this important intersection. 5 authors · Aug 14 2
- Improving Explainability of Sentence-level Metrics via Edit-level Attribution for Grammatical Error Correction Various evaluation metrics have been proposed for Grammatical Error Correction (GEC), but many, particularly reference-free metrics, lack explainability. This lack of explainability hinders researchers from analyzing the strengths and weaknesses of GEC models and limits the ability to provide detailed feedback for users. To address this issue, we propose attributing sentence-level scores to individual edits, providing insight into how specific corrections contribute to the overall performance. For the attribution method, we use Shapley values, from cooperative game theory, to compute the contribution of each edit. Experiments with existing sentence-level metrics demonstrate high consistency across different edit granularities and show approximately 70\% alignment with human evaluations. In addition, we analyze biases in the metrics based on the attribution results, revealing trends such as the tendency to ignore orthographic edits. Our implementation is available at https://github.com/naist-nlp/gec-attribute. 3 authors · Dec 17, 2024
- Utilizing Explainability Techniques for Reinforcement Learning Model Assurance Explainable Reinforcement Learning (XRL) can provide transparency into the decision-making process of a Deep Reinforcement Learning (DRL) model and increase user trust and adoption in real-world use cases. By utilizing XRL techniques, researchers can identify potential vulnerabilities within a trained DRL model prior to deployment, therefore limiting the potential for mission failure or mistakes by the system. This paper introduces the ARLIN (Assured RL Model Interrogation) Toolkit, an open-source Python library that identifies potential vulnerabilities and critical points within trained DRL models through detailed, human-interpretable explainability outputs. To illustrate ARLIN's effectiveness, we provide explainability visualizations and vulnerability analysis for a publicly available DRL model. The open-source code repository is available for download at https://github.com/mitre/arlin. 5 authors · Nov 27, 2023
- Evaluating explainability for machine learning predictions using model-agnostic metrics Rapid advancements in artificial intelligence (AI) technology have brought about a plethora of new challenges in terms of governance and regulation. AI systems are being integrated into various industries and sectors, creating a demand from decision-makers to possess a comprehensive and nuanced understanding of the capabilities and limitations of these systems. One critical aspect of this demand is the ability to explain the results of machine learning models, which is crucial to promoting transparency and trust in AI systems, as well as fundamental in helping machine learning models to be trained ethically. In this paper, we present novel metrics to quantify the degree of which AI model predictions can be easily explainable by its features. Our metrics summarize different aspects of explainability into scalars, providing a more comprehensive understanding of model predictions and facilitating communication between decision-makers and stakeholders, thereby increasing the overall transparency and accountability of AI systems. 4 authors · Feb 23, 2023
- ExpMRC: Explainability Evaluation for Machine Reading Comprehension Achieving human-level performance on some of Machine Reading Comprehension (MRC) datasets is no longer challenging with the help of powerful Pre-trained Language Models (PLMs). However, it is necessary to provide both answer prediction and its explanation to further improve the MRC system's reliability, especially for real-life applications. In this paper, we propose a new benchmark called ExpMRC for evaluating the explainability of the MRC systems. ExpMRC contains four subsets, including SQuAD, CMRC 2018, RACE^+, and C^3 with additional annotations of the answer's evidence. The MRC systems are required to give not only the correct answer but also its explanation. We use state-of-the-art pre-trained language models to build baseline systems and adopt various unsupervised approaches to extract evidence without a human-annotated training set. The experimental results show that these models are still far from human performance, suggesting that the ExpMRC is challenging. Resources will be available through https://github.com/ymcui/expmrc 5 authors · May 10, 2021
21 Multi-Domain Explainability of Preferences Preference mechanisms, such as human preference, LLM-as-a-Judge (LaaJ), and reward models, are central to aligning and evaluating large language models (LLMs). Yet, the underlying concepts that drive these preferences remain poorly understood. In this work, we propose a fully automated method for generating local and global concept-based explanations of preferences across multiple domains. Our method utilizes an LLM to identify concepts that distinguish between chosen and rejected responses, and to represent them with concept-based vectors. To model the relationships between concepts and preferences, we propose a white-box Hierarchical Multi-Domain Regression model that captures both domain-general and domain-specific effects. To evaluate our method, we curate a dataset spanning eight challenging and diverse domains and explain twelve mechanisms. Our method achieves strong preference prediction performance, outperforming baselines while also being explainable. Additionally, we assess explanations in two application-driven settings. First, guiding LLM outputs with concepts from LaaJ explanations yields responses that those judges consistently prefer. Second, prompting LaaJs with concepts explaining humans improves their preference predictions. Together, our work establishes a new paradigm for explainability in the era of LLMs. 3 authors · May 26 2
8 Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data Large multimodal models (LMMs) have shown impressive capabilities in a wide range of visual tasks. However, they often struggle with fine-grained visual reasoning, failing to identify domain-specific objectives and provide justifiable explanations for their predictions. To address this, we propose a novel visual rejection sampling framework to improve the cognition and explainability of LMMs using self-synthesized data. Specifically, visual fine-tuning requires images, queries, and target answers. Our approach begins by synthesizing interpretable answers that include human-verifiable visual features. These features are based on expert-defined concepts, carefully selected based on their alignment with the image content. After each round of fine-tuning, we apply a reward model-free filtering mechanism to select the highest-quality interpretable answers for the next round of tuning. This iterative process of data synthesis and fine-tuning progressively improves the model's ability to generate accurate and reasonable explanations. Experimental results demonstrate the effectiveness of our method in improving both the accuracy and explainability of specialized visual classification tasks. 5 authors · Feb 19 3
3 From Understanding to Utilization: A Survey on Explainability for Large Language Models This survey paper delves into the burgeoning field of explainability for Large Language Models (LLMs), a critical yet challenging aspect of natural language processing. With LLMs playing a pivotal role in various applications, their "black-box" nature raises concerns about transparency and ethical use. This paper emphasizes the necessity for enhanced explainability in LLMs, addressing both the general public's trust and the technical community's need for a deeper understanding of these models. We concentrate on pre-trained Transformer-based LLMs, such as LLaMA, which present unique interpretability challenges due to their scale and complexity. Our review categorizes existing explainability methods and discusses their application in improving model transparency and reliability. We also discuss representative evaluation methods, highlighting their strengths and limitations. The goal of this survey is to bridge the gap between theoretical understanding and practical application, offering insights for future research and development in the field of LLM explainability. 2 authors · Jan 23, 2024
2 Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods While research on applications and evaluations of explanation methods continues to expand, fairness of the explanation methods concerning disparities in their performance across subgroups remains an often overlooked aspect. In this paper, we address this gap by showing that, across three tasks and five language models, widely used post-hoc feature attribution methods exhibit significant gender disparity with respect to their faithfulness, robustness, and complexity. These disparities persist even when the models are pre-trained or fine-tuned on particularly unbiased datasets, indicating that the disparities we observe are not merely consequences of biased training data. Our results highlight the importance of addressing disparities in explanations when developing and applying explainability methods, as these can lead to biased outcomes against certain subgroups, with particularly critical implications in high-stakes contexts. Furthermore, our findings underscore the importance of incorporating the fairness of explanations, alongside overall model fairness and explainability, as a requirement in regulatory frameworks. 4 authors · May 2
2 LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity Vision Transformers (ViTs), with their ability to model long-range dependencies through self-attention mechanisms, have become a standard architecture in computer vision. However, the interpretability of these models remains a challenge. To address this, we propose LeGrad, an explainability method specifically designed for ViTs. LeGrad computes the gradient with respect to the attention maps of ViT layers, considering the gradient itself as the explainability signal. We aggregate the signal over all layers, combining the activations of the last as well as intermediate tokens to produce the merged explainability map. This makes LeGrad a conceptually simple and an easy-to-implement tool for enhancing the transparency of ViTs. We evaluate LeGrad in challenging segmentation, perturbation, and open-vocabulary settings, showcasing its versatility compared to other SotA explainability methods demonstrating its superior spatial fidelity and robustness to perturbations. A demo and the code is available at https://github.com/WalBouss/LeGrad. 5 authors · Apr 4, 2024 1
2 SyntaxShap: Syntax-aware Explainability Method for Text Generation To harness the power of large language models in safety-critical domains we need to ensure the explainability of their predictions. However, despite the significant attention to model interpretability, there remains an unexplored domain in explaining sequence-to-sequence tasks using methods tailored for textual data. This paper introduces SyntaxShap, a local, model-agnostic explainability method for text generation that takes into consideration the syntax in the text data. The presented work extends Shapley values to account for parsing-based syntactic dependencies. Taking a game theoric approach, SyntaxShap only considers coalitions constraint by the dependency tree. We adopt a model-based evaluation to compare SyntaxShap and its weighted form to state-of-the-art explainability methods adapted to text generation tasks, using diverse metrics including faithfulness, complexity, coherency, and semantic alignment of the explanations to the model. We show that our syntax-aware method produces explanations that help build more faithful, coherent, and interpretable explanations for predictions by autoregressive models. 3 authors · Feb 14, 2024
2 TempME: Towards the Explainability of Temporal Graph Neural Networks via Motif Discovery Temporal graphs are widely used to model dynamic systems with time-varying interactions. In real-world scenarios, the underlying mechanisms of generating future interactions in dynamic systems are typically governed by a set of recurring substructures within the graph, known as temporal motifs. Despite the success and prevalence of current temporal graph neural networks (TGNN), it remains uncertain which temporal motifs are recognized as the significant indications that trigger a certain prediction from the model, which is a critical challenge for advancing the explainability and trustworthiness of current TGNNs. To address this challenge, we propose a novel approach, called Temporal Motifs Explainer (TempME), which uncovers the most pivotal temporal motifs guiding the prediction of TGNNs. Derived from the information bottleneck principle, TempME extracts the most interaction-related motifs while minimizing the amount of contained information to preserve the sparsity and succinctness of the explanation. Events in the explanations generated by TempME are verified to be more spatiotemporally correlated than those of existing approaches, providing more understandable insights. Extensive experiments validate the superiority of TempME, with up to 8.21% increase in terms of explanation accuracy across six real-world datasets and up to 22.96% increase in boosting the prediction Average Precision of current TGNNs. 2 authors · Oct 30, 2023
1 Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability The development of effective explainability tools for Transformers is a crucial pursuit in deep learning research. One of the most promising approaches in this domain is Layer-wise Relevance Propagation (LRP), which propagates relevance scores backward through the network to the input space by redistributing activation values based on predefined rules. However, existing LRP-based methods for Transformer explainability entirely overlook a critical component of the Transformer architecture: its positional encoding (PE), resulting in violation of the conservation property, and the loss of an important and unique type of relevance, which is also associated with structural and positional features. To address this limitation, we reformulate the input space for Transformer explainability as a set of position-token pairs. This allows us to propose specialized theoretically-grounded LRP rules designed to propagate attributions across various positional encoding methods, including Rotary, Learnable, and Absolute PE. Extensive experiments with both fine-tuned classifiers and zero-shot foundation models, such as LLaMA 3, demonstrate that our method significantly outperforms the state-of-the-art in both vision and NLP explainability tasks. Our code is publicly available. 4 authors · Jun 2 3
1 Towards LLM-guided Causal Explainability for Black-box Text Classifiers With the advent of larger and more complex deep learning models, such as in Natural Language Processing (NLP), model qualities like explainability and interpretability, albeit highly desirable, are becoming harder challenges to tackle and solve. For example, state-of-the-art models in text classification are black-box by design. Although standard explanation methods provide some degree of explainability, these are mostly correlation-based methods and do not provide much insight into the model. The alternative of causal explainability is more desirable to achieve but extremely challenging in NLP due to a variety of reasons. Inspired by recent endeavors to utilize Large Language Models (LLMs) as experts, in this work, we aim to leverage the instruction-following and textual understanding capabilities of recent state-of-the-art LLMs to facilitate causal explainability via counterfactual explanation generation for black-box text classifiers. To do this, we propose a three-step pipeline via which, we use an off-the-shelf LLM to: (1) identify the latent or unobserved features in the input text, (2) identify the input features associated with the latent features, and finally (3) use the identified input features to generate a counterfactual explanation. We experiment with our pipeline on multiple NLP text classification datasets, with several recent LLMs, and present interesting and promising findings. 4 authors · Sep 23, 2023
- Multi-Head Explainer: A General Framework to Improve Explainability in CNNs and Transformers In this study, we introduce the Multi-Head Explainer (MHEX), a versatile and modular framework that enhances both the explainability and accuracy of Convolutional Neural Networks (CNNs) and Transformer-based models. MHEX consists of three core components: an Attention Gate that dynamically highlights task-relevant features, Deep Supervision that guides early layers to capture fine-grained details pertinent to the target class, and an Equivalent Matrix that unifies refined local and global representations to generate comprehensive saliency maps. Our approach demonstrates superior compatibility, enabling effortless integration into existing residual networks like ResNet and Transformer architectures such as BERT with minimal modifications. Extensive experiments on benchmark datasets in medical imaging and text classification show that MHEX not only improves classification accuracy but also produces highly interpretable and detailed saliency scores. 2 authors · Jan 2
- Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap The rapid proliferation of AI-manipulated or generated audio deepfakes poses serious challenges to media integrity and election security. Current AI-driven detection solutions lack explainability and underperform in real-world settings. In this paper, we introduce novel explainability methods for state-of-the-art transformer-based audio deepfake detectors and open-source a novel benchmark for real-world generalizability. By narrowing the explainability gap between transformer-based audio deepfake detectors and traditional methods, our results not only build trust with human experts, but also pave the way for unlocking the potential of citizen intelligence to overcome the scalability issue in audio deepfake detection. 5 authors · Oct 9, 2024
- A Concept-Based Explainability Framework for Large Multimodal Models Large multimodal models (LMMs) combine unimodal encoders and large language models (LLMs) to perform multimodal tasks. Despite recent advancements towards the interpretability of these models, understanding internal representations of LMMs remains largely a mystery. In this paper, we present a novel framework for the interpretation of LMMs. We propose a dictionary learning based approach, applied to the representation of tokens. The elements of the learned dictionary correspond to our proposed concepts. We show that these concepts are well semantically grounded in both vision and text. Thus we refer to these as ``multi-modal concepts''. We qualitatively and quantitatively evaluate the results of the learnt concepts. We show that the extracted multimodal concepts are useful to interpret representations of test samples. Finally, we evaluate the disentanglement between different concepts and the quality of grounding concepts visually and textually. Our code is publicly available at https://github.com/mshukor/xl-vlms 5 authors · Jun 12, 2024
- What Sketch Explainability Really Means for Downstream Tasks In this paper, we explore the unique modality of sketch for explainability, emphasising the profound impact of human strokes compared to conventional pixel-oriented studies. Beyond explanations of network behavior, we discern the genuine implications of explainability across diverse downstream sketch-related tasks. We propose a lightweight and portable explainability solution -- a seamless plugin that integrates effortlessly with any pre-trained model, eliminating the need for re-training. Demonstrating its adaptability, we present four applications: highly studied retrieval and generation, and completely novel assisted drawing and sketch adversarial attacks. The centrepiece to our solution is a stroke-level attribution map that takes different forms when linked with downstream tasks. By addressing the inherent non-differentiability of rasterisation, we enable explanations at both coarse stroke level (SLA) and partial stroke level (P-SLA), each with its advantages for specific downstream tasks. 6 authors · Mar 14, 2024
- Fast and Simple Explainability for Point Cloud Networks We propose a fast and simple explainable AI (XAI) method for point cloud data. It computes pointwise importance with respect to a trained network downstream task. This allows better understanding of the network properties, which is imperative for safety-critical applications. In addition to debugging and visualization, our low computational complexity facilitates online feedback to the network at inference. This can be used to reduce uncertainty and to increase robustness. In this work, we introduce Feature Based Interpretability (FBI), where we compute the features' norm, per point, before the bottleneck. We analyze the use of gradients and post- and pre-bottleneck strategies, showing pre-bottleneck is preferred, in terms of smoothness and ranking. We obtain at least three orders of magnitude speedup, compared to current XAI methods, thus, scalable for big point clouds or large-scale architectures. Our approach achieves SOTA results, in terms of classification explainability. We demonstrate how the proposed measure is helpful in analyzing and characterizing various aspects of 3D learning, such as rotation invariance, robustness to out-of-distribution (OOD) outliers or domain shift and dataset bias. 2 authors · Mar 12, 2024
- On the Relationship Between Interpretability and Explainability in Machine Learning Interpretability and explainability have gained more and more attention in the field of machine learning as they are crucial when it comes to high-stakes decisions and troubleshooting. Since both provide information about predictors and their decision process, they are often seen as two independent means for one single end. This view has led to a dichotomous literature: explainability techniques designed for complex black-box models, or interpretable approaches ignoring the many explainability tools. In this position paper, we challenge the common idea that interpretability and explainability are substitutes for one another by listing their principal shortcomings and discussing how both of them mitigate the drawbacks of the other. In doing so, we call for a new perspective on interpretability and explainability, and works targeting both topics simultaneously, leveraging each of their respective assets. 2 authors · Nov 19, 2023
- Scene Text Recognition Models Explainability Using Local Features Explainable AI (XAI) is the study on how humans can be able to understand the cause of a model's prediction. In this work, the problem of interest is Scene Text Recognition (STR) Explainability, using XAI to understand the cause of an STR model's prediction. Recent XAI literatures on STR only provide a simple analysis and do not fully explore other XAI methods. In this study, we specifically work on data explainability frameworks, called attribution-based methods, that explain the important parts of an input data in deep learning models. However, integrating them into STR produces inconsistent and ineffective explanations, because they only explain the model in the global context. To solve this problem, we propose a new method, STRExp, to take into consideration the local explanations, i.e. the individual character prediction explanations. This is then benchmarked across different attribution-based methods on different STR datasets and evaluated across different STR models. 2 authors · Oct 14, 2023
- Beam Enumeration: Probabilistic Explainability For Sample Efficient Self-conditioned Molecular Design Generative molecular design has moved from proof-of-concept to real-world applicability, as marked by the surge in very recent papers reporting experimental validation. Key challenges in explainability and sample efficiency present opportunities to enhance generative design to directly optimize expensive high-fidelity oracles and provide actionable insights to domain experts. Here, we propose Beam Enumeration to exhaustively enumerate the most probable sub-sequences from language-based molecular generative models and show that molecular substructures can be extracted. When coupled with reinforcement learning, extracted substructures become meaningful, providing a source of explainability and improving sample efficiency through self-conditioned generation. Beam Enumeration is generally applicable to any language-based molecular generative model and notably further improves the performance of the recently reported Augmented Memory algorithm, which achieved the new state-of-the-art on the Practical Molecular Optimization benchmark for sample efficiency. The combined algorithm generates more high reward molecules and faster, given a fixed oracle budget. Beam Enumeration shows that improvements to explainability and sample efficiency for molecular design can be made synergistic. 2 authors · Sep 25, 2023
- Exploring the Role of Explainability in AI-Assisted Embryo Selection In Vitro Fertilization is among the most widespread treatments for infertility. One of its main challenges is the evaluation and selection of embryo for implantation, a process with large inter- and intra-clinician variability. Deep learning based methods are gaining attention, but their opaque nature compromises their acceptance in the clinical context, where transparency in the decision making is key. In this paper we analyze the current work in the explainability of AI-assisted embryo analysis models, identifying the limitations. We also discuss how these models could be integrated in the clinical context as decision support systems, considering the needs of clinicians and patients. Finally, we propose guidelines for the sake of increasing interpretability and trustworthiness, pushing this technology forward towards established clinical practice. 8 authors · Aug 1, 2023
- UKP-SQuARE v2: Explainability and Adversarial Attacks for Trustworthy QA Question Answering (QA) systems are increasingly deployed in applications where they support real-world decisions. However, state-of-the-art models rely on deep neural networks, which are difficult to interpret by humans. Inherently interpretable models or post hoc explainability methods can help users to comprehend how a model arrives at its prediction and, if successful, increase their trust in the system. Furthermore, researchers can leverage these insights to develop new methods that are more accurate and less biased. In this paper, we introduce SQuARE v2, the new version of SQuARE, to provide an explainability infrastructure for comparing models based on methods such as saliency maps and graph-based explanations. While saliency maps are useful to inspect the importance of each input token for the model's prediction, graph-based explanations from external Knowledge Graphs enable the users to verify the reasoning behind the model prediction. In addition, we provide multiple adversarial attacks to compare the robustness of QA models. With these explainability methods and adversarial attacks, we aim to ease the research on trustworthy QA models. SQuARE is available on https://square.ukp-lab.de. 9 authors · Aug 19, 2022
- Xplique: A Deep Learning Explainability Toolbox Today's most advanced machine-learning models are hardly scrutable. The key challenge for explainability methods is to help assisting researchers in opening up these black boxes, by revealing the strategy that led to a given decision, by characterizing their internal states or by studying the underlying data representation. To address this challenge, we have developed Xplique: a software library for explainability which includes representative explainability methods as well as associated evaluation metrics. It interfaces with one of the most popular learning libraries: Tensorflow as well as other libraries including PyTorch, scikit-learn and Theano. The code is licensed under the MIT license and is freely available at github.com/deel-ai/xplique. 15 authors · Jun 9, 2022
- Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers Transformers are increasingly dominating multi-modal reasoning tasks, such as visual question answering, achieving state-of-the-art results thanks to their ability to contextualize information using the self-attention and co-attention mechanisms. These attention modules also play a role in other computer vision tasks including object detection and image segmentation. Unlike Transformers that only use self-attention, Transformers with co-attention require to consider multiple attention maps in parallel in order to highlight the information that is relevant to the prediction in the model's input. In this work, we propose the first method to explain prediction by any Transformer-based architecture, including bi-modal Transformers and Transformers with co-attentions. We provide generic solutions and apply these to the three most commonly used of these architectures: (i) pure self-attention, (ii) self-attention combined with co-attention, and (iii) encoder-decoder attention. We show that our method is superior to all existing methods which are adapted from single modality explainability. 3 authors · Mar 29, 2021
- Unsupervised Expressive Rules Provide Explainability and Assist Human Experts Grasping New Domains Approaching new data can be quite deterrent; you do not know how your categories of interest are realized in it, commonly, there is no labeled data at hand, and the performance of domain adaptation methods is unsatisfactory. Aiming to assist domain experts in their first steps into a new task over a new corpus, we present an unsupervised approach to reveal complex rules which cluster the unexplored corpus by its prominent categories (or facets). These rules are human-readable, thus providing an important ingredient which has become in short supply lately - explainability. Each rule provides an explanation for the commonality of all the texts it clusters together. We present an extensive evaluation of the usefulness of these rules in identifying target categories, as well as a user study which assesses their interpretability. 5 authors · Oct 19, 2020
- A Survey on Explainability in Machine Reading Comprehension This paper presents a systematic review of benchmarks and approaches for explainability in Machine Reading Comprehension (MRC). We present how the representation and inference challenges evolved and the steps which were taken to tackle these challenges. We also present the evaluation methodologies to assess the performance of explainable systems. In addition, we identify persisting open research questions and highlight critical directions for future work. 3 authors · Oct 1, 2020
3 Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era Explainable AI (XAI) refers to techniques that provide human-understandable insights into the workings of AI models. Recently, the focus of XAI is being extended towards Large Language Models (LLMs) which are often criticized for their lack of transparency. This extension calls for a significant transformation in XAI methodologies because of two reasons. First, many existing XAI methods cannot be directly applied to LLMs due to their complexity advanced capabilities. Second, as LLMs are increasingly deployed across diverse industry applications, the role of XAI shifts from merely opening the "black box" to actively enhancing the productivity and applicability of LLMs in real-world settings. Meanwhile, unlike traditional machine learning models that are passive recipients of XAI insights, the distinct abilities of LLMs can reciprocally enhance XAI. Therefore, in this paper, we introduce Usable XAI in the context of LLMs by analyzing (1) how XAI can benefit LLMs and AI systems, and (2) how LLMs can contribute to the advancement of XAI. We introduce 10 strategies, introducing the key techniques for each and discussing their associated challenges. We also provide case studies to demonstrate how to obtain and leverage explanations. The code used in this paper can be found at: https://github.com/JacksonWuxs/UsableXAI_LLM. 11 authors · Mar 13, 2024
2 SketchXAI: A First Look at Explainability for Human Sketches This paper, for the very first time, introduces human sketches to the landscape of XAI (Explainable Artificial Intelligence). We argue that sketch as a ``human-centred'' data form, represents a natural interface to study explainability. We focus on cultivating sketch-specific explainability designs. This starts by identifying strokes as a unique building block that offers a degree of flexibility in object construction and manipulation impossible in photos. Following this, we design a simple explainability-friendly sketch encoder that accommodates the intrinsic properties of strokes: shape, location, and order. We then move on to define the first ever XAI task for sketch, that of stroke location inversion SLI. Just as we have heat maps for photos, and correlation matrices for text, SLI offers an explainability angle to sketch in terms of asking a network how well it can recover stroke locations of an unseen sketch. We offer qualitative results for readers to interpret as snapshots of the SLI process in the paper, and as GIFs on the project page. A minor but interesting note is that thanks to its sketch-specific design, our sketch encoder also yields the best sketch recognition accuracy to date while having the smallest number of parameters. The code is available at https://sketchxai.github.io. 6 authors · Apr 23, 2023
1 Towards Robust Fidelity for Evaluating Explainability of Graph Neural Networks Graph Neural Networks (GNNs) are neural models that leverage the dependency structure in graphical data via message passing among the graph nodes. GNNs have emerged as pivotal architectures in analyzing graph-structured data, and their expansive application in sensitive domains requires a comprehensive understanding of their decision-making processes -- necessitating a framework for GNN explainability. An explanation function for GNNs takes a pre-trained GNN along with a graph as input, to produce a `sufficient statistic' subgraph with respect to the graph label. A main challenge in studying GNN explainability is to provide fidelity measures that evaluate the performance of these explanation functions. This paper studies this foundational challenge, spotlighting the inherent limitations of prevailing fidelity metrics, including Fid_+, Fid_-, and Fid_Delta. Specifically, a formal, information-theoretic definition of explainability is introduced and it is shown that existing metrics often fail to align with this definition across various statistical scenarios. The reason is due to potential distribution shifts when subgraphs are removed in computing these fidelity measures. Subsequently, a robust class of fidelity measures are introduced, and it is shown analytically that they are resilient to distribution shift issues and are applicable in a wide range of scenarios. Extensive empirical analysis on both synthetic and real datasets are provided to illustrate that the proposed metrics are more coherent with gold standard metrics. The source code is available at https://trustai4s-lab.github.io/fidelity. 8 authors · Oct 3, 2023
- CRAFT: Concept Recursive Activation FacTorization for Explainability Attribution methods, which employ heatmaps to identify the most influential regions of an image that impact model decisions, have gained widespread popularity as a type of explainability method. However, recent research has exposed the limited practical value of these methods, attributed in part to their narrow focus on the most prominent regions of an image -- revealing "where" the model looks, but failing to elucidate "what" the model sees in those areas. In this work, we try to fill in this gap with CRAFT -- a novel approach to identify both "what" and "where" by generating concept-based explanations. We introduce 3 new ingredients to the automatic concept extraction literature: (i) a recursive strategy to detect and decompose concepts across layers, (ii) a novel method for a more faithful estimation of concept importance using Sobol indices, and (iii) the use of implicit differentiation to unlock Concept Attribution Maps. We conduct both human and computer vision experiments to demonstrate the benefits of the proposed approach. We show that the proposed concept importance estimation technique is more faithful to the model than previous methods. When evaluating the usefulness of the method for human experimenters on a human-centered utility benchmark, we find that our approach significantly improves on two of the three test scenarios. Our code is freely available at github.com/deel-ai/Craft. 8 authors · Nov 17, 2022
- "Help Me Help the AI": Understanding How Explainability Can Support Human-AI Interaction Despite the proliferation of explainable AI (XAI) methods, little is understood about end-users' explainability needs and behaviors around XAI explanations. To address this gap and contribute to understanding how explainability can support human-AI interaction, we conducted a mixed-methods study with 20 end-users of a real-world AI application, the Merlin bird identification app, and inquired about their XAI needs, uses, and perceptions. We found that participants desire practically useful information that can improve their collaboration with the AI, more so than technical system details. Relatedly, participants intended to use XAI explanations for various purposes beyond understanding the AI's outputs: calibrating trust, improving their task skills, changing their behavior to supply better inputs to the AI, and giving constructive feedback to developers. Finally, among existing XAI approaches, participants preferred part-based explanations that resemble human reasoning and explanations. We discuss the implications of our findings and provide recommendations for future XAI design. 5 authors · Oct 2, 2022
- TorchEsegeta: Framework for Interpretability and Explainability of Image-based Deep Learning Models Clinicians are often very sceptical about applying automatic image processing approaches, especially deep learning based methods, in practice. One main reason for this is the black-box nature of these approaches and the inherent problem of missing insights of the automatically derived decisions. In order to increase trust in these methods, this paper presents approaches that help to interpret and explain the results of deep learning algorithms by depicting the anatomical areas which influence the decision of the algorithm most. Moreover, this research presents a unified framework, TorchEsegeta, for applying various interpretability and explainability techniques for deep learning models and generate visual interpretations and explanations for clinicians to corroborate their clinical findings. In addition, this will aid in gaining confidence in such methods. The framework builds on existing interpretability and explainability techniques that are currently focusing on classification models, extending them to segmentation tasks. In addition, these methods have been adapted to 3D models for volumetric analysis. The proposed framework provides methods to quantitatively compare visual explanations using infidelity and sensitivity metrics. This framework can be used by data scientists to perform post-hoc interpretations and explanations of their models, develop more explainable tools and present the findings to clinicians to increase their faith in such models. The proposed framework was evaluated based on a use case scenario of vessel segmentation models trained on Time-of-fight (TOF) Magnetic Resonance Angiogram (MRA) images of the human brain. Quantitative and qualitative results of a comparative study of different models and interpretability methods are presented. Furthermore, this paper provides an extensive overview of several existing interpretability and explainability methods. 10 authors · Oct 15, 2021
5 A Comparative Study on Automatic Coding of Medical Letters with Explainability This study aims to explore the implementation of Natural Language Processing (NLP) and machine learning (ML) techniques to automate the coding of medical letters with visualised explainability and light-weighted local computer settings. Currently in clinical settings, coding is a manual process that involves assigning codes to each condition, procedure, and medication in a patient's paperwork (e.g., 56265001 heart disease using SNOMED CT code). There are preliminary research on automatic coding in this field using state-of-the-art ML models; however, due to the complexity and size of the models, the real-world deployment is not achieved. To further facilitate the possibility of automatic coding practice, we explore some solutions in a local computer setting; in addition, we explore the function of explainability for transparency of AI models. We used the publicly available MIMIC-III database and the HAN/HLAN network models for ICD code prediction purposes. We also experimented with the mapping between ICD and SNOMED CT knowledge bases. In our experiments, the models provided useful information for 97.98\% of codes. The result of this investigation can shed some light on implementing automatic clinical coding in practice, such as in hospital settings, on the local computers used by clinicians , project page https://github.com/Glenj01/Medical-Coding. 4 authors · Jul 18, 2024 2
3 Generic Token Compression in Multimodal Large Language Models from an Explainability Perspective Existing Multimodal Large Language Models (MLLMs) process a large number of visual tokens, leading to significant computational costs and inefficiency. Previous works generally assume that all visual tokens are necessary in the shallow layers of LLMs, and therefore token compression typically occurs in intermediate layers. In contrast, our study reveals an interesting insight: with proper selection, token compression is feasible at the input stage of LLM with negligible performance loss. Specifically, we reveal that explainability methods can effectively evaluate the importance of each visual token with respect to the given instruction, which can well guide the token compression. Furthermore, we propose to learn a mapping from the attention map of the first LLM layer to the explanation results, thereby avoiding the need for a full inference pass and facilitating practical deployment. Interestingly, this mapping can be learned using a simple and lightweight convolutional network, whose training is efficient and independent of MLLMs. Extensive experiments on 10 image and video benchmarks across three leading MLLMs (Qwen2-VL, LLaVA-OneVision, and VILA1.5) demonstrate the effectiveness of our approach, e.g., pruning 50% visual tokens while retaining more than 96% of the original performance across all benchmarks for all these three MLLMs. It also exhibits strong generalization, even when the number of tokens in inference far exceeds that used in training. 6 authors · Jun 1
2 Using Interactive Feedback to Improve the Accuracy and Explainability of Question Answering Systems Post-Deployment Most research on question answering focuses on the pre-deployment stage; i.e., building an accurate model for deployment. In this paper, we ask the question: Can we improve QA systems further post-deployment based on user interactions? We focus on two kinds of improvements: 1) improving the QA system's performance itself, and 2) providing the model with the ability to explain the correctness or incorrectness of an answer. We collect a retrieval-based QA dataset, FeedbackQA, which contains interactive feedback from users. We collect this dataset by deploying a base QA system to crowdworkers who then engage with the system and provide feedback on the quality of its answers. The feedback contains both structured ratings and unstructured natural language explanations. We train a neural model with this feedback data that can generate explanations and re-score answer candidates. We show that feedback data not only improves the accuracy of the deployed QA system but also other stronger non-deployed systems. The generated explanations also help users make informed decisions about the correctness of answers. Project page: https://mcgill-nlp.github.io/feedbackqa/ 5 authors · Apr 6, 2022
1 Through a Compressed Lens: Investigating the Impact of Quantization on LLM Explainability and Interpretability Quantization methods are widely used to accelerate inference and streamline the deployment of large language models (LLMs). While prior research has extensively investigated the degradation of various LLM capabilities due to quantization, its effects on model explainability and interpretability, which are crucial for understanding decision-making processes, remain unexplored. To address this gap, we conduct comprehensive experiments using three common quantization techniques at distinct bit widths, in conjunction with two explainability methods, counterfactual examples and natural language explanations, as well as two interpretability approaches, knowledge memorization analysis and latent multi-hop reasoning analysis. We complement our analysis with a thorough user study, evaluating selected explainability methods. Our findings reveal that, depending on the configuration, quantization can significantly impact model explainability and interpretability. Notably, the direction of this effect is not consistent, as it strongly depends on (1) the quantization method, (2) the explainability or interpretability approach, and (3) the evaluation protocol. In some settings, human evaluation shows that quantization degrades explainability, while in others, it even leads to improvements. Our work serves as a cautionary tale, demonstrating that quantization can unpredictably affect model transparency. This insight has important implications for deploying LLMs in applications where transparency is a critical requirement. 8 authors · May 20
- Bridging the Gap in XAI-Why Reliable Metrics Matter for Explainability and Compliance This position paper emphasizes the critical gap in the evaluation of Explainable AI (XAI) due to the lack of standardized and reliable metrics, which diminishes its practical value, trustworthiness, and ability to meet regulatory requirements. Current evaluation methods are often fragmented, subjective, and biased, making them prone to manipulation and complicating the assessment of complex models. A central issue is the absence of a ground truth for explanations, complicating comparisons across various XAI approaches. To address these challenges, we advocate for widespread research into developing robust, context-sensitive evaluation metrics. These metrics should be resistant to manipulation, relevant to each use case, and based on human judgment and real-world applicability. We also recommend creating domain-specific evaluation benchmarks that align with the user and regulatory needs of sectors such as healthcare and finance. By encouraging collaboration among academia, industry, and regulators, we can create standards that balance flexibility and consistency, ensuring XAI explanations are meaningful, trustworthy, and compliant with evolving regulations. 2 authors · Feb 7
- GRAPHITE: Graph-Based Interpretable Tissue Examination for Enhanced Explainability in Breast Cancer Histopathology Explainable AI (XAI) in medical histopathology is essential for enhancing the interpretability and clinical trustworthiness of deep learning models in cancer diagnosis. However, the black-box nature of these models often limits their clinical adoption. We introduce GRAPHITE (Graph-based Interpretable Tissue Examination), a post-hoc explainable framework designed for breast cancer tissue microarray (TMA) analysis. GRAPHITE employs a multiscale approach, extracting patches at various magnification levels, constructing an hierarchical graph, and utilising graph attention networks (GAT) with scalewise attention (SAN) to capture scale-dependent features. We trained the model on 140 tumour TMA cores and four benign whole slide images from which 140 benign samples were created, and tested it on 53 pathologist-annotated TMA samples. GRAPHITE outperformed traditional XAI methods, achieving a mean average precision (mAP) of 0.56, an area under the receiver operating characteristic curve (AUROC) of 0.94, and a threshold robustness (ThR) of 0.70, indicating that the model maintains high performance across a wide range of thresholds. In clinical utility, GRAPHITE achieved the highest area under the decision curve (AUDC) of 4.17e+5, indicating reliable decision support across thresholds. These results highlight GRAPHITE's potential as a clinically valuable tool in computational pathology, providing interpretable visualisations that align with the pathologists' diagnostic reasoning and support precision medicine. 6 authors · Jan 7
- Detecting Machine-Generated Texts: Not Just "AI vs Humans" and Explainability is Complicated As LLMs rapidly advance, increasing concerns arise regarding risks about actual authorship of texts we see online and in real world. The task of distinguishing LLM-authored texts is complicated by the nuanced and overlapping behaviors of both machines and humans. In this paper, we challenge the current practice of considering LLM-generated text detection a binary classification task of differentiating human from AI. Instead, we introduce a novel ternary text classification scheme, adding an "undecided" category for texts that could be attributed to either source, and we show that this new category is crucial to understand how to make the detection result more explainable to lay users. This research shifts the paradigm from merely classifying to explaining machine-generated texts, emphasizing need for detectors to provide clear and understandable explanations to users. Our study involves creating four new datasets comprised of texts from various LLMs and human authors. Based on new datasets, we performed binary classification tests to ascertain the most effective SOTA detection methods and identified SOTA LLMs capable of producing harder-to-detect texts. We constructed a new dataset of texts generated by two top-performing LLMs and human authors, and asked three human annotators to produce ternary labels with explanation notes. This dataset was used to investigate how three top-performing SOTA detectors behave in new ternary classification context. Our results highlight why "undecided" category is much needed from the viewpoint of explainability. Additionally, we conducted an analysis of explainability of the three best-performing detectors and the explanation notes of the human annotators, revealing insights about the complexity of explainable detection of machine-generated texts. Finally, we propose guidelines for developing future detection systems with improved explanatory power. 9 authors · Jun 26, 2024
- FACTIFY3M: A Benchmark for Multimodal Fact Verification with Explainability through 5W Question-Answering Combating disinformation is one of the burning societal crises -- about 67% of the American population believes that disinformation produces a lot of uncertainty, and 10% of them knowingly propagate disinformation. Evidence shows that disinformation can manipulate democratic processes and public opinion, causing disruption in the share market, panic and anxiety in society, and even death during crises. Therefore, disinformation should be identified promptly and, if possible, mitigated. With approximately 3.2 billion images and 720,000 hours of video shared online daily on social media platforms, scalable detection of multimodal disinformation requires efficient fact verification. Despite progress in automatic text-based fact verification (e.g., FEVER, LIAR), the research community lacks substantial effort in multimodal fact verification. To address this gap, we introduce FACTIFY 3M, a dataset of 3 million samples that pushes the boundaries of the domain of fact verification via a multimodal fake news dataset, in addition to offering explainability through the concept of 5W question-answering. Salient features of the dataset include: (i) textual claims, (ii) ChatGPT-generated paraphrased claims, (iii) associated images, (iv) stable diffusion-generated additional images (i.e., visual paraphrases), (v) pixel-level image heatmap to foster image-text explainability of the claim, (vi) 5W QA pairs, and (vii) adversarial fake news stories. 18 authors · May 22, 2023
- Don't Lie to Me! Robust and Efficient Explainability with Verified Perturbation Analysis A variety of methods have been proposed to try to explain how deep neural networks make their decisions. Key to those approaches is the need to sample the pixel space efficiently in order to derive importance maps. However, it has been shown that the sampling methods used to date introduce biases and other artifacts, leading to inaccurate estimates of the importance of individual pixels and severely limit the reliability of current explainability methods. Unfortunately, the alternative -- to exhaustively sample the image space is computationally prohibitive. In this paper, we introduce EVA (Explaining using Verified perturbation Analysis) -- the first explainability method guarantee to have an exhaustive exploration of a perturbation space. Specifically, we leverage the beneficial properties of verified perturbation analysis -- time efficiency, tractability and guaranteed complete coverage of a manifold -- to efficiently characterize the input variables that are most likely to drive the model decision. We evaluate the approach systematically and demonstrate state-of-the-art results on multiple benchmarks. 7 authors · Feb 15, 2022
3 HRDE: Retrieval-Augmented Large Language Models for Chinese Health Rumor Detection and Explainability As people increasingly prioritize their health, the speed and breadth of health information dissemination on the internet have also grown. At the same time, the presence of false health information (health rumors) intermingled with genuine content poses a significant potential threat to public health. However, current research on Chinese health rumors still lacks a large-scale, public, and open-source dataset of health rumor information, as well as effective and reliable rumor detection methods. This paper addresses this gap by constructing a dataset containing 1.12 million health-related rumors (HealthRCN) through web scraping of common health-related questions and a series of data processing steps. HealthRCN is the largest known dataset of Chinese health information rumors to date. Based on this dataset, we propose retrieval-augmented large language models for Chinese health rumor detection and explainability (HRDE). This model leverages retrieved relevant information to accurately determine whether the input health information is a rumor and provides explanatory responses, effectively aiding users in verifying the authenticity of health information. In evaluation experiments, we compared multiple models and found that HRDE outperformed them all, including GPT-4-1106-Preview, in rumor detection accuracy and answer quality. HRDE achieved an average accuracy of 91.04% and an F1 score of 91.58%. 8 authors · Jun 30, 2024
- Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability Phishing attacks remain one of the most prevalent and persistent cybersecurity threat with attackers continuously evolving and intensifying tactics to evade the general detection system. Despite significant advances in artificial intelligence and machine learning, faithfully reproducing the interpretable reasoning with classification and explainability that underpin phishing judgments remains challenging. Due to recent advancement in Natural Language Processing, Large Language Models (LLMs) show a promising direction and potential for improving domain specific phishing classification tasks. However, enhancing the reliability and robustness of classification models requires not only accurate predictions from LLMs but also consistent and trustworthy explanations aligning with those predictions. Therefore, a key question remains: can LLMs not only classify phishing emails accurately but also generate explanations that are reliably aligned with their predictions and internally self-consistent? To answer these questions, we have fine-tuned transformer based models, including BERT, Llama models, and Wizard, to improve domain relevance and make them more tailored to phishing specific distinctions, using Binary Sequence Classification, Contrastive Learning (CL) and Direct Preference Optimization (DPO). To that end, we examined their performance in phishing classification and explainability by applying the ConsistenCy measure based on SHAPley values (CC SHAP), which measures prediction explanation token alignment to test the model's internal faithfulness and consistency and uncover the rationale behind its predictions and reasoning. Overall, our findings show that Llama models exhibit stronger prediction explanation token alignment with higher CC SHAP scores despite lacking reliable decision making accuracy, whereas Wizard achieves better prediction accuracy but lower CC SHAP scores. 3 authors · Jun 16
- A Large-scale Multi Domain Leukemia Dataset for the White Blood Cells Detection with Morphological Attributes for Explainability Earlier diagnosis of Leukemia can save thousands of lives annually. The prognosis of leukemia is challenging without the morphological information of White Blood Cells (WBC) and relies on the accessibility of expensive microscopes and the availability of hematologists to analyze Peripheral Blood Samples (PBS). Deep Learning based methods can be employed to assist hematologists. However, these algorithms require a large amount of labeled data, which is not readily available. To overcome this limitation, we have acquired a realistic, generalized, and large dataset. To collect this comprehensive dataset for real-world applications, two microscopes from two different cost spectrums (high-cost HCM and low-cost LCM) are used for dataset capturing at three magnifications (100x, 40x, 10x) through different sensors (high-end camera for HCM, middle-level camera for LCM and mobile-phone camera for both). The high-sensor camera is 47 times more expensive than the middle-level camera and HCM is 17 times more expensive than LCM. In this collection, using HCM at high resolution (100x), experienced hematologists annotated 10.3k WBC types (14) and artifacts, having 55k morphological labels (Cell Size, Nuclear Chromatin, Nuclear Shape, etc.) from 2.4k images of several PBS leukemia patients. Later on, these annotations are transferred to other 2 magnifications of HCM, and 3 magnifications of LCM, and on each camera captured images. Along with the LeukemiaAttri dataset, we provide baselines over multiple object detectors and Unsupervised Domain Adaptation (UDA) strategies, along with morphological information-based attribute prediction. The dataset will be publicly available after publication to facilitate the research in this direction. 6 authors · May 17, 2024
- May I Ask a Follow-up Question? Understanding the Benefits of Conversations in Neural Network Explainability Research in explainable AI (XAI) aims to provide insights into the decision-making process of opaque AI models. To date, most XAI methods offer one-off and static explanations, which cannot cater to the diverse backgrounds and understanding levels of users. With this paper, we investigate if free-form conversations can enhance users' comprehension of static explanations, improve acceptance and trust in the explanation methods, and facilitate human-AI collaboration. Participants are presented with static explanations, followed by a conversation with a human expert regarding the explanations. We measure the effect of the conversation on participants' ability to choose, from three machine learning models, the most accurate one based on explanations and their self-reported comprehension, acceptance, and trust. Empirical results show that conversations significantly improve comprehension, acceptance, trust, and collaboration. Our findings highlight the importance of customized model explanations in the format of free-form conversations and provide insights for the future design of conversational explanations. 3 authors · Sep 25, 2023
- FairX: A comprehensive benchmarking tool for model analysis using fairness, utility, and explainability We present FairX, an open-source Python-based benchmarking tool designed for the comprehensive analysis of models under the umbrella of fairness, utility, and eXplainability (XAI). FairX enables users to train benchmarking bias-mitigation models and evaluate their fairness using a wide array of fairness metrics, data utility metrics, and generate explanations for model predictions, all within a unified framework. Existing benchmarking tools do not have the way to evaluate synthetic data generated from fair generative models, also they do not have the support for training fair generative models either. In FairX, we add fair generative models in the collection of our fair-model library (pre-processing, in-processing, post-processing) and evaluation metrics for evaluating the quality of synthetic fair data. This version of FairX supports both tabular and image datasets. It also allows users to provide their own custom datasets. The open-source FairX benchmarking package is publicly available at https://github.com/fahim-sikder/FairX. 4 authors · Jun 20, 2024