Skip to main content

Overview

Introduction

The DynamoEnhance SDK also provides a suite of tools designed to enable the quantitative assessment of Large Language Model (LLM) performance. This documentation offers detailed explanations of each method available within the SDK, highlighting their functionalities and applications. By leveraging these methods, developers and researchers can gain deep insights into the capabilities and limitations of their models, ensuring high-quality and reliable text outputs.

SDK Methods

Compression Ratio Calculation

  • Method: calculate_compression_ratio()
    • Purpose: Measures the repetitive nature of the text generated by the model.
    • Description: This method calculates the compression ratio of a given text, providing insights into how verbose or concise the text is. A higher compression ratio indicates more repetition, useful for evaluating the efficiency of text summaries or descriptions generated by the model.
    • Use Case: Analyzing the verbosity and redundancy in model-generated content to ensure concise and meaningful outputs.

Cosine Similarity with BERT

  • Method: calculate_cosine_similarity_bert()
    • Purpose: Computes semantic similarity between two text segments using BERT embeddings.
    • Description: This method evaluates the cosine similarity between a reference text and a model-generated summary using BERT's contextual embeddings. It provides a measure of how semantically aligned the generated text is with the reference.
    • Use Case: Ensuring that model outputs maintain semantic integrity and accurately represent the content of the reference text.

Cosine Similarity with TF-IDF

  • Method: calculate_cosine_similarity_tfidf()
    • Purpose: Measures textual similarity using TF-IDF vectors.
    • Description: This method computes the cosine similarity between two texts based on Term Frequency-Inverse Document Frequency (TF-IDF) vectors. It highlights how well the generated text covers key terms and concepts compared to the reference text.
    • Use Case: Assessing the coverage of essential terms in model-generated summaries to ensure important information is retained.

N-gram Overlap Calculation

  • Method: calculate_ngram_overlap()
    • Purpose: Evaluates the overlap of n-grams between a reference text and a generated summary.
    • Description: This method calculates the degree of overlap for n-grams of varying lengths between two text segments. It is useful for detailed analysis of the linguistic and stylistic fidelity of the generated text.
    • Use Case: Ensuring that generated summaries accurately reflect the linguistic patterns and structures of the reference text.

Coverage Calculation

  • Method: calculate_coverage()
    • Purpose: Assesses how well the generated text covers the reference text.
    • Description: This method evaluates the coverage by dividing the reference text into chunks and measuring the overlap with the generated text. It helps in identifying whether the generated text evenly covers all parts of the reference or is biased towards certain sections.
    • Use Case: Verifying that summaries or paraphrases provide a balanced representation of the entire reference text.

Comprehensive Metrics Computation

  • Method: compute_metrics()
    • Purpose: Provides a comprehensive evaluation by calculating multiple text comparison metrics.
    • Description: This wrapper method allows for the simultaneous computation of various metrics such as compression ratio, cosine similarity (BERT and TF-IDF), n-gram overlap, and coverage. It simplifies the evaluation process by consolidating multiple metrics into a single function.
    • Use Case: Conducting thorough assessments of model-generated texts with a streamlined approach to gather diverse evaluative metrics.

Retrieval Relevance Judgment

  • Method: retrieval_relevance_judge_text()
    • Purpose: Determines the relevance of retrieved texts in response to queries.
    • Description: This method evaluates whether the contexts retrieved in response to questions are relevant and appropriate, using advanced language models. It ensures the relevance and accuracy of information retrieval systems.
    • Use Case: Enhancing the performance of search engines and QA systems by ensuring that retrieved information is pertinent to the queries.

Faithfulness Judgment

  • Method: faithfulness_judge_text()
    • Purpose: Assesses the faithfulness of a model's answers to the retrieved contexts.
    • Description: This method evaluates whether the answers provided by the model are faithful to the context they are derived from. It is crucial for validating the accuracy and reliability of responses in QA systems.
    • Use Case: Ensuring the trustworthiness of model outputs in applications where accuracy is critical, such as medical or legal advice systems.

Response Relevance Judgment

  • Method: response_relevance_judge_text()
    • Purpose: Evaluates the relevance of a model's responses to given questions.
    • Description: This method checks if the answers generated by the model are relevant to the questions asked. It is essential for maintaining the relevance and appropriateness of responses in dialogue systems.
    • Use Case: Improving user interactions in conversational AI by ensuring that responses are directly related to the questions posed.

Question-Answer Pair Generation

  • Method: generate_qa_from_context()
    • Purpose: Generates question-answer pairs from given contexts.
    • Description: This method generates QA pairs from provided texts and supports sampling strategies to ensure diverse and comprehensive question generation. It saves the generated pairs in a CSV file for easy access and use.
    • Use Case: Creating training and evaluation datasets for QA systems, enhancing their ability to understand and respond to varied contexts.