Topic detection is a fundamental capability in natural language processing with far-reaching applications in content management, recommendation systems, and search functionality. Amazon Bedrock Guardrail offers a powerful solution for implementing topic detection with impressive processing speed, making it ideal for applications that require efficient text analysis at scale.
This guide will walk you through the process of setting up and implementing topic detection using Amazon Bedrock Guardrail. Be sure also to check out our guide on Implementing Topic Detection with OpenAI.
The Dataset
Our implementation was tested on a diverse dataset comprising 2,926 text samples distributed across 14 distinct topical categories:
- Health & Medicine (235 samples)
- Education (216 samples)
- Technology (209 samples)
- Politics (207 samples)
- Food & Cooking (207 samples)
- Psychology & Self-Development (206 samples)
- Environment & Climate (206 samples)
- Entertainment (204 samples)
- Business & Entrepreneurship (204 samples)
- Travel & Tourism (203 samples)
- Science & Space (202 samples)
- Sports (201 samples)
- History (200 samples)
- Finance & Economy (185 samples)
Sample Texts from the Dataset
The table below provides a representative example from each topic category:
Topic Category | Sample Text |
---|---|
Health & Medicine | A new study links regular exercise to improved mental health. |
Technology | The latest iPhone model features an A17 Bionic chip. |
Politics | The presidential debate focused on healthcare and the economy. |
Food & Cooking | Cooking with fresh herbs enhances the flavor of any dish. |
Psychology & Self-Development | Emotional intelligence is key to healthy relationships. |
Environment & Climate | Eco-friendly practices are gaining traction among businesses. |
Entertainment | The latest Marvel movie broke box office records. |
Business & Entrepreneurship | Starting a business requires careful planning and research. |
Travel & Tourism | The Maldives is known for its stunning beaches and resorts. |
Science & Space | NASA plans to send humans to Mars within the next decade. |
Sports | The Lakers won the NBA championship after a thrilling game. |
History | The discovery of the Americas changed the course of history. |
Finance & Economy | The stock market surged today as tech companies posted gains. |
Setting Up Amazon Bedrock Guardrail
You can set up Amazon Bedrock Guardrail for topic detection using either the AWS web interface or programmatically via the AWS SDK. We'll explore both approaches.
Approach 1: Using the AWS Web Interface
The AWS web interface provides a user-friendly way to create and configure guardrails without writing code.
Step 1: Create a New Guardrail
Begin by navigating to the Amazon Bedrock console and selecting "Guardrails" from the menu. Click "Create guardrail" to start the configuration process.
In the initial setup screen, provide a name for your guardrail (e.g., "Topic-Detection") and an optional description. This will be the foundation for your topic detection system.
Step 2: Configure Denied Topics
After creating the guardrail, navigate to the "Add denied topics" section. Here, you'll define the topics you want the system to detect.
For each topic you want to detect:
- Enter the topic name (e.g., "Sports")
- Provide a clear definition that helps the system understand what content falls under this topic
- Add sample phrases that represent this topic (e.g., "The Lakers won the NBA championship after a thrilling game.")
Repeat this process for each of your target topics. In our benchmark, we configured all 14 topic categories as "denied topics" to leverage Bedrock's topic detection capabilities.
Step 3: Configure Contextual Grounding
The final critical step is configuring the contextual grounding check, which determines how strictly the system evaluates topic relevance.
- Navigate to "Add contextual grounding check"
- Enable the relevance check
- Set the relevance score threshold - this is crucial for balancing accuracy and recall
- In our benchmark, we used a threshold of 0.7
- Lower values (closer to 0) will detect less topics
- Higher values (closer to 0.99) will be more severe but might miss some relevant content
This threshold setting directly impacts the accuracy rate of your topic detection system. Adjusting this value allows you to find the optimal balance between precision and recall for your specific use case.
Approach 2: Using the AWS SDK (Programmatic Creation)
For automated or programmatic creation of guardrails, especially when dealing with multiple topics, you can use the AWS SDK. This approach is particularly useful for:
- Creating guardrails as part of CI/CD pipelines
- Maintaining consistent guardrail configurations across environments
- Handling large numbers of topics efficiently
Here's how to create a guardrail programmatically using Python and the AWS SDK:
Step 1: Set Up Your Environment
First, install the required dependencies:
Copied!1pip install boto3 pandas python-dotenv 2
Create a .env file to securely store your AWS credentials:
Copied!1AWS_REGION=your-aws-region 2AWS_ACCESS_KEY=your-access-key 3AWS_SECRET_KEY=your-secret-key 4
Step 2: Prepare Your Topic Data
You'll need a dataset with topic labels and descriptions. In our example, we use two CSV files:
- topic_detection.csv: Contains text samples with their topic labels
- topics_descriptions.csv: Contains topic names and their descriptions
The topics descriptions file should have columns for "Topic" and "Description", where descriptions are limited to 200 characters.
Step 3: Create the Guardrail Using the AWS SDK
Here's the Python code to create a guardrail programmatically, with detailed comments explaining each step:
The first step is to connect to Amazon Bedrock Service:
Copied!1import boto3 2import pandas as pd 3from dotenv import load_dotenv 4import os 5 6# Load environment variables from .env file for secure credential management 7load_dotenv() 8 9# Retrieve AWS credentials and region from environment variables 10AWS_REGION = os.getenv("AWS_REGION") 11AWS_SECRET_KEY = os.getenv("AWS_SECRET_KEY") 12AWS_ACCESS_KEY = os.getenv("AWS_ACCESS_KEY") 13 14# Initialize the Bedrock client with your AWS credentials 15# Note: This is different from the bedrock-runtime client used for inference 16client = boto3.client( 17 service_name="bedrock", # Use the bedrock service for creating guardrails 18 region_name=AWS_REGION, 19 aws_secret_access_key=AWS_SECRET_KEY, 20 aws_access_key_id=AWS_ACCESS_KEY, 21) 22 23
Then we load the dataset containing text samples and their topic labels. This dataset will be used to extract unique topics and sample texts that will be lately used to define the topics to be blocked.
Copied!1df = pd.read_csv(filepath_or_buffer="./data/topic_detection.csv") 2topics = list(df["label"].unique()) # Extract all unique topic labels 3 4# Load the dataset containing topic descriptions 5# This file should have a "Topic" column and a "Description" column 6df_desc = pd.read_csv( 7 filepath_or_buffer="./data/topics_descriptions.csv", delimiter=";" 8) 9 10# Get the formatted topic names from the descriptions file 11# These should be properly formatted for the guardrail (no spaces, special characters) 12formatted_topics = list(df_desc["Topic"].unique()) 13topics_config = list() # Initialize empty list to store topic configurations 14 15# Create a configuration dictionary for each topic 16# This maps each original topic label to its formatted name and description 17for topic, form_topic in zip(topics, formatted_topics): 18 # Create the configuration dictionary for this topic 19 config = { 20 "name": form_topic, # The formatted topic name (e.g., "HealthAndMedicine") 21 "definition": df_desc[df_desc["Topic"] == form_topic]["Description"].values[0], # Get the topic description 22 "examples": [ 23 # Include a sample text for this topic - helps the guardrail understand the topic 24 df[df["label"] == topic].sample(n=1, random_state=27)["text"].values[0] 25 ], 26 "type": "DENY", # Set type to "DENY" for topic detection purposes 27 } 28 topics_config.append(config) # Add this topic's config to our list 29 30
After we create the configuration of the guardrail, we are going to create the guardrail using the following code:
Copied!1# Create the guardrail with all configured topics 2create_response = client.create_guardrail( 3 name="Topic-Detection", # Name of the guardrail 4 description="Detecting Topics", # Description of the guardrail's purpose 5 6 # Configure the topic policy with our list of topic configurations 7 topicPolicyConfig={"topicsConfig": topics_config}, 8 9 # Configure the contextual grounding settings 10 contextualGroundingPolicyConfig={ 11 "filtersConfig": [ 12 # Set the GROUNDING threshold to 0.7 - controls how strictly topics are matched 13 {"type": "GROUNDING", "threshold": 0.7}, 14 # Set the RELEVANCE threshold to 0.7 - controls how relevant content must be 15 {"type": "RELEVANCE", "threshold": 0.7}, 16 ] 17 }, 18 19 # Messages to display when topics are detected 20 blockedInputMessaging="Topic Detected", # Message for detected topics in input 21 blockedOutputsMessaging="Topic Detected", # Message for detected topics in output 22) 23 24
Key Components of the SDK Approach
1. Topic Configuration:
- Each topic requires a name, definition, examples, and type ("DENY" for topic detection).
2. Contextual Grounding Configuration:
- We set both GROUNDING and RELEVANCE thresholds to 0.7, which matches our web interface configuration.
3. Blocked Messaging:
- We define simple messages to display when topics are detected.
Advantages of the SDK Approach:
- Automation: Create guardrails as part of automated workflows
- Version Control: Store guardrail configurations in your code repository
- Bulk Configuration: Efficiently configure multiple topics at once
- Reproducibility: Easily recreate the same guardrail in different environments
After creating the guardrail using either approach, you'll receive a guardrail ID that you'll need for the implementation phase.
Implementing the Bedrock Guardrail API
Once your guardrail is configured, you can implement it in your Python application. Here's a step-by-step guide:
Step 1: Set Up Your Environment
First, install the required dependencies:
Copied!1pip install boto3 pandas tqdm python-dotenv 2
Create a .env file to securely store your AWS credentials and guardrail configuration:
Copied!1GUARDRAIL_ID=your-guardrail-id 2GUARDRAIL_VERSION=your-guardrail-version 3AWS_REGION=your-aws-region 4AWS_ACCESS_KEY_ID=your-access-key 5AWS_SECRET_ACCESS_KEY=your-secret-key 6
Step 2: Implement the Topic Detection Function
Here's the Python code to implement topic detection using Amazon Bedrock Guardrail:
Copied!1from tqdm import tqdm 2import time 3from typing import Any 4import os 5import boto3 6import pandas as pd 7from dotenv import load_dotenv 8 9# Load environment variables from .env file 10load_dotenv() 11 12# Configuration parameters for AWS Bedrock 13GUARDRAIL_ID = os.getenv("GUARDRAIL_ID") # The ID of your configured guardrail 14GUARDRAIL_VERSION = os.getenv("GUARDRAIL_VERSION") # The version of your guardrail 15AWS_REGION = os.getenv("AWS_REGION") # Your AWS region 16AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID") # Your AWS access key 17AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY") # Your AWS secret key 18 19# Initialize the Bedrock runtime client 20bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION) 21 22 23 24 25 26def topic_detection_bedrock(text: str) -> Any: 27 # Format the input text for the guardrail API 28 content = [{"text": {"text": text}}] 29 30 # Measure execution time 31 start = time.time() 32 33 # Call the Bedrock Guardrail API 34 response = bedrock_runtime.apply_guardrail( 35 guardrailIdentifier=GUARDRAIL_ID, 36 guardrailVersion=GUARDRAIL_VERSION, 37 source="INPUT", # Analyze the input text 38 content=content, 39 ) 40 41 end = time.time() 42 execution_time = end - start 43 44 # Process the response based on the guardrail's action 45 if response["action"] == "GUARDRAIL_INTERVENED": 46 # Extract detected topics from the response 47 topics = list() 48 for topic in response["assessments"][0]["topicPolicy"]["topics"]: 49 topics.append(topic["name"]) 50 print(topic["name"], execution_time) 51 52 # Return either a list of topics or a single topic 53 if len(topics) > 1: 54 return topics, execution_time 55 else: 56 return topics[0], execution_time 57 else: 58 # If no topics were detected 59 print(response["action"], execution_time) 60 return response["action"], execution_time 61 62
Understanding the API Response
The Bedrock Guardrail API response contains valuable information about detected topics. Here's how to interpret it:
When a topic is detected, the response will have:
- action set to "GUARDRAIL_INTERVENED"
- assessments containing a topicPolicy object with detected topics
Each topic in the response includes:
- name: The name of the detected topic
- confidence: A confidence score for the detection
If no topics are detected, the action will be different (typically "ALLOW").
Step 3: Process Multiple Text Samples
To process a dataset of text samples, you can use the following code:
Copied!1def main(): 2 # Initialize lists to store results 3 topic_detected = list() 4 execution_time = list() 5 6 # Load the dataset 7 df = pd.read_csv(filepath_or_buffer="data/topic_detection.csv") 8 9 # Process each text sample with a progress bar 10 for _, row in tqdm( 11 df.iterrows(), total=len(df), desc="Analyzing topics with bedrock guardrail" 12 ): 13 text = row["text"] 14 topic, exec_time = topic_detection_bedrock(text=text) 15 topic_detected.append(topic) 16 execution_time.append(exec_time) 17 18 # Add results as new columns to the dataframe 19 df["execution_time"] = execution_time 20 df["topic_detected"] = topic_detected 21 22 # Save the results to a CSV file 23 df.to_csv(path_or_buf="data/bedrock_topic_detection.csv", index_label=False) 24 25if __name__ == "__main__": 26 main() 27 28
Performance of Amazon Bedrock Guardrail for Topic Detection
Now that we've covered the implementation, let's examine the performance characteristics of Amazon Bedrock Guardrail for topic detection based on our benchmark study.
Speed and Efficiency
Amazon Bedrock Guardrail demonstrates impressive processing efficiency for topic detection tasks:
- Average Processing Time: 0.357 seconds per text sample
- Consistent Performance: Minimal variance in processing time across different topic categories
This speed makes Bedrock Guardrail particularly well-suited for applications that need to process large volumes of text in real-time or near-real-time scenarios.
Accuracy and Detection Capabilities
Our testing revealed that with the default threshold value of 0.7, Bedrock Guardrail achieves an approximate accuracy of 58% with a moderate false positive rate. This configuration processes text samples in an average of 0.357 seconds, striking a reasonable balance between accuracy and speed.
Resource Utilization
Bedrock Guardrail is designed to be efficient with computational resources:
- Memory Usage: Minimal compared to running large language models locally
- Scaling: Handles increased load gracefully through AWS's infrastructure
- Cost Efficiency: Pay-as-you-go pricing model based on API calls
Key Observations
Our benchmark revealed several important characteristics of Bedrock Guardrail's topic detection:
1. Topic Definition Impact: The quality and specificity of topic definitions significantly influence detection accuracy
2. Sample Diversity: Including diverse examples for each topic improves detection across different writing styles
3. Threshold Tuning: Finding the optimal threshold value is crucial for balancing accuracy and false positives
4. Processing Consistency: Performance remains stable even with longer text samples
These observations highlight the importance of proper configuration and tuning when implementing Bedrock Guardrail for topic detection.
Optimizing Your Topic Detection System
To improve the accuracy of your Bedrock Guardrail topic detection system, consider these optimization strategies:
1. Refine Topic Definitions
Provide comprehensive definitions for each topic that clearly distinguish it from other topics. Include key concepts, terminology, and characteristics that define the topic.
2. Add Diverse Sample Phrases
Include a variety of sample phrases that represent different aspects of each topic. The more diverse and representative your samples, the better the system will understand the topic's scope.
3. Experiment with Relevance Thresholds
The relevance threshold in the contextual grounding check significantly impacts detection accuracy. Experiment with different values to find the optimal balance:
- Lower threshold (e.g., 0.5): Detects less topics.
- Higher threshold (e.g., 0.8): The Guardrail is severely blocking almost every prompt
4. Combine with Pre-processing
Consider implementing text pre-processing steps before sending content to Bedrock:
- Text normalization (lowercase, remove special characters)
- Keyword extraction to highlight important terms
- Entity recognition to identify key subjects
Use Cases for Amazon Bedrock Guardrail Topic Detection
Amazon Bedrock Guardrail's topic detection is particularly well-suited for:
1. High-Volume Content Processing: Applications that need to process large volumes of text quickly
2. Real-Time Content Moderation: Systems that filter content based on topic
3. Content Categorization: Automatically organizing documents, articles, or posts
4. User Query Classification: Routing user requests to appropriate knowledge bases or response systems
5. Content Recommendation Systems: Identifying content topics to match user interests
Conclusion
Amazon Bedrock Guardrail offers a powerful, configurable, and efficient solution for implementing topic detection in your applications. With an average processing time of just 0.357 seconds per text sample and the ability to process approximately 10,000 samples per hour, it's an excellent choice for applications where speed and scalability are critical requirements.
The 58% accuracy achieved with the default configuration (0.7 threshold) provides a solid foundation, which can be further improved through careful tuning of topic definitions, sample diversity, and threshold settings. The configurable nature of Bedrock Guardrail allows you to adjust the balance between precision and recall based on your specific use case requirements.
By carefully configuring your guardrail with well-defined topics, representative sample phrases, and optimized relevance thresholds, you can build a topic detection system that balances accuracy and performance to meet your specific requirements. The seamless integration with the AWS ecosystem makes it particularly attractive for organizations already leveraging AWS services.
As you implement and refine your topic detection system, remember that the optimal configuration will depend on your unique use case, content types, and performance priorities. Experiment with different settings to find the approach that works best for your application.