Troubleshooting
Common issues and solutions when using aio-azure-clients-toolbox.
Connection Pool Issues
ConnectionsExhausted Error
Symptom: ConnectionsExhausted: No connections available
Causes:
- Pool size too small for concurrent load
- Client limit per connection too low
- Connections not being properly released
Solutions:
# Increase pool size
client = ManagedCosmos(
# ... other params
max_size=20, # Increase from default 10
client_limit=200 # Increase from default 100
)
# Check for connection leaks
async with client.get_container_client() as container:
# Always use context manager
result = await container.create_item(body=document)
# Connection automatically released here
Connection Timeouts
Symptom: Operations timing out or hanging
Causes:
- Idle timeout too short
- Network connectivity issues
- Azure service throttling
Solutions:
# Increase timeouts
client = ManagedCosmos(
# ... other params
max_idle_seconds=600, # 10 minutes instead of 5
max_lifespan_seconds=7200 # 2 hours instead of 1
)
# Enable debug logging
import logging
logging.getLogger("aio_azure_clients_toolbox.connection_pooling").setLevel(logging.DEBUG)
Memory Usage Issues
Symptom: High memory consumption with pooled clients
Causes:
- Too many connections in pool
- Large idle timeouts
- Connection leaks
Solutions:
# Optimize for memory usage
client = ManagedCosmos(
# ... other params
max_size=5, # Smaller pool
max_idle_seconds=60, # Quick recycling
client_limit=50 # Fewer clients per connection
)
# Monitor pool health
print(f"Ready connections: {client.pool.ready_connection_count}")
Authentication Issues
DefaultAzureCredential Failures
Symptom: Authentication errors or permission denied
Causes:
- Missing environment variables
- Insufficient permissions
- Managed identity not configured
Solutions:
# Set environment variables
export AZURE_CLIENT_ID="your-client-id"
export AZURE_CLIENT_SECRET="your-client-secret"
export AZURE_TENANT_ID="your-tenant-id"
# Or use Azure CLI
az login
Check permissions:
- Cosmos DB: "Cosmos DB Account Contributor" or "Cosmos DB Data Contributor"
- Blob Storage: "Storage Blob Data Contributor"
- Service Bus: "Azure Service Bus Data Sender/Receiver"
- Event Hub: "Azure Event Hubs Data Sender"
- Event Grid: "Event Grid Data Sender"
Token Refresh Issues
Symptom: Intermittent authentication failures
Causes:
- Long-running processes with expired tokens
- Credential caching issues
Solutions:
# Force credential refresh by recreating clients periodically
async def refresh_clients():
await client.close()
client = create_new_client_instance()
# Or use shorter connection lifespans
client = ManagedCosmos(
# ... other params
max_lifespan_seconds=1800 # 30 minutes
)
Service-Specific Issues
Cosmos DB
Request Rate Too Large (429 errors)
Symptom: CosmosHttpResponseError with status code 429
Solutions:
# Implement retry with exponential backoff
from azure.cosmos import exceptions
import asyncio
async def cosmos_operation_with_retry(container, operation, max_retries=3):
for attempt in range(max_retries + 1):
try:
return await operation(container)
except exceptions.CosmosHttpResponseError as e:
if e.status_code == 429 and attempt < max_retries:
wait_time = 2 ** attempt
await asyncio.sleep(wait_time)
else:
raise
# Reduce connection pool size to limit concurrent requests
cosmos_client = ManagedCosmos(
# ... other params
max_size=3,
client_limit=25
)
Partition Key Issues
Symptom: Cross-partition queries failing or slow
Solutions:
# Always specify partition key when possible
async with cosmos_client.get_container_client() as container:
# Good: Uses partition key
item = await container.read_item(
item="item-id",
partition_key="partition-value"
)
# Avoid: Cross-partition query
items = list(container.query_items(
query="SELECT * FROM c WHERE c.id = @id",
parameters=[{"name": "@id", "value": "item-id"}],
enable_cross_partition_query=True # Required but inefficient
))
Blob Storage
Large File Upload Issues
Symptom: Timeouts or memory errors with large files
Solutions:
# Stream large files instead of loading into memory
import aiofiles
async def upload_large_file(blob_client, file_path: str):
async with aiofiles.open(file_path, 'rb') as file:
# Upload in chunks
chunk_size = 4 * 1024 * 1024 # 4MB chunks
async with blob_client.get_blob_client("large-file.bin") as client:
await client.upload_blob(
file,
overwrite=True,
max_single_put_size=chunk_size,
max_block_size=chunk_size
)
SAS Token Issues
Symptom: Access denied when using SAS URLs
Solutions:
# Check SAS token permissions and expiry
from azure.storage.blob import BlobSasPermissions
from datetime import datetime, timedelta
# Ensure proper permissions
permissions = BlobSasPermissions(read=True, write=True)
# Check token hasn't expired
sas_token = await blob_client.get_blob_sas_token(
"file.txt",
permission=permissions,
expiry_hours=24 # Ensure sufficient time
)
# Verify the URL works
sas_url = await blob_client.get_blob_sas_url("file.txt", permission=permissions)
print(f"SAS URL: {sas_url}")
Service Bus
Connection String vs Managed Identity
Symptom: Authentication issues with Service Bus
Note: This library uses DefaultAzureCredential, not connection strings.
# Correct: Using managed identity
service_bus = ManagedAzureServiceBusSender(
service_bus_namespace_url="https://namespace.servicebus.windows.net",
service_bus_queue_name="queue-name",
credential=DefaultAzureCredential()
)
# For connection strings, use Azure SDK directly:
# from azure.servicebus.aio import ServiceBusClient
# client = ServiceBusClient.from_connection_string(connection_string)
Performance Issues
High Latency
Symptoms: Slow response times
Diagnosis:
import time
import logging
# Enable performance logging
logging.getLogger("aio_azure_clients_toolbox.connection_pooling").setLevel(logging.DEBUG)
# Measure operation times
async def timed_operation(operation):
start = time.time()
try:
result = await operation()
duration = time.time() - start
print(f"Operation completed in {duration:.3f}s")
return result
except Exception as e:
duration = time.time() - start
print(f"Operation failed after {duration:.3f}s: {e}")
raise
# Check pool utilization
def check_pool_health(client):
pool = client.pool
print(f"Ready connections: {pool.ready_connection_count}")
for i, conn in enumerate(pool._pool):
print(f"Connection {i}: clients={conn.current_client_count}, ready={conn.is_ready}")
Solutions:
# Optimize pool configuration
client = ManagedCosmos(
# ... other params
client_limit=50, # Reduce contention
max_size=15, # More connections
max_idle_seconds=300, # Keep connections warm
)
# Pre-warm connections
async def warm_up_pool(client, operations=10):
"""Pre-warm connection pool."""
async def dummy_operation():
async with client.get_container_client() as container:
# Lightweight operation to establish connection
pass
await asyncio.gather(*[dummy_operation() for _ in range(operations)])
Debugging
Enable Debug Logging
import logging
# Configure detailed logging
logging.basicConfig(level=logging.DEBUG)
# Specific loggers
loggers = [
"aio_azure_clients_toolbox.connection_pooling",
"azure.core.pipeline.policies.http_logging_policy",
"azure.servicebus",
"azure.eventhub",
"azure.cosmos"
]
for logger_name in loggers:
logger = logging.getLogger(logger_name)
logger.setLevel(logging.DEBUG)
Network Connectivity
import aiohttp
import asyncio
async def test_connectivity(endpoint: str):
"""Test network connectivity to Azure endpoint."""
try:
async with aiohttp.ClientSession() as session:
async with session.get(f"https://{endpoint}") as response:
print(f"Connectivity to {endpoint}: {response.status}")
return response.status < 400
except Exception as e:
print(f"Connectivity test failed for {endpoint}: {e}")
return False
# Test Azure service endpoints
endpoints = [
"your-cosmos.documents.azure.com",
"yourstorage.blob.core.windows.net",
"your-namespace.servicebus.windows.net"
]
for endpoint in endpoints:
result = await test_connectivity(endpoint)
print(f"{endpoint}: {'✓' if result else '✗'}")
Configuration Validation
def validate_azure_config():
"""Validate Azure configuration."""
import os
required_vars = [
"AZURE_CLIENT_ID",
"AZURE_CLIENT_SECRET",
"AZURE_TENANT_ID"
]
missing = [var for var in required_vars if not os.getenv(var)]
if missing:
print(f"Missing environment variables: {missing}")
return False
print("Azure configuration appears valid")
return True
# Run validation
validate_azure_config()