Advanced Usage

This document covers advanced usage patterns for AI-Suite.

Streaming

Receive responses incrementally as they are generated by passing stream: true. This returns a Promise<AsyncGenerator<StreamChunk>> instead of a Promise<SuccessChatCompletion>.

const stream = await aiSuite.createChatCompletion(
  'openai/gpt-4o-mini',
  [{ role: 'user', content: 'Tell me a short story.' }],
  { stream: true, responseFormat: 'text' }
);

for await (const chunk of stream) {
  if (!chunk.done) {
    process.stdout.write(chunk.delta); // print each new piece of text as it arrives
  } else {
    console.log('\nDone! Total tokens:', chunk.usage?.total_tokens);
  }
}

Streaming is supported by OpenAI, Anthropic, and Gemini providers. For the full reference including chunk structure, JSON responses, hooks, and metadata propagation, see Usage — Stream.

Configuration Options

Temperature Control

Control the randomness of responses by adjusting the temperature:

const response = await aiSuite.createChatCompletion(
  'openai/gpt-4o',
  [{ role: 'user', content: 'Write a creative story' }],
  {
    responseFormat: 'text',
    temperature: 0.2  // Lower temperature for more deterministic responses
  }
);

Max Output Tokens

Limit the maximum number of tokens in the response:

const response = await aiSuite.createChatCompletion(
  'anthropic/claude-3-5-sonnet-20241022',
  [{ role: 'user', content: 'Write a short story.' }],
  {
    responseFormat: 'text',
    maxOutputTokens: 500  // Limit response to 500 tokens
  }
);

Structured Output

JSON Object Mode

Get any valid JSON response:

const response = await aiSuite.createChatCompletion(
  'openai/gpt-4o',
  [{ role: 'user', content: 'Generate a user profile with name, age, and email' }],
  {
    responseFormat: 'json_object'
  }
);

if (response.success) {
  console.log(response.content_object);  // Parsed JSON object
}

JSON Schema Mode (Strongly Typed)

Use Zod schemas for type-safe, validated JSON responses:

import { z } from 'zod';

const UserSchema = z.object({
  name: z.string(),
  age: z.number().int().positive(),
  email: z.string().email(),
  interests: z.array(z.string())
});

const response = await aiSuite.createChatCompletion(
  'openai/gpt-4o',
  [{ role: 'user', content: 'Generate a sample user' }],
  {
    responseFormat: 'json_schema',
    zodSchema: UserSchema
  }
);

if (response.success) {
  // response.content_object is typed according to your schema
  const user = response.content_object;
  console.log(`${user.name} is ${user.age} years old`);
}

This works with all providers that support JSON mode (OpenAI, Gemini, etc.).

Tool/Function Calling

Enable models to call functions:

const tools = [{
  type: 'function' as const,
  function: {
    name: 'get_weather',
    description: 'Get the current weather for a location',
    parameters: {
      type: 'object' as const,
      properties: {
        location: {
          type: 'string' as const,
          description: 'The city name'
        },
        unit: {
          type: 'string' as const,
          description: 'Temperature unit (celsius or fahrenheit)'
        }
      },
      required: ['location'],
      additionalProperties: false
    },
    additionalProperties: false,
    strict: true
  }
}];

const response = await aiSuite.createChatCompletion(
  'openai/gpt-4o',
  [{ role: 'user', content: 'What is the weather in Paris?' }],
  {
    responseFormat: 'text',
    tools
  }
);

if (response.success && response.tools) {
  for (const tool of response.tools) {
    console.log(`Calling ${tool.name} with:`, tool.content);
    // Execute your function and send result back
  }
}

Retry Logic

Built-in retry mechanism with exponential backoff:

const response = await aiSuite.createChatCompletion(
  'openai/gpt-4o',
  [{ role: 'user', content: 'Hello!' }],
  {
    responseFormat: 'text',
    retry: {
      attempts: 5,
      delay: (attempt) => {
        // Exponential backoff: 100ms, 200ms, 400ms, 800ms, 1600ms
        return Math.pow(2, attempt) * 100;
      }
    }
  }
);

Default retry configuration:

Attempts: 1 (no retry)
Delay: Exponential backoff starting at 100ms

Langfuse Integration

AI-Suite provides built-in integration with Langfuse for tracking and monitoring AI interactions.

Setup

import { Langfuse } from 'langfuse';

const langfuse = new Langfuse({
  publicKey: process.env.LANGFUSE_PUBLIC_KEY,
  secretKey: process.env.LANGFUSE_SECRET_KEY,
});

const aiSuite = new AISuite(
  {
    openaiKey: process.env.OPENAI_API_KEY,
    anthropicKey: process.env.ANTHROPIC_API_KEY
  },
  {
    langFuse: langfuse
  }
);

Adding Metadata

Track additional context with your requests:

const response = await aiSuite.createChatCompletion(
  'openai/gpt-4o',
  [{ role: 'user', content: 'Hello!' }],
  {
    responseFormat: 'text',
    metadata: {
      langFuse: {
        userId: 'user-123',
        sessionId: 'session-456',
        environment: 'production',
        name: 'greeting-interaction',
        tags: ['customer-support', 'greeting']
      },
      // Custom metadata
      customField: 'value'
    }
  }
);

What Gets Tracked

When Langfuse is integrated, AI-Suite automatically tracks:

Model used for each request
Input messages
Output responses
Token usage (input, output, cached, reasoning, thinking)
Execution time
Success/failure status
Custom metadata

Comparing Multiple Providers

AI-Suite makes it easy to compare responses from different providers:

const responses = await aiSuite.createChatCompletionMultiResult(
  [
    'openai/gpt-4o',
    'anthropic/claude-3-5-sonnet-20241022',
    'gemini/gemini-2.5-flash'
  ],
  [{ role: 'user', content: 'Explain quantum computing in simple terms.' }],
  {
    responseFormat: 'text',
    temperature: 0.7
  }
);

// Responses is an array in the same order as providers
const [openaiResponse, claudeResponse, geminiResponse] = responses;

if (openaiResponse.success) {
  console.log('OpenAI:', openaiResponse.content);
  console.log('Time:', openaiResponse.execution_time + 'ms');
}

if (claudeResponse.success) {
  console.log('Claude:', claudeResponse.content);
  console.log('Time:', claudeResponse.execution_time + 'ms');
}

if (geminiResponse.success) {
  console.log('Gemini:', geminiResponse.content);
  console.log('Time:', geminiResponse.execution_time + 'ms');
}

Reasoning Models (OpenAI o1/o3, Grok)

OpenAI’s reasoning models (o1, o3) and Grok support extended reasoning:

const response = await aiSuite.createChatCompletion(
  'openai/o1',
  [{ role: 'user', content: 'Solve this complex problem: ...' }],
  {
    responseFormat: 'text',
    reasoning: {
      effort: 'high'  // 'low' | 'medium' | 'high'
    }
  }
);

if (response.success) {
  console.log('Response:', response.content);
  console.log('Reasoning tokens used:', response.usage?.reasoning_tokens);
  console.log('Total tokens:', response.usage?.total_tokens);
}

The reasoning.effort parameter controls how much computational effort the model uses for reasoning:

low: Faster, less thorough reasoning
medium: Balanced reasoning
high: Slower, more thorough reasoning

Thinking Mode (Gemini 2.5)

Gemini 2.5 models support thinking budget for extended reasoning:

const response = await aiSuite.createChatCompletion(
  'gemini/gemini-2.5-pro',
  [{ role: 'user', content: 'Analyze this complex problem deeply...' }],
  {
    responseFormat: 'text',
    thinking: {
      budget: 1024,     // Token budget for thinking (0-16384)
      output: true      // Include thinking process in output
    }
  }
);

if (response.success) {
  console.log('Response:', response.content);
  console.log('Thinking tokens used:', response.usage?.thoughts_tokens);
}

Notes:

budget: Number of tokens allocated for thinking (0-16384). Higher = more thorough analysis
output: Whether to include the thinking process in the response
Only works with gemini-2.5-pro currently

Hooks System

Intercept and process requests/responses:

const aiSuite = new AISuite(
  {
    openaiKey: process.env.OPENAI_API_KEY,
  },
  {
    hooks: {
      handleRequest: async (req) => {
        // Log or modify request before sending
        console.log('Sending request:', JSON.stringify(req, null, 2));

        // You can throw an error to abort the request
        // throw new Error('Request aborted');
      },
      handleResponse: async (req, res, metadata) => {
        // Process response
        console.log('Received response:', res);
        console.log('Metadata:', metadata);

        // Log to your own tracking system
        await myTrackingSystem.log({
          request: req,
          response: res,
          metadata
        });
      },
      failOnError: true  // If false, hook errors won't abort the request
    }
  }
);

Use cases for hooks:

Custom logging
Request/response transformation
Additional validation
Integration with custom tracking systems
A/B testing
Request filtering/blocking

Error Handling

AI-Suite provides consistent error handling across all providers:

const response = await aiSuite.createChatCompletion(
  'openai/gpt-4o',
  [{ role: 'user', content: 'Hello, world!' }],
  {
    responseFormat: 'text'
  }
);

if (response.success) {
  console.log('Success:', response.content);
  console.log('Tokens used:', response.usage?.total_tokens);
} else {
  // Error handling
  console.error('Error tag:', response.tag);
  console.error('Error message:', response.error);
  console.error('Raw error:', response.raw);

  // Handle specific error types
  switch (response.tag) {
    case 'InvalidAuth':
      console.error('Invalid API key');
      break;
    case 'RateLimitExceeded':
      console.error('Rate limit hit, retry later');
      break;
    case 'InvalidRequest':
      console.error('Invalid request parameters');
      break;
    case 'ServerError':
      console.error('Provider server error');
      break;
    default:
      console.error('Unknown error');
  }
}

Error tags:

InvalidAuth: Authentication/API key issues
InvalidRequest: Malformed request
InvalidModel: Model not found or not available
RateLimitExceeded: Rate limit hit
ServerError: Provider server error (5xx)
ServerOverloaded: Server overloaded/capacity issues
Unknown: Other errors

Custom LLM Provider

Use any OpenAI-compatible API:

// Example: Using Ollama
const aiSuite = new AISuite({
  customURL: 'http://localhost:11434/v1',
  customLLMKey: 'not-needed'  // Ollama doesn't require auth
});

const response = await aiSuite.createChatCompletion(
  'custom-llm/llama3.2',
  [{ role: 'user', content: 'Hello!' }],
  {
    responseFormat: 'text',
    temperature: 0.7
  }
);

// Example: Using vLLM
const vllmSuite = new AISuite({
  customURL: 'http://your-vllm-server:8000/v1',
  customLLMKey: 'optional-key'
});

// Example: Using LM Studio
const lmStudioSuite = new AISuite({
  customURL: 'http://localhost:1234/v1',
});

This works with any server implementing the OpenAI Chat Completions API format.

Message Roles

AI-Suite supports different message roles:

const messages = [
  {
    role: 'developer',  // System/developer instructions
    content: 'You are a helpful assistant specialized in TypeScript'
  },
  {
    role: 'user',
    content: 'How do I define an interface?'
  },
  {
    role: 'assistant',
    content: 'You can define an interface like this: interface MyInterface { ... }'
  },
  {
    role: 'user',
    content: 'Can you show me an example?'
  }
];

const response = await aiSuite.createChatCompletion(
  'openai/gpt-4o',
  messages,
  { responseFormat: 'text' }
);

Role mapping per provider:

developer: Mapped to appropriate system instruction per provider
user: User message
assistant: Assistant response
tool: Tool/function call result

Image and File Support

AI-Suite supports sending images and files as part of your messages, enabling multimodal AI interactions. This feature is available for compatible providers (OpenAI, Anthropic, Google Gemini).

Supported Content Types

AI-Suite supports three content types in messages:

Text: Plain text or structured text objects
Images: Image data as Buffer or base64 string
Files: Documents with specified media type (PDF, PNG, JPG, JPEG, GIF, WEBP)

Sending Images

Send images by using the InputContentImage format:

import { readFileSync } from 'fs';

const img = readFileSync('./path/to/image.jpg');

const response = await aiSuite.createChatCompletion(
  'openai/gpt-4o',
  [
    {
      role: 'user',
      content: {
        type: 'image',
        image: img  // readFileSync returns a Buffer, use it directly
      }
    },
    {
      role: 'user',
      content: 'What do you see in this image?'
    }
  ],
  { responseFormat: 'text' }
);

You can also send base64-encoded images:

const base64Image = 'iVBORw0KGgoAAAANSUhEUgAA...'; // Your base64 string

const response = await aiSuite.createChatCompletion(
  'openai/gpt-4o',
  [
    {
      role: 'user',
      content: {
        type: 'image',
        image: base64Image
      }
    },
    {
      role: 'user',
      content: 'Describe this image'
    }
  ],
  { responseFormat: 'text' }
);

Sending Files

Send documents using the InputContentFile format:

import { readFileSync } from 'fs';

const pdf = readFileSync('./document.pdf');

const response = await aiSuite.createChatCompletion(
  'anthropic/claude-3-5-sonnet-20241022',
  [
    {
      role: 'user',
      content: {
        type: 'file',
        mediaType: 'application/pdf',
        file: pdf,  // readFileSync returns a Buffer, use it directly
        fileName: 'document.pdf'
      }
    },
    {
      role: 'user',
      content: 'Summarize the contents of this PDF'
    }
  ],
  { responseFormat: 'text' }
);

Supported media types:

application/pdf: PDF documents
image/png: PNG images
image/jpg: JPG images
image/jpeg: JPEG images
image/gif: GIF images
image/webp: WebP images

Mixing Multiple Content Types

You can send multiple content items (text, images, files) in a single message:

import { readFileSync } from 'fs';

const img1 = readFileSync('./image1.jpg');
const img2 = readFileSync('./image2.jpg');

const response = await aiSuite.createChatCompletion(
  'openai/gpt-4o',
  [
    {
      role: 'user',
      content: [
        {
          type: 'text',
          text: 'Please analyze these images:'
        },
        {
          type: 'image',
          image: img1
        },
        {
          type: 'image',
          image: img2
        },
        {
          type: 'text',
          text: 'What are the differences between them?'
        }
      ]
    }
  ],
  { responseFormat: 'text' }
);

Important Notes

Role Restrictions: Images and files can only be sent in user and developer role messages. Assistant and tool messages only support text content.
Provider Support: Not all providers support all content types. Check your provider’s documentation for specific capabilities.
File Size Limits: Different providers have different file size limits. Consult provider documentation for specifics.
Structured Output: Image and file inputs work with all response formats including json_schema and json_object.