gpt-5.5
GPT-5.5 — OpenAI's frontier reasoning model. OpenAI-compatible /v1/chat/completions with 1M context, 128K max output, advanced reasoning with adjustable effort, and Tool Search for large agent workflows.
GPT-5.5 is OpenAI's frontier reasoning model, exposed through reAPI as a drop-in OpenAI-compatible Chat Completions endpoint. 1M token context, 128K max output, advanced reasoning with adjustable effort, and Tool Search for large agent workflows. Current rates live on the model page and on api.reapi.ai/pricing.
Quick example
curl https://api.reapi.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"group": "default",
"messages": [
{ "role": "user", "content": "Hello" }
],
"stream": true,
"temperature": 0.7,
"top_p": 1,
"frequency_penalty": 0,
"presence_penalty": 0
}'from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.reapi.ai/v1",
)
stream = client.chat.completions.create(
model="gpt-5.5",
messages=[{"role": "user", "content": "Hello"}],
stream=True,
temperature=0.7,
top_p=1,
frequency_penalty=0,
presence_penalty=0,
extra_body={"group": "default"},
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)import OpenAI from "openai";
const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://api.reapi.ai/v1",
});
const stream = await client.chat.completions.create({
model: "gpt-5.5",
messages: [{ role: "user", content: "Hello" }],
stream: true,
temperature: 0.7,
top_p: 1,
frequency_penalty: 0,
presence_penalty: 0,
// `group` is a reAPI-specific extension; pass it via extra body.
// @ts-expect-error — not part of the OpenAI types
group: "default",
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
)
func main() {
body, _ := json.Marshal(map[string]any{
"model": "gpt-5.5",
"group": "default",
"messages": []map[string]string{
{"role": "user", "content": "Hello"},
},
"stream": true,
"temperature": 0.7,
"top_p": 1,
"frequency_penalty": 0,
"presence_penalty": 0,
})
req, _ := http.NewRequest("POST",
"https://api.reapi.ai/v1/chat/completions", bytes.NewReader(body))
req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
req.Header.Set("Content-Type", "application/json")
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
out, _ := io.ReadAll(resp.Body)
fmt.Println(string(out))
}Authentication
Every request needs a Bearer token. The GPT-5.5 chat workspace lives on
the api.reapi.ai platform — sign in there to create a key and top up
tokens.
- Open api.reapi.ai and sign in (or create an account).
- Generate an API key under API Keys.
- Top up tokens under Top Up (pay-as-you-go, billed in USD per 1M tokens — see api.reapi.ai/pricing).
Authorization: Bearer YOUR_API_KEYThe chat surface (api.reapi.ai) is a separate workspace from the
image/video/audio task gateway at reapi.ai/api/v1/*. Keys and balances
do not cross over — a key issued on reapi.ai/settings/apikeys will not
authenticate against api.reapi.ai/v1/chat/completions, and vice versa.
Endpoint
POST https://api.reapi.ai/v1/chat/completionsOpenAI-compatible. The same SDKs (openai-python, openai-node,
openai-go, …) work once the base URL is set to
https://api.reapi.ai/v1.
Request body
model — string, required
Must be "gpt-5.5". The value is echoed back in the response envelope.
messages — array, required
Conversation history as an array of message objects. Same shape as the OpenAI Chat Completions spec:
{
"role": "system" | "user" | "assistant" | "tool",
"content": "string or content-parts array"
}Multi-turn history is sent in chronological order — the last message is the one the model responds to.
stream — boolean, default false
When true, the response is streamed as server-sent events (SSE) with
Content-Type: text/event-stream. Each event is a JSON delta in the
OpenAI format, terminated by a data: [DONE] line. When false, the
full response body is returned in one HTTP response.
temperature — number, default 1
Range 0.0 – 2.0. Sampling temperature. Lower values make output more
deterministic; higher values increase randomness. OpenAI recommends
tuning either temperature or top_p, not both.
top_p — number, default 1
Range 0.0 – 1.0. Nucleus sampling cutoff — restricts sampling to the
smallest set of tokens whose cumulative probability mass exceeds
top_p.
frequency_penalty — number, default 0
Range -2.0 – 2.0. Penalises tokens by how often they've already
appeared in the response so far. Positive values discourage literal
repetition.
presence_penalty — number, default 0
Range -2.0 – 2.0. Penalises tokens that have appeared at all,
regardless of frequency. Positive values encourage the model to talk
about new topics.
group — string, default "default"
reAPI-specific extension. Selects a token group on the gateway, which
routes the request to a specific upstream channel pool. "default" is
the standard pool and covers nearly every workload — you can omit the
field if you don't need custom routing.
Other OpenAI parameters
Every other field on the OpenAI Chat Completions spec — max_tokens,
stop, n, seed, tools, tool_choice, response_format,
logprobs, top_logprobs, user, parallel_tool_calls,
reasoning_effort (none / low / medium / high / xhigh) —
passes through unchanged. The OpenAI SDKs do not need a reAPI-specific
shim.
Response shape
Non-streaming (stream: false)
{
"id": "chatcmpl-018f5a3a1b6e7d9f8c2b4d6e8f0a2c4e",
"object": "chat.completion",
"created": 1735000000,
"model": "gpt-5.5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 9,
"total_tokens": 21
}
}usage.prompt_tokens and usage.completion_tokens are the inputs to
the bill — see api.reapi.ai/pricing for
the live rate card.
Streaming (stream: true)
Content-Type: text/event-stream. Each data: line is a JSON delta:
data: {"id":"chatcmpl-…","object":"chat.completion.chunk","created":1735000000,"model":"gpt-5.5","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-…","object":"chat.completion.chunk","created":1735000000,"model":"gpt-5.5","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: {"id":"chatcmpl-…","object":"chat.completion.chunk","created":1735000000,"model":"gpt-5.5","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]The final event before [DONE] carries the finish_reason
(stop / length / tool_calls / content_filter). Usage stats are
omitted from the stream — call again with stream: false if you need
exact token counts per turn.
Pricing
GPT-5.5 is billed pay-as-you-go in USD against your api.reapi.ai token balance. The live per-1M-token rate card lives on api.reapi.ai/pricing; top up tokens at api.reapi.ai.
The bill for a single call is:
input_cost = prompt_tokens × input_rate / 1,000,000
output_cost = completion_tokens × output_rate / 1,000,000Failed requests are not charged.
Long-context tier (>272K input tokens)
When the input portion of a single request exceeds 272K tokens, the entire request is billed at 2× the input rate and 1.5× the output rate. A request with 270K input tokens stays at the standard rate; a request with 280K input tokens shifts the whole call (input and output) to the long-context rate. See api.reapi.ai/pricing for the resolved per-1M-token numbers in both tiers.
Limits
| Limit | Value |
|---|---|
| Context window | 1M tokens |
| Max output per call | 128K tokens |
| Standard-rate input | ≤ 272K tokens |
Streams that hit the output cap finish with finish_reason: "length";
call again with a continuation message if you need more text.
Errors
The error envelope follows the OpenAI shape — HTTP status, plus a JSON body:
{
"error": {
"message": "...",
"type": "invalid_request_error",
"code": "..."
}
}Common cases:
| Status | When | Notes |
|---|---|---|
400 | Bad request shape, unknown field, etc. | Same shape OpenAI returns |
401 | Missing / invalid API key | Re-issue a key at api.reapi.ai |
402 | Insufficient balance | Top up at api.reapi.ai |
429 | Per-group rate limit hit | Back off, or move to a different group |
500 | Upstream / gateway error | Safe to retry — failed calls are not charged |
api.reapi.ai does not internally retry chat requests. Every customer call maps to exactly one upstream POST. If a network error reaches you, that's a one-for-one wire failure and a retry from your side is safe; the upstream provider may have already produced output, but the gateway will not double-bill.
Recipes
Minimum request
{
"model": "gpt-5.5",
"messages": [
{ "role": "user", "content": "Summarise the OpenAI Chat Completions spec in three sentences." }
]
}Full parameter set
{
"model": "gpt-5.5",
"group": "default",
"messages": [
{ "role": "system", "content": "You are a senior staff engineer." },
{ "role": "user", "content": "Walk me through a 1M-token codebase review strategy." }
],
"stream": true,
"temperature": 0.7,
"top_p": 1,
"frequency_penalty": 0,
"presence_penalty": 0
}Tool use (function calling)
{
"model": "gpt-5.5",
"messages": [
{ "role": "user", "content": "What's the weather in Tokyo today?" }
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Look up the current weather for a city.",
"parameters": {
"type": "object",
"properties": { "city": { "type": "string" } },
"required": ["city"]
}
}
}
],
"tool_choice": "auto"
}Reasoning effort
{
"model": "gpt-5.5",
"reasoning_effort": "high",
"messages": [
{ "role": "user", "content": "Prove that the sum of the first n odd numbers is n^2." }
]
}reasoning_effort accepts none / low / medium / high / xhigh
— pick the lowest level that still produces correct output for the
workload to keep latency and token spend down.
Tips
- Stream by default for chat UX. Streaming responses cut perceived latency dramatically and let your UI render tokens as they're produced.
- Watch the long-context boundary. Splitting a 300K-token prompt into a 270K turn and a follow-up keeps you on the standard rate rather than paying the 2× / 1.5× long-context premium.
- Tune
temperatureortop_p, not both. Mixing them tends to produce results that are hard to reason about. reasoning_effort: highis the right default for agents. Reservexhighfor the genuinely hard turns — it adds latency and token spend.- Drop
frequency_penaltyandpresence_penaltyfirst when debugging weird output. Non-zero values can introduce artefacts that look like model bugs.