FAQ

AI API Doctor FAQ

Understand the tool's capabilities, privacy policy, how relay API scorecards work, and how raw quota evidence is used.

API Key Stored Locally No Model Authenticity Claims Diagnosis Consumes Minimal Credits Supports Custom Base URLs
Why Do You Need This Tool

Many AI API relay providers use a pre-charge mechanism. When a request fails, returns empty, encounters a 503 from upstream, times out, or hits cache, it becomes hard for users to tell if the charge was fair.

AI API Doctor helps by producing reproducible signals that users and providers can discuss based on evidence:

  • Check whether failed requests ultimately deduct quota
  • Check whether empty replies are charged
  • Check usage integrity issues
  • Check cache billing anomalies
  • Detect model performance anomalies

The goal is not to convict any site — it's to turn billing, usage, cache, and model performance into a reproducible diagnostic report.

Some AI API relay stations pre-charge quota at the start of a request, then settle and refund the difference after the request completes.

This mechanism itself is fine. But if failed requests, empty replies, upstream 503 errors, invalid models, or cache hits are not handled well, users may find that "they got no valid response but their quota changed".

A failed request charging anomaly signal is indicated when:

  • HTTP status code ≥ 400, or upstream returns 503 / 502 / 504 / timeout
  • No valid output (completion_tokens = 0 or missing)
  • Raw quota still decreased after 10 seconds

Note: The 10-second wait distinguishes between a "temporary pre-charge" and a "final deduction". If the quota is restored after 10 seconds, the pre-charge was likely refunded.

An empty reply charging anomaly signal is indicated when:

  • HTTP status code is 200
  • Visible output is empty
  • completion_tokens = 0 or missing
  • No tool call / image / audio / search or other valid output
  • Raw quota still decreased after 10 seconds

If a response does not return prompt_tokens, completion_tokens, total_tokens, or cache fields, users cannot easily verify whether the theoretical cost matches the actual deduction.

Usage integrity checks verify the response includes:

  • prompt_tokens (input token count)
  • completion_tokens / output_tokens (output token count)
  • total_tokens (total token count)
  • prompt_tokens_details.cached_tokens (cache hit count)

Missing or incomplete fields are flagged as "usage incomplete" — contact the provider for verification.

cached_tokens (or cache_read_input_tokens) indicates how many input tokens in the current request hit the cache.

Cache hits typically mean:

  • Lower latency (reading from cache instead of recomputing)
  • Lower input cost (some providers charge discounted rates for cache reads)

Different providers have different support for cache fields and discount rules. Actual billing follows the provider's published rates.

Model performance detection is a lightweight capability test covering 5 dimensions:

  • Instruction following
  • Basic reasoning
  • Number traps (e.g. which is larger: 9.11 or 9.9)
  • Code understanding
  • Context retention

It is not an official IQ test and cannot prove model authenticity. It is only used to detect obvious degradation or performance anomalies.

Results are part of the Model Performance Score, not the sole judgment criterion.

Web version:

  • Good for quick checks
  • No API Key entry required on the website
  • Supports manual report mode — fill in raw quota data to generate a report
  • Good for mobile sharing and customer support communication

Chrome extension (under review):

  • Can automatically read New API / One API raw quota — no manual entry needed
  • Good for precise desktop forensics
  • API Key stored locally in your browser, never uploaded to third parties
  • A download link will be shared once the review passes

No.

AI API Doctor does not recommend or rank any relay providers.

Reports show reproducible signals from this test and do not prove intent.

No.

AI API Doctor can help verify:

  • Whether the response returns complete usage information
  • Whether failed requests ultimately deduct quota
  • Whether empty replies are charged
  • Whether cache is billed correctly

Final balances, deductions, and billing are controlled by the provider's backend. AI API Doctor cannot directly access all providers' billing systems.

Conclusions should be understood as "usage signal verification", not a financial audit.

AI API Doctor does not recommend or rank any relay providers. The report only shows reproducible signals from this test, and does not prove intentional overcharging by the provider.

Basic Introduction

AI API Doctor is a local-first relay API black-box check tool for OpenAI-compatible API users.

It helps you check whether API Keys, Base URLs, model permissions, group configurations, chat/completions interfaces, raw quota changes, and client configurations are working correctly.

It is suitable for the following scenarios:

  • API Key is filled in but the client cannot use it
  • Base URL is uncertain
  • Model list is visible but actual requests fail
  • Errors 401 / 403 / 404 / 429 appear
  • Relay station reports "no access to a certain group"
  • Want to verify whether usage is returned for a request
  • Want to export Cline / Continue / Cherry Studio configurations

AI API Doctor is not a model authenticity verification tool, nor is it a legal audit tool.

Privacy & API Key Safety

No. The Chrome extension stores your API Key locally in your browser's chrome.storage.local.

AI API Doctor does not proactively upload your API Key to any third-party servers.

Diagnostic requests are sent only to the Base URL you currently have selected. For example, if you selected a custom relay station, the diagnostic request will be sent to that relay station's API address.

It is recommended to use a test-only API Key and not a production key.

Raw quota is the raw quota value recorded by the New API / One API backend, typically more granular than the frontend balance display.

The frontend balance often only shows two decimal places, so a change as small as $0.0001 may not be visible.

Raw quota lets you observe much finer-grained quota changes and is the primary evidence used by AI API Doctor for billing anomaly detection.

Some relay stations pre-charge quota at the start of a request, then settle and refund the difference after the request completes.

The 10-second wait is necessary to distinguish between a temporary pre-charge and a final deduction.

If the quota is restored after 10 seconds, the request likely had a pre-charge that was subsequently refunded. If the quota remains lower after 10 seconds, it may indicate a billing anomaly.

Yes. AI API Doctor sends a small number of real API requests to confirm whether your Base URL, API Key, model permissions, and chat/completions are working.

Basic diagnostics typically require only 1 to 3 requests, consuming a minimal amount.

For safety, it is recommended to:

  • Use a test-only API Key
  • Set a low credit limit
  • Not use a production key
  • Compare results against the provider's backend billing after diagnosis

This means a request failed (e.g. HTTP status ≥ 400, upstream 503, timeout, or no valid output) but the raw quota did not decrease after 10 seconds.

This is a normal result: the provider did not ultimately charge you for the failed request.

The station may have pre-charged and then refunded, or simply did not charge for the failed request in the first place.

A billing anomaly indicates that a request produced no valid output (failed request, empty reply, timeout, or invalid model) but the raw quota decreased after 10 seconds — meaning the provider ultimately deducted quota.

This is a reproducible signal. It shows that the station charged for a request that did not produce valid output.

The report generated by AI API Doctor can be shared with the provider's support team for verification.

Note: This is diagnostic evidence, not a legal audit report.

The web version does not require you to enter an API Key and runs entirely in the browser.

It cannot automatically access the New API / One API raw quota endpoint the way the Chrome extension can.

The web version is designed for users who want to manually enter raw quota data they have collected (e.g. from the provider's dashboard) to generate a shareable diagnostic report.

For automatic raw quota reading, use the Chrome extension.

No. AI API Doctor can help you check usage information from individual requests, but it cannot prove that a provider intentionally overbilled.

It can detect:

  • Whether the response returns a usage field
  • Whether total_tokens is abnormally high
  • Whether a short request shows significantly abnormal token consumption

Final balances, deductions, and billing are controlled by the provider's backend. AI API Doctor cannot directly access all providers' real billing systems.

Its conclusions should be understood as "usage signal verification," not a financial audit in any legal sense.

AI API Doctor-generated reports automatically hide full API Keys, displaying only a desensitized format such as:

sk-****abcd

You can share the report with the provider's owner or support team to explain:

  • Base URL
  • Model ID
  • Error code
  • Provider-returned information
  • Failed step details
  • Usage situation

Recommended steps:

  • Use a short prompt for testing to reduce per-test cost
  • Record multiple diagnostic reports to check result consistency
  • Contact the owner or support team for verification
  • Compare against the provider's official billing

Always ensure you do not manually send a full API Key, account password, or sensitive balance screenshot to strangers.

Common Error Codes

401: Invalid or expired API Key, or extra spaces when copying.

403: Insufficient permissions — possibly no model access, group access, IP whitelist restriction, or model not added to the group.

404: Incorrect endpoint address — possibly missing /v1, extra /v1, or filled with the official website address.

429: Too many requests, concurrent limit exceeded, quota exhausted, or provider rate limiting.

HTML response: Server returned a webpage instead of API JSON — possibly filled with the site homepage, login page, Cloudflare page, or provider does not support this endpoint.

No usage: Response did not return a usage field, so token consumption cannot be verified from this response.

Web Tool & Usage

Browser CORS (Cross-Origin Resource Sharing) security policies block web pages from directly reading responses from third-party APIs. When your Base URL is on a different domain from the web page, the browser will refuse to read the response.

This does not mean the API is unavailable. Common solutions:

  • Use the Chrome extension to read New API / One API raw quota
  • Switch to manual report mode and fill in raw quota data
  • Contact the provider to allow cross-origin debugging

Yes. Cache hit detection sends two long test requests (1200-1500 tokens of fixed text) to observe whether the second request hits the cache.

Enable cache detection only when needed. Before enabling, make sure:

  • You are using a test-only API Key
  • A low credit limit is set
  • You understand the provider's cache billing rules

cached_tokens (or cache_read_input_tokens) indicates how many input tokens in the current request hit the cache.

Cache hits typically mean:

  • Lower latency (reading from cache instead of recomputing)
  • Lower input cost (some providers charge discounted rates for cache reads)

Different providers have different support for cache fields and discount rules. Actual billing follows the provider's published rates.

If a response does not return prompt_tokens, completion_tokens, total_tokens, or cache fields, users cannot easily verify whether the theoretical cost matches the actual deduction.

Usage integrity checks verify whether the response includes:

  • prompt_tokens (input token count)
  • completion_tokens / output_tokens (output token count)
  • total_tokens (total token count)
  • prompt_tokens_details.cached_tokens (cache hit count)

If these fields are missing or incomplete, AI API Doctor flags it as "usage incomplete" — contact the provider for verification.

No. Reports only show reproducible technical signals and are suitable for communication with the provider or support team for troubleshooting.

AI API Doctor can help verify:

  • Whether the response returns complete usage information
  • Whether failed requests ultimately deduct quota
  • Whether empty replies are charged
  • Whether cache is billed correctly

Final balances, deductions, and billing are controlled by the provider's backend. AI API Doctor cannot directly access all providers' billing systems. Conclusions should be understood as "usage signal verification," not a financial audit.

Chrome Extension Status

Chrome Extension Under Review

Feature AI API Doctor (Chrome Extension) Web Version
Read New API raw quota automatically Yes No (manual entry required)
Requires API Key on website Yes, stored locally No
Auto-run failed-request billing check Yes No (manual report only)
Generate shareable report Yes Yes
Suitable for support communication Yes Yes
Suitable for precise desktop forensics Yes Limited
Suitable for mobile sharing No Yes
The Chrome extension is currently under review. A download link will be available once the review passes. The web version supports manual report generation and is suitable for customer support communication.

Diagnostic Report Example

AI API Doctor-generated reports are suitable for sending to relay station owners or customer support. Reports automatically hide full API Keys.

AI API Doctor Report ──────────────────────────────── Provider: Example Station Base URL: https://api.example.com/v1 API Key: sk-****abcd Model: claude-opus-4.7 Time: 2026-05-11 10:30 Result: 5 / 7 checks passed Failed Step: Chat Completion HTTP Status: 403 Provider: No access to GPT official group Suggestion: Please check Key group, model group, and channel permissions. ──────────────────────────────── Note: This report does not prove intentional overbilling. It only shows configuration and usage signals from this test request.

About the Project

AI API Doctor is initiated and publicly built by @norike0718.

The author is an independent developer who has long built SaaS and AI tools, focused on content workflows, AI API configuration, automation pipelines, and developer tool experience.

AI API Doctor's goal is not to "judge" relay stations but to help users diagnose API Key, Base URL, model permissions, token usage, and client configuration issues in real-world usage scenarios.

The project strives to keep diagnostic logic transparent, boundaries clear, and to prioritize avoiding false positives against legitimate providers.

No. AI API Doctor is a neutral diagnostic tool. It does not recommend or rank any relay providers.

Reports show reproducible signals from a local test and do not prove intent.

Many AI API services pre-charge quota at the start of a request and settle after based on actual usage.

If failed requests, empty replies, or upstream errors are not handled correctly, billing anomalies may occur that are difficult for users to notice.

Model Sanity Test

It is a lightweight model performance test, covering instruction following, basic reasoning, number traps, code understanding, and context retention.

It is not an official IQ test and cannot prove model authenticity. It is only used to detect obvious degradation or performance anomalies.

To enable: check "Run Model Sanity Test" in the advanced settings of the Diagnostic Lab.

This type of question tends to spark debate and can expose number-trap reasoning issues in some models.

However, a single question cannot represent a model's overall capability. AI API Doctor includes it as part of the Model Sanity Score rather than as the sole judgment.

The Model Sanity Score is calculated from 5 tests: instruction following, basic reasoning, number trap, code understanding, and context retention.

No.

AI API Doctor is a neutral tool. It does not require you to disclose the provider name, nor does it recommend or rank any relay station.

When sharing your score, you only need to report your health score, model sanity score, and one-line finding — no information about the provider is required.

No.

You can use the web version only. No Chrome extension installation required. No positive review required. No forced social media sharing required.

Just generate a diagnostic report when needed and post a one-line score if you want to participate in the challenge.

Start Diagnosing

Run a local diagnostic test to check whether your API Key, Base URL, and model configuration are working correctly.