Q: What does "failed request not charged" mean?

This means a request failed (e.g. HTTP status >= 400, upstream 503, timeout, or no valid output) but the raw quota did not decrease after 10 seconds. This is a normal result: the provider did not ultimately charge you for the failed request. The station may have pre-charged and then refunded, or simply did not charge for the failed request in the first place.

Question 1

What is AI API Doctor?

Accepted Answer

AI API Doctor is a local-first relay API black-box check tool for OpenAI-compatible API users.

It helps you check whether API Keys, Base URLs, model permissions, group configurations, chat/completions interfaces, raw quota changes, and client configurations are working correctly.

It is suitable for the following scenarios:

API Key is filled in but the client cannot use it
Base URL is uncertain
Model list is visible but actual requests fail
Errors 401 / 403 / 404 / 429 appear
Relay station reports "no access to a certain group"
Want to verify whether usage is returned for a request
Want to export Cline / Continue / Cherry Studio configurations

AI API Doctor is not a model authenticity verification tool, nor is it a legal audit tool.

Question 2

Does AI API Doctor upload my API Key?

Accepted Answer

No. The Chrome extension stores your API Key locally in your browser's chrome.storage.local.

AI API Doctor does not proactively upload your API Key to any third-party servers.

Diagnostic requests are sent only to the Base URL you currently have selected. For example, if you selected a custom relay station, the diagnostic request will be sent to that relay station's API address.

It is recommended to use a test-only API Key and not a production key.

Question 3

What is raw quota?

Accepted Answer

Raw quota is the raw quota value recorded by the New API / One API backend, typically more granular than the frontend balance display.

The frontend balance often only shows two decimal places, so a change as small as $0.0001 may not be visible.

Raw quota lets you observe much finer-grained quota changes and is the primary evidence used by AI API Doctor for billing anomaly detection.

Question 4

Why does the diagnosis wait 10 seconds?

Accepted Answer

Some relay stations pre-charge quota at the start of a request, then settle and refund the difference after the request completes.

The 10-second wait is necessary to distinguish between a temporary pre-charge and a final deduction.

If the quota is restored after 10 seconds, the request likely had a pre-charge that was subsequently refunded. If the quota remains lower after 10 seconds, it may indicate a billing anomaly.

Question 5

Does the diagnosis consume credits?

Accepted Answer

Yes. AI API Doctor sends a small number of real API requests to confirm whether your Base URL, API Key, model permissions, and chat/completions are working.

Basic diagnostics typically require only 1 to 3 requests, consuming a minimal amount.

For safety, it is recommended to:

Use a test-only API Key
Set a low credit limit
Not use a production key
Compare results against the provider's backend billing after diagnosis

Question 6

What does "failed request not charged" mean?

Accepted Answer

This means a request failed (e.g. HTTP status ≥ 400, upstream 503, timeout, or no valid output) but the raw quota did not decrease after 10 seconds.

This is a normal result: the provider did not ultimately charge you for the failed request.

The station may have pre-charged and then refunded, or simply did not charge for the failed request in the first place.

Question 7

What does "billing anomaly" mean?

Accepted Answer

A billing anomaly indicates that a request produced no valid output (failed request, empty reply, timeout, or invalid model) but the raw quota decreased after 10 seconds — meaning the provider ultimately deducted quota.

This is a reproducible signal. It shows that the station charged for a request that did not produce valid output.

The report generated by AI API Doctor can be shared with the provider's support team for verification.

Note: This is diagnostic evidence, not a legal audit report.

Question 8

Why can't the web version read raw quota automatically?

Accepted Answer

The web version does not require you to enter an API Key and runs entirely in the browser.

It cannot automatically access the New API / One API raw quota endpoint the way the Chrome extension can.

The web version is designed for users who want to manually enter raw quota data they have collected (e.g. from the provider's dashboard) to generate a shareable diagnostic report.

For automatic raw quota reading, use the Chrome extension.

Question 9

Does AI API Doctor prove intentional overbilling?

Accepted Answer

No. AI API Doctor can help you check usage information from individual requests, but it cannot prove that a provider intentionally overbilled. It can detect whether the response returns a usage field, whether total_tokens is abnormally high, and whether a short request shows significantly abnormal token consumption. However, final balances, deductions, and billing are controlled by the provider's backend. AI API Doctor cannot directly access all providers' real billing systems. Its conclusions should be understood as 'usage signal verification', not a financial audit in any legal sense.

Question 10

How should I send a report to my API provider?

Accepted Answer

AI API Doctor-generated reports automatically hide full API Keys, displaying only a desensitized format such as:

sk-****abcd

You can share the report with the provider's owner or support team to explain:

Base URL
Model ID
Error code
Provider-returned information
Failed step details
Usage situation

Recommended steps:

Use a short prompt for testing to reduce per-test cost
Record multiple diagnostic reports to check result consistency
Contact the owner or support team for verification
Compare against the provider's official billing

Always ensure you do not manually send a full API Key, account password, or sensitive balance screenshot to strangers.

Question 11

What is the difference between the Chrome extension and the web version?

Accepted Answer

The Chrome extension can automatically read the New API / One API raw quota without manual entry, making it suitable for precise desktop forensics. The web version does not require an API Key to be entered and is better suited for manual report generation, mobile sharing, and customer support communication. Both versions generate shareable diagnostic reports.

Question 12

What error codes does AI API Doctor detect?

Accepted Answer

AI API Doctor detects and reports on common API error codes: 401 (invalid or expired API Key, possible extra spaces when copying), 403 (insufficient permissions — possibly no model access, group access, IP whitelist restriction, or model not added to group), 404 (incorrect endpoint address — possibly missing /v1, extra /v1, or filled with the official website address), 429 (too many requests, concurrent limit exceeded, quota exhausted, or provider rate limiting), HTML response (server returned a webpage instead of API JSON — possibly filled with the site homepage, login page, Cloudflare page, or provider does not support this endpoint), and No usage (response did not return a usage field, so token consumption cannot be verified from this response).

Question 13

Why can't the web version directly test some Base URLs?

Accepted Answer

Browser CORS (Cross-Origin Resource Sharing) security policies block web pages from directly reading responses from third-party APIs. When your Base URL is on a different domain from the web page, the browser will refuse to read the response.

This does not mean the API is unavailable. Common solutions:

Use the Chrome extension to read New API / One API raw quota
Switch to manual report mode and fill in raw quota data
Contact the provider to allow cross-origin debugging

Question 14

Does cache hit detection consume credits?

Accepted Answer

Yes. Cache hit detection sends two long test requests (1200-1500 tokens of fixed text) to observe whether the second request hits the cache. Enable only when needed, using a test-only API Key.

Question 15

What is cached_tokens?

Accepted Answer

cached_tokens (or cache_read_input_tokens) indicates how many input tokens in the current request hit the cache. Cache hits typically mean lower latency and lower input cost. Different providers have different cache support and discount rules. Actual billing follows the provider's published rates.

Question 16

Why is usage integrity important?

Accepted Answer

If a response does not return prompt_tokens, completion_tokens, total_tokens, or cache fields, users cannot easily verify whether the theoretical cost matches the actual deduction. Usage integrity checks verify these fields are present. Missing or incomplete fields are flagged for provider verification.

Question 17

Does this report prove intentional overbilling?

Accepted Answer

No. Reports only show reproducible technical signals suitable for communication with the provider or support team. Final balances, deductions, and billing are controlled by the provider's backend. AI API Doctor cannot directly access all providers' billing systems. Conclusions should be understood as usage signal verification, not a financial audit.

Question 18

Does AI API Doctor recommend relay stations?

Accepted Answer

No. AI API Doctor is a neutral diagnostic tool. It does not recommend or rank any relay providers. Reports show reproducible signals from a local test and do not prove intent.

Question 19

Why explain pre-deduction billing?

Accepted Answer

Many AI API services pre-charge quota at the start of a request and settle after based on actual usage.

If failed requests, empty replies, or upstream errors are not handled correctly, billing anomalies may occur that are difficult for users to notice.

Feature	AI API Doctor (Chrome Extension)	Web Version
Read New API raw quota automatically	Yes	No (manual entry required)
Requires API Key on website	Yes, stored locally	No
Auto-run failed-request billing check	Yes	No (manual report only)
Generate shareable report	Yes	Yes
Suitable for support communication	Yes	Yes
Suitable for precise desktop forensics	Yes	Limited
Suitable for mobile sharing	No	Yes

AI API Doctor FAQ

Chrome Extension Under Review

Diagnostic Report Example

About the Project

Start Diagnosing