1. What Is Pre-Deduction Billing?

Some AI API relay stations pre-charge quota at the start of a request, then settle and refund the difference after the request completes.

This mechanism itself is fine. The quota change you see during a request is not necessarily the final amount. The station may pre-charge $0.001 and then refund $0.0008 if the actual usage was cheaper.

Problems arise when failed requests or empty replies are not refunded — in that case, users see their quota decrease without receiving valid output.

The frontend balance may only show two decimal places. A change of $0.0001 may be invisible on the page but observable in raw quota.

2. What Is Raw Quota?

Raw quota is the granular quota value stored by the New API / One API backend — far more precise than the frontend balance display.

AI API Doctor uses raw quota as the primary evidence for billing anomaly detection. The Chrome extension (under review) can read raw quota automatically. The web version supports manual entry.

3. Why Wait 10 Seconds?

The 10-second wait is the core of the detection logic. It distinguishes:

Temporary pre-charge: Quota was deducted during the request but refunded within ~10 seconds. This is normal.
Final deduction: Quota is still lower after 10 seconds. This is a billing anomaly signal.

Some stations settle within 3 seconds; others take longer. AI API Doctor uses 10 seconds as a conservative default.

If quota is still not restored after 10 seconds, record the final value and compare with the pre-test baseline.

4. Failed Request Charges

A failed request billing anomaly is indicated when all of the following are true:

HTTP status ≥ 400, or upstream returns 503 / 502 / 504 / timeout
No valid output (completion_tokens = 0 or missing)
No tool_call / image / audio / search output
Raw quota still decreased after 10 seconds

This means the station charged for a request that produced no valid result.

5. Empty Reply Charges

An empty reply billing anomaly is indicated when all of the following are true:

HTTP status is 200
Visible output is empty
completion_tokens = 0 or missing
No tool_call / image / audio / search output
Raw quota still decreased after 10 seconds

This means the station charged for a request that returned no usable output.

6. Usage Integrity

If a response does not return prompt_tokens, completion_tokens, total_tokens, or cache fields, you cannot verify whether the theoretical cost matches the actual deduction.

AI API Doctor flags responses as "usage incomplete" when these fields are missing or partial. Contact the provider for verification.

7. cached_tokens / cache_read_input_tokens

cached_tokens shows how many input tokens were served from cache instead of being recomputed. Cache hits typically mean lower latency and lower input cost.

Different providers handle cache fields and discount rates differently. Actual billing follows the provider's published rates.

8. How to Read Reports

AI API Doctor reports show five possible results:

Normal

Failed request not charged

Request failed but raw quota was not deducted.

Normal

Pre-charge refunded

Quota was pre-deducted but fully refunded within 10 seconds.

Warning

Cannot read raw quota

Raw quota unavailable — report is for reference only.

Anomaly

Failed request charged

Request failed with no valid output but quota was deducted.

9. Quick Test Steps

If you want to run this test manually without AI API Doctor:

Record your raw quota value before the test
Send a request expected to fail (e.g., a non-existent model or trigger a 503)
Wait 10 seconds
Record the raw quota value again
Compare: if quota decreased without valid output, this is a billing anomaly signal

10. How to Share Reports

AI API Doctor reports automatically sanitize API Keys, showing only sk-****abcd. You can share the report with the provider's support team.

Steps:

Use a short prompt to minimize per-test cost
Run the test multiple times to check result consistency
Share the report text (not a screenshot with your full balance)
Compare against the provider's official billing

Never send your full API Key, account password, or balance screenshots to strangers. Reports are auto-sanitized but always verify the text before sharing publicly.

11. Owner-Side Fix Reference

If you operate a relay station, failed requests and empty replies should not result in final deductions. A reference fix:

if (completion_tokens == 0 && visible_output == "" && !has_tool_call && !has_image && !has_audio) { actual_cost = 0 refund_precharge() log("empty_response_no_charge") }

12. Safety Boundaries

AI API Doctor only shows reproducible signals from a single test. It does not prove intentional overbilling by the provider. Reports are best suited for:

Troubleshooting with the provider's support team
Owner-side diagnosis and fix verification
Community comparison of billing behavior across stations

It is not a financial audit and should not be used as legal evidence.

AI API Doctor does not recommend or rank any relay providers. Reports only show reproducible signals — they do not prove intent.

Start Diagnosis View FAQ

How AI API Relay Black-box Check Works