How AI API Relay Black-box Check Works

Read raw quota before and after test requests to verify billing integrity. Understand pre-deduction, empty-reply charges, missing usage, and relay API scorecards.

1. What Is Pre-Deduction Billing?

Some AI API relay stations pre-charge quota at the start of a request, then settle and refund the difference after the request completes.

This mechanism itself is fine. The quota change you see during a request is not necessarily the final amount. The station may pre-charge $0.001 and then refund $0.0008 if the actual usage was cheaper.

Problems arise when failed requests or empty replies are not refunded — in that case, users see their quota decrease without receiving valid output.

The frontend balance may only show two decimal places. A change of $0.0001 may be invisible on the page but observable in raw quota.

2. What Is Raw Quota?

Raw quota is the granular quota value stored by the New API / One API backend — far more precise than the frontend balance display.

AI API Doctor uses raw quota as the primary evidence for billing anomaly detection. The Chrome extension (under review) can read raw quota automatically. The web version supports manual entry.

3. Why Wait 10 Seconds?

The 10-second wait is the core of the detection logic. It distinguishes:

Some stations settle within 3 seconds; others take longer. AI API Doctor uses 10 seconds as a conservative default.

If quota is still not restored after 10 seconds, record the final value and compare with the pre-test baseline.

4. Failed Request Charges

A failed request billing anomaly is indicated when all of the following are true:

This means the station charged for a request that produced no valid result.

5. Empty Reply Charges

An empty reply billing anomaly is indicated when all of the following are true:

This means the station charged for a request that returned no usable output.

6. Usage Integrity

If a response does not return prompt_tokens, completion_tokens, total_tokens, or cache fields, you cannot verify whether the theoretical cost matches the actual deduction.

AI API Doctor flags responses as "usage incomplete" when these fields are missing or partial. Contact the provider for verification.

7. cached_tokens / cache_read_input_tokens

cached_tokens shows how many input tokens were served from cache instead of being recomputed. Cache hits typically mean lower latency and lower input cost.

Different providers handle cache fields and discount rates differently. Actual billing follows the provider's published rates.

8. How to Read Reports

AI API Doctor reports show five possible results:

Normal
Failed request not charged
Request failed but raw quota was not deducted.
Normal
Pre-charge refunded
Quota was pre-deducted but fully refunded within 10 seconds.
Warning
Cannot read raw quota
Raw quota unavailable — report is for reference only.
Anomaly
Failed request charged
Request failed with no valid output but quota was deducted.

9. Quick Test Steps

If you want to run this test manually without AI API Doctor:

  1. Record your raw quota value before the test
  2. Send a request expected to fail (e.g., a non-existent model or trigger a 503)
  3. Wait 10 seconds
  4. Record the raw quota value again
  5. Compare: if quota decreased without valid output, this is a billing anomaly signal

10. How to Share Reports

AI API Doctor reports automatically sanitize API Keys, showing only sk-****abcd. You can share the report with the provider's support team.

Steps:

Never send your full API Key, account password, or balance screenshots to strangers. Reports are auto-sanitized but always verify the text before sharing publicly.

11. Owner-Side Fix Reference

If you operate a relay station, failed requests and empty replies should not result in final deductions. A reference fix:

if (completion_tokens == 0 && visible_output == "" && !has_tool_call && !has_image && !has_audio) { actual_cost = 0 refund_precharge() log("empty_response_no_charge") }

12. Safety Boundaries

AI API Doctor only shows reproducible signals from a single test. It does not prove intentional overbilling by the provider. Reports are best suited for:

It is not a financial audit and should not be used as legal evidence.

AI API Doctor does not recommend or rank any relay providers. Reports only show reproducible signals — they do not prove intent.