1. What Is Pre-Deduction Billing?
Some AI API relay stations pre-charge quota at the start of a request, then settle and refund the difference after the request completes.
This mechanism itself is fine. The quota change you see during a request is not necessarily the final amount. The station may pre-charge $0.001 and then refund $0.0008 if the actual usage was cheaper.
Problems arise when failed requests or empty replies are not refunded — in that case, users see their quota decrease without receiving valid output.
2. What Is Raw Quota?
Raw quota is the granular quota value stored by the New API / One API backend — far more precise than the frontend balance display.
AI API Doctor uses raw quota as the primary evidence for billing anomaly detection. The Chrome extension (under review) can read raw quota automatically. The web version supports manual entry.
3. Why Wait 10 Seconds?
The 10-second wait is the core of the detection logic. It distinguishes:
- Temporary pre-charge: Quota was deducted during the request but refunded within ~10 seconds. This is normal.
- Final deduction: Quota is still lower after 10 seconds. This is a billing anomaly signal.
Some stations settle within 3 seconds; others take longer. AI API Doctor uses 10 seconds as a conservative default.
4. Failed Request Charges
A failed request billing anomaly is indicated when all of the following are true:
- HTTP status ≥ 400, or upstream returns 503 / 502 / 504 / timeout
- No valid output (completion_tokens = 0 or missing)
- No tool_call / image / audio / search output
- Raw quota still decreased after 10 seconds
This means the station charged for a request that produced no valid result.
5. Empty Reply Charges
An empty reply billing anomaly is indicated when all of the following are true:
- HTTP status is 200
- Visible output is empty
- completion_tokens = 0 or missing
- No tool_call / image / audio / search output
- Raw quota still decreased after 10 seconds
This means the station charged for a request that returned no usable output.
6. Usage Integrity
If a response does not return prompt_tokens, completion_tokens, total_tokens, or cache fields, you cannot verify whether the theoretical cost matches the actual deduction.
AI API Doctor flags responses as "usage incomplete" when these fields are missing or partial. Contact the provider for verification.
7. cached_tokens / cache_read_input_tokens
cached_tokens shows how many input tokens were served from cache instead of being recomputed. Cache hits typically mean lower latency and lower input cost.
Different providers handle cache fields and discount rates differently. Actual billing follows the provider's published rates.
8. How to Read Reports
AI API Doctor reports show five possible results:
9. Quick Test Steps
If you want to run this test manually without AI API Doctor:
- Record your raw quota value before the test
- Send a request expected to fail (e.g., a non-existent model or trigger a 503)
- Wait 10 seconds
- Record the raw quota value again
- Compare: if quota decreased without valid output, this is a billing anomaly signal
10. How to Share Reports
AI API Doctor reports automatically sanitize API Keys, showing only sk-****abcd. You can share the report with the provider's support team.
Steps:
- Use a short prompt to minimize per-test cost
- Run the test multiple times to check result consistency
- Share the report text (not a screenshot with your full balance)
- Compare against the provider's official billing
11. Owner-Side Fix Reference
If you operate a relay station, failed requests and empty replies should not result in final deductions. A reference fix:
12. Safety Boundaries
AI API Doctor only shows reproducible signals from a single test. It does not prove intentional overbilling by the provider. Reports are best suited for:
- Troubleshooting with the provider's support team
- Owner-side diagnosis and fix verification
- Community comparison of billing behavior across stations
It is not a financial audit and should not be used as legal evidence.
AI API Doctor does not recommend or rank any relay providers. Reports only show reproducible signals — they do not prove intent.