Advice for Operating a Public-Facing API

I've been operating Pushover's public-facing API for over a decade now and I thought I'd pass on some advice for those creating a new API.

Pushover's API might be unusual in that it is used by a wide range of devices (embedded IoT things, legacy servers, security cameras, etc.) and HTTP libraries, rather than mostly being accessed from JavaScript in the latest web browsers. It also doesn't process sensitive financial information, so the advice given here may not be applicable to something operating like Stripe's API.

Host the API on its own hostname

Serve your API at api.example.com, never at example.com/api. As your API's usage grows, it will expand beyond your website/dashboard server and need to move to a separate server or many separate servers. You'll want to be able to move things around by just pointing its IP somewhere else rather than trying to proxy things from your dashboard server.

Your API may also have more relaxed security restrictions in terms of TLS versions and ciphers accepted that you don't want to relax on your dashboard website that handles sensitive information. Having your API at its own hostname means it can have its own TLS certificate and TLS restrictions.

Also, when it comes to blocking bots and poorly-written clients, a user should still be able to reach your main website for support even if their IP is blocked from reaching your API server.

Don't be too liberal in what you accept

Accepting a slightly non-conforming API request today from someone's ESP8266 buried in a forest might mean you'll have to keep accepting those same non-conforming requests being made years later, often at the cost of having to implement workarounds in your web framework or server code as it gets upgraded.

A lot of users will hack something together until it works with your API and when it suddenly stops working months or years down the road, you're going to have to deal with the fallout. Rather than bending over backwards trying to support poorly written code, don't let their bad code function properly in the first place so it doesn't get deployed.

Though you don't have to be pedantic about it. Pushover's API has a message size limitation of 1,024 characters. If the message parameter is larger than that, I could reject the request because it's not correct, but then the user's message is lost and they may not have any error handling. In this case, I truncate the message to 1,024 characters and process it anyway (assuming it wasn't so large that it hit the web server's request message size limit). The user still receives something, and if they care that it's truncated, they can properly implement continuation or smarter truncation.

Avoid OAuth if you can

It's a confusing protocol that brings its own security problems and it introduces a lot of overhead for your users to get up and running. With OAuth your API can't be used from a simple curl request but has to be a custom multi-step process pulling in a whole OAuth library.

Use static API tokens if you can, but make it easy to rotate them. If possible, avoid using an authentication mechanism that requires custom HTTP headers (including basic auth) because some esoteric devices and plugins don't support them. Some don't even support HTTP POST properly and will only be able to put form parameters in the URL query string (though you still need to insist on requiring a proper POST method, don't be an animal).

Remember, not everyone is going to be sending HTTP requests from the ideal code you would have written, they're using what they have available.

Log a unique id with every request

This is probably done by your web framework, but if not, generate a unique ID or UUID with every request, return it to the user in the message body somewhere, log it, and ask for it on your support form.

This will make your life much easier in the future when you need to track down requests in log files and correlate them to a user's support request. Often times the user won't know the IP they're sending API requests from or their parameters aren't getting parsed properly so you can't search your logs for their API token.

Side note: you probably don't need a fancy centralized logging setup to capture every detail of every HTTP request. Filter out sensitive POST parameters (Pushover redacts title and message), log one line per request including the request's UUID to flat files, then rotate and compress the log daily and delete old ones according to your retention policy. In 99.9% of Pushover support requests where I needed to consult server logs, having the date/time, sending IP, method, timing information, and sanitized POST parameters logged was enough to resolve the issue for the user.

[2023-07-12 16:30:24] [62dd8009-3174-4b86-8078-18330a1e7b0e] [1.2.3.4] method=POST \
 path=/1/messages.json format=json controller=Api::One::MessagesController \
 action=create status=200 duration=66.14 view=0.1 db=14.73 \
 params={"token"=>"apn9u395cgrxxxxxxxxxxxxxxxxxxx", \
 "user"=>"uc65prcvfxxxxxxxxxxxxxxxxxxxxx", "device"=>"iphone", "sound"=>"carl", \
 "message"=>"[FILTERED]", "monospace"=>"1"} cf_ray=7e5c6320196e0252-ORD \
 time0=0.05 time1=3.15 time2=4.78 time3=4.81 timebuild=16.65 timesave=59.43 \
 message_ids=m749182415778658654 queued=1 server=hippocampus

Be descriptive in your error responses

Assume a human will read them, even if it's unlikely. You might be surprised how error messages propagate but at least give the error a fighting chance to be seen.

When applicable, have your API errors include URLs to documentation to avoid support requests rather than just responding with a minimal "invalid xyz parameter" that the user has to decode.

Use prefixed tokens

This took me years to stumble upon, but use a short prefix for each type of random ID you create. Instead of generating an API token of Mk7vuCg9eptiV8qid4mn, make it appMk7vuCg9eptiV8qid4mn. Instead of a user key of zo2iD3x3J9, use userzo2iD3x3J9. Pushover uses a for API tokens, u for user keys, g for group keys, s for subscribed user keys, etc. This makes it easier for users to keep multiple keys/tokens straight when they all look like gibberish and it makes it possible to automate helpful API error responses like "your token parameter has a user key instead of an API token".

Stay on top of failures

Reportedly half of all e-mail processed is spam. That means a lot of money, servers, and administrative overhead is wasted just scaling up resources to deal with junk that no one wants. If you don't stay on top of error responses from your API, they will accumulate and you'll end up wasting a lot of your own money constantly serving bad requests.

Obviously you'll want instant alerts of 5xx errors from your API when your database falls over, but here I'm talking about the normal 4xx client errors generated in response to missing required parameters, expired tokens, and other conditions that you don't normally need to worry about.

Pushover's API generates about 1.5 million 4xx errors every day and each one increments an expiring counter in its database for the sending IP (for IPv6, rounded off to the /64). When that count reaches a certain limit in a short amount of time, the IP is banned for one hour and subsequent requests short-circuit all of the API logic and are responded to with a 429 status and a descriptive error message.

When an IP block can be mapped back to a user by way of an API token in the request, that user is sent an automated e-mail explaining why they were blocked and that they need to shut down whatever is continuing to send the failed requests.

If that error count keeps increasing past another threshold, the IP is blocked harder by adding it to a pf table, blocking it on the IP level. After 1 hour, the block is automatically removed. If the IP gets blocked again in a short amount of time, it is blocked for 2 hours, then 4, then 8, etc. However, avoid doing this type of IP-level block too early or you'll get inundated with generic "I can't connect" complaints. Let the requests fail on the HTTP level with proper error responses for as long as possible so they get seen.

Something I learned long ago is that automated things can have a hard time dealing with failure. Many people are also bad programmers and their method for dealing with "my HTTP query didn't get a 200 response" is to immediately hammer the API server again until it does (or it crashes, causing it to be restarted over and over). So a user's script sends Pushover normal API traffic all month until it gets to the final week, their API token reaches its monthly limit, and now the script is in a loop hitting the API with dozens of requests per second.

After dealing with this numerous times, my solution for Pushover's API was to add a step after the soft block (where requests are dropped with a 429 status), but before the hard block (where all traffic from the IP is dropped). After sending enough 429 responses that the IP's failure count is reaching the hard-block limit, Pushover's API will temporarily respond to rejected requests with a 200 status code but with a descriptive message in the body explaining why the message was rejected. The user's messages weren't going to go through anyway because they were already sending bad requests (their API token reached its usage limit or the input was bad), so responding with a 200 shouldn't cause too many problems and the descriptive error might possibly be seen by someone once they check why their messages aren't being processed. If these temporary 200 responses fail to slow down the client, they'll eventually run into the hard block limit and get cut off, but it was worth a try.

Questions or comments?
Reply on Mastodon or e-mail me.