Side-by-side examples for
gravity=auto, gravity=smart, gravity=face, and gravity=center live on Transforms → Smart cropping. This page goes deeper on focal points, entropy crops, and the analyze API.When to use which mode
| Scenario | Recommended mode | Why |
|---|---|---|
| Default for unknown assets | gravity=auto | Fast saliency, great default. |
| Portrait / team photo / anything with people | gravity=face | Real face detection (TinyFaceDetector). Picks the highest-confidence face as the crop centre; falls through to smart crop if no face is found. |
| Subject is off-centre or close to an edge | gravity=smart (or crop=smart) | Runs full content-aware scoring; better than libvips for tricky layouts. |
| You already know where the subject is | fp=x,y | Pure pixel math. Zero extra cost on every render. |
| Marketing hero where art direction matters | fp=x,y + a one-time /api/images/analyze call to seed it | Storage cost is one row per asset; renders stay deterministic. |
| Detail / texture sources (no obvious subject) | gravity=entropy | Picks the densest region by entropy without looking for skin/saliency cues. |
| Hard-coded corner | gravity=center, gravity=northwest, etc. | When the source is already composed for you. |
When auto is not enough
On off-centre hero images,gravity=auto (default saliency) can miss the subject. gravity=smart runs a heavier content-aware pass that scores detail, saturation, and skin tone across the frame.

auto, smart, face, center), see Transforms → Smart cropping.
Manual focal points (fp=x,y)
fp=x,y accepts a normalised coordinate where (0, 0) is the top-left of the source and (1, 1) is the bottom-right. Both percentages and decimals work:
w/h, then resizes that extract down. The focal point lands at the centre of the crop window, clamped if the requested aspect would push the window past a source edge.
This is the cheapest crop mode — it’s pure pixel math with no analysis pass — and it’s the only mode that’s deterministic across renditions. If you serve the same image at 1200×800, 600×400, and 300×200 with the same focal point, all three crops centre on the same physical pixel.



Face-aware cropping (gravity=face)
gravity=face runs a real face-detection model on the source (TinyFaceDetector, a ~190 KB MIT-licensed network) and uses the highest-confidence face’s centre as the crop focus. If multiple faces are detected, the box with the highest score wins — group shots tend to lock onto whoever the model is most confident about, which is usually the closest or most-foreground person.
If no face is detected (most product shots, landscapes, screenshots), the request transparently falls through to gravity=smart, then to the saliency strategy. So gravity=face is always at least as good as gravity=auto — it just adds a face-aware step on top.
gravity=face on Transforms for a side-by-side crop example.
The model loads once per process (~200 ms) and stays in memory. Detection on a typical 1080p portrait takes 50–200 ms on a single CPU core. Like every other smart-crop mode, this cost is paid once per cache miss — once the output bytes are cached, subsequent requests serve them for free.
Content-aware smart cropping (gravity=smart)
gravity=smart (or its alias crop=smart) runs the smartcrop algorithm on the source. smartcrop scores every candidate window for three signals — pixel-level detail, saturation, and skin-tone presence — then returns the rectangle with the highest aggregate score for the requested aspect ratio.
It’s slower than the saliency strategy (gravity=auto) by ~50–100ms on a typical 1080p source, but it picks the right region noticeably more often on:
- Hero shots where the subject is off-centre
- Portraits with significant negative space
- Product photography with the item in a corner
- Editorial imagery where the visual weight is asymmetric
gravity=smart on Transforms for a side-by-side comparison with gravity=auto.
Entropy cropping (gravity=entropy)
gravity=entropy picks the region with the highest pixel entropy — detail density — without saliency or face cues. Use it when there is no obvious subject: product flat-lays, fabric swatches, maps, or texture photography where you want the busiest area in frame.

Combining focal points with the API
The recommended pattern at scale:- On upload, call
/api/images/analyzeonce. Store the returnedfocal.suggested.{x, y}next to the file in your DB. - On render, build the CDN URL with
fp=x,yfrom the stored value. No per-request analysis. No CDN cache invalidation if art direction changes — you just update the stored focal point and refresh affected URLs. - Art director override: any time someone wants to hand-tune a focal point, update the stored value. URLs continue to be deterministic.
The analyze endpoint
POST /api/images/analyze returns the focal point smartcrop would pick, the suggested crop window in pixel coordinates, and a full colour palette. Use it once per upload, persist the result, and reuse it forever.
Authenticate
Same auth as every other media-tools API: a regular API key (cvly_…) or a logged-in session cookie. Delivery keys are not accepted on this endpoint — it’s a workspace-write path, not a public CDN one.
Two request shapes
Stored file (recommended)
If the image is already in Convertly Storage, send a JSON body with the file’s id. No upload needed.Fresh upload
If the image isn’t stored yet (onboarding, one-off analysis, “what colour is this?” calls), use multipart:What analyze returns visually
Analyze runs face detection and palette extraction on the source once. Persist the JSON next to the file — every CDN render withfp=x,y stays deterministic and free of per-request analysis.
Response
focal.suggested.{x, y}is normalised to[0, 1]. Paste straight into?fp=x,y.focal.suggested.confidenceis in[0, 1]. Values under ~0.3 mean the algorithm couldn’t find a strong subject; treat them as “centre crop is probably fine.”focal.suggested.sourcetells you which backend produced the focal point:face— a face was detected; the focal point is the centre of the highest-confidence face box. Highly trustworthy.smart-crop— no face found; smartcrop’s content-aware pick was used.center— both passes failed (very small image, single-colour, or unsupported format); the centre is the safe fallback.
focal.cropis the suggested square crop window in source pixel coordinates. Useful if you want to physically store a thumbnail rather than render one on demand.facesis the full list of detected face boxes, sorted by score descending. Each box is in source-image pixel coordinates with a confidence score. Empty array = no faces detected (a successful run); the field is present even whengravity=facewould fall through to smart crop.palette.dominantis whichever named swatch has the highest pixel population. Renders nicely as the card-background colour in product UIs.palette.*.bodyTextOnColoris a WCAG-style “white or black for body text?” hint.
Quota
One/api/images/analyze call = 1 media operation against your plan’s monthly quota. Free, Starter, and Pro all include the endpoint.
Comparison to imgix and Cloudinary
| Feature | imgix | Cloudinary | Convertly |
|---|---|---|---|
| Manual focal point in URL | fp-x / fp-y | g_xy_center,x_…,y_… | fp=x,y |
| Content-aware crop | crop=focalpoint (with fp) | g_auto | gravity=smart or crop=smart |
| Face-aware crop | crop=faces | g_face | gravity=face (TinyFaceDetector model, falls through to smart crop if no face) |
| Palette extraction in API | colour palette | colour metadata | /api/images/analyze returns six categorised swatches + dominant |
| Pricing for analysis | Per-image surcharge | Add-on tier | Counts as 1 op against the included monthly quota |
fp=x,y syntax intentionally mirrors imgix’s fp-x / fp-y so URL-translation is mechanical — see the migration guide for the full mapping.