Skip to main content
This guide covers the four ways Convertly’s image CDN can decide where and how to crop and how to feed it the metadata you already have from a CMS, design tool, or art director. It also covers the standalone analyse endpoint that emits focal points and palettes you can persist alongside every upload.
Side-by-side examples for gravity=auto, gravity=smart, gravity=face, and gravity=center live on Transforms → Smart cropping. This page goes deeper on focal points, entropy crops, and the analyze API.

When to use which mode

ScenarioRecommended modeWhy
Default for unknown assetsgravity=autoFast saliency, great default.
Portrait / team photo / anything with peoplegravity=faceReal face detection (TinyFaceDetector). Picks the highest-confidence face as the crop centre; falls through to smart crop if no face is found.
Subject is off-centre or close to an edgegravity=smart (or crop=smart)Runs full content-aware scoring; better than libvips for tricky layouts.
You already know where the subject isfp=x,yPure pixel math. Zero extra cost on every render.
Marketing hero where art direction mattersfp=x,y + a one-time /api/images/analyze call to seed itStorage cost is one row per asset; renders stay deterministic.
Detail / texture sources (no obvious subject)gravity=entropyPicks the densest region by entropy without looking for skin/saliency cues.
Hard-coded cornergravity=center, gravity=northwest, etc.When the source is already composed for you.

When auto is not enough

On off-centre hero images, gravity=auto (default saliency) can miss the subject. gravity=smart runs a heavier content-aware pass that scores detail, saturation, and skin tone across the frame.
Placeholder: gravity auto vs smart on off-centre hero
For the full gravity comparison grid (auto, smart, face, center), see Transforms → Smart cropping.

Manual focal points (fp=x,y)

fp=x,y accepts a normalised coordinate where (0, 0) is the top-left of the source and (1, 1) is the bottom-right. Both percentages and decimals work:
?w=600&h=400&fp=0.3,0.4    # decimals in [0, 1]
?w=600&h=400&fp=30%,40%    # percentages
Convertly takes a pre-resize extract whose aspect ratio matches the requested w/h, then resizes that extract down. The focal point lands at the centre of the crop window, clamped if the requested aspect would push the window past a source edge. This is the cheapest crop mode — it’s pure pixel math with no analysis pass — and it’s the only mode that’s deterministic across renditions. If you serve the same image at 1200×800, 600×400, and 300×200 with the same focal point, all three crops centre on the same physical pixel.
Placeholder: source image with focal point marker
Placeholder: center crop misses subject
Placeholder: focal point crop keeps subject
<img
  src="https://cdn.convertly.sh/cvly_pub_.../{fileId}?w=1200&h=800&fp=0.4,0.35"
  srcset="
    https://cdn.convertly.sh/cvly_pub_.../{fileId}?w=600&h=400&fp=0.4,0.35  600w,
    https://cdn.convertly.sh/cvly_pub_.../{fileId}?w=900&h=600&fp=0.4,0.35  900w,
    https://cdn.convertly.sh/cvly_pub_.../{fileId}?w=1200&h=800&fp=0.4,0.35 1200w
  "
  sizes="(min-width: 768px) 50vw, 100vw"
  alt="Product hero"
/>

Face-aware cropping (gravity=face)

gravity=face runs a real face-detection model on the source (TinyFaceDetector, a ~190 KB MIT-licensed network) and uses the highest-confidence face’s centre as the crop focus. If multiple faces are detected, the box with the highest score wins — group shots tend to lock onto whoever the model is most confident about, which is usually the closest or most-foreground person. If no face is detected (most product shots, landscapes, screenshots), the request transparently falls through to gravity=smart, then to the saliency strategy. So gravity=face is always at least as good as gravity=auto — it just adds a face-aware step on top.
<img src="https://cdn.convertly.sh/cvly_pub_.../{fileId}?w=400&h=400&fit=cover&gravity=face" alt="Team headshot" />
See gravity=face on Transforms for a side-by-side crop example. The model loads once per process (~200 ms) and stays in memory. Detection on a typical 1080p portrait takes 50–200 ms on a single CPU core. Like every other smart-crop mode, this cost is paid once per cache miss — once the output bytes are cached, subsequent requests serve them for free.

Content-aware smart cropping (gravity=smart)

gravity=smart (or its alias crop=smart) runs the smartcrop algorithm on the source. smartcrop scores every candidate window for three signals — pixel-level detail, saturation, and skin-tone presence — then returns the rectangle with the highest aggregate score for the requested aspect ratio. It’s slower than the saliency strategy (gravity=auto) by ~50–100ms on a typical 1080p source, but it picks the right region noticeably more often on:
  • Hero shots where the subject is off-centre
  • Portraits with significant negative space
  • Product photography with the item in a corner
  • Editorial imagery where the visual weight is asymmetric
<img src="https://cdn.convertly.sh/cvly_pub_.../{fileId}?w=600&h=600&fit=cover&gravity=smart" alt="…" />
The pass runs once per cache miss. Once the resulting bytes are cached, every subsequent request serves the already-encoded image with no analysis cost — so a single popular hero pays the smart-crop cost once and serves it billions of times for free. See gravity=smart on Transforms for a side-by-side comparison with gravity=auto.

Entropy cropping (gravity=entropy)

gravity=entropy picks the region with the highest pixel entropy — detail density — without saliency or face cues. Use it when there is no obvious subject: product flat-lays, fabric swatches, maps, or texture photography where you want the busiest area in frame.
?w=600&h=600&fit=cover&gravity=entropy
Placeholder: entropy crop on texture flat-lay

Combining focal points with the API

The recommended pattern at scale:
  1. On upload, call /api/images/analyze once. Store the returned focal.suggested.{x, y} next to the file in your DB.
  2. On render, build the CDN URL with fp=x,y from the stored value. No per-request analysis. No CDN cache invalidation if art direction changes — you just update the stored focal point and refresh affected URLs.
  3. Art director override: any time someone wants to hand-tune a focal point, update the stored value. URLs continue to be deterministic.
This keeps the hot path zero-cost while letting you serve smart-cropped renditions to every device.

The analyze endpoint

POST /api/images/analyze returns the focal point smartcrop would pick, the suggested crop window in pixel coordinates, and a full colour palette. Use it once per upload, persist the result, and reuse it forever.

Authenticate

Same auth as every other media-tools API: a regular API key (cvly_…) or a logged-in session cookie. Delivery keys are not accepted on this endpoint — it’s a workspace-write path, not a public CDN one.

Two request shapes

If the image is already in Convertly Storage, send a JSON body with the file’s id. No upload needed.
curl -X POST "https://convertly.sh/api/images/analyze" \
  -H "Authorization: Bearer $CONVERTLY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "fileId": "01h-xxx-...-uuid" }'

Fresh upload

If the image isn’t stored yet (onboarding, one-off analysis, “what colour is this?” calls), use multipart:
curl -X POST "https://convertly.sh/api/images/analyze" \
  -H "Authorization: Bearer $CONVERTLY_API_KEY" \
  -F "file=@./hero.jpg"
Multipart uploads are capped at 25 MB.

What analyze returns visually

Analyze runs face detection and palette extraction on the source once. Persist the JSON next to the file — every CDN render with fp=x,y stays deterministic and free of per-request analysis.
Placeholder: analyze API face detection overlay
Placeholder: analyze API palette swatches

Response

{
  "width": 1920,
  "height": 1080,
  "focal": {
    "suggested": { "x": 0.42, "y": 0.38, "confidence": 0.92, "source": "face" },
    "crop": { "left": 320, "top": 0, "width": 1080, "height": 1080 }
  },
  "faces": [
    { "x": 740, "y": 280, "width": 220, "height": 220, "score": 0.94 },
    { "x": 1180, "y": 320, "width": 180, "height": 180, "score": 0.71 }
  ],
  "palette": {
    "vibrant":      { "hex": "#a23f1c", "rgb": [162, 63, 28], "population": 1240, "bodyTextOnColor": "#ffffff" },
    "muted":        { "hex": "#7a6a5e", "rgb": [122, 106, 94], "population": 980, "bodyTextOnColor": "#ffffff" },
    "lightVibrant": { "hex": "#f1c597", "rgb": [241, 197, 151], "population": 620, "bodyTextOnColor": "#000000" },
    "darkVibrant":  { "hex": "#3a1607", "rgb": [58, 22, 7], "population": 510, "bodyTextOnColor": "#ffffff" },
    "lightMuted":   { "hex": "#d4c4b1", "rgb": [212, 196, 177], "population": 740, "bodyTextOnColor": "#000000" },
    "darkMuted":    { "hex": "#403832", "rgb": [64, 56, 50], "population": 410, "bodyTextOnColor": "#ffffff" },
    "dominant":     { "hex": "#a23f1c", "rgb": [162, 63, 28], "population": 1240, "bodyTextOnColor": "#ffffff" }
  }
}
Field notes:
  • focal.suggested.{x, y} is normalised to [0, 1]. Paste straight into ?fp=x,y.
  • focal.suggested.confidence is in [0, 1]. Values under ~0.3 mean the algorithm couldn’t find a strong subject; treat them as “centre crop is probably fine.”
  • focal.suggested.source tells you which backend produced the focal point:
    • face — a face was detected; the focal point is the centre of the highest-confidence face box. Highly trustworthy.
    • smart-crop — no face found; smartcrop’s content-aware pick was used.
    • center — both passes failed (very small image, single-colour, or unsupported format); the centre is the safe fallback.
  • focal.crop is the suggested square crop window in source pixel coordinates. Useful if you want to physically store a thumbnail rather than render one on demand.
  • faces is the full list of detected face boxes, sorted by score descending. Each box is in source-image pixel coordinates with a confidence score. Empty array = no faces detected (a successful run); the field is present even when gravity=face would fall through to smart crop.
  • palette.dominant is whichever named swatch has the highest pixel population. Renders nicely as the card-background colour in product UIs.
  • palette.*.bodyTextOnColor is a WCAG-style “white or black for body text?” hint.

Quota

One /api/images/analyze call = 1 media operation against your plan’s monthly quota. Free, Starter, and Pro all include the endpoint.

Comparison to imgix and Cloudinary

FeatureimgixCloudinaryConvertly
Manual focal point in URLfp-x / fp-yg_xy_center,x_…,y_…fp=x,y
Content-aware cropcrop=focalpoint (with fp)g_autogravity=smart or crop=smart
Face-aware cropcrop=facesg_facegravity=face (TinyFaceDetector model, falls through to smart crop if no face)
Palette extraction in APIcolour palettecolour metadata/api/images/analyze returns six categorised swatches + dominant
Pricing for analysisPer-image surchargeAdd-on tierCounts as 1 op against the included monthly quota
The fp=x,y syntax intentionally mirrors imgix’s fp-x / fp-y so URL-translation is mechanical — see the migration guide for the full mapping.