Krawly Documentation

Krawly is an AI-powered web scraping platform. Describe what you want to extract, and Krawly generates a YAML configuration that handles pagination, detail pages, and data transformation — no code required.

🤖

AI-Powered

Describe your target in plain text. Krawly's AI generates the scraping config automatically.

🐍

Python SDK

pip install krawly — Full API client with progress tracking and local execution.

🧩

Chrome Extension

Run scrapers directly in your browser. Login, import configs, and export results.

📡

REST API

Full API access on all plans. Generate configs, run jobs, and retrieve results programmatically.

Quickstart

Get started with Krawly in under 2 minutes.

1. Create an account

Sign up at krawly.io/accounts/register. The free plan includes 3 scraping jobs per month with full API access.

2. Get your API key

Go to your profile page and copy your API key (starts with sai_).

3. Generate a scraping config

Use the playground at krawly.io/dashboard or the API:

curl -X POST https://krawly.io/api/v1/generate/ \
  -H "Authorization: Bearer sai_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://books.toscrape.com",
    "prompt": "Scrape all book titles, prices, ratings, and availability"
  }'

4. Or use the Python SDK

pip install krawly
from krawly import Krawly

client = Krawly(api_key="sai_your_api_key")

# Generate a config from natural language
result = client.generate(
    url="https://books.toscrape.com",
    prompt="Scrape all book titles, prices, and ratings"
)

# Run the generated config and get results
results = client.scrape(
    url="https://books.toscrape.com",
    prompt="Scrape all book titles, prices, and ratings"
)
print(f"Scraped {len(results.data)} items")

Authentication

All API requests (except login) require authentication using your API key in the Authorization header:

Authorization: Bearer sai_your_api_key
💡 Tip: You can find your API key on your profile page. If you don't have one, it will be generated when you first use the login endpoint.

You can also authenticate via the login endpoint using your email and password:

POST /api/v1/auth/login/
Content-Type: application/json

{
  "email": "you@example.com",
  "password": "your_password"
}

# Response:
{
  "api_key": "sai_abc123...",
  "email": "you@example.com",
  "plan": "starter"
}

API Reference

The Krawly API is a RESTful API with JSON request/response format. Base URL: https://krawly.io/api/v1/

MethodEndpointDescription
POST /api/v1/auth/login/ Login with email/password, returns API key
GET /api/v1/me/ Get account info and subscription
GET /api/v1/configs/ List your YAML configs
POST /api/v1/configs/ Create a new config
GET /api/v1/configs/{id}/ Get a specific config
DELETE /api/v1/configs/{id}/ Delete a config
POST /api/v1/generate/ Generate YAML from URL + prompt
POST /api/v1/run/ Run a YAML config (server-side)
GET /api/v1/jobs/ List your scraping jobs
GET /api/v1/jobs/{id}/status/ Get job status and progress
GET /api/v1/jobs/{id}/results/ Get job results

POST /api/v1/auth/login/

Authenticate with email and password to receive your API key. No authentication header required.

Request Body

FieldTypeRequiredDescription
emailstringYesYour account email
passwordstringYesYour account password

Response (200)

{
  "api_key": "sai_a1b2c3d4e5f6...",
  "email": "user@example.com",
  "plan": "starter"
}

Error Response (401)

{
  "error": "Invalid email or password."
}

GET /api/v1/me/

Get your account information and subscription details.

Response

{
  "user": {
    "id": 1,
    "email": "user@example.com",
    "first_name": "John",
    "last_name": "Doe"
  },
  "subscription": {
    "plan": "starter",
    "yaml_generations_remaining": 15,
    "monthly_scrapes_remaining": 20,
    "has_api_access": true,
    "has_server_execution": false
  }
}

Configs CRUD

GET /api/v1/configs/

List all your YAML configurations.

{
  "count": 3,
  "results": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "name": "Books Scraper",
      "target_url": "https://books.toscrape.com",
      "created_at": "2026-01-15T10:30:00Z"
    }
  ]
}

POST /api/v1/configs/

Create a new YAML configuration.

FieldTypeRequiredDescription
namestringYesConfig name
target_urlstringYesTarget website URL
yaml_contentstringYesFull YAML config content
promptstringNoDescription of what to scrape

GET /api/v1/configs/{id}/

Get a specific configuration including the full YAML content.

DELETE /api/v1/configs/{id}/

Delete a configuration.

POST /api/v1/generate/

Generate a YAML scraping configuration using AI. The AI agent analyzes the target website, discovers API endpoints, handles JavaScript-rendered pages, and produces an optimized config.

Request Body

FieldTypeRequiredDescription
urlstringYesTarget website URL
promptstringYesNatural language description of what to scrape

Response (202 Accepted)

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "progress": 0,
  "job_type": "generate",
  "target_url": "https://books.toscrape.com",
  "created_at": "2026-01-15T10:30:00Z"
}
ℹ️ Async Operation: Generation runs asynchronously. Poll /api/v1/jobs/{id}/status/ to track progress.

Example

curl -X POST https://krawly.io/api/v1/generate/ \
  -H "Authorization: Bearer sai_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://news.ycombinator.com",
    "prompt": "Scrape all story titles, points, links, and comment counts"
  }'

POST /api/v1/run/

Execute a YAML config on Krawly's servers. Provide either a config_id or yaml_content.

Request Body

FieldTypeRequiredDescription
config_idUUIDOne ofID of a saved config to run
yaml_contentstringOne ofRaw YAML config content

Response (202 Accepted)

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "progress": 0,
  "job_type": "run"
}
⚠️ Pro Plan Required: Server-side execution is available on the Pro plan and above. Free and Starter plans can run scrapers locally via the SDK or Chrome Extension.

Jobs

GET /api/v1/jobs/

List all your scraping jobs with status and progress.

GET /api/v1/jobs/{id}/status/

Get the current status and progress of a specific job. Use this to poll for completion.

Job Status Values

StatusDescription
pendingJob is queued and waiting to start
analyzingAI is analyzing the target website
generatingAI is generating the YAML configuration
validatingTesting the generated config
runningScraper is executing
completedJob finished successfully
failedJob failed with an error

Response

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "progress": 100,
  "status_message": "Scraped 250 items",
  "items_found": 250,
  "pages_scraped": 5,
  "api_discovered": true,
  "created_at": "2026-01-15T10:30:00Z",
  "completed_at": "2026-01-15T10:31:12Z",
  "results": { "data": [...], "row_count": 250 },
  "yaml_config": { "id": "...", "yaml_content": "..." }
}

GET /api/v1/jobs/{id}/results/

Get the scraped data from a completed job.

Response

[
  {
    "data": [
      { "title": "Book Title", "price": "£29.99", "rating": "4 stars" },
      { "title": "Another Book", "price": "£15.50", "rating": "5 stars" }
    ],
    "row_count": 250
  }
]

Python SDK

Installation

pip install krawly

The SDK requires Python 3.8+ and has minimal dependencies (requests and pyyaml).

Quick Start

from krawly import Krawly

# Initialize the client
client = Krawly(api_key="sai_your_api_key")

# Check your account
info = client.me()
print(f"Plan: {info.plan}")
print(f"Scrapes remaining: {info.scrapes_remaining}")

# One-liner: generate + run + get results
results = client.scrape(
    url="https://books.toscrape.com",
    prompt="Scrape all book titles, prices, and ratings"
)
print(f"Got {len(results.data)} items")

# Save results
import json
with open("results.json", "w") as f:
    json.dump(results.data, f, indent=2)

Client Reference

Constructor

Krawly(api_key="sai_...", base_url="https://krawly.io")
ParamTypeDefaultDescription
api_keystrYour API key (required)
base_urlstrhttps://krawly.ioAPI base URL

Account

# Get account info
info = client.me()  # Returns UserInfo

YAML Generation

# Generate a YAML config from natural language
result = client.generate(url="...", prompt="...")
# Returns GenerationResult with .job_id, .yaml_content, .config

Running Scrapers

# Run by config_id
job = client.run(config_id="uuid-here")

# Run raw YAML
job = client.run_yaml(yaml_content="version: '2.0'...")

# Wait for completion with progress
status = client.wait_for_completion(job.id, on_progress=lambda s: print(s.progress))

# Get results
results = client.job_results(job.id)

One-Liner Scraping

# Generate + run + wait + return results in one call
results = client.scrape(url="...", prompt="...")

# Same, but with a YAML file
results = client.scrape_with_file("config.yaml")

Config Management

# List configs
configs = client.list_configs()

# Get a specific config
config = client.get_config("uuid-here")

# Create a new config
config = client.create_config(
    name="My Scraper",
    target_url="https://example.com",
    yaml_content="...",
    prompt="..."
)

# Delete a config
client.delete_config("uuid-here")

# Download YAML to file
client.download_config("uuid-here", "output.yaml")

# Upload YAML from file
config = client.upload_config("config.yaml")

YAML Utilities

# Load YAML from file
config = Krawly.load_yaml("config.yaml")

# Parse YAML string
config = Krawly.parse_yaml(yaml_string)

# Save dict as YAML
Krawly.save_yaml(config_dict, "output.yaml")

Models

UserInfo

AttributeTypeDescription
emailstrAccount email
planstrCurrent plan name
scrapes_remainingintScrapes left this month
generations_remainingintGenerations left this month
has_api_accessboolWhether API access is enabled

JobStatus

AttributeTypeDescription
idstrJob UUID
statusstrCurrent status
progressintProgress 0-100
is_runningboolTrue if job is active
is_completedboolTrue if job finished
is_failedboolTrue if job failed
items_foundintNumber of items scraped
pages_scrapedintNumber of pages visited

ScrapingResult

AttributeTypeDescription
datalist[dict]Scraped data rows
row_countintTotal number of rows

Exceptions

ExceptionDescription
KrawlyErrorBase exception for all SDK errors
AuthenticationErrorInvalid or missing API key (401)
QuotaExceededErrorMonthly limit reached (429)
RateLimitErrorToo many requests (429)

Examples

Progress Tracking

from krawly import Krawly

client = Krawly(api_key="sai_...")

def on_progress(status):
    print(f"[{status.progress}%] {status.status} — {status.status_message}")

result = client.generate(
    url="https://news.ycombinator.com",
    prompt="Scrape story titles, points, authors, and links"
)

# Wait with real-time progress
final = client.wait_for_completion(result.job_id, on_progress=on_progress)
print(f"Done! Found {final.items_found} items")

Batch Scraping

from krawly import Krawly

client = Krawly(api_key="sai_...")
urls = [
    ("https://books.toscrape.com", "Get all book titles and prices"),
    ("https://quotes.toscrape.com", "Get all quotes, authors, and tags"),
]

for url, prompt in urls:
    try:
        results = client.scrape(url=url, prompt=prompt)
        print(f"✅ {url}: {len(results.data)} items")
    except Exception as e:
        print(f"❌ {url}: {e}")

Working with YAML Files

from krawly import Krawly

client = Krawly(api_key="sai_...")

# Upload a local YAML file as a config
config = client.upload_config("my_scraper.yaml")
print(f"Created config: {config.name} ({config.id})")

# Run the config on the server
results = client.scrape_with_file("my_scraper.yaml")
print(f"Got {len(results.data)} items")

Chrome Extension

Installation

The Krawly Chrome Extension lets you run scraping configs directly in your browser.

  1. Download the extension ZIP from your Krawly dashboard or the Krawly website
  2. Open chrome://extensions/ in Chrome
  3. Enable Developer mode (toggle in the top right)
  4. Click "Load unpacked" and select the extracted extension folder
  5. The Krawly icon (⚡) will appear in your toolbar
✅ Chrome Web Store: The extension will be available on the Chrome Web Store soon. For now, use the developer mode installation method above.

Login & Setup

Click the Krawly icon in your toolbar to open the extension popup.

  1. Email & Password: Enter your Krawly credentials and click "Sign In"
  2. API Key: Alternatively, paste your API key (starts with sai_) and click "Connect with API Key"

Once logged in, the extension will automatically sync your configs from Krawly and show your account info.

Usage

The extension has four tabs:

▶ Run Tab

📥 Configs Tab

💾 Local Tab

👤 Account Tab

Exporting Data

After running a scraper, you can export results in three formats:

FormatDescription
JSONStandard JSON array with pretty-printing
CSVComma-separated values with UTF-8 BOM for Excel compatibility
XLSXNative Excel spreadsheet format

You can also click 📋 Copy to copy all results as JSON to your clipboard.

YAML Configuration

Overview

Krawly uses YAML-based configurations to define scraping rules. Configs are portable, version-controlled, and can be generated automatically by AI or written manually.

💡 AI Generation: You don't need to write YAML manually! Use the playground at krawly.io or the client.generate() method to have AI create configs for you.

Basic Structure

version: "2.0"
name: "My Scraper"
description: "Scrapes product listings"

target:
  url: "https://example.com/products"
  engine: "heavy"             # "lightweight", "heavy", or "auto"
  browser:
    wait_selector: ".product"  # Wait for this element before scraping
    timeout: 15000

extract:
  container: ".product-card"   # CSS selector for each item
  type: "list"                 # "list", "single", or "table"
  fields:
    - name: "title"
      selector: "h2.title"
      type: "text"
    - name: "price"
      selector: ".price"
      type: "text"
      transform: ["strip_currency", "to_number"]
    - name: "url"
      selector: "a"
      type: "link"
    - name: "image"
      selector: "img"
      type: "image_url"

pagination:
  type: "click"
  selector: "a.next-page"
  max_pages: 5
  wait_after: 2000

output:
  format: "json"
  deduplicate: true
  deduplicate_fields: ["title", "url"]
  limit: 500

Field Types

TypeDescriptionExtra Params
textExtract text content from element
attributeExtract an HTML attributeattribute: "href"
htmlExtract inner HTML
linkExtract absolute URL from href
image_urlExtract image src or data-src
countCount matching elements
existsCheck if element exists (true/false)
regexExtract text via regex patternpattern: "\\d+"
evalExecute JavaScript expressionscript: "el.innerText"

Transforms

Apply data transformations to extracted values:

TransformDescription
stripTrim whitespace
lowercaseConvert to lowercase
uppercaseConvert to uppercase
strip_currencyRemove currency symbols (£$€¥₺)
to_numberParse as number
extract_numberExtract first number from text
strip_htmlRemove HTML tags
trim_whitespaceCollapse multiple spaces
regex:PATTERNExtract via regex
replace:OLD:NEWReplace text
split:DELIM:INDEXSplit and take index
slice:START:ENDSubstring slice
template:TEXT{value}TEXTTemplate string

Pagination

Krawly supports three pagination strategies:

Click Pagination

pagination:
  type: "click"
  selector: "button.load-more"    # CSS selector for next button
  max_pages: 10
  wait_after: 2000                # ms to wait after click

URL Pagination

pagination:
  type: "url"
  url_pattern: "https://example.com/page/{page}"
  max_pages: 10
  wait_after: 1500

Infinite Scroll

pagination:
  type: "scroll_infinite"
  max_pages: 5
  wait_after: 3000

Stop Conditions

pagination:
  type: "click"
  selector: ".next"
  max_pages: 20
  stop_condition:
    max_items: 500    # Stop after collecting this many items

Actions

Pre-scraping actions that run before data extraction:

actions:
  - type: "click"
    selector: "#accept-cookies"
  - type: "scroll"
    direction: "down"
    amount: "full"
  - type: "type"
    selector: "#search-input"
    value: "laptop"
  - type: "select"
    selector: "#sort-by"
    value: "price_low"
  - type: "hover"
    selector: ".dropdown-trigger"
  - type: "keyboard"
    key: "Enter"
  - type: "script"
    script: "window.scrollTo(0, 500)"

Complete Examples

E-Commerce Product Scraper

version: "2.0"
name: "Product Scraper"
description: "Scrapes product listings with prices and images"

target:
  url: "https://shop.example.com/category"
  engine: "heavy"
  browser:
    wait_selector: ".product-grid"

extract:
  container: ".product-card"
  type: "list"
  fields:
    - name: "title"
      selector: "h3.product-name"
      type: "text"
    - name: "price"
      selector: ".price-current"
      type: "text"
      transform: ["strip_currency", "to_number"]
    - name: "original_price"
      selector: ".price-original"
      type: "text"
      transform: ["strip_currency", "to_number"]
      default: null
    - name: "rating"
      selector: ".star-rating"
      type: "attribute"
      attribute: "data-rating"
    - name: "image_url"
      selector: "img.product-image"
      type: "image_url"
    - name: "product_url"
      selector: "a.product-link"
      type: "link"

detail_fields:
  fields:
    - name: "description"
      selector: ".product-description"
      type: "text"
    - name: "sku"
      selector: "#product-sku"
      type: "text"
    - name: "in_stock"
      selector: ".stock-status.available"
      type: "exists"

pagination:
  type: "click"
  selector: ".pagination .next"
  max_pages: 10
  wait_after: 2000

output:
  format: "json"
  deduplicate: true
  deduplicate_fields: ["product_url"]

News Article Scraper

version: "2.0"
name: "News Scraper"
description: "Scrapes news article headlines"

target:
  url: "https://news.ycombinator.com"
  engine: "lightweight"

extract:
  container: "tr.athing"
  type: "list"
  fields:
    - name: "rank"
      selector: ".rank"
      type: "text"
      transform: ["replace:.:"]
    - name: "title"
      selector: ".titleline > a"
      type: "text"
    - name: "url"
      selector: ".titleline > a"
      type: "link"
    - name: "site"
      selector: ".sitestr"
      type: "text"
      default: "news.ycombinator.com"

pagination:
  type: "click"
  selector: "a.morelink"
  max_pages: 3

Plans & Limits

FeatureFreeStarter ($15/mo)Pro ($29/mo)
Monthly Scrapes320100
API Access
Chrome Extension
Python SDK
Server Execution
Priority Support
💡 All plans include API access. You can use the REST API, Python SDK, and Chrome Extension on every plan — including Free.

Error Handling

The API uses standard HTTP status codes:

CodeMeaningAction
200Success
201CreatedResource was created
202AcceptedAsync job was queued
400Bad RequestCheck your request data
401UnauthorizedCheck your API key
404Not FoundResource doesn't exist
429Too Many RequestsRate limited or quota exceeded
500Server ErrorContact support

SDK Error Handling

from krawly import Krawly, KrawlyError, AuthenticationError, QuotaExceededError

client = Krawly(api_key="sai_...")

try:
    results = client.scrape(url="https://example.com", prompt="Get products")
except AuthenticationError:
    print("Invalid API key. Check your credentials.")
except QuotaExceededError as e:
    print(f"Monthly limit reached: {e}")
except KrawlyError as e:
    print(f"Krawly error: {e}")

© 2026 Krawly. All rights reserved. — krawly.io · support@krawly.io