Krawly Documentation

Krawly is an AI-powered web scraping platform. Describe what you want to extract, and Krawly generates a YAML configuration that handles pagination, detail pages, and data transformation — no code required.

🤖

AI-Powered

Describe your target in plain text. Krawly's AI generates the scraping config automatically.

🐍

Python SDK

pip install krawly — Full API client with progress tracking and local execution.

🧩

Chrome Extension

Run scrapers directly in your browser. Login, import configs, and export results.

📡

REST API

Full API access on all plans. Generate configs, run jobs, and retrieve results programmatically.

Quickstart

Get started with Krawly in under 2 minutes.

1. Create an account

2. Get your API key

Go to your profile page and copy your API key (starts with sai_).

3. Generate a scraping config

Use the playground at krawly.io/dashboard or the API:

curl -X POST https://krawly.io/api/v1/generate/ \
  -H "Authorization: Bearer sai_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://books.toscrape.com",
    "prompt": "Scrape all book titles, prices, ratings, and availability"
  }'

4. Or use the Python SDK

pip install krawly

from krawly import Krawly

client = Krawly(api_key="sai_your_api_key")

# Generate a config from natural language
result = client.generate(
    url="https://books.toscrape.com",
    prompt="Scrape all book titles, prices, and ratings"
)

# Run the generated config and get results
results = client.scrape(
    url="https://books.toscrape.com",
    prompt="Scrape all book titles, prices, and ratings"
)
print(f"Scraped {len(results.data)} items")

Authentication

All API requests (except login) require authentication using your API key in the Authorization header:

Authorization: Bearer sai_your_api_key

💡 Tip: You can find your API key on your profile page. If you don't have one, it will be generated when you first use the login endpoint.

You can also authenticate via the login endpoint using your email and password:

POST /api/v1/auth/login/
Content-Type: application/json

{
  "email": "you@example.com",
  "password": "your_password"
}

# Response:
{
  "api_key": "sai_abc123...",
  "email": "you@example.com",
  "plan": "starter"
}

API Reference

The Krawly API is a RESTful API with JSON request/response format. Base URL: https://krawly.io/api/v1/

Method	Endpoint	Description
POST	`/api/v1/auth/login/`	Login with email/password, returns API key
GET	`/api/v1/me/`	Get account info and subscription
GET	`/api/v1/configs/`	List your YAML configs
POST	`/api/v1/configs/`	Create a new config
GET	`/api/v1/configs/{id}/`	Get a specific config
DELETE	`/api/v1/configs/{id}/`	Delete a config
POST	`/api/v1/generate/`	Generate YAML from URL + prompt
POST	`/api/v1/run/`	Run a YAML config (server-side)
GET	`/api/v1/jobs/`	List your scraping jobs
GET	`/api/v1/jobs/{id}/status/`	Get job status and progress
GET	`/api/v1/jobs/{id}/results/`	Get job results

POST /api/v1/auth/login/

Authenticate with email and password to receive your API key. No authentication header required.

Request Body

Field	Type	Required	Description
`email`	string	Yes	Your account email
`password`	string	Yes	Your account password

Response (200)

{
  "api_key": "sai_a1b2c3d4e5f6...",
  "email": "user@example.com",
  "plan": "starter"
}

Error Response (401)

{
  "error": "Invalid email or password."
}

GET /api/v1/me/

Get your account information and subscription details.

Response

{
  "user": {
    "id": 1,
    "email": "user@example.com",
    "first_name": "John",
    "last_name": "Doe"
  },
  "subscription": {
    "plan": "starter",
    "yaml_generations_remaining": 15,
    "monthly_scrapes_remaining": 20,
    "has_api_access": true,
    "has_server_execution": false
  }
}

Configs CRUD

GET /api/v1/configs/

List all your YAML configurations.

{
  "count": 3,
  "results": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "name": "Books Scraper",
      "target_url": "https://books.toscrape.com",
      "created_at": "2026-01-15T10:30:00Z"
    }
  ]
}

POST /api/v1/configs/

Create a new YAML configuration.

Field	Type	Required	Description
`name`	string	Yes	Config name
`target_url`	string	Yes	Target website URL
`yaml_content`	string	Yes	Full YAML config content
`prompt`	string	No	Description of what to scrape

GET /api/v1/configs/{id}/

Get a specific configuration including the full YAML content.

DELETE /api/v1/configs/{id}/

Delete a configuration.

POST /api/v1/generate/

Generate a YAML scraping configuration using AI. The AI agent analyzes the target website, discovers API endpoints, handles JavaScript-rendered pages, and produces an optimized config.

Request Body

Field	Type	Required	Description
`url`	string	Yes	Target website URL
`prompt`	string	Yes	Natural language description of what to scrape

Response (202 Accepted)

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "progress": 0,
  "job_type": "generate",
  "target_url": "https://books.toscrape.com",
  "created_at": "2026-01-15T10:30:00Z"
}

ℹ️ Async Operation: Generation runs asynchronously. Poll /api/v1/jobs/{id}/status/ to track progress.

Example

curl -X POST https://krawly.io/api/v1/generate/ \
  -H "Authorization: Bearer sai_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://news.ycombinator.com",
    "prompt": "Scrape all story titles, points, links, and comment counts"
  }'

POST /api/v1/run/

Execute a YAML config on Krawly's servers. Provide either a config_id or yaml_content.

Request Body

Field	Type	Required	Description
`config_id`	UUID	One of	ID of a saved config to run
`yaml_content`	string	One of	Raw YAML config content

Response (202 Accepted)

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "progress": 0,
  "job_type": "run"
}

⚠️ Pro Plan Required: Server-side execution is available on the Pro plan and above. Free and Starter plans can run scrapers locally via the SDK or Chrome Extension.

Jobs

GET /api/v1/jobs/

List all your scraping jobs with status and progress.

GET /api/v1/jobs/{id}/status/

Get the current status and progress of a specific job. Use this to poll for completion.

Job Status Values

Status	Description
`pending`	Job is queued and waiting to start
`analyzing`	AI is analyzing the target website
`generating`	AI is generating the YAML configuration
`validating`	Testing the generated config
`running`	Scraper is executing
`completed`	Job finished successfully
`failed`	Job failed with an error

Response

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "progress": 100,
  "status_message": "Scraped 250 items",
  "items_found": 250,
  "pages_scraped": 5,
  "api_discovered": true,
  "created_at": "2026-01-15T10:30:00Z",
  "completed_at": "2026-01-15T10:31:12Z",
  "results": { "data": [...], "row_count": 250 },
  "yaml_config": { "id": "...", "yaml_content": "..." }
}

GET /api/v1/jobs/{id}/results/

Get the scraped data from a completed job.

Response

[
  {
    "data": [
      { "title": "Book Title", "price": "£29.99", "rating": "4 stars" },
      { "title": "Another Book", "price": "£15.50", "rating": "5 stars" }
    ],
    "row_count": 250
  }
]

Python SDK

Installation

pip install krawly

The SDK requires Python 3.8+ and has minimal dependencies (requests and pyyaml).

📦 PyPI: pypi.org/project/krawly

Quick Start

from krawly import Krawly

# Initialize the client
client = Krawly(api_key="sai_your_api_key")

# Check your account
info = client.me()
print(f"Plan: {info.plan}")
print(f"Scrapes remaining: {info.scrapes_remaining}")

# One-liner: generate + run + get results
results = client.scrape(
    url="https://books.toscrape.com",
    prompt="Scrape all book titles, prices, and ratings"
)
print(f"Got {len(results.data)} items")

# Save results
import json
with open("results.json", "w") as f:
    json.dump(results.data, f, indent=2)

Client Reference

Constructor

Krawly(api_key="sai_...", base_url="https://krawly.io")

Param	Type	Default	Description
`api_key`	str	—	Your API key (required)
`base_url`	str	https://krawly.io	API base URL

Account

# Get account info
info = client.me()  # Returns UserInfo

YAML Generation

# Generate a YAML config from natural language
result = client.generate(url="...", prompt="...")
# Returns GenerationResult with .job_id, .yaml_content, .config

Running Scrapers

# Run by config_id
job = client.run(config_id="uuid-here")

# Run raw YAML
job = client.run_yaml(yaml_content="version: '2.0'...")

# Wait for completion with progress
status = client.wait_for_completion(job.id, on_progress=lambda s: print(s.progress))

# Get results
results = client.job_results(job.id)

One-Liner Scraping

# Generate + run + wait + return results in one call
results = client.scrape(url="...", prompt="...")

# Same, but with a YAML file
results = client.scrape_with_file("config.yaml")

Config Management

# List configs
configs = client.list_configs()

# Get a specific config
config = client.get_config("uuid-here")

# Create a new config
config = client.create_config(
    name="My Scraper",
    target_url="https://example.com",
    yaml_content="...",
    prompt="..."
)

# Delete a config
client.delete_config("uuid-here")

# Download YAML to file
client.download_config("uuid-here", "output.yaml")

# Upload YAML from file
config = client.upload_config("config.yaml")

YAML Utilities

# Load YAML from file
config = Krawly.load_yaml("config.yaml")

# Parse YAML string
config = Krawly.parse_yaml(yaml_string)

# Save dict as YAML
Krawly.save_yaml(config_dict, "output.yaml")

Models

UserInfo

Attribute	Type	Description
`email`	str	Account email
`plan`	str	Current plan name
`scrapes_remaining`	int	Scrapes left this month
`generations_remaining`	int	Generations left this month
`has_api_access`	bool	Whether API access is enabled

JobStatus

Attribute	Type	Description
`id`	str	Job UUID
`status`	str	Current status
`progress`	int	Progress 0-100
`is_running`	bool	True if job is active
`is_completed`	bool	True if job finished
`is_failed`	bool	True if job failed
`items_found`	int	Number of items scraped
`pages_scraped`	int	Number of pages visited

ScrapingResult

Attribute	Type	Description
`data`	list[dict]	Scraped data rows
`row_count`	int	Total number of rows

Exceptions

Exception	Description
`KrawlyError`	Base exception for all SDK errors
`AuthenticationError`	Invalid or missing API key (401)
`QuotaExceededError`	Monthly limit reached (429)
`RateLimitError`	Too many requests (429)

Examples

Progress Tracking

from krawly import Krawly

client = Krawly(api_key="sai_...")

def on_progress(status):
    print(f"[{status.progress}%] {status.status} — {status.status_message}")

result = client.generate(
    url="https://news.ycombinator.com",
    prompt="Scrape story titles, points, authors, and links"
)

# Wait with real-time progress
final = client.wait_for_completion(result.job_id, on_progress=on_progress)
print(f"Done! Found {final.items_found} items")

Batch Scraping

from krawly import Krawly

client = Krawly(api_key="sai_...")
urls = [
    ("https://books.toscrape.com", "Get all book titles and prices"),
    ("https://quotes.toscrape.com", "Get all quotes, authors, and tags"),
]

for url, prompt in urls:
    try:
        results = client.scrape(url=url, prompt=prompt)
        print(f"✅ {url}: {len(results.data)} items")
    except Exception as e:
        print(f"❌ {url}: {e}")

Working with YAML Files

from krawly import Krawly

client = Krawly(api_key="sai_...")

# Upload a local YAML file as a config
config = client.upload_config("my_scraper.yaml")
print(f"Created config: {config.name} ({config.id})")

# Run the config on the server
results = client.scrape_with_file("my_scraper.yaml")
print(f"Got {len(results.data)} items")

Chrome Extension

Installation

The Krawly Chrome Extension lets you run scraping configs directly in your browser.

Download the extension ZIP from your Krawly dashboard or the Krawly website
Open chrome://extensions/ in Chrome
Enable Developer mode (toggle in the top right)
Click "Load unpacked" and select the extracted extension folder
The Krawly icon (⚡) will appear in your toolbar

✅ Chrome Web Store: The extension will be available on the Chrome Web Store soon. For now, use the developer mode installation method above.

Login & Setup

Click the Krawly icon in your toolbar to open the extension popup.

Email & Password: Enter your Krawly credentials and click "Sign In"
API Key: Alternatively, paste your API key (starts with sai_) and click "Connect with API Key"

Once logged in, the extension will automatically sync your configs from Krawly and show your account info.

Usage

The extension has four tabs:

▶ Run Tab

Shows the current page's title and URL
Paste or import a YAML config into the editor
Click Run Full to execute the scraper on the current page
Click Quick Test to scrape the first page only (limited to 20 items)
Watch real-time progress with the progress bar
View results in a table and download as JSON, CSV, or XLSX

📥 Configs Tab

Shows all your configs synced from krawly.io
Click any config to load it into the Run tab
Import YAML manually by pasting into the text area

💾 Local Tab

Locally saved configs (stored in browser storage)
Configs imported from server or pasted manually are saved here
Delete individual configs with the trash button

👤 Account Tab

Shows your email, current plan, and API key
Copy your API key with one click
Clear local data or sign out

Exporting Data

After running a scraper, you can export results in three formats:

Format	Description
JSON	Standard JSON array with pretty-printing
CSV	Comma-separated values with UTF-8 BOM for Excel compatibility
XLSX	Native Excel spreadsheet format

You can also click 📋 Copy to copy all results as JSON to your clipboard.

YAML Configuration

Overview

Krawly uses YAML-based configurations to define scraping rules. Configs are portable, version-controlled, and can be generated automatically by AI or written manually.

💡 AI Generation: You don't need to write YAML manually! Use the playground at krawly.io or the client.generate() method to have AI create configs for you.

Basic Structure

version: "2.0"
name: "My Scraper"
description: "Scrapes product listings"

target:
  url: "https://example.com/products"
  engine: "heavy"             # "lightweight", "heavy", or "auto"
  browser:
    wait_selector: ".product"  # Wait for this element before scraping
    timeout: 15000

extract:
  container: ".product-card"   # CSS selector for each item
  type: "list"                 # "list", "single", or "table"
  fields:
    - name: "title"
      selector: "h2.title"
      type: "text"
    - name: "price"
      selector: ".price"
      type: "text"
      transform: ["strip_currency", "to_number"]
    - name: "url"
      selector: "a"
      type: "link"
    - name: "image"
      selector: "img"
      type: "image_url"

pagination:
  type: "click"
  selector: "a.next-page"
  max_pages: 5
  wait_after: 2000

output:
  format: "json"
  deduplicate: true
  deduplicate_fields: ["title", "url"]
  limit: 500

Field Types

Type	Description	Extra Params
`text`	Extract text content from element	—
`attribute`	Extract an HTML attribute	`attribute: "href"`
`html`	Extract inner HTML	—
`link`	Extract absolute URL from href	—
`image_url`	Extract image src or data-src	—
`count`	Count matching elements	—
`exists`	Check if element exists (true/false)	—
`regex`	Extract text via regex pattern	`pattern: "\\d+"`
`eval`	Execute JavaScript expression	`script: "el.innerText"`

Transforms

Apply data transformations to extracted values:

Transform	Description
`strip`	Trim whitespace
`lowercase`	Convert to lowercase
`uppercase`	Convert to uppercase
`strip_currency`	Remove currency symbols (£$€¥₺)
`to_number`	Parse as number
`extract_number`	Extract first number from text
`strip_html`	Remove HTML tags
`trim_whitespace`	Collapse multiple spaces
`regex:PATTERN`	Extract via regex
`replace:OLD:NEW`	Replace text
`split:DELIM:INDEX`	Split and take index
`slice:START:END`	Substring slice
`template:TEXT{value}TEXT`	Template string

Pagination

Krawly supports three pagination strategies:

Click Pagination

pagination:
  type: "click"
  selector: "button.load-more"    # CSS selector for next button
  max_pages: 10
  wait_after: 2000                # ms to wait after click

URL Pagination

pagination:
  type: "url"
  url_pattern: "https://example.com/page/{page}"
  max_pages: 10
  wait_after: 1500

Infinite Scroll

pagination:
  type: "scroll_infinite"
  max_pages: 5
  wait_after: 3000

Stop Conditions

pagination:
  type: "click"
  selector: ".next"
  max_pages: 20
  stop_condition:
    max_items: 500    # Stop after collecting this many items

Actions

Pre-scraping actions that run before data extraction:

actions:
  - type: "click"
    selector: "#accept-cookies"
  - type: "scroll"
    direction: "down"
    amount: "full"
  - type: "type"
    selector: "#search-input"
    value: "laptop"
  - type: "select"
    selector: "#sort-by"
    value: "price_low"
  - type: "hover"
    selector: ".dropdown-trigger"
  - type: "keyboard"
    key: "Enter"
  - type: "script"
    script: "window.scrollTo(0, 500)"

Complete Examples

E-Commerce Product Scraper

version: "2.0"
name: "Product Scraper"
description: "Scrapes product listings with prices and images"

target:
  url: "https://shop.example.com/category"
  engine: "heavy"
  browser:
    wait_selector: ".product-grid"

extract:
  container: ".product-card"
  type: "list"
  fields:
    - name: "title"
      selector: "h3.product-name"
      type: "text"
    - name: "price"
      selector: ".price-current"
      type: "text"
      transform: ["strip_currency", "to_number"]
    - name: "original_price"
      selector: ".price-original"
      type: "text"
      transform: ["strip_currency", "to_number"]
      default: null
    - name: "rating"
      selector: ".star-rating"
      type: "attribute"
      attribute: "data-rating"
    - name: "image_url"
      selector: "img.product-image"
      type: "image_url"
    - name: "product_url"
      selector: "a.product-link"
      type: "link"

detail_fields:
  fields:
    - name: "description"
      selector: ".product-description"
      type: "text"
    - name: "sku"
      selector: "#product-sku"
      type: "text"
    - name: "in_stock"
      selector: ".stock-status.available"
      type: "exists"

pagination:
  type: "click"
  selector: ".pagination .next"
  max_pages: 10
  wait_after: 2000

output:
  format: "json"
  deduplicate: true
  deduplicate_fields: ["product_url"]

News Article Scraper

version: "2.0"
name: "News Scraper"
description: "Scrapes news article headlines"

target:
  url: "https://news.ycombinator.com"
  engine: "lightweight"

extract:
  container: "tr.athing"
  type: "list"
  fields:
    - name: "rank"
      selector: ".rank"
      type: "text"
      transform: ["replace:.:"]
    - name: "title"
      selector: ".titleline > a"
      type: "text"
    - name: "url"
      selector: ".titleline > a"
      type: "link"
    - name: "site"
      selector: ".sitestr"
      type: "text"
      default: "news.ycombinator.com"

pagination:
  type: "click"
  selector: "a.morelink"
  max_pages: 3

Plans & Limits

Feature	Free	Starter ($15/mo)	Pro ($29/mo)
Monthly Scrapes	3	20	100
API Access	✅	✅	✅
Chrome Extension	✅	✅	✅
Python SDK	✅	✅	✅
Server Execution	—	—	✅
Priority Support	—	✅	✅

💡 All plans include API access. You can use the REST API, Python SDK, and Chrome Extension on every plan — including Free.

Error Handling

The API uses standard HTTP status codes:

Code	Meaning	Action
`200`	Success	—
`201`	Created	Resource was created
`202`	Accepted	Async job was queued
`400`	Bad Request	Check your request data
`401`	Unauthorized	Check your API key
`404`	Not Found	Resource doesn't exist
`429`	Too Many Requests	Rate limited or quota exceeded
`500`	Server Error	Contact support

SDK Error Handling

from krawly import Krawly, KrawlyError, AuthenticationError, QuotaExceededError

client = Krawly(api_key="sai_...")

try:
    results = client.scrape(url="https://example.com", prompt="Get products")
except AuthenticationError:
    print("Invalid API key. Check your credentials.")
except QuotaExceededError as e:
    print(f"Monthly limit reached: {e}")
except KrawlyError as e:
    print(f"Krawly error: {e}")