Krawly Documentation
Krawly is an AI-powered web scraping platform. Describe what you want to extract, and Krawly generates a YAML configuration that handles pagination, detail pages, and data transformation — no code required.
AI-Powered
Describe your target in plain text. Krawly's AI generates the scraping config automatically.
Python SDK
pip install krawly — Full API client with progress tracking and local execution.
Chrome Extension
Run scrapers directly in your browser. Login, import configs, and export results.
REST API
Full API access on all plans. Generate configs, run jobs, and retrieve results programmatically.
Quickstart
Get started with Krawly in under 2 minutes.
1. Create an account
Sign up at krawly.io/accounts/register. The free plan includes 3 scraping jobs per month with full API access.
2. Get your API key
Go to your profile page and copy your API key (starts with sai_).
3. Generate a scraping config
Use the playground at krawly.io/dashboard or the API:
curl -X POST https://krawly.io/api/v1/generate/ \
-H "Authorization: Bearer sai_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://books.toscrape.com",
"prompt": "Scrape all book titles, prices, ratings, and availability"
}'
4. Or use the Python SDK
pip install krawly
from krawly import Krawly
client = Krawly(api_key="sai_your_api_key")
# Generate a config from natural language
result = client.generate(
url="https://books.toscrape.com",
prompt="Scrape all book titles, prices, and ratings"
)
# Run the generated config and get results
results = client.scrape(
url="https://books.toscrape.com",
prompt="Scrape all book titles, prices, and ratings"
)
print(f"Scraped {len(results.data)} items")
Authentication
All API requests (except login) require authentication using your API key in the Authorization header:
Authorization: Bearer sai_your_api_key
You can also authenticate via the login endpoint using your email and password:
POST /api/v1/auth/login/
Content-Type: application/json
{
"email": "you@example.com",
"password": "your_password"
}
# Response:
{
"api_key": "sai_abc123...",
"email": "you@example.com",
"plan": "starter"
}
API Reference
The Krawly API is a RESTful API with JSON request/response format. Base URL: https://krawly.io/api/v1/
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/auth/login/ |
Login with email/password, returns API key |
| GET | /api/v1/me/ |
Get account info and subscription |
| GET | /api/v1/configs/ |
List your YAML configs |
| POST | /api/v1/configs/ |
Create a new config |
| GET | /api/v1/configs/{id}/ |
Get a specific config |
| DELETE | /api/v1/configs/{id}/ |
Delete a config |
| POST | /api/v1/generate/ |
Generate YAML from URL + prompt |
| POST | /api/v1/run/ |
Run a YAML config (server-side) |
| GET | /api/v1/jobs/ |
List your scraping jobs |
| GET | /api/v1/jobs/{id}/status/ |
Get job status and progress |
| GET | /api/v1/jobs/{id}/results/ |
Get job results |
POST /api/v1/auth/login/
Authenticate with email and password to receive your API key. No authentication header required.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
email | string | Yes | Your account email |
password | string | Yes | Your account password |
Response (200)
{
"api_key": "sai_a1b2c3d4e5f6...",
"email": "user@example.com",
"plan": "starter"
}
Error Response (401)
{
"error": "Invalid email or password."
}
GET /api/v1/me/
Get your account information and subscription details.
Response
{
"user": {
"id": 1,
"email": "user@example.com",
"first_name": "John",
"last_name": "Doe"
},
"subscription": {
"plan": "starter",
"yaml_generations_remaining": 15,
"monthly_scrapes_remaining": 20,
"has_api_access": true,
"has_server_execution": false
}
}
Configs CRUD
GET /api/v1/configs/
List all your YAML configurations.
{
"count": 3,
"results": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "Books Scraper",
"target_url": "https://books.toscrape.com",
"created_at": "2026-01-15T10:30:00Z"
}
]
}
POST /api/v1/configs/
Create a new YAML configuration.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Config name |
target_url | string | Yes | Target website URL |
yaml_content | string | Yes | Full YAML config content |
prompt | string | No | Description of what to scrape |
GET /api/v1/configs/{id}/
Get a specific configuration including the full YAML content.
DELETE /api/v1/configs/{id}/
Delete a configuration.
POST /api/v1/generate/
Generate a YAML scraping configuration using AI. The AI agent analyzes the target website, discovers API endpoints, handles JavaScript-rendered pages, and produces an optimized config.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
url | string | Yes | Target website URL |
prompt | string | Yes | Natural language description of what to scrape |
Response (202 Accepted)
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "pending",
"progress": 0,
"job_type": "generate",
"target_url": "https://books.toscrape.com",
"created_at": "2026-01-15T10:30:00Z"
}
/api/v1/jobs/{id}/status/ to track progress.
Example
curl -X POST https://krawly.io/api/v1/generate/ \
-H "Authorization: Bearer sai_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://news.ycombinator.com",
"prompt": "Scrape all story titles, points, links, and comment counts"
}'
POST /api/v1/run/
Execute a YAML config on Krawly's servers. Provide either a config_id or yaml_content.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
config_id | UUID | One of | ID of a saved config to run |
yaml_content | string | One of | Raw YAML config content |
Response (202 Accepted)
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "pending",
"progress": 0,
"job_type": "run"
}
Jobs
GET /api/v1/jobs/
List all your scraping jobs with status and progress.
GET /api/v1/jobs/{id}/status/
Get the current status and progress of a specific job. Use this to poll for completion.
Job Status Values
| Status | Description |
|---|---|
pending | Job is queued and waiting to start |
analyzing | AI is analyzing the target website |
generating | AI is generating the YAML configuration |
validating | Testing the generated config |
running | Scraper is executing |
completed | Job finished successfully |
failed | Job failed with an error |
Response
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"progress": 100,
"status_message": "Scraped 250 items",
"items_found": 250,
"pages_scraped": 5,
"api_discovered": true,
"created_at": "2026-01-15T10:30:00Z",
"completed_at": "2026-01-15T10:31:12Z",
"results": { "data": [...], "row_count": 250 },
"yaml_config": { "id": "...", "yaml_content": "..." }
}
GET /api/v1/jobs/{id}/results/
Get the scraped data from a completed job.
Response
[
{
"data": [
{ "title": "Book Title", "price": "£29.99", "rating": "4 stars" },
{ "title": "Another Book", "price": "£15.50", "rating": "5 stars" }
],
"row_count": 250
}
]
Python SDK
Installation
pip install krawly
The SDK requires Python 3.8+ and has minimal dependencies (requests and pyyaml).
Quick Start
from krawly import Krawly
# Initialize the client
client = Krawly(api_key="sai_your_api_key")
# Check your account
info = client.me()
print(f"Plan: {info.plan}")
print(f"Scrapes remaining: {info.scrapes_remaining}")
# One-liner: generate + run + get results
results = client.scrape(
url="https://books.toscrape.com",
prompt="Scrape all book titles, prices, and ratings"
)
print(f"Got {len(results.data)} items")
# Save results
import json
with open("results.json", "w") as f:
json.dump(results.data, f, indent=2)
Client Reference
Constructor
Krawly(api_key="sai_...", base_url="https://krawly.io")
| Param | Type | Default | Description |
|---|---|---|---|
api_key | str | — | Your API key (required) |
base_url | str | https://krawly.io | API base URL |
Account
# Get account info
info = client.me() # Returns UserInfo
YAML Generation
# Generate a YAML config from natural language
result = client.generate(url="...", prompt="...")
# Returns GenerationResult with .job_id, .yaml_content, .config
Running Scrapers
# Run by config_id
job = client.run(config_id="uuid-here")
# Run raw YAML
job = client.run_yaml(yaml_content="version: '2.0'...")
# Wait for completion with progress
status = client.wait_for_completion(job.id, on_progress=lambda s: print(s.progress))
# Get results
results = client.job_results(job.id)
One-Liner Scraping
# Generate + run + wait + return results in one call
results = client.scrape(url="...", prompt="...")
# Same, but with a YAML file
results = client.scrape_with_file("config.yaml")
Config Management
# List configs
configs = client.list_configs()
# Get a specific config
config = client.get_config("uuid-here")
# Create a new config
config = client.create_config(
name="My Scraper",
target_url="https://example.com",
yaml_content="...",
prompt="..."
)
# Delete a config
client.delete_config("uuid-here")
# Download YAML to file
client.download_config("uuid-here", "output.yaml")
# Upload YAML from file
config = client.upload_config("config.yaml")
YAML Utilities
# Load YAML from file
config = Krawly.load_yaml("config.yaml")
# Parse YAML string
config = Krawly.parse_yaml(yaml_string)
# Save dict as YAML
Krawly.save_yaml(config_dict, "output.yaml")
Models
UserInfo
| Attribute | Type | Description |
|---|---|---|
email | str | Account email |
plan | str | Current plan name |
scrapes_remaining | int | Scrapes left this month |
generations_remaining | int | Generations left this month |
has_api_access | bool | Whether API access is enabled |
JobStatus
| Attribute | Type | Description |
|---|---|---|
id | str | Job UUID |
status | str | Current status |
progress | int | Progress 0-100 |
is_running | bool | True if job is active |
is_completed | bool | True if job finished |
is_failed | bool | True if job failed |
items_found | int | Number of items scraped |
pages_scraped | int | Number of pages visited |
ScrapingResult
| Attribute | Type | Description |
|---|---|---|
data | list[dict] | Scraped data rows |
row_count | int | Total number of rows |
Exceptions
| Exception | Description |
|---|---|
KrawlyError | Base exception for all SDK errors |
AuthenticationError | Invalid or missing API key (401) |
QuotaExceededError | Monthly limit reached (429) |
RateLimitError | Too many requests (429) |
Examples
Progress Tracking
from krawly import Krawly
client = Krawly(api_key="sai_...")
def on_progress(status):
print(f"[{status.progress}%] {status.status} — {status.status_message}")
result = client.generate(
url="https://news.ycombinator.com",
prompt="Scrape story titles, points, authors, and links"
)
# Wait with real-time progress
final = client.wait_for_completion(result.job_id, on_progress=on_progress)
print(f"Done! Found {final.items_found} items")
Batch Scraping
from krawly import Krawly
client = Krawly(api_key="sai_...")
urls = [
("https://books.toscrape.com", "Get all book titles and prices"),
("https://quotes.toscrape.com", "Get all quotes, authors, and tags"),
]
for url, prompt in urls:
try:
results = client.scrape(url=url, prompt=prompt)
print(f"✅ {url}: {len(results.data)} items")
except Exception as e:
print(f"❌ {url}: {e}")
Working with YAML Files
from krawly import Krawly
client = Krawly(api_key="sai_...")
# Upload a local YAML file as a config
config = client.upload_config("my_scraper.yaml")
print(f"Created config: {config.name} ({config.id})")
# Run the config on the server
results = client.scrape_with_file("my_scraper.yaml")
print(f"Got {len(results.data)} items")
Chrome Extension
Installation
The Krawly Chrome Extension lets you run scraping configs directly in your browser.
- Download the extension ZIP from your Krawly dashboard or the Krawly website
- Open
chrome://extensions/in Chrome - Enable Developer mode (toggle in the top right)
- Click "Load unpacked" and select the extracted extension folder
- The Krawly icon (⚡) will appear in your toolbar
Login & Setup
Click the Krawly icon in your toolbar to open the extension popup.
- Email & Password: Enter your Krawly credentials and click "Sign In"
- API Key: Alternatively, paste your API key (starts with
sai_) and click "Connect with API Key"
Once logged in, the extension will automatically sync your configs from Krawly and show your account info.
Usage
The extension has four tabs:
▶ Run Tab
- Shows the current page's title and URL
- Paste or import a YAML config into the editor
- Click Run Full to execute the scraper on the current page
- Click Quick Test to scrape the first page only (limited to 20 items)
- Watch real-time progress with the progress bar
- View results in a table and download as JSON, CSV, or XLSX
📥 Configs Tab
- Shows all your configs synced from krawly.io
- Click any config to load it into the Run tab
- Import YAML manually by pasting into the text area
💾 Local Tab
- Locally saved configs (stored in browser storage)
- Configs imported from server or pasted manually are saved here
- Delete individual configs with the trash button
👤 Account Tab
- Shows your email, current plan, and API key
- Copy your API key with one click
- Clear local data or sign out
Exporting Data
After running a scraper, you can export results in three formats:
| Format | Description |
|---|---|
| JSON | Standard JSON array with pretty-printing |
| CSV | Comma-separated values with UTF-8 BOM for Excel compatibility |
| XLSX | Native Excel spreadsheet format |
You can also click 📋 Copy to copy all results as JSON to your clipboard.
YAML Configuration
Overview
Krawly uses YAML-based configurations to define scraping rules. Configs are portable, version-controlled, and can be generated automatically by AI or written manually.
client.generate() method to have AI create configs for you.
Basic Structure
version: "2.0"
name: "My Scraper"
description: "Scrapes product listings"
target:
url: "https://example.com/products"
engine: "heavy" # "lightweight", "heavy", or "auto"
browser:
wait_selector: ".product" # Wait for this element before scraping
timeout: 15000
extract:
container: ".product-card" # CSS selector for each item
type: "list" # "list", "single", or "table"
fields:
- name: "title"
selector: "h2.title"
type: "text"
- name: "price"
selector: ".price"
type: "text"
transform: ["strip_currency", "to_number"]
- name: "url"
selector: "a"
type: "link"
- name: "image"
selector: "img"
type: "image_url"
pagination:
type: "click"
selector: "a.next-page"
max_pages: 5
wait_after: 2000
output:
format: "json"
deduplicate: true
deduplicate_fields: ["title", "url"]
limit: 500
Field Types
| Type | Description | Extra Params |
|---|---|---|
text | Extract text content from element | — |
attribute | Extract an HTML attribute | attribute: "href" |
html | Extract inner HTML | — |
link | Extract absolute URL from href | — |
image_url | Extract image src or data-src | — |
count | Count matching elements | — |
exists | Check if element exists (true/false) | — |
regex | Extract text via regex pattern | pattern: "\\d+" |
eval | Execute JavaScript expression | script: "el.innerText" |
Transforms
Apply data transformations to extracted values:
| Transform | Description |
|---|---|
strip | Trim whitespace |
lowercase | Convert to lowercase |
uppercase | Convert to uppercase |
strip_currency | Remove currency symbols (£$€¥₺) |
to_number | Parse as number |
extract_number | Extract first number from text |
strip_html | Remove HTML tags |
trim_whitespace | Collapse multiple spaces |
regex:PATTERN | Extract via regex |
replace:OLD:NEW | Replace text |
split:DELIM:INDEX | Split and take index |
slice:START:END | Substring slice |
template:TEXT{value}TEXT | Template string |
Pagination
Krawly supports three pagination strategies:
Click Pagination
pagination:
type: "click"
selector: "button.load-more" # CSS selector for next button
max_pages: 10
wait_after: 2000 # ms to wait after click
URL Pagination
pagination:
type: "url"
url_pattern: "https://example.com/page/{page}"
max_pages: 10
wait_after: 1500
Infinite Scroll
pagination:
type: "scroll_infinite"
max_pages: 5
wait_after: 3000
Stop Conditions
pagination:
type: "click"
selector: ".next"
max_pages: 20
stop_condition:
max_items: 500 # Stop after collecting this many items
Actions
Pre-scraping actions that run before data extraction:
actions:
- type: "click"
selector: "#accept-cookies"
- type: "scroll"
direction: "down"
amount: "full"
- type: "type"
selector: "#search-input"
value: "laptop"
- type: "select"
selector: "#sort-by"
value: "price_low"
- type: "hover"
selector: ".dropdown-trigger"
- type: "keyboard"
key: "Enter"
- type: "script"
script: "window.scrollTo(0, 500)"
Complete Examples
E-Commerce Product Scraper
version: "2.0"
name: "Product Scraper"
description: "Scrapes product listings with prices and images"
target:
url: "https://shop.example.com/category"
engine: "heavy"
browser:
wait_selector: ".product-grid"
extract:
container: ".product-card"
type: "list"
fields:
- name: "title"
selector: "h3.product-name"
type: "text"
- name: "price"
selector: ".price-current"
type: "text"
transform: ["strip_currency", "to_number"]
- name: "original_price"
selector: ".price-original"
type: "text"
transform: ["strip_currency", "to_number"]
default: null
- name: "rating"
selector: ".star-rating"
type: "attribute"
attribute: "data-rating"
- name: "image_url"
selector: "img.product-image"
type: "image_url"
- name: "product_url"
selector: "a.product-link"
type: "link"
detail_fields:
fields:
- name: "description"
selector: ".product-description"
type: "text"
- name: "sku"
selector: "#product-sku"
type: "text"
- name: "in_stock"
selector: ".stock-status.available"
type: "exists"
pagination:
type: "click"
selector: ".pagination .next"
max_pages: 10
wait_after: 2000
output:
format: "json"
deduplicate: true
deduplicate_fields: ["product_url"]
News Article Scraper
version: "2.0"
name: "News Scraper"
description: "Scrapes news article headlines"
target:
url: "https://news.ycombinator.com"
engine: "lightweight"
extract:
container: "tr.athing"
type: "list"
fields:
- name: "rank"
selector: ".rank"
type: "text"
transform: ["replace:.:"]
- name: "title"
selector: ".titleline > a"
type: "text"
- name: "url"
selector: ".titleline > a"
type: "link"
- name: "site"
selector: ".sitestr"
type: "text"
default: "news.ycombinator.com"
pagination:
type: "click"
selector: "a.morelink"
max_pages: 3
Plans & Limits
| Feature | Free | Starter ($15/mo) | Pro ($29/mo) |
|---|---|---|---|
| Monthly Scrapes | 3 | 20 | 100 |
| API Access | ✅ | ✅ | ✅ |
| Chrome Extension | ✅ | ✅ | ✅ |
| Python SDK | ✅ | ✅ | ✅ |
| Server Execution | — | — | ✅ |
| Priority Support | — | ✅ | ✅ |
Error Handling
The API uses standard HTTP status codes:
| Code | Meaning | Action |
|---|---|---|
200 | Success | — |
201 | Created | Resource was created |
202 | Accepted | Async job was queued |
400 | Bad Request | Check your request data |
401 | Unauthorized | Check your API key |
404 | Not Found | Resource doesn't exist |
429 | Too Many Requests | Rate limited or quota exceeded |
500 | Server Error | Contact support |
SDK Error Handling
from krawly import Krawly, KrawlyError, AuthenticationError, QuotaExceededError
client = Krawly(api_key="sai_...")
try:
results = client.scrape(url="https://example.com", prompt="Get products")
except AuthenticationError:
print("Invalid API key. Check your credentials.")
except QuotaExceededError as e:
print(f"Monthly limit reached: {e}")
except KrawlyError as e:
print(f"Krawly error: {e}")
© 2026 Krawly. All rights reserved. — krawly.io · support@krawly.io