# Scrape

**Scrape Content of a Given URL**

**POST /env/scrape**

This endpoint scrapes the content of a given URL within the session’s environment. If a session ID is not provided, a new session is created. If a session ID is provided and no URL is provided, the current page in the session is scraped.

**Request Example:**

```bash
bashCopyEditcurl --location \
--request POST 'https://api.cros.one/env/scrape' \
--header 'Authorization: Bearer your-api-key' \
--header 'Content-Type: application/json' \
--data '{
  "session_id": "1234567890abcdef",
  "url": "https://example.com",
  "only_main_content": true,
  "scrape_images": false,
  "screenshot": true,
  "session_timeout_minutes": 10
}'
```

**Response Example:**

```json
jsonCopyEdit{
  "metadata": {
    "url": "https://example.com",
    "page_title": "Example Website",
    "timestamp": "2025-02-06T14:00:00.000Z"
  },
  "session": {
    "session_id": "1234567890abcdef",
    "status": "active",
    "last_accessed_at": "2025-02-06T14:05:00.000Z",
    "timeout_minutes": 10
  },
  "data": {
    "main_content": "<div><h1>Example Page</h1><p>This is the main content of the page.</p></div>"
  },
  "screenshot": "base64_encoded_image_data_here",
  "space": {
    "description": "Available actions on the current page",
    "actions": [
      {
        "id": "I1",
        "description": "Click on the login button",
        "category": "User Interaction"
      }
    ]
  }
}
```

**Fields in the Request Body:**

* **session\_id** (string | null): The ID of the session. If not provided, a new session will be created.
* **url** (string | null): The URL to scrape. If not provided, uses the current page URL.
* **keep\_alive** (boolean, default: `false`): If true, the session will not be closed after the operation is completed.
* **max\_nb\_actions** (integer, default: `100`): The maximum number of actions to list. The listing will stop after this number is reached.
* **min\_nb\_actions** (integer | null): The minimum number of actions to list before stopping. If not provided, the listing will continue until the maximum number of actions is reached.
* **only\_main\_content** (boolean, default: `true`): If true, only the main content of the page will be scraped, excluding elements like navbars, footers, etc.
* **scrape\_images** (boolean, default: `false`): If true, images will be scraped from the page.
* **screenshot** (boolean | null): Whether to include a screenshot in the response.
* **session\_timeout\_minutes** (integer, default: `5`): Session timeout in minutes. Cannot exceed the global timeout. Required range: `0 < x < 30`.

**Fields in the Response:**

* **metadata** (object): Metadata of the current page (e.g., URL, page title, and timestamp).
* **session** (object): Browser session information, including the session ID, status, last accessed time, and timeout.
* **data** (object | null): Extracted data from the page, such as the main content scraped or other information.
* **screenshot** (file | null): A base64-encoded screenshot of the current page, if requested.
* **space** (object | null): Available actions on the current page, such as clickable elements or form submissions.
