Scrape
Scrape Content of a Given URL
POST /env/scrape
This endpoint scrapes the content of a given URL within the session’s environment. If a session ID is not provided, a new session is created. If a session ID is provided and no URL is provided, the current page in the session is scraped.
Request Example:
Response Example:
Fields in the Request Body:
session_id (string | null): The ID of the session. If not provided, a new session will be created.
url (string | null): The URL to scrape. If not provided, uses the current page URL.
keep_alive (boolean, default:
false
): If true, the session will not be closed after the operation is completed.max_nb_actions (integer, default:
100
): The maximum number of actions to list. The listing will stop after this number is reached.min_nb_actions (integer | null): The minimum number of actions to list before stopping. If not provided, the listing will continue until the maximum number of actions is reached.
only_main_content (boolean, default:
true
): If true, only the main content of the page will be scraped, excluding elements like navbars, footers, etc.scrape_images (boolean, default:
false
): If true, images will be scraped from the page.screenshot (boolean | null): Whether to include a screenshot in the response.
session_timeout_minutes (integer, default:
5
): Session timeout in minutes. Cannot exceed the global timeout. Required range:0 < x < 30
.
Fields in the Response:
metadata (object): Metadata of the current page (e.g., URL, page title, and timestamp).
session (object): Browser session information, including the session ID, status, last accessed time, and timeout.
data (object | null): Extracted data from the page, such as the main content scraped or other information.
screenshot (file | null): A base64-encoded screenshot of the current page, if requested.
space (object | null): Available actions on the current page, such as clickable elements or form submissions.
Last updated