Skip to content

Any Article

Scraper to extract the main article from any web page using OpenAI.

Example usage:

import asyncio
from webquest.browsers import Hyperbrowser
from webquest.scrapers import AnyArticle

async def main():
    scraper = AnyArticle(browser=Hyperbrowser())
    response = await scraper.run(
        scraper.request_model(url="https://example.com/article"),
    )
    print(response.model_dump_json(indent=4))

if __name__ == "__main__":
    asyncio.run(main())

Settings

AnyArticleSettings

Configuration settings for the Any Article scraper.

Name Type Default Description
character_limit int 5000 The maximum number of characters to parse.
parser_model str gpt-5-mini The OpenAI model to use for parsing.
openai_api_key SecretStr | None None The API key for OpenAI.

Request

AnyArticleRequest

Represents a request to extract an article from a web page.

Name Type Default Description
url str Required The URL of the web page to extract the article from.

Response

AnyArticleResponse

Represents the extracted article content.

Name Type Default Description
publisher str Required The name of the publisher.
title str Required The title of the article.
published_at str Required The publication date of the article.
authors list[str] Required The list of authors of the article.
content str Required The main content of the article.