Any Article¶
Scraper to extract the main article from any web page using OpenAI.
Example usage:
import asyncio
from webquest.browsers import Hyperbrowser
from webquest.scrapers import AnyArticle
async def main():
scraper = AnyArticle(browser=Hyperbrowser())
response = await scraper.run(
scraper.request_model(url="https://example.com/article"),
)
print(response.model_dump_json(indent=4))
if __name__ == "__main__":
asyncio.run(main())
Settings¶
AnyArticleSettings
Configuration settings for the Any Article scraper.
| Name | Type | Default | Description |
|---|---|---|---|
character_limit |
int |
5000 |
The maximum number of characters to parse. |
parser_model |
str |
gpt-5-mini |
The OpenAI model to use for parsing. |
openai_api_key |
SecretStr | None |
None |
The API key for OpenAI. |
Request¶
AnyArticleRequest
Represents a request to extract an article from a web page.
| Name | Type | Default | Description |
|---|---|---|---|
url |
str |
Required | The URL of the web page to extract the article from. |
Response¶
AnyArticleResponse
Represents the extracted article content.
| Name | Type | Default | Description |
|---|---|---|---|
publisher |
str |
Required | The name of the publisher. |
title |
str |
Required | The title of the article. |
published_at |
str |
Required | The publication date of the article. |
authors |
list[str] |
Required | The list of authors of the article. |
content |
str |
Required | The main content of the article. |