early_learning.bulk_import

Attributes

SEARCH_URL

BASE_URL

MAX_RESULTS

HEADERS

plugin

records

Functions

_absolute_url(→ str)

Return an absolute URL, prepending BASE_URL when href is relative.

_text(→ str)

Return stripped text for the first matching CSS selector, or ''.

_text_list(→ list[str])

Return a list of stripped text values for all matching CSS selectors.

scrape_resource_page(...)

Fetch a single resource page and extract metadata.

scrape_search_results(→ list[str])

Fetch the search results page and return up to max_results resource URLs.

bulk_import(...)

Scrape the first max_results English-language resources from the Early

Module Contents

early_learning.bulk_import.SEARCH_URL = 'https://www.earlylearningresourcenetwork.org/books/search?f%5B0%5D=language%3A712'
early_learning.bulk_import.BASE_URL = 'https://www.earlylearningresourcenetwork.org'
early_learning.bulk_import.MAX_RESULTS = 8
early_learning.bulk_import.HEADERS
early_learning.bulk_import.plugin
early_learning.bulk_import._absolute_url(href: str) str

Return an absolute URL, prepending BASE_URL when href is relative.

early_learning.bulk_import._text(soup: bs4.BeautifulSoup, selector: str) str

Return stripped text for the first matching CSS selector, or ‘’.

early_learning.bulk_import._text_list(soup: bs4.BeautifulSoup, selector: str) list[str]

Return a list of stripped text values for all matching CSS selectors.

early_learning.bulk_import.scrape_resource_page(client: httpx.Client, url: str) server.plugins.early_learning.early_learning_models.EarlyLearningItem

Fetch a single resource page and extract metadata.

The Early Learning Resource Network is a Drupal site. Field markup follows the standard Drupal 9/10 pattern: div.field--name-field-<name>.

early_learning.bulk_import.scrape_search_results(client: httpx.Client, url: str = SEARCH_URL, max_results: int = MAX_RESULTS) list[str]

Fetch the search results page and return up to max_results resource URLs.

Drupal Views renders search results as <article> elements. We look for the canonical <h3 class="node__title"> / <h2 class="node__title"> title links used by many Drupal themes.

early_learning.bulk_import.bulk_import(url: str = SEARCH_URL, max_results: int = MAX_RESULTS) list[server.plugins.early_learning.early_learning_models.EarlyLearningItem]

Scrape the first max_results English-language resources from the Early Learning Resource Network search page and return them as a list of EarlyLearningItem objects.

Results are cached locally in early_learning_resources.json so that subsequent runs do not re-fetch the site.

early_learning.bulk_import.records