early_learning.bulk_import¶
Attributes¶
Functions¶
|
Return an absolute URL, prepending BASE_URL when href is relative. |
|
Return stripped text for the first matching CSS selector, or ''. |
|
Return a list of stripped text values for all matching CSS selectors. |
|
Fetch a single resource page and extract metadata. |
|
Fetch the search results page and return up to max_results resource URLs. |
|
Scrape the first max_results English-language resources from the Early |
Module Contents¶
- early_learning.bulk_import.SEARCH_URL = 'https://www.earlylearningresourcenetwork.org/books/search?f%5B0%5D=language%3A712'¶
- early_learning.bulk_import.BASE_URL = 'https://www.earlylearningresourcenetwork.org'¶
- early_learning.bulk_import.MAX_RESULTS = 8¶
- early_learning.bulk_import.HEADERS¶
- early_learning.bulk_import.plugin¶
- early_learning.bulk_import._absolute_url(href: str) str¶
Return an absolute URL, prepending BASE_URL when href is relative.
- early_learning.bulk_import._text(soup: bs4.BeautifulSoup, selector: str) str¶
Return stripped text for the first matching CSS selector, or ‘’.
- early_learning.bulk_import._text_list(soup: bs4.BeautifulSoup, selector: str) list[str]¶
Return a list of stripped text values for all matching CSS selectors.
- early_learning.bulk_import.scrape_resource_page(client: httpx.Client, url: str) server.plugins.early_learning.early_learning_models.EarlyLearningItem¶
Fetch a single resource page and extract metadata.
The Early Learning Resource Network is a Drupal site. Field markup follows the standard Drupal 9/10 pattern:
div.field--name-field-<name>.
- early_learning.bulk_import.scrape_search_results(client: httpx.Client, url: str = SEARCH_URL, max_results: int = MAX_RESULTS) list[str]¶
Fetch the search results page and return up to max_results resource URLs.
Drupal Views renders search results as
<article>elements. We look for the canonical<h3 class="node__title">/<h2 class="node__title">title links used by many Drupal themes.
- early_learning.bulk_import.bulk_import(url: str = SEARCH_URL, max_results: int = MAX_RESULTS) list[server.plugins.early_learning.early_learning_models.EarlyLearningItem]¶
Scrape the first max_results English-language resources from the Early Learning Resource Network search page and return them as a list of
EarlyLearningItemobjects.Results are cached locally in
early_learning_resources.jsonso that subsequent runs do not re-fetch the site.
- early_learning.bulk_import.records¶