early_learning.bulk_import ========================== .. py:module:: early_learning.bulk_import Attributes ---------- .. autoapisummary:: early_learning.bulk_import.SEARCH_URL early_learning.bulk_import.BASE_URL early_learning.bulk_import.MAX_RESULTS early_learning.bulk_import.HEADERS early_learning.bulk_import.plugin early_learning.bulk_import.records Functions --------- .. autoapisummary:: early_learning.bulk_import._absolute_url early_learning.bulk_import._text early_learning.bulk_import._text_list early_learning.bulk_import.scrape_resource_page early_learning.bulk_import.scrape_search_results early_learning.bulk_import.bulk_import Module Contents --------------- .. py:data:: SEARCH_URL :value: 'https://www.earlylearningresourcenetwork.org/books/search?f%5B0%5D=language%3A712' .. py:data:: BASE_URL :value: 'https://www.earlylearningresourcenetwork.org' .. py:data:: MAX_RESULTS :value: 8 .. py:data:: HEADERS .. py:data:: plugin .. py:function:: _absolute_url(href: str) -> str Return an absolute URL, prepending BASE_URL when href is relative. .. py:function:: _text(soup: bs4.BeautifulSoup, selector: str) -> str Return stripped text for the first matching CSS selector, or ''. .. py:function:: _text_list(soup: bs4.BeautifulSoup, selector: str) -> list[str] Return a list of stripped text values for all matching CSS selectors. .. py:function:: scrape_resource_page(client: httpx.Client, url: str) -> server.plugins.early_learning.early_learning_models.EarlyLearningItem Fetch a single resource page and extract metadata. The Early Learning Resource Network is a Drupal site. Field markup follows the standard Drupal 9/10 pattern: ``div.field--name-field-``. .. py:function:: scrape_search_results(client: httpx.Client, url: str = SEARCH_URL, max_results: int = MAX_RESULTS) -> list[str] Fetch the search results page and return up to *max_results* resource URLs. Drupal Views renders search results as ``
`` elements. We look for the canonical ``

`` / ``

`` title links used by many Drupal themes. .. py:function:: bulk_import(url: str = SEARCH_URL, max_results: int = MAX_RESULTS) -> list[server.plugins.early_learning.early_learning_models.EarlyLearningItem] Scrape the first *max_results* English-language resources from the Early Learning Resource Network search page and return them as a list of ``EarlyLearningItem`` objects. Results are cached locally in ``early_learning_resources.json`` so that subsequent runs do not re-fetch the site. .. py:data:: records