early_learning.bulk_import
==========================

.. py:module:: early_learning.bulk_import


Attributes
----------

.. autoapisummary::

   early_learning.bulk_import.SEARCH_URL
   early_learning.bulk_import.BASE_URL
   early_learning.bulk_import.MAX_RESULTS
   early_learning.bulk_import.HEADERS
   early_learning.bulk_import.plugin
   early_learning.bulk_import.records


Functions
---------

.. autoapisummary::

   early_learning.bulk_import._absolute_url
   early_learning.bulk_import._text
   early_learning.bulk_import._text_list
   early_learning.bulk_import.scrape_resource_page
   early_learning.bulk_import.scrape_search_results
   early_learning.bulk_import.bulk_import


Module Contents
---------------

.. py:data:: SEARCH_URL
   :value: 'https://www.earlylearningresourcenetwork.org/books/search?f%5B0%5D=language%3A712'


.. py:data:: BASE_URL
   :value: 'https://www.earlylearningresourcenetwork.org'


.. py:data:: MAX_RESULTS
   :value: 8


.. py:data:: HEADERS

.. py:data:: plugin

.. py:function:: _absolute_url(href: str) -> str

   Return an absolute URL, prepending BASE_URL when href is relative.


.. py:function:: _text(soup: bs4.BeautifulSoup, selector: str) -> str

   Return stripped text for the first matching CSS selector, or ''.


.. py:function:: _text_list(soup: bs4.BeautifulSoup, selector: str) -> list[str]

   Return a list of stripped text values for all matching CSS selectors.


.. py:function:: scrape_resource_page(client: httpx.Client, url: str) -> server.plugins.early_learning.early_learning_models.EarlyLearningItem

   Fetch a single resource page and extract metadata.

   The Early Learning Resource Network is a Drupal site.  Field markup follows
   the standard Drupal 9/10 pattern: ``div.field--name-field-<name>``.


.. py:function:: scrape_search_results(client: httpx.Client, url: str = SEARCH_URL, max_results: int = MAX_RESULTS) -> list[str]

   Fetch the search results page and return up to *max_results* resource URLs.

   Drupal Views renders search results as ``<article>`` elements.  We look for
   the canonical ``<h3 class="node__title">`` / ``<h2 class="node__title">``
   title links used by many Drupal themes.


.. py:function:: bulk_import(url: str = SEARCH_URL, max_results: int = MAX_RESULTS) -> list[server.plugins.early_learning.early_learning_models.EarlyLearningItem]

   Scrape the first *max_results* English-language resources from the Early
   Learning Resource Network search page and return them as a list of
   ``EarlyLearningItem`` objects.

   Results are cached locally in ``early_learning_resources.json`` so that
   subsequent runs do not re-fetch the site.


.. py:data:: records