Scraper#
richy.core.scraper contains the classes responsible for actual
web content scraping. It wraps the
Rug library (stock-market data) and
Karpet (crypto data), plus
yfinance as a fallback
where applicable.
CurrentPrice#
Frozen dataclass returned by
get_current_price_and_change(). Holds
the current price plus the day’s change in both absolute and
percentage form, the market state (open / closed /
pre-market / post-market), and – when the market is in
pre/post-market – a closed_market snapshot of the most recent
regular-session close.
Manager#
Central scraping facade. Every periodic task in
Tasks ultimately delegates to one of the methods
below. Methods are static – Manager carries no per-instance
state.
- class richy.core.scraper.Manager[source]#
Bases:
objectMain manager class for scraping. All scraping methods are placed here.
Basic info#
Per-item basic info lookups (name, market, market cap, P/E, EPS,
year low / high, holdings count, etc.). Results are written into
ItemData by the relevant celery task.
Fetches share basic info via rug library.
Current price#
Price history#
Bulk price fetchers. Each returns a pandas.DataFrame ready to
upsert into Price. fetch_intraday_prices
returns the past-day, fine-grained data used by the Eye component.
Downloads all prices for the share. Returns dataframe with following columns:
Date (index)
Open
High
Low
Close
Volume
Dividends
Stock Splits
- Parameters:
share (Share) – Share model instance we want prices for.
- Returns:
Pandas dataframe.
- Return type:
pandas.DataFrame
- static Manager.fetch_etf_prices(etf)[source]#
Downloads all prices for the etf. Returns dataframe with following columns:
Date (index)
Open
High
Low
Close
Volume
Dividends
Stock Splits
- Parameters:
etf (Etf) – Etf model instance we want prices for.
- Returns:
Pandas dataframe.
- Return type:
pandas.DataFrame
- static Manager.fetch_index_prices(index)[source]#
Downloads all prices for the index. Returns dataframe with following columns:
Date (index)
Open
High
Low
Close
- Parameters:
share (Share) – Share model instance we want prices for.
- Returns:
Pandas dataframe.
- Return type:
pandas.DataFrame
- static Manager.fetch_coin_prices(coin)[source]#
Downloads all prices for the coin since settings.COIN_EPOCH. Returns dataframe with following columns:
date (index)
price
market_cap
total_volume
- Parameters:
coin (Coin) – Coin model instance we want prices for.
- Returns:
Pandas dataframe.
- Return type:
pandas.DataFrame
- static Manager.fetch_intraday_prices(item)[source]#
Fetches market (intraday) data prices for shares, indexes and ETFs. For coins past 24 hours prices are fetched in 30 minutes interval.
- Parameters:
item (Item) – Item model instance we want prices for.
- Returns:
Pandas dataframe.
- Return type:
pandas.DataFrame
Financials and ratings#
Per-share financial data feeds. Each call upserts a row into
Asset with the appropriate type.
- static Manager.fetch_financials(share)[source]#
Fetches all the share financials data and directly updates them in the database for the given share.
- Parameters:
share (Share) – Share which financials will be downloaded for.
Dividends#
Holdings#
Downloader#
General-purpose HTTP client builder. Downloader.get_client()
returns a pre-configured requests.Session for code that fetches
URLs directly (outside of what Rug / Karpet / yfinance already wrap).
The session ships with:
Automatic retries on connection errors and configurable HTTP status codes (defaults:
429,500,502,503,504).Exponential backoff between attempts,
Retry-Afterhonored.A per-request default timeout enforced through a custom HTTP adapter (
TimeoutHTTPAdapter) so callers that forgettimeout=cannot hang indefinitely.A desktop browser
User-Agentheader to avoid naive bot filters.
Module-level constant DEFAULT_TIMEOUT (= 10 seconds) is the
fallback timeout used when no explicit value is passed.
Note
urllib3’s default allowed_methods is preserved, so retries
apply only to idempotent verbs (HEAD, GET, PUT,
DELETE, OPTIONS, TRACE). POST and PATCH are
not retried.
- class richy.core.scraper.Downloader[source]#
Bases:
objectCommon application downloader based on requests library
- class richy.core.scraper.Downloader.TimeoutHTTPAdapter(*args, timeout: float | None = None, **kwargs)#
Bases:
HTTPAdapterHTTPAdapter that applies a default timeout if the caller doesn’t set one.
- static Downloader.get_client(retries: int = 5, backoff_factor: float = 1.0, status_forcelist: tuple = (429, 500, 502, 503, 504), timeout: float = 10) Session[source]#
Builds a requests.Session with automatic retries, exponential backoff, and a default per-request timeout.
Backoff sleep between attempts is:
{backoff_factor} * (2 ** (retry_number - 1))
e.g. with backoff_factor=1.0 -> 0s, 1s, 2s, 4s, 8s, 16s, … Retry also honors the Retry-After header on 429/503 responses.
Note: uses urllib3’s default allowed_methods, which retries idempotent methods only (HEAD, GET, PUT, DELETE, OPTIONS, TRACE). POST/PATCH are NOT retried by default.