Scraper#

richy.core.scraper contains the classes responsible for actual web content scraping. It wraps the Rug library (stock-market data) and Karpet (crypto data), plus yfinance as a fallback where applicable.

CurrentPrice#

Frozen dataclass returned by get_current_price_and_change(). Holds the current price plus the day’s change in both absolute and percentage form, the market state (open / closed / pre-market / post-market), and – when the market is in pre/post-market – a closed_market snapshot of the most recent regular-session close.

class richy.core.scraper.CurrentPrice(price: float, change_value: float, change_percents: float, state: str = 'open', closed_market: ~typing.Dict = <factory>)[source]#

Bases: object

Current market price state.

TODO: pridat popis pro pouziti current_market

CurrentPrice.is_closed()[source]#
CurrentPrice.round(item)[source]#

Returns a dict with all numbers rounded according to item type.

Parameters:

item (Item) – Item model instance.

Returns:

Dict with rounded numbers.

Return type:

dict

Manager#

Central scraping facade. Every periodic task in Tasks ultimately delegates to one of the methods below. Methods are static – Manager carries no per-instance state.

class richy.core.scraper.Manager[source]#

Bases: object

Main manager class for scraping. All scraping methods are placed here.

Basic info#

Per-item basic info lookups (name, market, market cap, P/E, EPS, year low / high, holdings count, etc.). Results are written into ItemData by the relevant celery task.

static Manager.get_share_basic_info(share)[source]#

Fetches share basic info via rug library.

Parameters:

share (Share) – Share model instance.

Returns:

Basic info as a dict.

Return type:

dict

static Manager.get_etf_basic_info(etf)[source]#

Fetches etf basic info via rug library.

Parameters:

etf (Etf) – Etf model instance.

Returns:

Basic info as a dict.

Return type:

dict

static Manager.get_coin_basic_info(coin)[source]#

Fetches coin basic info via karpet library.

Parameters:

coin (Coin) – Coin model instance.

Returns:

Basic info as a dict.

Return type:

dict

Current price#

static Manager.get_current_price_and_change(item)[source]#

Fetches current market price, market staten and price change in value and percents.

Parameters:

item (Item) – Item model instance to be fetched price for.

Returns:

Dataclass with price, state and change values.

Return type:

CurrentPrice

Price history#

Bulk price fetchers. Each returns a pandas.DataFrame ready to upsert into Price. fetch_intraday_prices returns the past-day, fine-grained data used by the Eye component.

static Manager.fetch_share_prices(share, history='max')[source]#

Downloads all prices for the share. Returns dataframe with following columns:

  • Date (index)

  • Open

  • High

  • Low

  • Close

  • Volume

  • Dividends

  • Stock Splits

Parameters:

share (Share) – Share model instance we want prices for.

Returns:

Pandas dataframe.

Return type:

pandas.DataFrame

static Manager.fetch_etf_prices(etf)[source]#

Downloads all prices for the etf. Returns dataframe with following columns:

  • Date (index)

  • Open

  • High

  • Low

  • Close

  • Volume

  • Dividends

  • Stock Splits

Parameters:

etf (Etf) – Etf model instance we want prices for.

Returns:

Pandas dataframe.

Return type:

pandas.DataFrame

static Manager.fetch_index_prices(index)[source]#

Downloads all prices for the index. Returns dataframe with following columns:

  • Date (index)

  • Open

  • High

  • Low

  • Close

Parameters:

share (Share) – Share model instance we want prices for.

Returns:

Pandas dataframe.

Return type:

pandas.DataFrame

static Manager.fetch_coin_prices(coin)[source]#

Downloads all prices for the coin since settings.COIN_EPOCH. Returns dataframe with following columns:

  • date (index)

  • price

  • market_cap

  • total_volume

Parameters:

coin (Coin) – Coin model instance we want prices for.

Returns:

Pandas dataframe.

Return type:

pandas.DataFrame

static Manager.fetch_intraday_prices(item)[source]#

Fetches market (intraday) data prices for shares, indexes and ETFs. For coins past 24 hours prices are fetched in 30 minutes interval.

Parameters:

item (Item) – Item model instance we want prices for.

Returns:

Pandas dataframe.

Return type:

pandas.DataFrame

Financials and ratings#

Per-share financial data feeds. Each call upserts a row into Asset with the appropriate type.

static Manager.fetch_financials(share)[source]#

Fetches all the share financials data and directly updates them in the database for the given share.

Parameters:

share (Share) – Share which financials will be downloaded for.

static Manager.fetch_ratings(share)[source]#

Fetches analyst ratings and saves it as Asset model record.

Parameters:

share (Share) – Share which ratings will be downloaded for.

static Manager.fetch_price_ratings(share)[source]#

Fetches share price ratings data and directly updates them in the database for the given share.

Parameters:

share (Share) – Share which financials will be downloaded for.

Dividends#

static Manager.get_dividends(share_or_etf)[source]#

Fetches share dividends via rug library.

Parameters:

share_or_etf (Share or Etf) – Share or Etf model instance.

Returns:

Dividends as a list.

Return type:

list

Holdings#

static Manager.fetch_etf_holdings(etf)[source]#

Fetches ETF holdings with rug library.

Parameters:

etf (Etf) – Etf model instance we want prices for.

Returns:

List of objects with name, symbol (can be None), instrument, weight (in %) keys.

Downloader#

General-purpose HTTP client builder. Downloader.get_client() returns a pre-configured requests.Session for code that fetches URLs directly (outside of what Rug / Karpet / yfinance already wrap). The session ships with:

  • Automatic retries on connection errors and configurable HTTP status codes (defaults: 429, 500, 502, 503, 504).

  • Exponential backoff between attempts, Retry-After honored.

  • A per-request default timeout enforced through a custom HTTP adapter (TimeoutHTTPAdapter) so callers that forget timeout= cannot hang indefinitely.

  • A desktop browser User-Agent header to avoid naive bot filters.

Module-level constant DEFAULT_TIMEOUT (= 10 seconds) is the fallback timeout used when no explicit value is passed.

Note

urllib3’s default allowed_methods is preserved, so retries apply only to idempotent verbs (HEAD, GET, PUT, DELETE, OPTIONS, TRACE). POST and PATCH are not retried.

class richy.core.scraper.Downloader[source]#

Bases: object

Common application downloader based on requests library

class richy.core.scraper.Downloader.TimeoutHTTPAdapter(*args, timeout: float | None = None, **kwargs)#

Bases: HTTPAdapter

HTTPAdapter that applies a default timeout if the caller doesn’t set one.

static Downloader.get_client(retries: int = 5, backoff_factor: float = 1.0, status_forcelist: tuple = (429, 500, 502, 503, 504), timeout: float = 10) Session[source]#

Builds a requests.Session with automatic retries, exponential backoff, and a default per-request timeout.

Backoff sleep between attempts is:

{backoff_factor} * (2 ** (retry_number - 1))

e.g. with backoff_factor=1.0 -> 0s, 1s, 2s, 4s, 8s, 16s, … Retry also honors the Retry-After header on 429/503 responses.

Note: uses urllib3’s default allowed_methods, which retries idempotent methods only (HEAD, GET, PUT, DELETE, OPTIONS, TRACE). POST/PATCH are NOT retried by default.