The Math

How Snake SAT classification works in procurement

The Dana Theorem

Any indicator function over a finite discrete domain can be encoded as a SAT instance in polynomial time.

For classification, this means: given a dataset with features and labels, Snake constructs a CNF (Conjunctive Normal Form) Boolean formula directly from the data. Each clause in the formula captures a decision boundary. No gradient descent, no backpropagation, no matrix algebra. The formula IS the model.

Training data (features + labels)
  → SAT clause construction (polynomial time)
    → CNF formula φ
      → φ(x) evaluates to class predictions

Complexity: O(L × n × m × b) where L = layers, n = samples, m = features, b = bucket size. Linear in samples and features.

doc_type_model — 97.1% Accuracy

4 classes: Facture, Confirmation, Avoir, BL. 8 boolean/integer features.

FeatureFactureConfirmationAvoirBL
has_total_ttcouinonouinon
has_tvaouinonouinon
has_conditions_paiement85%30%nonnon
has_ribouinonnonnon
has_date_livraisonnonnonnonoui
nb_lignes3-151-121-51-20

What Snake learned

The SAT clauses read like business rules because they ARE business rules, learned from data:

IF has_date_livraison = oui THEN BL
IF has_tva = oui AND has_rib = oui THEN Facture
IF has_tva = oui AND has_conditions_paiement = non AND nb_lignes ≤ 5 THEN Avoir
IF has_tva = non AND has_date_livraison = non THEN Confirmation

Snake doesn't know these are "business rules." It finds the minimal set of Boolean clauses that separate the classes. The result happens to be human-readable because the features map to domain concepts.

The V1 → V2 iteration

First training: 78.6%. Confirmation and BL both lacked TVA/TTC — the model couldn't separate them. The confusion matrix showed 11 Confirmation→BL and 5 BL→Confirmation errors.

Fix: make has_date_livraison the clean BL separator. Remove date_livraison from Confirmation training data entirely. Result: 97.1%. BL and Confirmation now at 100%. Remaining 4 errors: Avoir↔Facture boundary (avoirs with unusual nb_lignes).

This is the data iteration loop working. Not hyperparameter tuning — feature engineering on the training distribution.

price_anomaly_model — 98.6% Accuracy

3 classes: Normal, Alerte, Critique. 5 numeric features.

FeatureDescription
ecart_pctPrice deviation vs reference (%)
ecart_historique_pctPrice deviation vs historical average (%)
nb_lignes_ecartNumber of lines with deviations in same invoice
montant_ecartTotal monetary impact of deviation (€)
fournisseur_fiabiliteSupplier reliability score (0-1)

What Snake learned

The model converges on threshold-like decisions — but with nuance:

|ecart_pct| < 5%  AND fournisseur_fiabilite > 0.70 → Normal
|ecart_pct| 5-15% OR (|ecart| < 5% AND fiabilite < 0.60) → Alerte
|ecart_pct| > 15% OR nb_lignes_ecart ≥ 3 → Critique

The fournisseur_fiabilite interaction is the interesting finding: a low-reliability supplier with even a small price deviation triggers an Alerte. Snake learned a business rule that wasn't explicitly programmed: trust the supplier's track record, not just the numbers.

Ref Matching Delegation

Article reference matching (VS-TMP10 → #33107 "Trempe 10mm clair") is not rebuilt here. snake.aws.monce.ai has factory-specific global_model with a 3-tier cascade:

  1. Exact match (ref code lookup)
  2. Snake SAT classification (description → article)
  3. Fuzzy fallback (Levenshtein/Jaccard)

Procurement classifier delegates ref matching and owns anomaly detection. Separation of concerns: one service classifies articles, another detects price anomalies on classified articles.

Dual-Mode Architecture

The /comprendre endpoint (full docs) operates in two modes:

  1. Default (anthropic=false): Regex parser extracts quantities, prices, product keywords, supplier names. Snake doc_type model classifies the document. Snake price_anomaly detects ecarts. No external API call. ~5ms. Always works.
  2. Haiku-enhanced (anthropic=true): Claude Haiku parses free text into structured JSON — handles ambiguity, context, normalization. Snake models then classify the Haiku-extracted data. ~1.2s. Requires ANTHROPIC_API_KEY.

The key insight: Snake classification accuracy is the same in both modes. The models see features, not raw text. The quality difference is entirely in extraction precision — whether Haiku or regex builds the feature dict. On well-structured text with clear numbers and product keywords, regex + Snake gets you 80% of the Haiku result at 250x the speed.

Charles Dana — Monce SAS — 2026