How Snake SAT classification works in procurement
Any indicator function over a finite discrete domain can be encoded as a SAT instance in polynomial time.
For classification, this means: given a dataset with features and labels, Snake constructs a CNF (Conjunctive Normal Form) Boolean formula directly from the data. Each clause in the formula captures a decision boundary. No gradient descent, no backpropagation, no matrix algebra. The formula IS the model.
Training data (features + labels)
→ SAT clause construction (polynomial time)
→ CNF formula φ
→ φ(x) evaluates to class predictions
Complexity: O(L × n × m × b) where L = layers, n = samples, m = features, b = bucket size. Linear in samples and features.
4 classes: Facture, Confirmation, Avoir, BL. 8 boolean/integer features.
| Feature | Facture | Confirmation | Avoir | BL |
|---|---|---|---|---|
has_total_ttc | oui | non | oui | non |
has_tva | oui | non | oui | non |
has_conditions_paiement | 85% | 30% | non | non |
has_rib | oui | non | non | non |
has_date_livraison | non | non | non | oui |
nb_lignes | 3-15 | 1-12 | 1-5 | 1-20 |
The SAT clauses read like business rules because they ARE business rules, learned from data:
IF has_date_livraison = oui THEN BL IF has_tva = oui AND has_rib = oui THEN Facture IF has_tva = oui AND has_conditions_paiement = non AND nb_lignes ≤ 5 THEN Avoir IF has_tva = non AND has_date_livraison = non THEN Confirmation
Snake doesn't know these are "business rules." It finds the minimal set of Boolean clauses that separate the classes. The result happens to be human-readable because the features map to domain concepts.
First training: 78.6%. Confirmation and BL both lacked TVA/TTC — the model couldn't separate them. The confusion matrix showed 11 Confirmation→BL and 5 BL→Confirmation errors.
Fix: make has_date_livraison the clean BL separator. Remove date_livraison from Confirmation training data entirely. Result: 97.1%. BL and Confirmation now at 100%. Remaining 4 errors: Avoir↔Facture boundary (avoirs with unusual nb_lignes).
This is the data iteration loop working. Not hyperparameter tuning — feature engineering on the training distribution.
3 classes: Normal, Alerte, Critique. 5 numeric features.
| Feature | Description |
|---|---|
ecart_pct | Price deviation vs reference (%) |
ecart_historique_pct | Price deviation vs historical average (%) |
nb_lignes_ecart | Number of lines with deviations in same invoice |
montant_ecart | Total monetary impact of deviation (€) |
fournisseur_fiabilite | Supplier reliability score (0-1) |
The model converges on threshold-like decisions — but with nuance:
|ecart_pct| < 5% AND fournisseur_fiabilite > 0.70 → Normal |ecart_pct| 5-15% OR (|ecart| < 5% AND fiabilite < 0.60) → Alerte |ecart_pct| > 15% OR nb_lignes_ecart ≥ 3 → Critique
The fournisseur_fiabilite interaction is the interesting finding: a low-reliability supplier with even a small price deviation triggers an Alerte. Snake learned a business rule that wasn't explicitly programmed: trust the supplier's track record, not just the numbers.
Article reference matching (VS-TMP10 → #33107 "Trempe 10mm clair") is not rebuilt here. snake.aws.monce.ai has factory-specific global_model with a 3-tier cascade:
Procurement classifier delegates ref matching and owns anomaly detection. Separation of concerns: one service classifies articles, another detects price anomalies on classified articles.
The /comprendre endpoint (full docs) operates in two modes:
anthropic=false): Regex parser extracts quantities, prices, product keywords, supplier names. Snake doc_type model classifies the document. Snake price_anomaly detects ecarts. No external API call. ~5ms. Always works.anthropic=true): Claude Haiku parses free text into structured JSON — handles ambiguity, context, normalization. Snake models then classify the Haiku-extracted data. ~1.2s. Requires ANTHROPIC_API_KEY.The key insight: Snake classification accuracy is the same in both modes. The models see features, not raw text. The quality difference is entirely in extraction precision — whether Haiku or regex builds the feature dict. On well-structured text with clear numbers and product keywords, regex + Snake gets you 80% of the Haiku result at 250x the speed.
Charles Dana — Monce SAS — 2026