Six languages, real morphology, growing every day.
Romanian, English, German, French, Spanish, Italian. Singular ⇄ plural, diacritics, umlauts, accent shifts — all wired in. The platform vocabulary self-grows from real client catalogues and from real user search patterns. A new client onboarded tomorrow inherits everything Skryx has learned so far.
// Live platform vocabulary (Jun 2026) RO · 439 pairs · 107 trusted EN · 353 pairs · 329 trusted DE · 234 pairs · 89 trusted FR · 462 pairs · 438 trusted ES · 437 pairs · 378 trusted IT · 369 pairs · 294 trusted ───────────────────────────────── 2,294 pairs · 1,635 trusted // Trusted pairs auto-apply at onboarding — // the new tenant doesn't click anything.
One pluggable interface. Six languages today.
Every language plugs in via MorphologyServiceInterface — the same contract
Coach, synonym generation, and search-time form detection all consume. Adding a 7th
language is one constructor line in MorphologyManager + one service class.
Each service ships with a curated table of irregular plurals, suffix rules with
confidence scoring, and a normaliser that handles diacritics / accents / umlauts.
The native ground truth.
Where Skryx was born. Romanian morphology covers vowel-mutation feminines (roată ⇄ roți, lampă ⇄ lămpi), neuter -uri plurals (timp ⇄ timpuri), -tor → -toare patterns, -ie → -ii, and the famous irregulars (om ⇄ oameni, copil ⇄ copii). Diacritics fold for normalisation but the original spelling is preserved in suggestions.
roată ⇄ roți lampă ⇄ lămpi om ⇄ oameni motor ⇄ motoare supapă ⇄ supape rulment ⇄ rulmenți încărcător ⇄ încărcătoare
From mouse ↔ mice to analysis ↔ analyses.
Ablaut classics (man → men, foot → feet, child → children), -f / -fe → -ves (wife → wives, leaf → leaves), invariable plurals (sheep, fish, deer), and the Latin/Greek borrowings (analysis → analyses, datum → data, matrix → matrices). Regular -s / -es / -ies suffix rules with per-pattern confidence weights.
mouse ⇄ mice wolf ⇄ wolves city ⇄ cities box ⇄ boxes analysis ⇄ analyses datum ⇄ data sheep ⇄ sheep
Mann ⇄ Männer, Buch ⇄ Bücher, Auto ⇄ Autos.
Eight plural classes with frequent umlaut mutation (Mann → Männer, Apfel → Äpfel). Normalisation handles ß ↔ ss and folds umlauts to base vowels so a tenant whose catalogue uses Strasse and another uses Straße both find the right products. Loanword -s plurals (Autos, Hotels) supported.
Mann ⇄ Männer Apfel ⇄ Äpfel Buch ⇄ Bücher Auto ⇄ Autos Frage ⇄ Fragen Schlüssel ⇄ Schlüssel
French, Spanish, Italian — the patterns that aren't just +s.
French -al → -aux (cheval → chevaux, journal → journaux), the famous -ou → -oux seven (bijou → bijoux, chou → choux, hibou → hiboux), œil → yeux. Spanish accent shifts (nación → naciones, jardín → jardines), -z → -ces (luz → luces, vez → veces). Italian gender-aware -o → -i, -a → -e, plus the irregulars (uomo → uomini, amico → amici).
// French cheval ⇄ chevaux œil ⇄ yeux // Spanish nación ⇄ naciones luz ⇄ luces // Italian uomo ⇄ uomini amico ⇄ amici
Three channels feed the vocabulary. All on schedule.
The hand-curated tables are the floor, not the ceiling. Three discovery channels run automatically and add pairs to a shared platform-wide vocabulary, so new clients inherit everything Skryx has learned before — at signup, with zero configuration.
Real client catalogues teach the engine.
MorphologyDiscoveryService walks each client catalogue, builds a
word-frequency map, finds Levenshtein-close word pairs that look like
singular/plural variants, and sends each candidate to the Skryx AI for
validation. Confirmed pairs join the platform-wide vocabulary.
- Bucket-by-prefix lookup — handles 50k+ document walks fast
- Top-3 scoring per source word, not first-match-only
- Skips pairs already known to the static or learned set
// Wed 02:30 — UTB Romanian catalogue candidates: 3,162 validated: 235 rejected: 2,379 skipped_known: 548 // Top validated pairs (sample) rolă ⇄ role 296× tijă ⇄ tije 129× cupă ⇄ cupe 118× dinte ⇄ dinți 94× valvă ⇄ valve 94×
500 pairs per language with one command.
MorphologySeedingService asks the Skryx AI to generate
domain-aware noun pairs for each language (e-commerce, household, fashion,
auto, electronics) and verifies each one against the language's morphology
rules. Bootstraps English, German, French, Spanish, Italian to ~400 pairs
each in minutes — without waiting for the first client to upload a catalogue.
// Seeding output locale generated accepted duplicates RO 198 96 102 EN 651 353 298 DE 442 234 208 FR 579 462 117 ES 743 437 306 IT 557 369 188 ───────────────────────────────── 3,170 1,951 1,219
Real user behaviour. The strongest signal.
When a customer types supape at 14:02 and supapă at
14:03 in the same session, that's better evidence than any heuristic.
MorphologyReformulationMiner walks search_events
daily, finds consecutive same-session queries that look morphologically related,
aggregates by frequency, validates with the Skryx AI, and adds confirmed pairs
to the platform vocabulary.
// 583k searches mined, last 30 days candidates: 300 validated: 32 rejected: 265 batches: 12 // Sample reformulation-mined pair { "singular": "supapă", "plural": "supape", "source": "reformulation_mining", "signal": "8 sessions, 30d" }
New clients inherit what everyone learned before them.
Pairs that cross a trust threshold (high-confidence AI seed, OR multiple tenants
applied independently, OR strong catalogue frequency) get marked
is_platform_trusted. At new-tenant onboarding, Auto-Pilot reads the
trusted set for the detected locale, filters to pairs whose words appear in this
client's catalogue, and applies them as synonyms directly — no operator click required.
Operator can revoke anything individually from the Synonyms page.
152 assertions across 31 tests. 100% precision on pinned vocabulary.
Every language ships with a fixture set — 50-100 hand-verified pairs per locale stored as JSON. PHPUnit runs the pluralisation, singularisation, form detection, confidence threshold, and round-trip checks across all six languages on every PR. Locale detection eval runs across 20 real-world catalogue sentences. No regression ships unnoticed.
Operator dismissals become platform-wide signal.
If a Romanian e-commerce operator dismisses cablu ⇄ case as a bad
pair, Skryx remembers — for THEM. If three independent operators in the same
locale dismiss the same pair, it gets marked globally bad and auto-filtered
from all future suggestions for everyone. The platform learns from rejection
as much as from acceptance.