About the Method

Archetypes first. Then synergy, uncertainty, and draft pressure.

The method presented in this website aims at constructing directional graph networks for MTG limited data. It tries to learn archetypes from decklists, then measures single-card strength, pair synergy, triplet motifs, and draft urgency inside each learned shell.

The formulas below describe the current implementation directly.

1. Archetypes from full decklists

Let \(X\) be the deck-by-card matrix built from the deck_* columns of the game export. Each row is one deck build and each column is one card. Cards that appear in more than 55% of decks are removed before clustering because they are too common to separate one archetype from another. A card that appears in almost every deck contributes very little information about which shell a deck belongs to.

The weighting used before clustering is a classic TF-IDF transform. TF-IDF means term frequency - inverse document frequency. Here the “term” is a card and the “document” is a deck. The first factor rewards cards that appear in a deck; the second factor downweights cards that are common across many decks.

\[ w(d,c) = (1 + \log x(d,c)) \cdot \log\!\left(\frac{1 + D}{1 + df(c)}\right) \]

\(d\): one deck build.
\(c\): one card.
\(D\): total number of deck builds in the dataset.
\(x(d,c)\): number of copies of card \(c\) in deck \(d\).
\(df(c)\): number of deck builds that contain card \(c\).
\(w(d,c)\): weighted deck feature used by the clustering model.

The weighted matrix is reduced with truncated SVD and then clustered with MiniBatch K-means.

2. Signposts

For archetype \(k\), the signpost statistic asks whether a card appears much more often inside that archetype than outside it.

\[ \ell(c) = \log\!\left(\frac{a_c + \alpha_c}{N_k - a_c + \beta_c}\right) - \log\!\left(\frac{b_c + \alpha_c}{N_{\neg k} - b_c + \beta_c}\right) \]

\(c\): one card.
\(a_c\): number of decks inside archetype \(k\) that contain card \(c\).
\(b_c\): number of decks outside archetype \(k\) that contain card \(c\).
\(N_k\): number of decks inside archetype \(k\).
\(N_{\neg k}\): number of decks outside archetype \(k\).
\(\ell(c)\): smoothed log-odds contrast for card \(c\).

A positive value of \(\ell(c)\) means the card is concentrated inside the archetype. A larger value means that concentration is stronger.

The prior is centered on the card's global prevalence. In plain terms, the method starts from the card's average frequency in the whole format and then asks how much the archetype deviates from that average. This keeps the statistic stable when sample sizes are modest.

\[ p_{0,c} = \frac{a_c + b_c}{D}, \qquad \alpha_c = 1 + 28\,p_{0,c}\,w_c, \qquad \beta_c = 1 + 28\,(1-p_{0,c}) \]

\(p_{0,c}\): global deck prevalence of card \(c\).
\(D\): total number of deck builds in the whole dataset.
\(\alpha_c\) and \(\beta_c\): prior pseudo-counts that smooth the in-archetype and out-of-archetype log-odds.
\(w_c\): rarity weight used inside the prior.

The rarity weight slightly decreases the influence of rare and mythic cards. The practical goal is to reduce the chance that a rare card rises to the top of the signpost list mainly because it is scarce and powerful.

\[ z(c) = \frac{\ell(c)}{\sqrt{\mathrm{Var}(\ell(c))}}, \qquad signpost\_score(c) = z(c)\sqrt{a_c} \]

\(z(c)\): standardized signpost strength.
\(signpost\_score(c)\): final signpost score used in the exported data.

3. Deck-level pairs and triplets

Once the archetypes are learned, the method asks which cards co-occur in completed decklists more often than independence would predict.

\[ E_{\mathrm{deck}}(a,b) = N_k\,\pi(a)\,\pi(b), \qquad L^{\mathrm{deck}}_{ab} = \log_2\!\left(\frac{O_{\mathrm{deck}}(a,b)+1}{E_{\mathrm{deck}}(a,b)+1}\right) \]

\[ E_{\mathrm{deck}}(a,b,c) = N_k\,\pi(a)\,\pi(b)\,\pi(c), \qquad L^{\mathrm{deck}}_{abc} = \log_2\!\left(\frac{O_{\mathrm{deck}}(a,b,c)+1}{E_{\mathrm{deck}}(a,b,c)+1}\right) \]

\(a,b,c\): cards.
\(\pi(a)\): fraction of decks in archetype \(k\) that contain card \(a\). The same definition applies to \(\pi(b)\) and \(\pi(c)\).
\(O_{\mathrm{deck}}(a,b)\): observed number of decks in archetype \(k\) that contain both \(a\) and \(b\).
\(O_{\mathrm{deck}}(a,b,c)\): observed number of decks in archetype \(k\) that contain all three cards.
\(E_{\mathrm{deck}}\): independence expectation inside the archetype.
\(L^{\mathrm{deck}}_{ab}\) and \(L^{\mathrm{deck}}_{abc}\): deck-level log-lift for the pair or triplet.

A positive deck-level lift means the cards appear together in finished decklists more often than independent deck inclusion would predict.

When the code finds a strong deck triplet, it pushes part of that information back into the pair graph. For each retained triplet, it computes a triplet quality score and then gives each of the three pairs inside that triplet a bonus equal to 12% of that quality. If several triplets contain the same pair, the code keeps the largest bonus for that pair.

4. Game-level pair synergy

The game-side statistics use the chosen game zone from the 17Lands game export. In the current implementation the chosen zone is the drawn zone, taken from the drawn_* columns. So the card-level rate here is a drawn-zone win rate, which is very close in spirit to a game-in-hand style statistic.

\[ standalone\_delta(a) = p_a - p_k \]

\(p_k\): baseline game win rate of archetype \(k\).
\(p_a\): win rate of games in archetype \(k\) where card \(a\) appears in the chosen game zone.
\(standalone\_delta(a)\): card-level lift above the archetype baseline.

The expected pair win rate is computed in logit space. The reason is that independent probability factors multiply, so after moving to odds and then taking a logarithm they add cleanly.

\[ \operatorname{logit}(\hat p_{ab}) = \operatorname{logit}(p_a) + \operatorname{logit}(p_b) - \operatorname{logit}(p_k) \]

\[ synergy\_delta\_logit(a,b) = \operatorname{logit}(p_{ab}) - \operatorname{logit}(\hat p_{ab}) \]

\(p_{ab}\): win rate of games in archetype \(k\) where both cards \(a\) and \(b\) appear in the drawn zone.
\(\hat p_{ab}\): expected pair win rate under the additive log-odds model.
\(synergy\_delta\_logit(a,b)\): amount by which the observed pair beats or trails its expectation in logit space.

5. Standard deviation and confidence

The uncertainty model starts from the usual binomial variance and propagates it to the synergy statistic.

\[ \mathrm{Var}(p) = \frac{p(1-p)}{n}, \qquad \mathrm{Var}(\operatorname{logit}(p)) = \frac{1}{n\,p(1-p)} \]

\[ SE_{ab} = \sqrt{\mathrm{Var}(\operatorname{logit}(p_{ab})) + \mathrm{Var}(\operatorname{logit}(\hat p_{ab}))} \]

\[ \sigma_{ab} = \frac{|synergy\_delta\_logit(a,b)|}{SE_{ab}}, \qquad confidence(a,b) = \operatorname{erf}\!\left(\frac{\sigma_{ab}}{\sqrt{2}}\right) \]

\(n\): number of samples supporting the corresponding rate.
\(SE_{ab}\): standard error of the pair-synergy statistic.
\(\sigma_{ab}\): signal-to-noise ratio for the pair effect.
erf: function used to express confidence on a 0-100% scale.

The percentage shown on the site is the Gaussian central probability corresponding to the estimated signal-to-noise ratio. Higher confidence means the estimated effect is large compared with its estimated standard deviation.

6. Pair quality, triplet quality, and draft corroboration

The graph needs one ranking score to decide which links deserve visual emphasis. The score is a product of several terms. Each term measures a different ingredient: game-level lift, support, deck-level corroboration, triplet structure, draft corroboration, and confidence. A pair receives a large score when several of these signals agree.

\[ pair\_quality_{raw}(a,b) = \max(0, synergy\_delta\_logit(a,b)) \sqrt{n_{ab}} (0.02 + \max(0, p_{ab} - p_k)) (1 + 0.25\max(0, L^{\mathrm{deck}}_{ab})) (1 + 0.18\max(0, triplet\_bonus(a,b))) (1 + 0.05\log(1 + draft\_pair\_events(a,b))) \]

\[ pair\_quality(a,b) = pair\_quality_{raw}(a,b) \cdot \min(1, \sigma_{ab}) \]

\[ triplet\_quality_{raw}(a,b,c) = synergy\_delta\_logit(a,b,c) \sqrt{n_{abc}} (1 + 0.20\max(0, L^{\mathrm{deck}}_{abc})) (1 + 0.06\log(1 + draft\_triplet\_events(a,b,c))) \]

\(n_{ab}\): number of games where both cards appear in the drawn zone inside archetype \(k\).
\(n_{abc}\): number of games where all three cards appear in the drawn zone inside archetype \(k\).
\(triplet\_bonus(a,b)\): best bonus inherited by pair \((a,b)\) from retained deck triplets.
\(draft\_pair\_events(a,b)\): number of projected drafts in archetype \(k\) that contain both cards.
\(draft\_triplet\_events(a,b,c)\): number of projected drafts in archetype \(k\) that contain all three cards.

7. Draft pressure, ALSA, and wheeling

ALSA is measured globally across the whole draft table.

\[ ALSA(c) = \frac{1}{N_c}\sum_i last\_seen_i(c) \]

\[ ALSA_{sd}(c) = \max\!\left(0.65, \sqrt{\frac{1}{N_c}\sum_i last\_seen_i(c)^2 - ALSA(c)^2}\right) \]

\[ \tau(c) = \max\!\left(1.1, 0.60\,ALSA(c) + 0.80\,ALSA_{sd}(c)\right) \]

\[ wheel\_model(c) = \frac{1}{4}\sum_{s=1}^{4} \exp\!\left(-\frac{\max(0, (s+8)-ALSA(c))}{\tau(c)}\right) \]

\(N_c\): number of drafts where card \(c\) was observed.
\(last\_seen_i(c)\): last pick at which card \(c\) was seen in draft \(i\).
\(ALSA(c)\): average last seen at for card \(c\).
\(ALSA_{sd}(c)\): empirical spread of the last-seen positions.
\(\tau(c)\): decay scale used by the wheel model.
\(wheel\_model(c)\): model-based probability that the card comes back when first seen on picks 1 through 4.

8. Exported scores

Dependency score

Weighted mean of positive incident pair synergy and triplet bonus. A high value means the card usually gains value from neighboring cards in the same shell.

Analysis score

\[ analysis_{raw}(c) = signpost\_score(c) + 18\,dependency(c) + 140\max(0, standalone\_delta(c)) \]

\(signpost\_score(c)\): signpost strength of card \(c\).
\(dependency(c)\): dependency score of card \(c\).
\(standalone\_delta(c)\): win-rate lift of card \(c\) above the archetype baseline.

The raw score is transformed with \(\operatorname{sign}(x)\log(1+|x|)\), clipped between the 5th and 95th percentiles inside the archetype, and rescaled to 0-10.

Pick priority score

\[ pick\_priority_{raw}(c) = 100\big(\max(0,draft\_win\_lift(c)) + 0.55\max(0,standalone\_delta(c))\big) \sqrt{\max(1,draft\_drafts(c))} \big(1 + 0.10\max(0, signpost\_lift(c)-1)\big) \]

\(draft\_win\_lift(c)\): smoothed draft match win-rate lift of card \(c\) inside projected drafts of the archetype.
\(draft\_drafts(c)\): number of projected drafts in the archetype that contain card \(c\).
\(signpost\_lift(c)\): ratio between the card's prevalence inside the archetype and its global deck prevalence.

The raw value is transformed with \(\log(1+x)\), clipped between the 5th and 95th percentiles inside the archetype, and rescaled to 0-10.

Pick urgency

\[ pick\_urgency(c) = 10\left(\frac{0.90\,market(c) + 0.10\,pick\_priority(c) - 0.58}{0.42}\right)_{[0,1]}^{1.15} \]

\(market(c)\): normalized market-pressure score derived from ALSA, average taken-at, and wheel probability.
\(pick\_priority(c)\): normalized archetype performance score from the previous formula.
\((\cdot)_{[0,1]}\): clamp to the interval from 0 to 1.

This score is a recommendation scale. Higher values mean the table values the card and the archetype also gains from it.

Shell dependence

\[ shell\_dependence_{raw}(c) = \frac{\max(0,support\_delta(c))\sqrt{\max(1,supported\_games(c))}(1 + 4\,dependency(c))(1 + 0.18\max(0,signpost\_lift(c)-1))}{1 + 16\max(0,standalone\_delta(c))} \]

\(support\_delta(c)\): supported win rate minus solo-drawn win rate for card \(c\).
\(supported\_games(c)\): games in the archetype where card \(c\) appears with at least one other mapped card in the drawn zone.
\(dependency(c)\), \(signpost\_lift(c)\), and \(standalone\_delta(c)\): same quantities defined above.

The raw value is transformed with \(\log(1+x)\), clipped, and rescaled to 0-10.

Build-around detector

A card is flagged as a build-around when all four conditions hold: supported games at least 80, support delta at least 1.8 percentage points, dependency at least 0.075, and standalone delta at most 1.4 percentage points.

Core score

\[ core_{raw}(c) = signpost\_score(c) + 52\,deck\_prevalence(c) + 85\max(0,standalone\_delta(c)) + 14\,dependency(c) \]

\(deck\_prevalence(c)\): fraction of decks in the archetype that contain card \(c\).

The raw value is transformed with \(\log(1+x)\), clipped, and rescaled to 0-10.

9. Network construction rules

Object	Current rule
Candidate card pool	Up to 18 cards with the highest signpost score, then cards with the largest standalone lift, then cards that appear frequently in games of the archetype, then anchor neighbors: cards that co-occur strongly in deck motifs with the leading signposts. Lands are removed and the final list is truncated to 28 cards.
Node support floor	At least 110 appearances in the chosen game zone inside the specific cluster archetype.
Pair support floor	At least 90 games inside the specific cluster archetype where both cards appear in the chosen game zone.
Triplet support floor	At least 30 games inside the specific cluster archetype where all three cards appear in the chosen game zone.
Edge threshold	`synergy_delta_logit >= 0.05`.
Base confidence threshold	Seed edges use a confidence level around 90%.
Knot threshold	Retained knots use a confidence level around 80%.
Knot promotion	Once a knot is established, nearby edges can enter at a lower confidence threshold when they reinforce that knot and stay within the same established structure.
Displayed solo/support stats	At least 30 supporting games are required in the relevant bucket.
Arrow direction	The directed graph points from the higher-win-rate card to the lower-win-rate card. The symmetric pair statistic stays unchanged.

Back to formats Privacy