The Hidden Logic of Sudoku

Denis Berthier

book cover


for advanced players (not the good place to start with)


1) Part of the material presented here is more complex than what you can find in the book and it is for advanced players. The basics are in the book. Although some definitions are recalled here, this is not done systematically. Said otherwise, this is not the good place to start with.

2) Most of the material here is a compilation of ideas that I have already published on Web Forums and/or in international conferences. Some of it already appears in the second edition of the book.
I compiled this material here in order to make it more easily available than lots of posts disseminated in Web forums, which are always plagued with tons of unrelated stuff.
Notice however that, although a part of it can be understood independently by advanced players, this is not designed as a fully self-contained set of pages but as a set of supplements to the book (but, with time and several re-writings, this has become more self-contained).

3) WORK IN PROGRESS. At a few places, the writing may be sketchy, but all the definitions are precise.

4) Part of the material below is superseded by the contents of my more recent books: "Constraint Resolution Theories" and "Pattern-Based Constraint Satisfaction". But the book itself is not superseded by these newer ones, because it contains Sudoku specific aspects that are not recalled in them. These new books are best understood as a sequel to "The Hidden Logic of Sudoku".

1) General concepts and techniques

1.1) The concept of a resolution rule
discusses the general concept of a resolution rule and its specificity with respect to the more general notion of a "resolution technique".
The concepts of a resolution theory and a resolution path are also defined, together with the very important property a resolution theory can have: the confluence property.

Resolution rules can be classified into various large families. The classification introduced in the resolution rules page is not exhaustive, new families might be discovered. But it is useful to understand that there are several very different types of rules.

Basic patterns (such as naked or hidden subsets and "fish") have been known for a long time. But when basic patterns are not enough to solve a puzzle, chain patterns become the next tool. This is why much emphasis will be put on chains (although, of course, my conceptual framework is not limited to chains).

The resolution rules page introduces various important properties a resolution rule for chains can have, such as reversibility, non-anticipativeness (or no look-ahead), composability.
All the chains, whips and braids introduced below are non-anticipative and composable. In addition, the chains and whips that do not contain the t- extension (xy, xyz, nrc, nrcz) are reversible.

The most basic chains are the standard xy-chains, i.e. sequences of bivalue cells linked in rc-space, and with contents {x1 y1} - {x2 y2} - {x3 y3} - {x4 y4} - ..... - {xn yn}, such that for all n>1, xn+1=yn and yn=x1. A target is a candidate z equal to both x1 and yn and located in a cell linked to both the first and the last cells; the xy-chain rule says that any target can be eliminated.

The (very standard) logic of an xy-chain (i.e. the once and for all proof of the xy-chain theorem) is as follows:

If target
  then not  x1
    then y1
      then not x2
        then y2
             then not xn
                then yn
                   then not target.
Therefore not target.

Once the logic of an xy-chain is understood, then the logic of all the other types of chains, whips and braids is easily understood, because all of them are conceptually simple generalisations of xy-chains.
Chains with the t- extension (xyt, hxyt, xyzt, hxyzt, nrct, nrczt) can also be called oriented chains, because, contrary to xy, xyz, nrc or nrcz chains, they cannot be reversed from head to tail.

Here are the most basic definitions relevant to chains:
Definition: two candidates n1r1c1 and n2r2c2 are nrc-linked (or liknked, for short) if they are different and:
- either n1=n2 and the two rc-cells (r1, c1) and (r2, c2) share a unit in rc-space,
- or n1 ≠ n2 and the rc-cells (r1, c1) and (r2, c2) are the same.
Being nrc-linked is the most general, fully super-symmetric support for the immediate detection of a contradiction between two nrc-candidates. It is quasi "physical" in the sense that it depends only on the grid structure.

Definition: a chain is a sequence (i.e. a strictly ordered set) of candidates, such that the first and the last candidates in the sequence are different (there is no global loop) and any two consecutive candidates are nrc-linked.
It can easily be checked that the above definition of an xy-chain can be re-written in this format.

A chain in this purely descriptive sense thus satisfies the two basic conditions without which the notion of a chain would loose any meaning: linearity and nrc-continuity.
There is also an implicit third general condition that any specific type of chain should satisfy: consistency between the factual description and the proof of the eliminations, i.e. one should be able to prove the associated resolution rule in a way that consistently follows the linear structure (e.g. without having to anticipate on later parts of the sequence). All the types of chains/lassos and whips we shall define will satisfy all these conditions.

Definition: a target of a chain is a candidate that does not belong to the chain and that is nrc-linked to both endpoints of the chain.

Of course, chains of specific types will satisfy additional conditions allowing to prove associated resolution rules. These resolution rules will always conclude that any target can de deleted.

1.2) xyt-, xyz- and xyzt- chains have been defined in the first edition of the book.

1.3) nrczt-chains and lassos have been defined in the second edition of the book. They are the "3D" (or "super-symmetric") counterpart of the xy-, xyt-, xyz- and xyzt- chains.

1.4) Whips are mainly another view of chains and lassos, unifying them into a single concept.

1.5) Subsumption theorems prove that nrczt-whips subsume (are more general than) many previously known techniques.

1.6) Braids are an extension of whips and are precisely related to the Trial and Error procedure via the braids vs T&E theorem. Naturally defined nrczt-braid resolution theories have the vezry important confluence property.

1.7) x2y2-belts are my interpretation of Steve's "hidden pairs loops" (introduced to solve EasterMonster). But this page also contains indications on how to define formally a non ambiguous resolution rule when we only have examples of what it should do.

1.9) The strategic level discusses different ways of defining resolution strategies based on a set of resolution rules

2) Techniques for exceptionnally hard puzzles

Basic nrczt-whips are enough for solving all the puzzles that a human solver is able to solve - and much more, as shown by my classification results, based on ~ 10,000,000 random puzzles.
However, as there has been some interest for exceptionally hard puzzles, I have developed a few additional techniques for dealing with them.

2.1) On the notion of a strong T-backdoor and its application to EasterMonster
introduces the notion of a strong T-backdoor, where T is any resolution theory, and shows how it can be applied, with the EasterMonster example.

2.2) The general zt-ing principle; generalised whips and braids
defines the general zt-ing principle, gives definitions of zt-whips(FP) and zt-braids(FP) for any family of patterns FP, states the general zt-braids(FP) vs T&E(FP) theorem , the confluence property for some zt-braid(FP) resolution theories and gives classification results for the hardest known puzzles (gsf's collection) with respect to an increasing sequence of FP families.

3) Rating resolution rules and puzzles; classification results

3.1) Rating resolution rules and puzzles; ordering the rules
introduces several ratings of puzzles, studies their correlation and justifies the choice of length as a measure of complexity of a chain rule.
This page is not yet revised and it keeps a strong flavour of its historical development on the Sudoku Player's Forum.

3.2) Classification results
gives various classification results, shows that different kinds of generators of minimal puzzles lead to different statitics (and thus proves that there is some bias), introduces the notion of a controlled-bias generator, uses it to produce unbiased statistics. As a collateral result, it gives a close estimate of the real distribution of minimal puzzles.

3.3) The nrczt-rating of nrczt-whips as a guide for defining the rating of any chain or any pattern
How can one rate chains based on subsets (e.g. ALS-chains) with different lengths and different maximal sizes for the subsets they use, in a way consistent with the rating of nrczt chains or whips?

Remember first that the complexity of a resolution rule is in its condition pattern, not in its conclusion.
Remember also that the kind of complexity I am considering is mainly complexity of resolution rules and that it is extended to complexity of puzzles in the SER style (= complexity of the most complex rule necessary to solve it).
The question then is mainly: how can the presence of Subsets in a chain be taken into account in the definition of chain length? For ALS-chains, e.g., there are two kinds of parameters: chain length and size of each of the ALSs; how should they be combined?

After all the time and work on the subject of ratings, the answer now seems very clear to me: as the nrczt-whips can solve almost all the puzzles (in more than 10 million minimal puzzles randomy generated by various kinds of generators - bottom-up, top-down, conrollled-bias - no puzzle was found that couldn't be solved by nrczt-whips), the NRCZT-rating constitutes an almost universal rating, to which all the other ratings can be compared statistically.
It has very good properties, the first three of which are not shared with the widely used SER:
- pure logic definition,
- full supersymmetry,
- very strong correlation (0.95) with log(number of chains), which shows that it is statistically a logarithmic measure of complexity,
- strong correlation (0.895) with the well established SER for the human solvable puzzles, in spite of their a priori very different definitions,
- almost confluence of the underlying resolution theories (i.e. almost independence of the implementation); if one wants strict confluence and indepence, just replace erywwhere in this section whips by braids and NRCZT by B-NRCZT.

It is therefore justified to take it as a reference for defining ratings based on different kinds of chains or patterns.

3.3.1) Consider first chains of ALS, AHS and A-Fish.
- define the length of such a chain to be the sum of the sizes of all its defining Subsets, a Single being considered as a subset of length 1 (see remarks below for precisions);
- define the LS-rating of a puzzle as the length of the longest such chain necessary to solve it.
- the "restricted commons" don't count in the subset sizes, they more or less play the roles of left-linking candidates,
- for the most classical complementarity reasons: NS(5) = HS(4), HS(5) = NS(4), NS(6) = HS(3) ... (here, NS= Naked Subset, HS=Hidden Subset)
- for supersymmetry reasons: for n = 1, 2, 3, 4, NS(n), HS(n) and SHS(n) (i.e. Fish(n)) are all counted as a NS(n).

This rating is supersymmetric and consistent with the NRCZT-rating: whenever a chain can be viewed according to the two points of view (which, according to the general subsumption results, is the case for almost all the chains of ALS, AHS and A-Fish), its length will be the same according to the two points of view.
Although there is currently no program computing ratings of ALS-chains in a way consistent with this definiton, it is almost certain (due to the subsumption theorems) that this LS-rating is very strongly correlated with the NRCZT-rating.

3.3.2) Now, consider the more general case of nrczt-whips(Subsets), as defined in section 2 of the zt-ing page. In such chains, a naked, hidden and super-hidden (fish) Subset (modulo the target and all the previous right-linking candidates) can be taken as a right-linking object, in lieu of a mere candidate.
Define the length of such a whip(Subsets) as the sum of all the sizes of the right-linking objects (Subsets) it contains (a Single candidate being still considered as a subset of length 1).
Define the NRCZT(Subsets)-rating as the length of the maximal whip(Subsets) necessary to solve it.
Again, this rating is supersymmetric and consistent: any chain that can be viewed according to several points of view (whether some parts of it are considered as subsets modulo the target and the previous rlc's or as mere nrczt-chains) will have the same ratings for all these points of view.

We know that, most of the extremely rare puzzles that cannot be solved with simple whips (fewer than one in 10 millions) can be solved with zt-whips(Subsets).
It is thus very likely that most of the puzzles that can be solved by mere nrczt-whips will have the same ratings when they can be considered according to the two points of view.
The nrczt(Subsets)-rating is thus almost an extension of the nrczt-rating.

3.3.3) This definition obviously applies to Paul Isaacson's whips with ALS inserts, which are a secial case of zt-whips(Subsets).

3.3.4) Finally, this definition should apply to Allan Barker's set cover patterns (networks of 2D-cells, i.e. rc-, rn-, cn- and bn- cells - "base sets" in set cover terminology, renamed "truths" in his vocabulary), for which one could take the number of 2D-cells as the "length" of the pattern, disregarding the number of "cover sets" (in set cover terminology - renamed "linksets" in Allan's vocabulary).

This definition of the NRCZT-rating, thus extended to other patterns in a consistent way, allows comparisons of the complexities of the solutions obtained with the corresponding patterns.

4) Additional examples

With a complete listing of the full resolution paths of the 66 hardest puzzles in the famous Magictour/Top1465 collection, we explore the limits of pure nrczt-whips.

5) Extension of the techniques developed for Sudoku
to the general Constraints Satisfaction Problem (CSP)

Many real world problems, such as resource allocation, temporal reasoning or scheduling, naturally appear as Constraint Satisfaction Problems (CSP).
Constraints Satisfaction is a main sub-area of Artificial Intelligence.

A finite CSP is defined by a finite number of variables with values in some fixed finite domains and a finite set of constraints (i.e. of relations they must satisfy); it consists of finding a value for each of these variables, such that they globally satisfy all the constraints.

Sudoku is obviously a CSP. The "natural" variables are the Xrc (one variable for each row r and each column c), with domains the 9 numbers that can a priori occupy these cells; but it is convenient to add auxiliary variables Xbs, Xrn, Xcn, Xbn to take into account the natural symmetries and super-symmetries of the problem.

Variants of Sudoku and many other "logic games" are CSPs.

Most of what I've done for Sudoku can be extended to any finite CSP:

<> the whole general conceptual framework in Multi-Sorted First Order Logic (MS-FOL);
<> definition of candidates and elaboration of their logical epistemic status;
<> definition of candidates being linked (nrc-linked in Sudoku);
<> definitions of a resolution rule, a resolution theory, the solution of a CSP according to a resolution theory, a resolution strategy;
<> definition of a chain and a target;
<> definition of a bivalue variable and a bivalue chain (which corresponds to an xy-chain or to an nrc-chain, depending on which set of variables we take as primitive);
<> definitions of t-, z- and zt- chains, whips and braids;
<> proof of the zt- chain, whip and braid elimination theorems;
<> proof of confluence property for naturally defined zt-braids theories;
<> proof of the T&E vs zt-braids elimination theorem.

Details can be found in two presentations I have done at the CISSE'08 international conference (cisse-conference-2008):

From Constraints to Resolution Rules. Part I: Conceptual Framework

From Constraints to Resolution Rules. Part II: Chains, Braids, Confluence and T&E

<> Moreover, even though the numerical values of the classification results obtained for Sudoku can obviously not be transferred to any CSP, the general ideas about top-down, bottom-up and controlled-bias generators suggest that it may be very difficult to obtain unbiased statistics for any CSP. See another presentation I have done at the CISSE'09 international conference (cisse-conference-2009):

Unbiased Statistics of a CSP - a Controlled-Bias Generator

<> I have now published a whole book about these generalisations: Constraint Resolution Theories and a more recent, revised and largely extended version of it: Pattern-Based Constraint Constraint Satisfaction and Logic Puzzles in which applications to other logic puzzles (N-Queens, Futoshiki, Kakuro, Numbrix, Hidato, graph colouring) are also dealt with in detail.

Home(The Hidden Logic of Sudoku)         Home(Denis Berthier)