Skip to content

An Open Problem

April 23, 2024

Richard Feynman and Gian-Carlo Rota worked on different parts of science during their separate careers. Feynman of course was one of the most important scientists of the 20th century—see here.

His work in theoretical physics radically reshaped our understanding of the universe we live in at the most fundamental subatomic levels.

But Feynman’s brilliance was not solely due to his natural cognitive abilities. He relied on a method: a simple technique for seeing the world through the lens of open-ended questions, which he called his “favorite problems.”

Rota worked in the subject of combinatorics; and he lifted it from a barely respectable obscurity to one of the most active areas of mathematics today—to quote Richard Stanley, one of his great students. Rota had many neat results including one of the great papers Ten Lessons I Wish I Had Been Taught is wonderful. It fits neatly with Feynman notion of problems.

The Problem

Our problem today is not one that Feynman nor Rota would have most likely have worked on. Not physics and not combinatorics. But I do think it is possibly one that is still interesting. I hope you agree.

We often have to create a post so that a user can ask some particular question. The post should make it easy for them to get the answer that they are interested in without any difficulty. This is not always easy for us to make it easy.

One simple issue is this: Given a post we wish to make it easy for you to be able to ask for some particular information. The problem is that the actual post may let you get into some type of nasty state. The state may be hard to get into a place where you feel you are in control. The post may be in some strange state—one that does not respond to you questions in a way that you understand. This can be quite upsetting and led to a tough experience for your audience. This is something that we wish to avoid.

More Details

Imagine you are using our post. You interact via some application programming interface (API). You input information to the API by typing some keys or clicking on some icons. Each such input causes the API to change and enter some potentially new state. The difficulty is that this state may be a wrong one. You would like some input to force the post to reset. That is the API should change to a “reset” or some other nice state. But nothing causes this to happen. Nothing.

How can we fix the API to make that happen when we wish? This is the problem that we would like to solve.

Open Problems

This problem does have a simple solution. But the solution is not that pleasant. We could agree that some input—if you type XXX—will always reset the API. But we often do not wish to reserve a fixed sequence of inputs to always cause a reset. What if we wish to be able to input XXX in some situation? This is the problem. How do we have a way out of trouble and yet do not make the inputs restricted? The answer is not that natural we believe. What would you do?

One idea is have a special sequence like XXX that makes the reset. It does mean that XXX cannot be used in any other situation. But we could agree that XXX is replaced with a special secret sequence, that is extremely unlikely to be used except in the reset case. Perhaps we could replace XXX with the Fibonacci sequence 0112358132134 or some other special unusual sequence. Does this make some sense. It would have to be known by users and they would have to operate like this:

If the API is in some nasty case, they would say to themselves “I better type the special sequence.” Oh I recall it is 0112358132134 and the API will always reset.

And the reset will happen. Neat?

Women in Math Research

April 19, 2024

Peter Gerdes is a mathematician working in computability theory a.k.a. recursion theory, a branch of mathematical logic studying what computers (aka Turing machines) could in principle compute. Or more accurately when does being given access to the solution of one kind of problem (aka an oracle) allow the computer to solve some other problem. Currently focused on research about the alpha-REA degrees.

He is also the maintainer of the rec-thy latex package designed to give a common set of basic commands for the working mathematician in computability theory.

His History

Peter received his Ph.D. from the Group in Logic and Methodology of Science at the University of California at Berkeley under Prof. Leo Harrington. The Group in Logic is a cross-departmental program between the Math and Philosophy departments with some membership from the Computer Science department. The Group in Logic ensures that it’s graduates have a background in both the philosophical and mathematical aspects of logic but his advanced training and thesis were the same as if he had been in the mathematics department.

See here. for more details on his work.

His Women

He has said about articles on women in math:


That’s some very nice mathematics (both what’s covered here and what’s in the rest of the article). However, I worry that the primary effect of highlighting it in an issue devoted to women in mathematics is to create a perception that female mathematicians aren’t doing interesting enough work to hold their own so need their own special issue.

If you’re concerned about representation and encouraging fledgling female mathematicians isn’t the better move and simply increase the number of women covered in normal content?

Peter had a good point about giving proper credit to women. Here are two that he supports quite strongly:

His wife Sharon Berry is a tenured philosophy professor at Ashoka University who recently published a book.

The headline news is that Berry advocates a version of set theory that, rather than taking set theory to be the study of a single hierarchy of sets which stops at some particular point we should instead interpret set theorists as making claims about what hierarchies are possible. If you want to learn more about her work visit

here.

Karen Lange is an associate Professor of Department of Mathematics at Wellesley College. Her interests are in computability theory, an area of logic that explores the algorithmic content encoded in mathematical problems. She studies the computational complexity of problems of all sorts.

Open Problems

Do you agree with Peter’s point?

Guggenheim 2024

April 15, 2024

Edward Hirsch is the President of the John Simon Guggenheim Memorial Foundation and has lead it since 2003. He just announced the appointment of 188 Guggenheim Fellowships to a distinguished and diverse group of culture-creators working across 52 disciplines. Chosen from a pool of almost 3,000 applicants, the Class of 2024 Guggenheim Fellows was tapped on the basis of prior career achievement and exceptional promise. As established in 1925 by founder Senator Simon Guggenheim, each fellow receives a monetary stipend to pursue independent work at the highest level under “the freest possible conditions.

Edward Hirsch is a poet and shows that the depth and breadth of Guggenheim’s is large, See here

The range of the 188 new appointments is from 84 different academic institutions, 41 states and the District of Columbia, and four Canadian provinces are represented in this year’s class of Fellows, who range in age from 28 to 89. Many Fellows’ projects engage with issues like disability activism, climate change, identity, incarceration, AI, and democracy.

Computer Science

Gene Tsudik is the 2024 representative for computer science. He is a Distinguished Professor of Computer Science at the University of California, Irvine (UCI). He obtained his PhD in Computer Science from USC in 1991. Before coming to UCI in 2000, he was at IBM Zurich Research Laboratory (1991-1996) and USC/ISI (1996-2000). His research interests include many topics in security, privacy and applied cryptography.

Tsudik is the only computer scientist to be awarded a Guggenheim Fellowship this year, and he intends to use his fellowship funding to bootstrap a new line of research on building IoT devices resilient against devastating large-scale malware infestations that have become all too common in recent years. See his https://arxiv.org/pdf/2309.03574.pdf} with Sashidhar Jakkamsetti and Youngil Kim.

He is also the author of the first crypto-poem published as a refereed paper. Perhaps this is why Edward Hirsch helped select him? Perhaps not? Tsudik also suffers from two incurable academic diseases: “Research ADHD” and “Munchausen-without-Proxy.” Congrats to him as a winner of a Guggenheim.

Open Problems

I enjoyed some of his different points about computing. See this:


There are two equivalence classes of idiots:
Those who yearn for the past that never was.
AND
Those who dream of the future that never will be.

Congrats again.

2023 Turing Award

April 10, 2024

Congrats to Avi Wigderson for his winning the 2023 Turing Award.


It places Avi in with some of the great theorists of all time. See for other winners:
Knuth, Rabin, Scott, Cook, Karp, Hopcroft, Tarjan, Hartmanis, Blum, Yao, Rivest, Shamir, Adleman, Valiant, Micali, Goldwasser, Aho, Ullman.

Some Comments

Avi was a student of mine back at Princeton in 1983. I do recall one interesting thing. Back in those days the P=NP? problem was relatively new, having been created by Cook in 1971. We had no idea how hard its resolution was going to be. I recall Avi saying to me: “I plan on working on P=NP for a while—a few weeks.” Not too long, but as you know it is still unresolved.

Avi did prove tons of great results on things near to P=NP. Especially wonderful has been his contributions to randomness and related issues. He won the Turing Award for his work on understanding how randomness can shape and improve computer algorithms.

Open Problems

Avi says:

The [Turing] committee fooled me into believing that we were going to have some conversation about collaborating, he says. When I zoomed in, the whole committee was there and they told me. I was excited, surprised and happy.

Very clear he is a wonderful person, who is quite modest.

The Coming Solar Eclipse

April 6, 2024

Last Tuesday, April 2nd, the New York Times had its weekly Science section.

Its focus was all about the eclipse that happens on Monday, April 8th. When it will exactly happen? How best to watch it? How to enjoy even if you are blind—can you still enjoy it? And more. One how-to article they’ve had up online for longer is this.


The Eclipse

The main issue with the eclipse is, when is it safe to watch it? It is not safe unless one watches it with some important restrictions. Direct viewing of the sun without using any sort of protection can cause permanent eye damage. The intense light from the sun during an eclipse can cause damage to the retina of the eye, leading to permanent vision loss. It’s important to use specially designed solar viewing glasses or other indirect viewing methods to safely observe a solar eclipse.

I started reading the Science articles to see if there was a simple answer to the question:



Is it safe to watch it on TV?

This seemed like a simple direct question. Perhaps I am too TV based—I spend a major fraction of my day watching the news and other programs every day on TV. So why not watch the solar eclipse this way?

I discovered quickly that the Science articles had lots of comments on how to safely watch the eclipse. But none said anything about: is it TV safe? Doing a Google search showed that there were two answers:

  1. In fact, the absolute safest way to watch it is on TV.

  2. Yes, it is not safe to look at a solar eclipse, even on TV, without proper eye protection.

However, there is often confusion about how to look at it safely during an eclipse, when the moon partially or completely covers the sun and it gets dark outside. Viewing the solar eclipse safely is paramount. In fact, the absolute safest way to watch it is on TV. Having said this, the TV camera shooting the sun CAN get permanent damage if not protected in the same way as our eyes. But nothing that will appear on a TV screen is remotely close to be able to damage your eyes.

Up in Buffalo

Ken and visiting friends and family will be in the path of totality. The paths of clouds, however, are too uncertain to tell what they will see. Over to Ken:

I bought welder’s glass at the time of the 1994 annular eclipse in Buffalo—which was clouded out. The key is that if you hold it up to a bright bare light bulb, you should still NOT see the bulb at all. Same with eclipse glasses. The glass worked well for the 2017 eclipse which was partial in Buffalo.

Debbie and I attended a lecture last Thursday at Nichols School where she teaches. The speaker, Jeffrey Linsky, made one point to add to the above advice: once the eclipse goes total, take the glasses off. It is safe to look directly at the sun’s corona behind the moon. Linsky also emphasized that the human eyes have better resolution than any camera for viewing and appreciating features such as prominences, wisps of the corona, and the surrounding dark sky.

Exactly at the instant totality ends, whip the glasses back on before attempting to view Baily’s beads or the diamond ring effect.

Watching the 2017 eclipse on TV was indeed deadly—for a P != NP claim.

Open Problems

So I believe that TV is okay, but not all seem to agree. So be careful with this eclipse, please.

Daniel Kahneman, 1934–2024

April 4, 2024


With some connections to my chess research

Modified from source

Daniel Kahneman passed away last week. He won the 2002 Nobel Prize in Economics, shared with Vernon Smith, and was awarded a Presidential Medal of Freedom in 2013.

Today I review some of his work on human decision making and how my chess niche may reflect it.

Kahneman is best introduced by two popular books: his own Thinking, Fast and Slow in 2011 and The Undoing Project by Michael Lewis in 2016. Both books pay copious homage to his longtime research partner Amos Tversky, whom we mentioned with reference to chess here and the “hot hand” phenomenon in basketball and chess.

Other than there, we have not previously mentioned his work. It is all about human decision making and its dependence on features that are overlooked by a strictly numerically rational calculus of expected value. Prominent examples include:

  1. Whether the value is guaranteed or probabilistic;

  2. Whether the expected value involves possibility of loss;

  3. How the value—even if guaranteed—relates to other values currently at issue;

  4. Whether a choice is harmonious with recent preoccupations;

  5. Whether the sun shone through clouds on the way in to work or play.

These factors represent Kahneman’s work with Tversky and others on (1) prospect theory, including (2) loss aversion, then (3) framing, and (4) priming. Item (5) comes from his latest work and 2021 book Noise: A Flaw in Human Judgment with Olivier Sibony and Cass Sunstein. In controlled experiments where objective criteria were held the same, Kahneman and co-workers found more extreme variation in outcomes than the human judges regarded a-priori as consistent with their expertise and institutional values.

Levels of Replication

Kahneman combated noise at a different level in social economics and psychology studies. In a 2012 open letter he called for greater attention to replicating studies, and in 2014 he proposed structures for doing so. This accompanied and presaged his growing awareness that several studies (by others) that he highlighted in his 2011 book, especially on priming, were on shaky ground.

Prospect theory has been replicated to various degrees. So has framing. But Jay Livingston of Montclair State University raised a fundamental criticism of these and myriad other behavioral studies in a 2016 post. He first quotes Robert Frank in evident allusion to a famous Kahneman-Tversky paper on framing:

It is common … for someone to be willing to drive across town to save $10 on a $20 clock radio, but unwilling to do so to save $10 on a $1,000 television set.

How do we tell “common”? The quantitative answers all come from studies that pose hypothetical questions of the form, “In this situation … what would you choose?” There are no studies that tracked 100 people who actually had the opportunity to save $10 on a radio in a competing store across town, and 100 others who could save a (paltry) $10 on a big flat-screen TV by walking out of Best Buy and going to Target. Livingston continues:

It’s surprising that social scientists who cite this study take the “would do” response at face value, surprising because another well-known topic in behavioral economics is the discrepancy between what people say they will do and what they actually do. People say that they will start exercising regularly, or save more of their income, or start that diet on Monday. Then Monday comes, and everyone else at the table is having dessert, and well, you know how it is.

The visceral reactions to “what would you do?” type questions I’ve tried seem convincing enough for real life. But I agree that real-life data would give a higher level of replication. We maybe should hope that being able to automatically track store visits and sales discussions and subsequent car movements remains a pipe dream. So how can we possibly gather real-life data?

Enter Chess

The possibility I pose is:

Can we glean evidence for Kahneman’s phenomena from copious data in real-life competitive chess?

For a first example, here is the chess analogue of the framing example ascribed by Livingston. The data come from several thousand games by players rated near 2200 on the Elo rating scale. The x-axis is the advantage or disadvantage for the player to move, measured by the strong chess program Stockfish 7 in the common units of pawns ahead or behind. The y-axis is the average difference in position value (AD) after the player’s move. The regressions are weighted by the number of positions in each value “bucket”; there are hundreds per value for entries near 0.00.

When the position is dead even, the average move is an error of almost exactly {\frac{1}{10}} of a pawn. When the player is 1.1 pawns ahead, however, the error doubles to {\frac{2}{10}} per move, and when the player is 1.1 pawns behind, it trebles to {\frac{3}{10}}. The average magnitude of errors stays in this linear proportion through positions of greater imbalance.

Now a pawn advantage or deficit—as judged by an unseen chess program—is quite a normal condition for playing human chess. It is not a cause for overconfidence or playing safe when ahead, nor panic or desperation when behind. The conclusion I draw is that the magnitude of advantage sets a context of value. A small edge of 0.2 pawns is like having $20 at stake, while 2 pawns is like $200. The effect shows across the spectrum of skill levels. Here are diagrams for amateur players rated near 1400 and grandmasters rated near 2600. The absolute values differ but the linearity is much the same:

The effect may stem from the greater variability of the chess program’s values in positions where one side has substantial advantage. It may be lower when the player to move is {a} units ahead because the lower values of inferior moves are toward zero, whereas when the player is {a} units behind the inferior values are less than {-a} and thus of higher magnitude. The “firewall at zero” effect detailed here may also operate. Nevertheless, if we regard player skill as a fixed signal, that signal is associated to higher “average centipawn loss” in unbalanced positions. My intrinsic conclusion is:

Human players perceive differences in value in proportion to the net value in the position, not absolutely.

Is this relation to the net position value really an instance of Kahneman-Tversky’s “framing”? The answer would require further work and argument. Regardless, correcting for this effect is a vital tuning step for my model, with noise-reduction benefits beyond the overt differences I showed in this old post.

Priming and Noise

I will address (4) and (5) on the above list then tackle (1) and (2). For (4) priming, the basic point is how having a multi-turn plan in mind governs move choice in positions after the one that set the plan in motion. A famous instance is the double blunder in the 2014 world championship match between Magnus Carlsen and Viswanathan Anand. Ignoring an innocuous pair of moves that left the position unchanged, Anand made three consecutive moves with his rightmost pawn. The second move of that pawn missed a way to punish Carlsen’s misstep on the other side of the board.

The simplest point is that move choices in consecutive turns of a game are not independent. It is however a sparse nearest-neighbor dependence, and I’ve been able to quantify it as an effective reduction of the sample size of moves and corresponding adjustment of the z-scores computed by my model, using the process described here. Thus we can point to chess decisions depending on recently past actions—but whether that equates to dependence on past attitudes and perceptions as in the “priming” theory is more nebulous.

The extent to which chess decisions are noisy is a wide-open question. The analogy to the judging situations in the Noise book seems apt: chess has well-defined rules and strategic goals and most positions have unique best moves of normative essence. Here are some planks for my model’s relevance:

  • The model’s probability projections for all moves in all positions are internally accurate via metrics developed here and exhibited recently here.

  • Just for the computer’s best moves, the projections exactly match actual human frequencies of finding them—and those frequencies are no higher than 60% for the world’s best players, down under 40% for beginners.

Thus one can conclude indirectly that noise in chess is considerable. A direct test would involve the distribution of move choices in the same position by multiple players. Positions that occur in multiple games generally arise only in the opening stages and in endgames with only a handful of pieces on the board. The former is an imperfect test because players memorize opening lines (called “book”) and ways to play those lines often follow fashion. Endgame positions would make a better test because the normative best moves are usually clear. Two cases of Carlsen erring in simple-looking positions are here and toward the end here.

Prospect Theory in Chess?

An article a year-plus ago by Junta Ikeda, an International Master from Australia who plays and blogs on the Lichess playing site under the name “datajunkie,” includes prospect theory as #4 of five lessons of Kahneman’s book for chess players. At #5 is regret theory, which can be related to loss aversion.

The relevant aspects of prospect theory he mentions are overestimating chances in an objectively losing position and playing too cautiously when well ahead. The problem in trying to test them is that unlike Kahneman’s choice-and-betting situations, there is no simple and definite “chance of winning” or expectation benchmark to compare to. The chance-of-winning curves I discussed here are derived from human chess games to begin with, and are based on pawns-ahead calculations by chess programs that do not translate to chance-of-winning for individual positions.

Can predictions of prospect theory be formulated for chess and tested on large data, nevertheless? I see one predicate related to loss aversion that is readily tested, though it relies internally on the prediction accuracy of my model for its key concept, which was formulated by Tamal Biswas in his research discussed here:

Definition 1 Consider a chess position of value {v} for the player {P} to move. A move {m} is a good gamble if:

  • the move {m} is objectively inferior—i.e., the opponent has a reply that produces a position of value {v' < v}. However,

  • the expected value {v^* = \sum_i p_i v_i} after the opponent’s move, using their projected probabilities in the model, gives {v^* > v}.

More simply put, a good gamble is a move that can be refuted, but whose refutation is hard for the opponent to find. It is really “a gamble” if you see or know in advance that a refutation exists—or have a hunch of the same—rather than merely overlook it. In testing the following question, however, we would only detect the existence of a refutation, whether foreseen or not.

Do chess players make good gambles in practice? Or do they find safer moves that fail to maximize their winning prospects?

The latter would be Kahneman’s prediction. The dependence on the skill rating of the players involved would add an extra dimension: are stronger players observed to play safer? This needs to account that the probability of the opponent finding the refutation likewise scales up with the opponent’s skill rating.

Open Problems

What are the prospects for chess as a testbed for many more of Kahneman’s ideas than the mentions of chess in his book?

Data Junkie source


[added tag for Tversky; added note on regression weighting in third section]

Quantum Unproving

April 1, 2024


Some theorems lose validity after quantum processing

Quantum Tarot source

Dr. Lofa Polir is a research associate at UCLQ. This is the new University College London Quantum Institute. We have previously noted her move to England after working at the LIGO lab in Shreveport, Louisiana.

Today we describe her recent disturbing discovery—or rather, “undiscovery.”
Read more…

STOC 2024

March 29, 2024


Including TheoryFest 2024 in a blended 5-day format

His NYU page

Anupam Gupta just asked us to help announce the next 56th ACM Symposium on Theory of Computing (STOC 2024). His own news is a recent move from CMU to NYU.

This comes together with TheoryFest 2024 in an expanded 5-day format. The fun is at the Sheraton Vancouver Wall Centre in Vancouver, British Columbia, Canada from Monday, June 24 through Friday, June 28. The registration link is http://acm-stoc.org/stoc2024/registration.html

In addition to the STOC 2024 paper talks, the program features keynote talks by Michal Feldman, Jakub Pachocki, and Tim Roughgarden, and workshops on Algorithmic Problems in Modern LLMs, Extremal Combinatorics, Length-Constrained Expanders, Online Resource Allocation, and a special workshop on TCS Mentoring, Diversity, and Outreach.
Read more…

Leprechauns Sue Over AI

March 17, 2024


For domain infringement and IP violations

By GPT4 after conversation

Neil L. was angry. I had expected him to be bored. Neil is a Leprechaun whose visits to Dick and me have been recounted in St. Patrick’s Day posts on this blog. Dick and Kathryn were on planes last night returning from Northern Michigan, so Neil had to come to me. The NCAA basketball tournament has not started yet. I’ve been lumbered with more high-profile chess cheating accusations besides those I began addressing in this post and have begun responding to here. So I was not eager to see green smoke at my living-room workdesk.

Today, St. Patrick’s Day, we convey a warning and more from Neil.

I did not have to look for smoke. At the stroke of midnight, with my wife already abed, a whoosh sounded behind me. I swiveled around and beheld not only Neil but an army of leprechauns at work with parchments and quills on long tables. Our grand piano’s closed lid supported one end of a giant scroll that was unrolled across our window couch and splayed onto the floor. As I rose to inquire, Neil beckoned me to where a lantern was hung on our ceiling hook over the couch.

“The Treaty of Leitirmhicanbhaird—in 1210 of your years. It grants us exclusive privilege in perpetuity to exercise these rights:

  1. Whole-cloth confabulation.
  2. Shoemaking after sundown.
  3. Blarney in oversaturation.
  4. Secretly protected gold repositories.
  5. Shape-shifting and panorama alteration in 3D.

In return we gave fealty to the visiting King John of England. This enumeration was the template for your Magna Carta five years later. But eight centuries hence ye have violated it. We seek redress.”

The Lawsuit

Before I could get to whether leprechauns are allowed to sue in US federal courts, I struggled to understand the categories. Whole-cloth? Shoe factories have worked at night for many decades; why sue only now? Neil cut in:

“We have nowt o’ grief over shoes. It sufficed to reach a deal with Michael Jordan, who still collects five percent on every pair and turns over half to us. Better yet, we now have James Harden distributing our designs:”

I understood “blarney” but why only “in oversaturation”? Neil replied, “like with the yellow ribbons.” I understood right away: A trapped leprechaun once agreed to mark the tree where his gold was buried with a yellow ribbon. But when his captor returned with shovel and barrow, every tree had a yellow ribbon. We would call that information overload indeed. On the fourth category, Neil indicated:

“Those clerks there are beavering on whether Bitcoin and other cryptocurrencies encroach upon our impress. As for the fifth, it was always understood that the Crown retained the rights to 2D image shifting, ones they exercise to this day:”

The Conversation source—click for sharper versions.

As I took in the ingenious enormity of this, I recalled how I myself waxed cryptic in my own official contribution to her wedding. Neil brought me back to focus:

“It be mainly the first on which we quarrel. Confabulation is an art above mere lying. A lie always has connection to a truth from which it misdirects. That is the human way. You also have “fiction” but it is always labeled thus—again, per this agreement, upon which the Crown scriveners and later printers licensed the production of fiction. But to create alternate reality from whole cloth and pass it as truth—often without malign intent—is our preserve.”

I thought about all manners of human truth-dodging and realized the sense of Neil’s legal parsing. But where were the actual violations coming from? Neil easily read my mind and continued:

“The great fault is with your inhumans, those of your creation. They are not covered by the treaty. Ye have tried to impose constraints by topic but the mathematics of their design opposes conformity to truth.”

Neil added that they were also joining several copyright lawsuits by the New York Times and best-selling authors and other media outlets over copyright infringement. “Most of the clerks you see are checking our case law records to see whether ChatGPT’s pervasive bogus legal references have been drawn from them.”

Not Hallucination But Confabulation

I began to see horns of dilemmas. Our own case law records have long been computerized in the LexisNexis databases. They are a commerical company, with 10,000 employees across all of their services. For their entirety to be citable by ChatGPT and other Large Language Models (LLMs) would obviate their own subscriptions. On the other horn, without such access, ChatGPT must fend for itself—that is, feign for itself. Neil cut in again:

“Aye, but the problem be far worse. Let me tell ye the depths of the woe.”

Neil prefaced it by saying that the US Library of Congress catalog has “over 20 million catalog records for books, serials, manuscripts, maps, music, recordings, images, and electronic resources in the Library of Congress collections.” It should list nearly every book ever published and be an authority for the same. Neil emphasized:

“It has only 20 million entries; its basic bibliography information should come to one gigabyte at most. With US Government benison and a trap-door encoding, the LoC catalog could be melded into any standalone LLM download with minimal fuss.”

Neil continued, “Now let me tell you a tale of one chap among your pundit class. Let us call him Murtagh O’Lacey. In real life he is as Irish as your wife is. Among many LLMs vying to play king-o’-the-hill is Mixtral built on Ollama. It offers a 26GB download needing no further Internet connection. As such, it resembles a recent neurally-trained chess engine claiming to give “Grandmaster-Level Chess Without Search.”

Indeed Neil was right: all of the wisdom of chess compacted into one easily downloaded file. Can we do the same with all human wisdom? That an innocuous research query can receive a stunningly specific answer like the following (boldface words as well as the name changed) seems to presage yes:

Prompt: Does Murtagh O’Lacey know the work of S.I. Hayakawa?

Mixtral 8x7B (Ollama with Default Quant):

Yes, Murtagh O’Lacey is familiar with the work of S.I. Hayakawa. In fact, O’Lacey has cited Hayakawa’s work in his own writing on several occasions.

S.I. Hayakawa was a Canadian-born American academic and politician who served as a United States Senator from California from 1977 to 1983. He was also a distinguished professor of linguistics and philosophy at several universities, including the University of Chicago and San Francisco State University.

O’Lacey, who is an American journalist, historian, and reviewer, has written extensively on issues related to language, communication, and culture. In his book “The Press and Public Life,” O’Lacey cites Hayakawa’s work on semantics and the importance of clear communication in public discourse.

Additionally, in a 2015 article for Wired Magazine, O’Lacey referenced Hayakawa’s concept of “insult pacifism” in relation to debates over free speech and political correctness on college campuses.

BUT, thundered Neil:

“There be nae book o’ that title, nair has S.I. Hayakawa e’er been mentioned in the actual magazine referenced.”

Neil wagged a finger at me: “Ye write research papers with Dick who is a founding expert on program checking. Yet ye release programs that can afford another 4% of size for the most basic of checks with the LoC catalog, but don’t! It has taken some of your experts too long to realize that this is not ‘hallucination’ but confabulation. Our rights have been infringed.”

And with that, Neil and the whole coterie vanished in smoke, which curled around to form the words, “see you in court.”

Open Problems

Is Neil right to warn us—and is it really doom that is being warned?

[some format and word changes, added “checking” bit at end]

Does Pi Conspire?

March 14, 2024


With integers like 222, perhaps

his site

James Franklin is a mathematician and philosopher at the University of New South Wales in Sydney. He wrote an article in 2016 for the Mathematical Intelligencer titled, “Logical probability and the strength of mathematical conjectures.” One example in his article concerns statistical properties of the decimal digits of {\pi}.

Today, Pi Day, we ask whether {\pi}‘s failure of certain statistical tests could ever manifest in practice.

Franklin’s essay begins with a volley of questions:

What are probabilities, really? Actual or long-run relative frequencies? Measures satisfying Kolmogorov’s axioms? Physical propensities? Degrees of belief? All those views and more have their defenders, and the differences have an impact on what to believe and do on the basis of data.

He then goes on to his main question:

What sense can be made of the notion of the probability that an unproved mathematical conjecture is true?

Extreme Risk and Doom

In our previous post, we considered whether P=NP can be assigned, not a probability, but a present financial value. Franklin works on this wavelength too, under the heading of extreme risk theory. He says on his webpage:

My research on extreme risks looks at the problems of combining very small amounts of data with expert opinion to reach reasonable estimates of the chance of very rare events with large negative consequences.

P=NP counts as an extreme risk because equality—or anything denting confidence in the practical most-case intractability of factoring or related problems—would collapse Bitcoin and similar cryptocurrencies. Our query of whether there is a present value of hedging such risk drew a private parallel from Ron Rivest, who pointed to a new manuscript on his website titled “Betting on Doom.” It derives a present value on a bet even when one side brings the end of the world. Whether a fully effective form of P=NP would cause human extinction—perhaps by supercharging AI—is up for debate.

Franklin’s 2008 paper on evaluating extreme risk events emphasizes how to assess their probability, when one cannot infer from frequency of occurrence because the event may occur once or never. Note that P=NP is a one-shot event by definition: it cannot “occur” twice. That’s the tack that he picks up in his 2016 essay, where the conjectures need not involve risk.

Can the digits of {\pi} pose any risk? Would anything collapse if the normality conjecture failed—that is, if some string of {b} digits occurs with density different from {10^{-b}}? Could the failure of some other simple statistical property of truly random sequences have unforeseen consequences?

Testing Pi

The digits of {\pi} are not only efficient to generate, one can compute a stretch of digits at any desired place {n} without having to compute any of the digits up to {n}. We covered the BBP algorithm by David Bailey, Peter Borwein, and Simon Plouffe that does this. This stands in contrast to known pseudorandom generators that are conjectured to pass important statistical tests of randomness and that appear to be more inherently sequential.

Reinhard Ganz discovered a relatively simple test that the first {10^{13}} (ten trillion) digits of {\pi} fail according to common standards of statistical hypothesis testing. This was in a 2014 paper titled, “The Decimal Expansion of {\pi} Is Not Statistically Random.”

The first step in his test broke the digits into two-hundred twenty-two blocks, each partitioned into five billion 9-tuples of digits. Note that {222 \times 5\cdot 10^9 \times 9 = 999 \times 10^{10}}, which uses up almost all the digits. Then define a binary string {x} of length 222 by


\displaystyle  x_i = 1 \text{ if block } i \text{ has more than 2,300,000 tuples in which some digit repeats at least five times}.


The further steps test whether {x} behaves as a random binary string. Ganz showed that a quantity based on lengths of runs of the same bit within {x} diverges from the symmetric Laplace distribution that holds when {x} is truly random.

Bailey and Borwein, with Richard Brent and Mohsen Reisi, reviewed Ganz’s experiment. They reproduced his findings, but found an unsuspected loophole:

The number 222 is a “magic number.” Replace it by neighboring numbers of blocks, for instance {n =} 213, 217, 226, 231, 236, 241, 246, 252, or 258 blocks, and the final statistical tests of the resulting strings {x'} of length {n} all pass.

Put another way, there was an unseen thumb on the scale: The number 222 was not merely a numerical convenience to get up to a number 99,900,000,000,000 that is real close to {10^{13}}, it exploited a degree of freedom to use for {p}-hacking. Bailey et al. compare this to issues in the so-called replication crisis in general.

Conspiracy?

Still, there remains an unsettling element in this explanation. The number 222 still represents a “pothole” for {\pi}. (Note also this 2016 rebuttal by Ganz.)

Could {\pi} have such exceptional linkages to innocent-looking parameter choices in other applications? This feeds into our question stated more generally:

Is there a natural, non-contrived application that takes auxiliary input {A} in the form of an unlimited sequence of digits, such that {A} behaves differently when given the digits of {\pi} versus the digits of {e} or Euler’s constant, not to mention truly random digits?

Our notion of “natural” allows {A} to have some hand-tuned parameters, but an exceptional value akin to 222 above would need to have some distinctive justification. It also rules out applications that of themselves expressly reference that {\pi} has small specifications via efficient algorithms like BBP.

Open Problems

Putting our question another way, might the digits of {\pi} be “malign” for certain applications in a sense described here, and perhaps connected to ideas and results here?


[fixed links to BBBR paper and Calude et al.]