Skip to content

Advancing and Counting

March 7, 2021


Announcing tomorrow’s Women in Data Science workshop (global start tonight 8pm ET), plus a US State Department event for International Women’s Day (also March 8)

Santa Fe Inst. external faculty src

Dana Randall is an ADVANCE Professor of Computing and also is an Adjunct Professor of Mathematics at the Georgia Institute of Technology. She is a terrific speaker and teacher and leader. Her class ratings are off the charts. See also AMS for her past special talks.

Today I thought we would discuss her research and its connections to complexity theory and to physics and to math in general.

Dana does research into the boundary between math and physics. At the highest level Dana seeks to understand random processes, especially those connected to physical systems. The difficulty, in my opinion, is that sometimes the random system is not artificial. This means that we have no control over the system, and this makes the analysis of its behavior that much harder.

Another way to say this is: we often fare better when we can control the exact random process. When someone else gets to decide on what the process is, we are often in trouble. The system might behave badly, or even worse, might be hard to understand. Nature is often that way—not always thinking about making the analysis of a system easy.

Shuffling

Let’s make this concrete. Suppose that you or Dana were presented with three methods for shuffling a deck of {n} cards—when we play cards {n} is {52}. Imagine the methods are:

  1. This method selects {n} numbers in the range {[1,n]} and checks for repeats. If there are, then try again. Use the permutation to shuffle.
  2. This method takes a deck of {n} numbers and repeatedly shuffles them.
  3. This method executes the following code:

The task of understanding these methods is before us. The first (1) is slow, even for modest size {n}. The chance of getting a permutation on a given trial is {\frac{n!}{n^n}}, which for {n= 52} is {4.7257911 \times 10^{-22}}, which equals

\displaystyle  0.00000000000000000000047257911.

But it does generate a fair shuffle—all possible ones are equally likely. And the proof of this is easy. The second (2) is more complicated. The final shuffle depends on the number and manner we use to shuffle the deck. The final analysis is messy.

The third one (3) is due to Ronald Fisher and Frank Yates, who discovered it in 1938. It has an elegant, but nontrivial, analysis. It is both exact in that all orderings are equally likely, and it takes time linear in {n}. The history of it is:

The modern version of the Fisher-Yates shuffle, designed for computer use, was introduced by Richard Durstenfeld in 1964 and popularized by Donald Knuth in The Art of Computer Programming as “Algorithm P (Shuffling)” […apparently unawares; Fisher and Yates were acknowledged in later editions of Knuth’s text…]

My guess is that Dana, if given these methods, would not be interested in (1): too slow on one hand and trivial on the other. Nor interested in (2): not elegant and messy. Perhaps (3) would accord with her work: it has a nice analysis and runs in linear time.

Advancing

As we said earlier, Dana is part of ADVANCE at Georgia Tech:

Georgia Tech’s ADVANCE Program seeks to develop systemic and institutional approaches that increase the representation, full participation, and advancement of women and minorities in academic STEM careers—thus contributing to a more diverse workforce, locally and nationally.

Dana has been and is a leader in helping advance these goals. It is especially relevant since this Monday, March 8th is special. It is Celebrate International Women’s Day with WiDS. See this for details:

Join us for the 24-hour virtual WiDS Worldwide Conference. We’ll follow the sun, bringing you speakers from around the world on International Women’s Day beginning at 1:00 am GMT March 8 (5:00 pm PST March 7).

In making a collage of their speaker page, we have compressed and rearranged it somewhat. And we have added the logo for the regional events happening around the globe (some already past) and one for their sponsors.


If you could not take time to follow all the talks, you might take a few for a sample. Maybe you would figure that taking the first four speakers in alphabetical order, or the last four, would be as random a sample as any—after all, what’s in a name? Well, if you took the last four, you would actually get three of the four keynote speakers. Sometimes procedures that we hope would give “random” samples in fact give special ones. That takes us back to one more topic in Randall’s work.

Counting

One benefit of a random process is that under good conditions it can give us an accurate small sampling of a large and complex system. We have mentioned dimension reduction in this context. A simpler task is just to get an approximate count of entities in the system. Sometimes one can control the system, but sometimes not.

One success is represented by a paper with Sarah Miracle of the University of St. Thomas. It is about counting colorings in multigraphs {G = (V,E)} that do not violate simple constraints on the edges {(u,v)}. Each edge has a forbidden pair {(c,c')} of colors, and a coloring {\chi} defined on {V} is legal provided it does not have both {\chi(u) = c} and {\chi(v) = c'}. Multiple edges between {u} and {v} can enforce multiple such constraints. Several natural problems can be represented via this one. The paper is one where their Markov Chain methods do well.

A second recent paper uses Markov chains to count elements of a given rank in finite partially ordered sets. The chains should be biased according to the structure of the Hasse diagram of the poset. The trick in the paper is a way to balance the bias so as to prevent states that need to be counted from have too low frequency. This enables a direct analysis of the mixing time. The notable application was the first provably efficient ways to sample uniformly from certain kinds of partitions of {n}, for {n} quite large.

The flip side—a phase change being a kind of a flip—is represented by other work with myriad collaborators that is summarized in a wonderful talk she gave at a workshop marking the 30th anniversary of DIMACS. As with an earlier version at a Schloss Dagstuhl workshop, the talk was titled, “Phase Transitions and Emergent Phenomena in Algorithms and Applications”:

Markov chain Monte Carlo methods have become ubiquitous across science and engineering as a means of exploring large configuration spaces. The idea is to walk among the configurations so that even though you explore a very small part of the space, samples will be drawn from a desirable distribution. Over the last 30 years there have been tremendous advances in the design and analysis of efficient sampling algorithms for this purpose, largely building on insights from statistical physics. One of the striking discoveries has been the realization that many natural Markov chains undergo a phase transition where they change from being efficient to inefficient as some parameter of the system is varied.

Here are a video and slides. The first main slide is about programmable active matter and it interests me especially to see DNA computing included. These systems can have emergent behavior, and while that can ruin randomized procedures that would bank on the system staying stable, it opens other opportunities.

I, Dick, have run into this type of issue before. I have been on the wrong side, with theorems that were weak because we assumed the process could not be changed. Others who followed us changed the process—got stronger results with easier proofs. Is there a name for this?

Open Problems

Take a look at the talks for the WiDS Worldwide Conference this Monday.

Ken also notes another event happening tomorrow: a 10am ceremony for the International Women of Courage Award. His friend the Iranian chess arbiter Shohreh Bayat is among the honorees, after a story told in her own words here in the Washington Post. Ken was working with her, on statistical assurance against cheating in the championship match, at the time. The ceremony starts at 10am ET hosted by the US Department of State, with opening remarks by Dr. Jill Biden.

We must add that there is something that is hard about the type of random processes that Dana studies. We tried to explain what makes her work deep, but perhaps we did not properly explain it.


[added global start time of workshop to subtitle]

2 Comments leave one →
  1. March 7, 2021 3:09 pm

    I’m proud to say that Dana is also an External Faculty member at the Santa Fe Institute.

    • March 7, 2021 3:27 pm

      Indeed, the last edit (I, Ken) made to this post was to source that fact to the photo at top.

Leave a Reply

Discover more from Gödel's Lost Letter and P=NP

Subscribe now to keep reading and get access to the full archive.

Continue reading