Some time ago I wrote about volumes of spheres in multi-dimensional phase space – as needed in integrals in statistical mechanics.

The post was primarily about the curious fact that the ‘bulk of the volume’ of such spheres is contained in a thin shell beneath their hyperspherical surfaces. The trick to calculate something reasonable is to spot expressions you can Taylor-expand in the exponent.

Large numbers ‘do not get much bigger’ if multiplied by a factor, to be demonstrated again by Taylor-expanding such a large number in the exponent; I used this example:

Assuming N is about 10^{25 }then its natural logarithm is about 58 and , then 58 can be neglected compared to N itself.

However, in the real world numbers associated with locations and momenta of particles come with units. Calling the unit ‘length’ in phase space the large volume can be written as , and the impact of an additional factor N also depends on the unit length chosen.

I did not yet mention the related issues with the definition of entropy. In this post I will follow the way Landau and Lifshitz introduce entropy in Statistical Physics, Volume 5 of their Course of Theoretical Physics.

Landau and Lifshitz introduce statistical mechanics top-down, starting from fundamental principles and from Hamiltonian classical mechanics: no applications, no definitions of ‘heat’ and ‘work’, nor historical references needed for motivation. Classical phenomenological thermodynamics is only introduced after their are done with the statistical foundations. Both entropy and temperature are *defined *– these are useful fundamental properties spotted in the mathematical derivations and thus deserve special names. They cover both classical and quantum statistics in small number of pages – LL’s style has been called terse or elegant.

The behaviour of a system with a large number of particles is encoded in a probability distribution function in phase space, a density. In the classical case this is a continuous function of phase-space co-ordinates. In the quantum case you consider distinct states – whose energy levels are densely packed together though. Moving from classical to quantum statistics means to count those states rather than to integrate the smooth density function over a volume. There are equivalent states created by permutations of identical particles – but factoring in that is postponed and not required for a first definition of entropy. A *quasi-classical* description is sufficient: using a minimum cell in phase space, whose dimensions are defined by Planck’s constant h that has a dimension of action – length times momentum.

**Entropy as statistical weight**

Entropy S is *defined* as the logarithm of the *statistical weight* – the number of quantum states associated with the part of phase phase used by the (sub)-system. (Landau and Lifshitz use the concept of a – still large – subsystem embedded in a larger volume most consequentially, in order to avoid reliance on the ergodic hypothesis as mentioned in the preface). In the quasi-classical view the statistical weight is the volume in phase space occupied by the system divided by the size of the minimum unit cell defined by Planck’s constant h. Denoting momenta by *p*, positions by *q*, the ‘volume’ in phase space by times , and degrees of freedom by *s *

An example from solid state physics: if the system is considered a rectangular box in the physical world, possible quantum states related to vibrations can be visualized in terms of possible standing waves that ‘fit’ into the box. The statistical weight would then single out those bunch of states the system actually ‘has’ / ‘uses’ / ‘occupies’ in the long run.

Different sorts of statistical functions are introduced, and one reason for writing this article to emphasize the difference between them: The density function associates each point in phase space – each possible configuration of a system characterized by the momenta and locations of all particles – with a probability. These points are also called microstates. Taking into account the probabilities to find a system in any of these microstates gives you the so-called macrostate characterized by the statistical weight: How large or small a part of phase space the system will use when watched for a long time.

The canonical example is an ideal gas in a vessel: The most probable spatial distribution of particles is to find them spread out evenly, the most unlikely configuration is to have them concentrated in (nearly) the same location, like one corner of the box. The density function assigns probabilities to these configurations. As the even distribution is so much much more likely, the part of the statistical weight would cover all of the physical volume available. The statistical weight function has to obtain a maximum value in the most likely case, in equilibrium.

**The significance of energies – and why there are logarithms everywhere.**

Different sufficiently large subsystems of one big system are statistically independent – as their properties are defined by their bulk volume rather than their surfaces interfacing with other subsystems – and the larger the volume, the larger the ratio of volume and surface. Thus the probability density function for the combined system – as a function of momenta and locations of all particles in the total phase phase – has to be equal to the product of the densities for each subsystem. Denoting the classical density with and adding a subscript for the set of momenta and positions referring to a subsystem:

(Since these are probability densities, the actual probability is always obtained by multiplying with the differential(s) ).

This means that the logarithm of the composite density is equal to the sum of the logarithms of the individual densities. This the root cause of having logarithms show up everywhere in statistical mechanics.

A mechanical system of particles is characterized by only 7 ‘meaningful’ additive integrals: *Energy*, *momentum* and *angular momentum* – they add up when you combine systems, in contrast to all the other millions of integration constants that would appear when solving the equations of motions exactly. Momentum and angular momentum are not that interesting thermodynamically, as one can change to a frame moving and rotating with the system (LL also cover rotating systems). So energy remains as the integral of outstanding importance.

**From counting states to energy levels
**

What we want is to relate entropy to energy, so assertions about numbers of states covered need to be translated to statements about energy and energy ranges.

LL denote the probability to find a system in (micro-)state n with energy as – the quantum equivalent of density . has to be a linear function of the energy of this micro-state as per the additivity just mentioned above, and thus LL omit the subscript n for w:

(They omit any symbol ever if possible to keep their notation succinct ;-))

A thermodynamic system has an enormous number of (mechanical) degrees of freedom. Fluctuations are small as per the law of large numbers in statistics, and the probability to find a system with a certain energy can be approximated by a sharp delta-function-like peak at the system’s energy E. So in thermal equilibrium its energy has a very sharp peak. It occupies a very thin ‘layer’ of thickness in config space – around the hyperplane that characterizes its average energy E.

Statistical weight can be considered the width of the related function: Energy-wise broadening of the macroscopic state needs to be translated to a broadening related to the number of quantum states.

We change variables, so the connection between Γ and E is made via the derivative of Γ with respect to E. E is an integral, statistical property of the whole system, and the probability for the system to have energy E in equilibrium is . E is not discrete so this is again a probability density. It is capital W now – in contrast to which says something about the ‘population’ of each quantum state with energy .

A quasi-continuous number of states per energy Γ is related to E by the differential:

.

As E peaks so sharply and the energy levels are packed so densely it is reasonable to use the function (small) w but calculate it for an argument value E. Capital W(E) is a probability density as a function of total energy, small w(E) is a function of discrete energies denoting states – so it has to be multiplied by the number of states in the range in question:

Thus…

.

The delta-function-like functions (of energy or states) have to be normalized, and the widths ΔΓ and ΔE multiplied by the respective heights W and w taken at the average energy have to be 1, respectively:

(… and the ‘average’ energy is what is simply called ‘the’ energy in classical thermodynamics).

So is inversely proportional to the probability of the most likely state (of average energy). This can also be concluded from the quasi-classical definition: If you imagine a box full of particles, the least possible state is equivalent to all particles occupying a single cell in phase space. The probability for that is (size of the unit cell) over (size of the box) times smaller than the probability to find the particles evenly distributed on the whole box … which is exactly the definition of .

The statistical weight is finally:

.

… the broadening in , proportional to the broadening in

**The more familiar (?) definition of entropy**

From that, you can recover another familiar definition of entropy, perhaps the more common one. Taking the logarithm…

.

As log w is linear in E, the averaging of E can be extended to the whole log function. Then the definition of ‘averaging over states n’ can be used: To multiply the value for each state n by probability and sum up:

.

… which is the first statistical expression for entropy I had once learned.

**LL do not introduce Boltzmann’s constant k here**

It is effectively set to 1 – so entropy is defined without a reference to k. k is is only mentioned in passing later: In case one wishes to measure energy and temperature in different units. But there is no need to do so, if you defined entropy and temperature based on first principles.

**Back to units**

In a purely classical description based on the volume in phase space instead of the number of states there would be no cell of minimum size, and then instead of the statistical weight we had simply this volume: But then entropy would be calculated in a very awkward unit, the logarithm of action. Every change of the unit for measuring volumes in phase space would result in an additive constant – the deeper reason why entropy in a classical context is only defined up to such a constant.

So the natural unit called above should actually be Planck’s constant taken to the power defined by the number of particles.

**Temperature**

The first task to be solved in statistical mechanics is to find a general way of formulating a proper density function small as a function of energy . You can either assume that the system has a clearly defined energy upfront – the system lives on a ‘energy-hyperplane in phase space’ – or you can consider it immersed in a larger system later identified with a ‘heat bath’ which causes the system to reach thermal equilibrium. These two concepts are called the micro-canonical and the canonical distribution (or Gibbs distribution) and the actual distribution functions don’t differ much because the energy peaks so sharply also in the canonical case. It’s that type of calculations where those hyperspheres are actually needed.

Temperature as a concept emerges from a closer look at these distributions, but LL introduce it upfront from simpler considerations: It is sufficient to know that 1) entropy only depends on energy, 2) both are additive functions of subsystems, and 3) entropy is a maximum in equilibrium. You divide one system in two subsystems. The total change in entropy has to be zero as this is a maximum (in equilibrium), and what energy leaves one system has to be received as by the other system. Taking a look at the total entropy S as a function of the energy of one subsystem:

So has to be the same for each subsystem x. Cutting one of the subsystems in two you can use the same argument again. So there is one very interesting quantity that is the same for every subsystem – . Let’s call it 1/T and let’s call T the* temperature*.