In classical mechanics you want to understand the motion of all constituents of a system in detail. The trajectory of each ‘particle’ can be calculated from the forces between them and initial positions and velocities. In statistical mechanics you try to work out what can still be said about a system even though – or because – you cannot solve these equations anymore and you do not (want to) know each particle’s initial condition.
Phase Space and Probability Density
Describing mechanical systems statically means to assign a certain probability to a ‘state’ of a system. From a microscopic perspective, the state is determined by the positions and velocities of all particles. The velocities (more exactly: momenta) are needed because the equations of motions are second order differential equations, relating accelerations to forces. Thus positions (more exactly: degrees of freedom) and their first derivatives have to be provided for some point of time. But the values of positions and momenta are real numbers that can change continuously; this is not like coins or dice where you can assign a probability value to a few possible outcomes. You cannot assign a probability to a state of, say, all particles being placed exactly at the corners of a cubic lattice and having a specific velocity. The probability for that exact state would be zero(*). All probabilities for all possible states need to add up to unity; so the way to treat this is by introducing probability densities: The probability of the system to be in some state approximately is a probability density times a small volume surrounding that point we want to pin down. If positions and momenta are denoted symbolically as q and p, then probability of the state ‘q,p’ is equal to the probability density ρ times the ‘volume’: ρ(q,p)dqdp.
This begs the question: What is this ‘space’ in which a ‘state’ of this system of trillions of particles would be described by ‘a point’? Every point in that space would represent the locations and velocities of all the particles. It is called a phase space – an abstract space with dimensions equal to 6 times the number of particles – 3 for the spatial coordinates of each point particle and 3 components of its momentum. Every possible detailed combination of all positions and velocities of all particles is stuffed into one giant vector, one point in phase space.
When a mechanical system is left to itself, the point representing its state wanders through phase space with time. It will occupy some place much more often than others. For example, it will very rarely touch a point where the spatial coordinates of all particles are nearly the same, where all particles huddle together in a corner. The probability density is high in places where the wandering system spends more time.
Instead of thinking of one system walking through phase space you can also think of a large number of identical systems – dots thrown into phase space randomly (with random initial conditions for the particles they contain). Probability density is encoded by this ‘patch’ of millions of dots. If all these systems are let loose and start to wander, each of them will visit the more probable places in phase space more often. Which places are more popular depends on the system’s total energy: A single constraint in phase space represents a ‘surface’ with one dimension less than the whole space. If energy is constant, the system is restricted to a thin layer near this hypersurface of constant energy.
The shape of the probability density function will not change with time – just as the shape of a river does not change despite the steady flow of water. This comparison is not a metaphor: The behavior of probability density matches that of the density of an incompressible fluid exactly. The system dots cannot be destroyed or created; they just move around. You can set up an equation for the local invariance of this ‘material’: If density changes with time in a selected volume, it can only change so because material flows in or out the volume. This flow is steady, so the density does not depend on time explicitly. There is a lack of ‘sources’ – the divergence of the current density associated with probability density is zero. You can combine this statement with what you know about the motion of each particle – how the changes of position and momentum can be derived from the system’s total energy (Hamiltonian equations). The final result is again: The probability density does not depend on time at all, also not indirectly via the underlying motions of the particles.
So probability density is a constant, if / as long as this system’s energy is constant. The system dots move but stay close to the energy hypersurface(*). If the system’s total energy would change, it would do so very slowly compared to the intrinsic motion of the dots which change their places all the time. This can be achieved by envisaging every system to be part of a larger system – to be a subsystem. Each subsystem contains so many particles that statistical averaging still makes sense. Different subsystems interact: Particles assigned to different systems meet and mingle at the interfaces. But if subsystems are large, their interfaces are small compared to their volume (from dimensional analysis), so subsystems can be considered to be rather independent and have well-defined energies.
This statistical independence plus the invariance of the probability density explains naturally why logarithm (or exponential) functions are at the heart of statistical mechanics and thermodynamics. This is where factors like e to the power of a ratio of characteristic energies and the energy equivalent of a certain temperature can be finally traced back to. Trying to pin down the origin of these logarithms and exponential functions in thermodynamics and statistical physics was my motivation to write this article.
If you separate a large system into two subsystems with half the total number of particles each, each of them ‘lives’ in a phase space defined by positions and momenta of its respective particles. The combined system has a phase space of twice the dimensions. As systems only interact weakly at their interfaces, they are statistically independent: There is a probability for subsystem 1 to occupy configuration 1 and subsystem 2 to occupy configuration 2. Sealing the systems together at the interfaces leaves the internal states of each subsystem (nearly) unchanged, and the probability to find the exact combination of part 1 being in state 1 and part 2 being in state 2 is the product of the independent individual probabilities:
ρ12(q1,p1,q2,p2) dq1dp1dq2dp2 = ρ1(q1,p1)dq1dp1 . ρ2(q2,p2)dq2dp2
= ρ1(q1,p1) ρ2(q2,p2) dq1dp1dq2dp2
ρ12(q1,p1,q2,p2) = ρ1(q1,p1) ρ2(q2,p2)
Additive Constants of Motion
Probability density is a function of all the p’s and q’s, yet it does not change with time. The same is true for seven other properties of closed systems in classical mechanics: Energy – and the components of the vectors of momentum and of angular momentum. The latter vectors can be ‘transformed away’ by hopping into another, moving coordinate system. What remains as a truly internal property of a system is energy. Hence is overarching importance in statistical mechanics.
Actually, more constants of motion can be found: Nearly every integration constant of the Newtonian equations of motions is a constant of motion. There are two times the number of degrees (f) of freedom of them. This is typically stated as: There can be 2f-1 constants of motion as a constant time value can be chosen as one of the constants. This is like shifting some ‘start time’, so one of the constants does not count. The 2f-1 constants can be combined to form other constants, for example the total energy or momentum. But only the seven(**) constants mentioned above have the special simple property of additivity: The property of a combined system is the sum of the respective properties calculated only for the subsystems.
There is a new additive constant of motion lurking the definition of probability density. An additive constant can be defined by applying the logarithm function, because from …
ρ12 = ρ1 ρ2
logρ12 = logρ1 logρ2
So logρ is a function of the p’s and the q’s that is a constant of the motion. But as only seven (**) functions of that type exist for a mechanical system and this has to hold for any division into any combination of volumes, this new constant of motion can only be a linear combination of the other constants. Neglecting the vectors and keeping only energy E, this means that for some subsystem …
logρ(p,q) = α + βE(p,q)
where α and β are numbers, characteristic for the subsystem.
Applying the exponential function to both sides of this equation, a familiar pattern starts to appear: A probability (density) is equal to an exponential function that has energy in its exponent. The factor β will finally be related to temperature, and something closely related to the logarithm of probability density will deserve a nicer name … entropy.
This post comprise some ideas that form a prequel for my older article about entropy – which also has much formulas and derivations. In both posts I mainly followed Landau-Lifshitz’s textbook on statistical physics, part of their course of theoretical physics. A book of which David Tong aptly says: Russian style: terse, encyclopedic, magnificent. Much of this book comes across as remarkably modern given that it was first published in 1958.
This post was also an experiment to describe math with words, not formulas, yet without metaphors. But a long time ago I tried to find a metaphor for points in phase space – and used the result on an election, a Political State Vector.
(*) The Delta Function introduced in the last postings would help with describing point-like distributions that would still give unity when integrated. However, even when systems are described classically, you (need to) add a seemingly artificial ‘cell’ in phase space of minimum size whose value can only be understood using quantum mechanics. This minimum quantum cell is one way to understand why entropy can only be described up to a constant in classical physics, and that its value at the absolute zero temperature is zero. The minimum cell can also explain (in part) why the particles in the patch occupied by the points in phase space can move a bit without changing the density. You count points per cell but do not pin to them down at ever growing resolution. The area covered with dots will turn into a bit finer ‘foam’ but there won’t be large holes so that the idea of an incompressible fluid remains justified.
(**) You can form one vector more which is often not mentioned: It is the ‘initial position of the center of gravity’, from the actual location minus time times the center’s velocity. The reason why it is not as important: All the additive constants are tied to a fundamental symmetry of the equations of motion: Energy to invariance with shift of time, momentum to displacement, angular momentum to rotations. This ‘initial position of the center of gravity’ would be tied to Galilean boosts (changing inertial frames), thus its only holds for non-relativistic classical mechanics.