Assume there are V independent states of information. Then we can represent approximately NV digits in base N.
The amount of information we can represent is I=NNV.
The value of Nthat maximizes I (either where the derivative is 0(if the second derivative is negative) or at infinity (if the second derivative is positive)) is the most "efficient" base.
So we take the natural log: ln(I)=NVln(N)
And take the derivative to N: (ln(I))′=VN2(1−ln(N)).
We then set (ln(I))′=0. Solving, N=e.
Take the second derivative: (ln(I))′′=VN3(2ln(N)−3)
When N=e, (ln(I))′′=−e3V which is negative (recall that V is positive), so I reaches its maximum when N=e.