Ginicoef

The Gini coefficient, an economic metric developed by the Italian statistician Corrado Gini, is a frequently used measure of economic inequality and the principle can be applied to income, wealth or other metrics of economic well-being. Gini was apparently a supporter of Italian fascism and eugenics, so one shouldn’t take all of his work too seriously. But the coefficient is often used as an informative metric of inequality even by contemporary economists with much more progressive values.

The first step to calculating the Gini coefficient (for income) is to divide the economy in question into income groups and arrange the groups by rank of income from poorest to richest from left to right. One then constructs a line plot of the cumulative income share of people in the group of interest or poorer (on the y axis) against the cumulative portion of the population (on the x axis). This may be represented by a box from 0-1 (in proportion terms) or 0-100 (in percentage terms). So for example, in a society with complete income equality divided into even population quintiles, the “x” points are at 20%, 40%, 60%, 80% and 100% and the proportion of the total income of each group would be the same (20%, 40%, 60%, 80% and 100%).

In this artificial case of perfect equality, these five points describe a straight line with a slope of 1 or y=x. But more fundamentally, the curve connecting the values of cumulative income (or wealth) is the “Lorenz curve” named after American economist Max Lorenz. In real economies, the Lorenz curve is always concave (i.e. to keep with the example of quintiles, the poorest fifth of the population always have less than 1/5th the total income and the richest always have more).

If one takes the area between the y=x line and the observed Lorenz curve and divides that quantity by the total area under the y=x line, one has a quotient that is bounded by 0 and 1 (although sometimes people multiply by 100 to represent this in percentage terms). This quotient is the Gini coefficient. With infinitely many quantiles, this can be written as an integral formula, or the Lorenz curve can be plotted via an interpolation scheme. The income Gini coefficient of the USA is around 42% which is fairly high, especially relative to other affluent nations. The countries with the highest Gini coefficients are mostly in southern Africa and those with the lowest Gini coefficients are in Europe. Here’s a link to Gini coefficients in countries around the world:

https://worldpopulationreview.com/country-rankings/gini-coefficient-by-country

The highest income Gini coefficients are in the neighborhood of 60% and the lowest are around 25%.

It should be noted that wealth Gini coefficients are always higher than income Gini coefficients – sometimes by a large margin. Affluent families tend to pass on their wealth, whereas poorer families live a much more tenuous life and may not have any wealth to pass on from generation to generation. As countries develop from poverty, at first, their Gini coefficients tend to rise (money comes into the hands of the relatively few as a society moves from a subsistence economy to one with more of a goods and services base). But for middle and higher income countries, more advanced economic development tends to level the playing field and reduce poverty a bit, and there is some evidence that for affluent countries, lower Gini coefficients are better for overall growth and prosperity. But of course the details depend on the policies, laws and power dynamics of each individual country. While the US is a prosperous country, its Gini coefficient has increased over time and the economic gains are very disproportionately going to those who are already wealthy.

Consider two imaginary societies divided into population quintiles. In society 1, the poorest quintile has 10% of the income, the second poorest has 15%, the middle quintile has 20%, the second richest has 25% and the richest has 30%. In society 2, the poorest quintile has 3% of the income, the second poorest has 6%, the middle quintile has 11%, the second richest has 20% and the richest has 60%. Clearly, the society 1 is more equitable than society 2 and has a lower Gini coefficient. The Gini coefficient for society 1 is 20%, whereas for society 2 is 51.2%. The Lorenz curves of the two societies are plotted below using straight linear interpolation.

The code below uses a slightly different approach to calculating the Gini coefficient by calculating the differences in histograms centered on the middle of each quantile, rather than taking the coordinate point at the right edge of each quantile. It can be shown that if one interpolates linearly between data values, these two methods are equivalent. If one were to use a line plot approach with a curving interpolation technique, the Gini coefficients would tend to be higher. One should also note that two Lorenz curves that produce the same Gini coefficient may have a different appearance.

R script

#This script calculates the Gini coefficient based on the basis of vector p (length 5) of
# income or wealth share. The default vector p represents a highly stratified economy.
n = 5
m = 2
p <- c(3, 6, 11, 20, 60, 10, 15, 20, 25, 30)
dim(p) <- c(n,m)
print(p)
cump <- c(cumsum(p[,1]),cumsum(p[,2]))
dim(cump) <- c(n,m)
print(cump)
ave <- matrix(NA,n,m)
for (i in 1:1) {ave[i,] = (cump[i,])/2} #{ave[i,] = cump[i,]/2}
for (i in 2:n) {ave[i,] = (cump[i,]+cump[i-1,])/2}
print(ave)
perfect <- c(10, 30, 50, 70, 90)
PERFECT <- matrix(replicate(m,perfect),nrow=n)
print(PERFECT)
diff <- PERFECT-ave
print(diff)
Gini <- 100*c((sum(diff[,1])/sum(PERFECT[,1])), (sum(diff[,2])/sum(PERFECT[,2])))
print(paste('Gini', Gini))

x <- matrix(NA,n+1,1)
for (i in 1:1) {x[i,] = 0}
for (i in 2:2) {x[i,] = (i-1)*100/n}
for (i in 2:n+1) {x[i,] = (i-1)*100/n}
print(x)
cumpr <- matrix(NA,n+1,m)
for (i in 1:1) {cumpr[i,]=0}
for (i in 2:2) {cumpr[i,] = cump[i-1,]}
for (i in 2:n+1) {cumpr[i,]=cump[i-1,]}
plot(x, x, type="l", col="black", pch="o", lty=1, ylim=c(0,100), main = "Gini plot", xlab = "percentage of population", ylab = "percentage of income")
lines(x,cumpr[,1],col="blue") 
lines(x,cumpr[,2],col="green")
legend(0, 100, legend=c("perfect equality", "Gini=51.2","Gini=20"), fill = c("black","blue","green"))

Python script

#This script calculates the Gini coefficient based on the basis of matrix P of income
#or wealth share. n is the number of categories of wealth or income in the analysis. 
#m is the number of datasets. Values in P are expressed as a percentage of total.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

n = 5
m = 2
P = np.transpose(np.array([[3, 6, 11, 20, 60], [10, 15, 20, 25, 30]]))
print(P)
C = np.cumsum(P,axis=0)
print(C)

ave = np.zeros((n,m)) 
ave[0,] = (C[0,])/2
for i in range(1,n):
    ave[i,] = (C[i,]+C[i-1,])/2
#print(ave)

perfect = np.array([10, 30, 50, 70, 90])
PERFECT = (np.ones((m,n))*perfect).transpose()
#print(PERFECT)
diff = PERFECT-ave
#print(diff)
Gini = np.multiply(100,(sum(diff)/sum(PERFECT)))
print('Gini',Gini)

Cstar = np.append([[0,0,]],C,axis = 0)
print(Cstar)

x = np.transpose(np.array([0, 20, 40, 60, 80, 100]))
plt.title('Gini Coefficient')
plt.xlabel('percentage of population')
plt.ylabel('percentage of income')
plt.plot(x, x, label = "perfect equality")
plt.plot(x, Cstar[:,0], label = "Gini = 51.2")
plt.plot(x, Cstar[:,1], label = "Gini = 20")
plt.legend()
plt.show()

Matlab script

% This script calculates the Gini coefficient on the basis of vector p (length 5) of
% income or wealth share. The default vector p here is of a fairly
% stratified economy.
n = 5
m = 2
p = [3 6 11 20 60; 10 15 20 25 30]
P = p'
for i = 1:n
    cump(i,:) = sum(P(1:i,:),1)
end
for i = 1
    ave(i,:) = cump(i,:)/2
end
for i = 2:n
    ave(i,:) = (cump(i,:)+cump(i-1,:))/2
end
perfect = [10 30 50 70 90];
PERFECT = repmat(perfect',1,m)
diff = PERFECT - ave
Gini = 100*m*sum(diff(1:n,:),1)/sum(sum(PERFECT,1),2)

x = [0 20 40 60 80 100]
X = x'
cumpstar = [[0,0];cump]

plot(x,x,x,cumpstar)
title('Gini Coefficient')
xlabel('percentage of population')
ylabel('percentage of income')
legend('perfect equality','Gini = 51.2','Gini = 20')