Info-metrics and Firm Size Distribution

March 20, 2022

In his "Foundations of info-metrics: Modeling, Inference, and imperfect information," Amos Golan suggests a multi-parameter model for making inferences about firm size distribution. The motivation is providing an example for cases where the available information we have about the system is insufficient for very exact inference. Such problems are widespread in social science, so I believe it is instructive to follow this simple example.

Accordingly, we have a country, Uniformia, where the only information we have about the country is they produce a single output with ten different sizes of firms. This is a very typical case since governments rarely make these micro-level data available, concerned with the privacy of their constituents.

Golan models the firms with a simple approach: for input X and output Y, the production function for the firm of type A may be y_A = f_A (x_A). So different firms with the same amount of inputs may still differ in their outputs since their production functions are not necessarily the same.

The inference problem is finding the distribution of firm sizes, which is highly skewed. Golan suggests not incorporating this information for now since, i) if added as a constraint inappropriately, may bias inference, ii) how to add such information into the inference process is less than trivial for an introductory level exercise.

So what is known now? First, the conservation laws are assumed to be linear, that is:

< X > = Sum_k=1^10 x_k*p_k

< Y > = Sum_k=1^10 y_k*p_k

The above constraints state the arithmetic means of inputs (x_k) and outputs (y_k) are known and denoted as < X > and < Y > respectively. The inference problem could be written as:

Max H(P) = -Sum_k=1^10 p_k*log(p_k)

s.to

< X > = Sum_k=1^10 x_k*p_k

< Y > = Sum_k=1^10 y_k*p_k

Sum_k=1^10 p_k = 1

Which gives the solution: p*_k = exp(-L1*x_k -L2*y_k) / sum(exp(-L1*x_k -L2*y_k))

If you are familiar with the Quantal Response Statistical Equilibrium framework, you realize the normalization (denoted as Sigma(L1, L2) in Golan's work) is rarely readily available, so you can use MCMC or other algorithms to estimate your parameters which then results in the p*_k. In his available codes for the book on the book's website, Golan wrote the constrained optimization problem in Matlab and solved it with fmincon. I was able to reproduce the results, yet not everyone has Matlab, so I wanted to write the codes in R.

He suggested a production function of the form y_k = a_k*x_k^beta_k + e_k, where a_k and beta_k are firm-specific constants, and e_k is somewhat firm-specific errors. I've followed his description in the R codes below.

The problem, therefore, has a true population distribution denoted as "p_pop," and we are trying to make an inference on this problem based on the mean constraints < X > and < Y > that we observe and are calculated using the true population proportions. Again, the situation should be familiar to many researchers having insufficient information about each unit in the system while knowing about the aggregates of the system.

Next, we define the entropy function and introduce the constraints.

Compared with Matlab and Mathematica solutions, the "solnl" function works pretty well for this problem where we have a nonlinear optimization problem with linear constraints. I'm very thankful to Oriol Vallès Codina for sharing his notes about this subject and introducing me to this R package.

Finally, the finding presented in Matlab codes from the book's website is below:

Compared with the results using R:

Hope it will be helpful for future use.

Search This Blog

Olan Biten

Info-metrics and Firm Size Distribution

Comments

Post a Comment

Popular posts from this blog

Information Problem Reconstructed

Clusters

World Inequality Report (2022): Regularities in Inequalities