1 Introduction

One of the most extended ways of tax evasion is that related to the underreporting of income by self-employed workers. As long as their incomes are not subject to third-party reporting, the probability of being detected by the tax authorities in case of hiding earnings is lower than in the case of salary workers, and this leads to higher levels of tax evasion. This acquires more relevance in a country like Spain, where the high levels of unemployment clearly affect the individual’s decisions to work in the underground economy (Ahn and De la Rica 1997), and the lack of flexible enough rules in the labor markets may encourage the informal economy (Bentolila et al. 1994).

As other phenomena of tax evasion, the first challenge to approach it lies in the difficulty of measuring the extent of such concealment. The standard method is based on the seminal paper by Pissarides and Weber (1989), which uses the Engel curves for food demand. The underlying idea is simple. Both salary and self-employed workers report accurately their food expenditures in household budget surveys. By contrast, when they are asked about their earnings, only the salary workers say their true income. The estimate of underreporting of income by the self-employed workers is then given by the comparison of food expenditures of both groups in function of declared income, given other economic and demographic characteristics. A detailed explanation of this method is provided in the next section.

On this basis, a number of papers have offered estimates of underreporting for different samples. In essence, what is computed is the number by which the reported income of self-employed has to be multiplied to obtain the true income. For the UK economy, Pissarides and Weber (1989) give a central value of 1.55 in 1982. From another point of view, Lyssiotou et al. (2004), using a complete demand system approach and non-parametric estimation methods, suggest that the extent of underreporting by self-employed workers in the UK in 1993 goes from 118 % for households with head in blue collar occupation to 64 % for white collar jobs.

With data of Canada, Schuetze (2002) finds, for some years between 1969 and 1992, estimates that go from 11 to 23 % as average values of lower and upper bound estimates, respectively. For the period 1994–1996, Johansson (2005) gives a range of estimates between 16 and 40 % of underreporting in Finland, depending on the definition used for the self-employed household. More recently, Engstrom and Holmlund (2009) conclude that the Swedish households with at least one self-employed member underreport their income by around 30 % in early 2000s. And Hurst et al. (2011), using three data samples for the US in the 80s, 90s and early 2000s, estimate the degree of underreporting by between 25 and 35 %.

From the very beginning of this literature, most of papers assume that employees do not hide part of their income and underreporting is exclusively concentrated on self-employed workers. But this simplifying assumption is weak from both theoretical (see, for instance, Kolm and Nielsen (2008), for a model with concealment of income by firms and salary workers) and empirical points of view. In this sense, the 2007 Eurobarometer shows that 5 % of all dependant employees in a representative sample of individuals in the EU admitted having received all or part of their salary as envelope or cash-in-hand wages.Footnote 1

Our own sample shows a number of indications leading to think that also the salary workers partly hide their earnings. For instance, about 12 % of households declaring that the main source of their income is on their own (“cuenta propia”) classify themselves as salary workers. Furthermore, the number of employees that do not inform on their monthly income in the survey is about twice than that of respondents (obviously, this figure is higher in the case of self-employed workers—about three times—but both of them may reveal concealment of income).

This paper applies the methodology by Pissarides and Weber (1989) to get an estimation of the extent of underreporting by the Spanish self-employed over the period 2006–2009. Our data come from the Spanish Household Budget Surveys. The robustness of our results has been checked using alternative specifications, testing for non-linearities in the relationship between income and food expenditure, and dealing with potential problems of endogeneity. Different measures of the key variables have been examined as well.

In this context, we can summarize the main contributions of the paper as follows. Firstly, we replicate the well-known approach of estimating food demand functions for making explicit a measure of concealment of income in a sample that has never been exploited in this regard, with the particular characteristics of the Spanish case in terms of data, the period considered and others. Secondly, the interpretation of the results considers the possibility that the salary workers also conceal part of their incomes; in fact, this can be seen not only as a realistic assumption but also as a reasonable reading of our findings.

After the Introduction, we set up the theoretical framework used to measure the extent of underreporting of income. Section 3 explains the main features of data and the criteria followed to build the sample. Section 4 gives details of estimation procedures and shows the results. Finally, Sect. 5 concludes.

2 The model

This section aims to build an analytical framework to estimate the degree of underreporting of income by households with self-employed workers as household heads. The approach used here is based on the following main assumptions: (i) Food expenditures are correctly reported by households in budget surveys; but (ii) this not the case of income. Previous studies have qualified this second assumption setting that salary workers are completely honest by reporting their income while self-employed workers hide part of their earnings. However, we really think that a most adjusted picture to the real world involves salary workers (at least some of them) that also conceal partially their income, although in a lower degree than self-employed workers. Consequently, a natural test for measuring the relative extent of such a underreporting by self-employed workers consists of comparing food demand functions—which depend on income—of both groups.

Our starting point is the model by Pissarides and Weber (1989), which we shall hold almost in its totality but introducing the chance of underreporting by salary workers. This innovation not only allows to keep manageable the empirical estimation but also to broaden the interpretation of the results. Particularly, our measure of underreporting by self-employed will be a relative measure which takes as reference a given level (and strictly positive) of hidden income by salary workers.

Let \(Y_{i}\) be the true income of household \(i\). We shall distinguish two types of households, denoted by \(SW\) and \(SE\), which refer to salary worker and self-employed worker households, respectively. As usual in the definition of consumption functions, a relation between the observable income \(Y_{i}\) and the permanent income \(Y_{i}^{p}\) has to be set up:

$$\begin{aligned} Y_{i}=p_{i}Y_{i}^{p}, \end{aligned}$$
(1)

where \(p_{i}\) is a random variable to take into consideration the deviations of observable income from its permanent, long-run value. It is assumed that the mean of \(p_{i}\) is the same for all the households in the economy but the variance of \(p_{i}\) to be higher for self-employed households than for salary workers. This can be seen as a reasonable assumption as long as self-employed workers face more risks and, consequently, a more volatile income is to be expected in their case.

Let \(Y_{i}^{\prime }\) be the disposable income reported by households in budget expenditure surveys. As said before, previous papers have assumed that salary workers report correctly all their income. In our framework, by contrast, and using a slight modification of the Pissarides and Weber’s model, we will allow the phenomenon of underreporting of income also for salary workers. True income \(Y_{i}\) and reported income \(Y_{i}^{\prime }\) are related as follows:

$$\begin{aligned} Y_{i}=k_{i}Y_{i}^{\prime },\quad \text{ with} k_{i}>1. \end{aligned}$$
(2)

\(k_{i}\) is a random variable that indicates to what extent household \(i\) hides part of her true income \(Y_{i}\). In other words, \(k_{i}\) is the number by which the reported income \(Y_{i}^{\prime }\) must be multiplied so as to get the true income \(Y_{i}\). Both types of workers hide part of their income but in a different proportion: \(k_{SE}>k_{SW}\), that is, self-employed households underreport more disposable income than salary households.

Combining Eqs. (1) and (2), and after logarithmical transformation, the log of permanent income is:

$$\begin{aligned} \ln Y_{i}^{p}=\ln Y_{i}^{^{\prime }}-\ln p_{i}+\ln k_{i}, \end{aligned}$$
(3)

which becomes one of the key variables by estimating the following food expenditure function:

$$\begin{aligned} \ln F_{i}={\varvec{\alpha }}\mathbf{X}^{\prime }+\beta \ln Y_{i}^{p}+\varepsilon _{i}, \end{aligned}$$
(4)

where \(F_{i}\) is the food expenditure of household \(i\), \({\varvec{\alpha }}\) is a vector of parameters common to salary and self-employed worker households, \(\mathbf X \) is a vector of household characteristics, \(\beta \) is a scalar that can be interpreted as the marginal propensity to consume food, and \(\varepsilon _{i}\) is a white noise. In a sense, what expression (4) represents is a log-linear Engel curve for food consumption.

At this point, the main caveat by estimating the above Engel curve is that we have no data on \(p_{i}\) and \(k_{i}\) (in fact, the latter is the measure of underreporting that we are looking for). Thus, we need to make some assumptions on their distribution over the sample. As is usual in literature, we set up:

$$\begin{aligned} \ln p_{i}=\mu _{i}^{p}+u_{i} \end{aligned}$$
(5)
$$\begin{aligned} \ln k_{i}=\mu _{i}^{k}+v_{i}, \end{aligned}$$
(6)

that is, both variables are log-normal distributed, with particular values of \(\mu ^{p}\) and \(\mu ^{k}\) for salary and self-employed workers. Disturbances \(u_{i}\) and \(v_{i}\) are assumed to have zero means and constant (but differentiated among both types of workers) variances \(\sigma _{u_{i}}^{2}\) and \(\sigma _{v_{i}}^{2}\).

Substituting (5) and (6) into (3), and in turn into (4), we get:

$$\begin{aligned} \ln F_{i}={\varvec{\alpha }}\mathbf{X}^{\prime }+\beta \ln Y_{i}^{^{\prime }}-\beta (\mu _{i}^{p}-\mu _{i}^{k})-\beta (u_{i}-v_{i})+\varepsilon _{i}. \end{aligned}$$
(7)

When this Engel curve is adjusted with a dummy variable to reflect the appropriate distribution of \(k\) and \(p\) across groups \(i=SE,SW\), one finds

$$\begin{aligned} \ln F_{i}&= {\varvec{\alpha }}\mathbf{X}^{\prime }+\beta \ln Y_{i}^{\prime }-\beta (\mu _{SW}^{p}+DSE_{i}(\mu _{SE}^{p}-\mu _{SW}^{p})\\&-(\mu _{SW}^{k}+DSE_{i}(\mu _{SE}^{k}-\mu _{SW}^{k})))+\eta _{i}, \end{aligned}$$

where \(DSE_{i}\) is dummy variable that takes the value 1 if the household head of family \(i\) is self-employed worker and 0 if salary worker, and \(\eta _{i}\) is the error of regression that, by construction, includes not only unexplained variations in household food expenditures but also deviations of their actual income from its permanent income and of their reported income from their true income. The estimation of this equation requires further algebra manipulation using the properties of log-normal distributions. Particularly,

$$\begin{aligned} \ln \overset{-}{p_{i}}=\mu _{i}^{p}+\frac{1}{2}\sigma _{u_{i}}^{2} \end{aligned}$$
(8)
$$\begin{aligned} \ln \overset{-}{k_{i}}=\mu _{i}^{k}+\frac{1}{2}\sigma _{v_{i}}^{2}, \end{aligned}$$
(9)

where a bar over a variable denotes its mean. Assuming that the mean of \( p_{i}\) is the same for salary and self-employed workers (\(\ln \overset{-}{p} _{SE}= \ln \overset{-}{p}_{SW}\)), after substituting for \(\mu _{SE}^{p}-\mu _{SW}^{p}\), the above Engel curve can be written as

$$\begin{aligned} \ln F_{i}={\varvec{\alpha }}\mathbf{X}_i^{\prime }+\beta \ln Y_{i}^{\prime }+\gamma DSE_{i}+\eta _{i}, \end{aligned}$$
(10)

where \(\gamma =\beta \left[ \theta -\frac{1}{2}\left( \sigma _{v_{_{SE}}}^{2}-\sigma _{v_{_{SW}}}^{2}\right) +\frac{1}{2}\left( \sigma _{u_{SE}}^{2}-\sigma _{u_{SW}}^{2}\right) \right] \) and \(\theta =\ln \overset{-}{k}_{SE}-\ln \overset{-}{k}_{SW}\). As can be seen from the expression which relates \(\gamma , \beta \) and \(\theta \), the extent of underreporting of income estimated is an interval whose limits depend upon the extreme values for variances of \(u\) and \(v\) in each type of household. The usual approach to get estimates of such as variances involves the computation of residual variances in the following regression for income:

$$\begin{aligned} \ln Y_{i}^{\prime }={\varvec{\Omega }} \mathbf{X}_{i}^{\prime }+{\varvec{\Gamma }} \mathbf{Z}_{i}^{\prime }+\xi _{i}, \end{aligned}$$
(11)

where \(\mathbf Z _{i}\) is a vector of variables used as instruments in IV-2SLS estimates of expression (10), given the potential endogeneity of \(Y_{i}^{\prime }\). Again, the error term \(\xi _{i}\) has three components: unexplained variations in household permanent income, deviations of their actual income from its permanent income and deviations of their reported income from their true income. If the first component is assumed to be the same in both the salary and self-employed workers—which seems to be a reasonable assumption given that the risks of omitting variables related to the distinction between self-employed versus salary workers are null when a dummy is included or a separate estimation by type of household is considered—, we can write

$$\begin{aligned} \sigma _{\xi _{SE}}^{2}-\sigma _{\xi _{SW}}^{2}=\sigma _{u_{_{SE}}}^{2}+\sigma _{v_{SE}}^{2}-2cov(uv)_{SE}-\sigma _{u_{SW}}^{2}-\sigma _{v_{SW}}^{2}+2cov(uv)_{SW}. \qquad \end{aligned}$$
(12)

On the other hand, given the value of \(\gamma \) above, the relative underreporting of income by self-employed households is given then by

$$\begin{aligned} \theta =\frac{\gamma }{\beta }+\frac{1}{2}(\sigma _{v_{_{SE}}}^{2}-\sigma _{u_{SE}}^{2}+\sigma _{u_{SW}}^{2}-\sigma _{v_{SW}}^{2}). \end{aligned}$$
(13)

Note that (13) is quite similar to the expression (18) of Pissarides and Weber (1989), where the level of underreporting of income by salary workers is fixed at zero, and consequently the term \(\sigma _{v_{SW}}^{2}\) does not appear. If we set up that the covariance between \(u\) and \(v\) are null for both types of households, lower and upper bounds for the relative underreporting of income by self-employed households are obtained.Footnote 2 Taken the variances for salary workers as parameters, we see that the minimum value for \(\theta \) is obtained when \( \sigma _{vSE}^{2}\) reaches its lowest value, that is, when it is equal to \( \sigma _{v_{SW}}^{2}\). Under such a case,

$$\begin{aligned} \theta =\frac{\gamma }{\beta }-\frac{1}{2}(\sigma _{\xi _{SE}}^{2}-\sigma _{\xi _{SW}}^{2}), \end{aligned}$$
(14)

where (12) has been used. By contrast, it is easy to see that (13) reaches its maximum value when \(\sigma _{\mathrm{u}_{SE}}^{2}\) is at its minimum feasible value, which in our model is like in Pissarides and Weber (1989): \(\sigma _{\mathrm{u}_{SE}}^{2}=\sigma _{\mathrm{u}_{SW}}^{2}\).Footnote 3 This gives an upper bound for the extent of underreporting of income by self-employed households:

$$\begin{aligned} \theta =\frac{\gamma }{\beta }+\frac{1}{2}(\sigma _{\xi _{SE}}^{2}-\sigma _{\xi _{SW}}^{2}) \end{aligned}$$
(15)

Given the fact that salary workers also partially hide their income, the interpretation of these lower, central and upper values of the degree of underreporting differs from those of Pissarides and Weber (1989)’s approach as long as we have explicitly taken into consideration the likely concealment of income by employees. In essence, our approach closely follows that of Pissarides and Weber (1989): we estimate their standard food demand functions but our interpretation of the results is consistent with the fact that the underreporting of income by self-employed households is in relation to a given degree of underreporting of income by salary workers. Section 4 shall offer empirical evidence reinforcing this point as long as the extent of concealment by self-employed will dramatically hinge upon the particular group of salary worker households used in the comparison.

3 The data

The data used are drawn from the Spanish Household Budget Surveys (EPF in Spanish) from 2006 to 2009 elaborated by the Spanish National Institute of Statistics (INE in Spanish). The sample size is approximately 24,000 households per year, with half of the sample renewed each year.Footnote 4 The food consumption expenditures registered in the EPF refer to both the monetary flow on the payment of certain goods and the value of the consumption made by the households in terms of self-consumption and self-supply as well. In this paper, we work with the sum of both of them not only because the econometric estimates become worse if self-consumption and self-supply are not taken into account but also due to the differences between salary and self-employed workers in these items.Footnote 5

In a number of cases (about 25 % of households), the INE makes imputations in food expenditures to correct missing values, errors, absence of answer, etc. Our estimates distinguish these two different situations. Anyway, the differences between self-employed workers and salary workers in the percentage of imputation over the total food expenditures are practically null.Footnote 6 We also show results with and without meals away home included in the household food expenditures.

There are two variables of interest regarding the household income in the EPF. The first is the net (after taxes) income of household as a whole and the second is the net income of the household head. Both of them are measured in nominal terms. Since both food expenditures and household incomes as nominal variables could be subject to the effect of price changes, we have deflated the former using the food CPI and the latter using the GDP deflator. Estimates only change insignificantly, thus we have decided to report here only the regressions with nominal data.

In a high number of cases (around 70 % for salary workers and almost 80 % of self-employed workers), the INE makes imputations of the monthly net total income received by the households. This is because a huge number of households do not inform about how much they earn. All these observations based on imputed values have been removed in our sample. This is not the case of net income of the household head, where all the data available here come from the answers of participants.

Salary worker household is defined as that in which the household head is self-reported as salary worker and the corresponding for self-employed worker household. Other criteria have been considered in this key distinction (such as the main source of income for the households) in order to avoid a number of inconsistencies.Footnote 7 As is usual in this type of papers, families in which the household head works in agriculture, cattle farming or fishing have been removed from the sample; this way, we aim at avoiding that the relationship between food consumption and income to be affected by the particular consumption pattern of these households.

It is not straightforward to set up a clear correspondence between our data and those from other statistical sources. Unfortunately, the Spanish National Accounts do not distinguish between self-employed and salary workers in terms of the primary generation and allocation of income. However, we can see how the composition of our sample is closely similar to that of Economically Active Population Survey (EAPS; EPA in Spanish). While the shares of self-employed and salary workers over total of non-agricultural jobs are respectively about 15 and 84 % in the EAPS, the corresponding weights in our sample are 17 and 82 %.

Table 1 shows the main intuition behind this paper. Households whose head is a self-employed worker declare to spend in food practically the same than the households headed by a salary worker. But households with a self-employed head systematically always report less income in the EPF than the corresponding salary worker households. These differences are statistical significant (column (3)). In line with previous research, this is a clear indication that self-employed households underreport part of their income.Footnote 8 On the other hand, standard deviations of income (whatever the definition used) is always higher in the case of self-employed households than in the case of salary worker households, reflecting a positive sign for the difference \(\sigma _{\xi _{SE}}^{2}-\sigma _{\xi _{SW}}^{2}\); this is compatible with a more volatile pattern for self-employed income, as we set up in the theoretical framework.Footnote 9

Table 1 Differences between SE and SW households in food expenditure and income

As we are interested in isolating the effect of the self-employed condition on the extent of underreporting, we need to control for the factors which are involved in determining the food demand function of both groups. Table 2 gives information about some economic and demographic variables with some expected impact on household food expenditures. On this basis we can characterize the average self-employed household in relation to the salary worker family.

Table 2 Differences in economic and demographic variables between SE and SW households

Although the self-employed households consist of less members, dependent children and labour active members than the salary worker households, the former have a slightly higher number of income recipients than the latter. Self-employed households also are headed by an older person than the corresponding salary worker family, whose nationality is mainly Spanish and male sex (with very small differences with respect to the salary worker households). Human capital accumulation is bigger in the case of employee households.Footnote 10

Regarding housing characteristics, the average self-employed household lives more in towns below 10,000 inhabitants, has a less recourse to mortgages, and owns slightly more houses (other than the main one) when comparing to the average salary worker household. If other types of expenditures are analysed, the self-employed households spend less money in alcoholic drinks, meals out of home, cars and durables goods for housing or leisure than the salary worker households. Finally, the interpretation of variable “compliance” says that the higher its value, the less the implication of the household in providing the information required in the survey; in this sense, self-employed households are less collaborative than employee households.

4 Estimations and results

The model of Sect. 2 suggests an equation which allows us to obtain an estimate of underreporting of income by self-employed worker households in relation to salary workers households. In essence, expression (10) states that the food consumption of both types of households depends on reported income, on a dummy distinguishing whether the household head of the family is a self-employed worker or not, and a number of variables controlling for different socio-economic and demographic characteristics.

On the basis of Eq. (10), we have run a number of regressions under several specifications and methods. Particularly, we report 2SLS estimates in which both the reported household income and the dummy variable for SE have been instrumented. It is clear, by assumption, that the first one is measured with error given the existence of transitory variations around its permanent value and the own concealment phenomenon. The dummy for self-employed workers is also treated as endogenous as long as we have evidence of potential misclassification of self-employed households as families with a salary worker head;Footnote 11 otherwise, this would likely lead to a downward bias in the estimated coefficient for self-employed dummy (Schuetze 2002). Not surprisingly, the Hausman specification test supports the idea that OLS estimates are inconsistent and the IV approach is required. Selection of instruments has been done using the Sargan test; a complete list of those that have been used can be seen in the Appendix A. Moreover, the lack of correlations between the disturbances and the unobserved individual effects, as also Hausman-type tests show for all the specifications—not reported here—, leads to a random effects model.

We have used different definitions for the dependent variable: the log of total food expenditures (food purchases plus meals away home) per household or the log of food expenditures (only food purchases) per household. Similarly, two measures of income have been considered: the log of net total income (called in tables total income) or the log of net income earned by the household head (called HH income).

In all specifications the control variables are the age of household head and its square, the number of members in the households (or the number of dependant children), a dummy for marital status (1 if married, 0 otherwise), a time dummy for 2009, and a constant. Other specifications were estimated but those reported here are the best ones in terms of econometric guarantees and economic sense. Particularly, regional dummies, the log of expenditures in clothes, cars, health, and other household spending items, dummies controlling for the size of the city, and time dummies for others years were included but they were not statistically significant. Additionally, among the rejected variables as potential instruments, we have dummies for housing ownership (if financed with a mortgage, if rented), the number of labour active members, a dummy for sex of household head, the log of durable goods for housing expenditures and the log of durable goods for leisure expenditures.

Table 3 reports IV estimates of expression (10) and shows that the degree of underreporting by the Spanish self-employed workers ranges between 20 and 30 %. Table 3 also displays the lower and upper values of degree of underreporting, according to the expressions (14) and (15), respectively, and the variances of Table 9 in the Appendix B.Footnote 12 Recall that on the basis of standard assumptions, the values for \(\sigma _{\xi _{SE}}^{2}\) and \(\sigma _{\xi _{SW}}^{2}\) can be obtained as the residual variances of (11) when a separate estimation for each type of household is done.

Table 3 IV estimates of Engel curve for food

When the dependent variable is the log of food purchases, the concealment of earnings is higher than in the case of taking into consideration the meals away home as well; it makes sense to find less underreporting when explicit expenditures in bars and restaurants is regarded, specially as receipts and invoices may be involved. In the two first columns, we also see higher levels of underreporting when only the income earned by the self-employment head of household is considered. By contrast, the concealment of earnings by the households with self-employed head is lower if we take all the familiar income regardless its sources. In other words, the higher the share of self-employment income over the total family income, the higher the level of underreporting.

These results are line with previous papers, although in the low range. Engstrom and Holmlund (2009), by focussing the difference between underreporting in self-employed households and underreporting of self-employed income in self-employed households, see how their estimates of such a measure goes from 30 % to around 35 % in Sweden. Kleven et al. (2011), using experimental methods, find that evasion rate for total positive self-employment income is 17.7 % in Denmark while the corresponding value for third-party reported income (among other things, salary worker incomes) is below 1 %. Hurst et al. (2011), with a very close methodology to this one, find that the self-employed workers underreport their income by between 25 and 35 % in the US.

Regarding the impact of control variables on the dependent variable, the results exhibit reasonable patterns and again similar to previous studies. Food expenditures are positively affected by the age of household head but negatively by its square; dummies for marital status of household head and for year 2009, when the economic crisis was specially hard, have also a negative impact on food expenditure. By contrast, the number of members or dependant children in the household have a positive effect on food spending.

One of the contributions of this paper lies in the broader interpretation of the estimates of underreporting by taking into consideration the chance of hiding income by salary workers too. To what extent changes in the set of salary workers used in the estimations affect the degree of underreporting by self-employed workers? How sensitive is this measure with respect to different subsets of salary workers, which in a sense can be seen as control group? Table 4 gives a suggestive answer by reporting meaningful differences in the degree of underreporting of income by self-employed households depending on the group of salary workers with which they are compared to. When families with salary workers as head receive no pensions, the degree of concealment found for the self-employed workers is about 45 % more than in the standard approach (1.809 vs 1.253); by contrast, if the estimates are computed regarding households with salary workers and earnings from pensions as well, the level of underreporting by self-employed decreases by 11 % in comparison with the standard estimate (1.111 vs 1.253). Similarly, the extent of hidden earnings by self-employed notably goes up when salary households with no income from unemployment benefits are regarded.

Table 4 IV estimates of Engel curve for food and different sets of SW

Using individuals’ own assessment of their employment status could be problematic since there could be workers whose status may not be clear. Although the potential misclassification of self-employed households as families with a salary worker head has been dealt with the above IV approach, we include now an additional check. Column (1) in Table 5 shows estimates of food expenditure equation but requiring that those households stating to be self-employed also declare that the income coming from self-employment is the main one. The degree of underreporting as well as the coefficients of control variables are practically the same than in the canonical specification.

Table 5 IV estimates of Engel curve for food with different definitions of SE

These results underline an issue that seems to be relevant for the Spanish case, namely, that the measures of degree of underreporting by self-employed workers are in relation to a given level of underreporting by salary workers, which is difficult to be assumed that is null. On the basis that the employees also hide part of their income, our estimates of concealment by self-employed workers are relative to these hidden earnings by salary workers. Furthermore, these estimates are very sensitive to whether the households are entitled to receive social benefits or not.

Substantial differences do arise when the group of self-employed workers is filtered to take into account some particular, relevant features. Column (2) in Table 5 reports the level of concealment of income when the self-employed has hired at least one worker as employee. The central value of \(\theta \) is 1.428, notably higher than for the whole sample of self-employed; remaining regressors hardly differ from the previous ones. This fact is specially intense if the self-employed workers with employees are technicians or professionals such as lawyers, doctors, architects and so on (column 3 in Table 5); in this case their reported earnings should be multiplied by 2.4 to obtain the true income, though this result should be interpreted with caution due to the weak statistical significance of the coefficient for the self-employed dummy. By contrast, households with skilled self-employed head (and no distinction is now made whether they have employees or not) are found to underreport less than in the standard case (1.147 vs 1.253).

Additional robustness analyses have been carried out in order to verify whether the specification chosen is the most appropriate. Firstly, we have checked the assumption of equal propensity to consume for self-employed and salary workers. Columns (1) and (2) of Table 6 report the estimates by regressing the log of food expenditures on both the income of self-employed and of salary worker households, in terms of household income and household head income as well. It is straightforward to see that the differences between the two relevant coefficients are negligible, keeping the remaining coefficients practically unchanged. This result is in line with previous findings by other authors, for instance Pissarides and Weber (1989).

Regarding the potential existence of non-linear relationships between food consumption and income (Lyssiotou et al. 2004; Tedds 2010; both of them using non-parametric techniques), we have run regressions where the log of income is assumed to have both first and second order effects on consumption, in a log quadratic version of the Engel curve (4). In the columns (3) and (4) of Table 6, it is clearly seen that the statistical insignificance of the quadratic coefficients reject the presence of non-linear relationships, in line with other papers (Pissarides and Weber 1989; Hurst et al. 2011).

Table 6 IV estimates of Engel curve for food with changes in specification

An interesting extension to make the estimates more reliable consists of applying quantile regression to see whether the underreporting is sensitive to different quantiles. Indeed, it may be the case that the extent of concealment to be dependant on the conditional distribution of food expenditures. Table 7 shows how the value of \(\theta \) changes with respect to the 0.25th, 0.50th and 0.75th quantile. While the degree of underreporting substantially increases in the comparison between the 0.25th and the 0.50th quantiles, there is an stabilization of the estimate for the last quantile. In other words, it is found an increasing relationship between the level of underreporting and the food expenditures: the higher the households expenditures in food, the higher their underreporting of income. But this is only true for the two first quantiles. Beyond the median household, this relationship disappears, following the standard pattern of decreasing marginal propensity to consume food.Footnote 13

Table 7 Quantile regression of Engel curve for food

Finally, it is reasonable to think that business cycle may impact on tax evasion and more generally on the extent of informal economy. Although this is not the central point of this paper, Table 8 gives some insights on this potential link for several, different specifications. Contrary to what previous intuition may conjecture, the degree of underreporting in our sample is higher in the years of expansion (2006–2007) than when the economy shrinks (2008–2009).

Table 8 IV estimates of Engel curve for food by subperiods

5 Concluding remarks

At first sight, one could say that the extent of underreporting of income by the Spanish self-employed workers would be above the estimates found for USA, Sweden or UK. This view would be supported firstly by the fact that tax morale in Spain is not so strong as in other OECD countries (Alm and Torgler 2006). And secondly, as Mediterranean country, the Spanish self-employment rate is higher than in north European countries (Torrini 2005), and this makes more difficult and costlier the control of such income source by tax authorities.

This paper shows evidence on the extent of underreporting by self-employed in a sample that has never been used with this purpose. Our estimates range this magnitude by around 25 % of the reported income recognized by the households headed by self-employed workers. These figures are very close to those corresponding to other countries such as Sweden or USA. Our result has been obtained using data drawn from the Spanish Household Budget Surveys over the period 2006–2009 and after running a number of regressions to control for changes in specification, non-linearities and endogeneity.

Having said that, we have proposed here a broader interpretation of the standard Pissarides and Weber (1989) model. Instead of assuming that salary workers honestly report all their incomes, we have also admitted the chance of hiding earnings by employees. In this context, our measure of income underreported by self-employed workers must be interpreted as a relative extent of such concealment, taking as reference a given level of underreporting of income by the salary workers.

In other words, our estimate of 25 % of underreporting by self-employed households is in relation to the income of self-employed worker that equals the degree of underreporting of income by salary workers, which is strictly positive in our approach. Indeed, we see how the extent of concealment is greatly sensitive to the type of salary worker household taken into consideration in the sample. Recall that the range of underreporting goes from 1.111 when households with employee head also receive pension incomes to 1.809 in the opposite case: no pension income is obtained.

Consequently, our estimates must be seen as lower bounds in the absolute extent of underreporting of income, beyond the standard maximum and minimum thresholds derived from the canonical approach. In this context, the extent of black economy stemmed from the underreporting of self-employed is around 2.5 % of GDP.Footnote 14 Previous estimates of the informal economy in Spain are substantially higher than those reported here. Arrazola et al. (2011) finds that the size of Spanish underground economy in the 2000s is around 20 % of GDP, using both the currency demand and electricity models. Schneider (2012) also reaches practically the same figure over the period 2003–2010. Anyway, the comparison between both of them and this paper only aims for inserting our estimates within the general framework of previous estimates; indeed, the methodologies are quite different and the focus is completely distinct (the whole economy versus the underreporting by self-employed with respect to employees).

A line for further research could be motivated by the consequences of this concealment of income on tax revenues and progressivity. While the effect of progressivity on tax evasion has been examined by some authors, the inverse effect (the impact of the concealment of income by self-employed workers on progressivity) has hardly studied. Although there are some theoretical papers dealing with this issue (see, for instance, the recent paper by Freire-Seren and Panades 2008), the scope for empirical papers is wide. Precisely on the basis of this new research avenue, it is clear that basic principles of vertical and horizontal equity are damaged in the presence of underreporting.

Additionally, as the salary workers have to pay more taxes compared to self-employed workers, other things equal, an inefficient incentive to allocate more resources (than socially optimal) in the self-employment activities arises. As result of this, individuals see how their employment choice between paid employment and self-employment is distorted in favour of the latter.