An individual-tree linear mixed-effects model for predicting the basal area increment of major forest species in Southern Europe

Aim of study: Assessment of growth is essential to support sustainability of forest management and forest policies. The objective of the study was to develop a species-specific model to predict the annual increment of tree basal area through variables recorded by forest surveys, to assess forest growth directly or in the context of more complex forest growth and yield simulation models. Area of study: Italy. Material and methods: Data on 34638 trees of 31 different forest species collected in 5162 plots of the Italian National Forest Inventory were used; the data were recorded between 2004 and 2006. To account for the hierarchical structure of the data due to trees nested within plots, a two-level mixed-effects modelling approach was used. Main results: The final result is an individual-tree linear mixed-effects model with species as dummy variables. Tree size is the main predictor, but the model also integrates geographical and topographic predictors and includes competition. The model fitting is good (Mc Fadden’s Pseudo-R 2 0.536), and the variance of the random effect at the plot level is significant (intra-class correlation coefficient 0.512). Compared to the ordinary least squares regression, the mixed-effects model allowed reducing the mean absolute error of estimates in the plots by 64.5% in average. Research highlights: A single tree-level model for predicting the basal area increment of different species was developed using forest inventory data. The data used for the modelling cover 31 species and a great variety of growing conditions, and the model seems suitable to be applied in the wider context of Southern Europe.


Introduction
Assessment of growth is essential in forestry. Together with disturbances influence, tree growth is primary evidence of forest dynamics (Pretzsch, 2009). Knowledge of growth supports sustainability, prevents wood shortages, allows conscious decisions in forest planning and hence it is crucial to support forest policies. Tree growth models are the most important components in forest planning systems, together with models for stand establishment, harvest regimes and natural mortality (Andreassen & Tomter, 2003). Countries that have signed the United Nation Framework Convention on Climate Change (UNFCCC) in 1992 are required to assess forest growth also for reporting on carbon sequestration, under the Land Use, Land Use Changes and Forestry (LULUCF) activities. In forest land remaining forest land, most of the living biomass carbon stock increase is due to biomass growth, which is usually estimated converting the net volume increment using biomass conversion and expansion factors (IPCC, 2003;Di Cosmo et al., 2016). The increment and growth of forests is used in monitoring programmes to survey the response of forests to changing environmental conditions (Dobbertin, 2005;Solberg et al., 2009;Gschwantner et al., 2016). The analysis of growth patterns offers the opportunity to quantify the influence of environmental factors on trees and stands, since they reflect the conditions that trees or stands have been exposed to (Yue et al., 2014).
Despite its importance, increment data are expensive to measure because these are almost exclusively obtained through field surveys, since the recent progresses in forest data retrieval through remote sensing techniques, included the use of airborne laser scanning, have not changed the prospect of increment estimation from the remote sensing instruments (Gasparini et al., 2017).
Volume increment is one of the main attributes that has been estimated by National Forest Inventories (NFIs) since their initiation in the nineteenth century , as the primary variables assessed were those related to wood and timber stock and production, i.e. variables of economic relevance. NFIs are one of the three main sources of data for growth studies together with dendrochronological studies and yield plot networks (Spiecker, 1999). In Europe, sample-based NFIs originated in the twenties of the last century and since then, most of the European countries have established their NFI to provide accurate information for forest management and decision-making Bosela et al., 2016). In forest inventories, growth measurements derive from consecutive measurements in permanent plots using calliper or tape, or from more precise measurements on increment borings; one advantage of using forest inventory data is that the whole range of site conditions, stand structure and treatment history is sampled (Wykoff, 1990).
Italy is currently carrying out its third NFI (INFC2015). The first NFI (IFNI85) was carried out in 1985; the second NFI (INFC2005) was carried out between 2002 and 2006 . The Italian NFI provides estimates of the periodic annual increment of volume as the increase in volume of the trees measured at the time of the survey. Estimates are computed at country level and for the administrative regions at the highest resolution. As a first contribution to fulfil the gap of information that may occur at more local level (e.g. in the provinces, the associations of municipalities, the consortia of forest owners) a stand-level model for predicting the periodic annual volume increment per hectare has been made available using the second Italian NFI data by Gasparini et al., 2017. The aim of this work was to develop a species-specific, tree-level, mixed-effect prediction model for the annual increment in basal area using the data collected in the Italian NFI. Compared to stand-level models, individual-tree models are more flexible and allow a higher resolution (Burkhart & Tomé, 2012). Basal area is a variable directly computed from the diameter at breast height (DBH), an attribute easy to measure with accuracy, and it is highly correlated with other tree characteristics. Diameter or basal area increment models are key sub-models in individual tree-based forest growth and yield simulation models (Pokharel & Dech, 2012;Vospernik, 2017). The inventory design is a three-phase sampling for stratification (Gasparini et al., 2010). In the first phase, the land cover/land use of approximately 301000 sample units is classified by photo-interpretation of aerial images. The second phase is mainly used to classify the forest type (21 forest categories and 77 forest sub-categories) and to collect data on qualitative attributes (site features, vegetation structure, management regime and others). The third phase is used to record quantitative data on trees, shrubs and deadwood. The second phase is carried out in the field, on a subsets of sampling units defined by land cover/land use and administrative region; also the third phase is carried out in the filed, on a subset of sample units further stratified by forest type. In INFC2005, the second phase sample consisted of approximately 30000 units, while the third phase sample of about 7000 units.

Data description
Relevant to this article, forest category, aspect (with a compass) and slope (with a clinometer) were recorded in the second phase (years 2004-2005) within a 25 m radius circular plot centred on the NFI sampling point. All trees with DBH ≥ 4.5 cm and ≥ 9.5 cm were callipered in two concentric circular plots with radius 4 m and 13 m, respectively, during the third phase (year 2006). Some of those were sampled for additional measurements, i.e. total tree height, crown base height (distance from ground surface to the lowest living branch) and tree coring. Sample trees were chosen following a protocol based on both random and representative criteria but discarding trees with visible faults. The five trees closer to the plot centre, the three larger ones in DBH and two of the less represented species (in mixed forests) or DBH class (in monospecific 3 A model for predicting the basal area increment of major forest species in Southern Europe woods) were finally selected. As the trees in the last two categories may also be among the closer to the plot centre, the total number of sample trees ranged between five and ten per plot. One core per sample tree was taken at 1.30 m above ground level with a Pressler increment borer. The core was taken along the plot centre direction, to avoid systematic annual rings eccentricity due to site features, such as slope, prevailing winds and other. The width of the outermost 5 rings (excluding the current year ring) was measured as the five-year, inside bark, DBH increment (DI5). The DBH corresponding to five years before the survey year (DBH0) was calculated by subtracting DI5 from the measured DBH (DBHt) and the five-year periodic basal area increment (BAI) was then computed for each sample tree. Bark growth was considered null during the five-year interval; in European NFIs, the increment of bark is often omitted due to its difficult assessment (Gschwantner at al., 2016). The model developed in this study is based on the information recorded in 5162 NFI plots with complete data for all the variables tested as explanatory variables, and where no cuttings had occurred during the year before the survey; based on these selection criteria, the dataset resulted in 34638 sample trees.

Statistical analysis
The sample trees dataset was analysed and the existence of outliers checked as recommended by Zuur et al. (2010). We first assessed singular observations graphically for each species through Cleveland dotplots and excluded less than 200 sample trees from the overall dataset that contained more than 35000 observations. We then discarded all the species with less than 200 observations remaining. During the statistical analysis, fourteen additional singular observations were identified and excluded. The remaining data were randomly divided into two subsets, one for model calibration (23091 sample trees) and the other for model evaluation (11547 sample trees). After the division, 4630 plots had either trees used for calibration and trees used for evaluation, 486 plots had only sample trees used for calibration, 46 plots had only trees for model evaluation. The model was constructed for 31 species that altogether constitute the 92% of the growing stock volume of the Italian forests; apart from Sorbus aria Crantz., they are all within the fortieth position in the list of the most important species in terms of tree volume (Gasparini & Tabacchi, 2011).
A set of potential predictors was built with variables related to plot location (geographical coordinates), site features (altitude, slope, aspect), stand attributes (basal area per hectare; dominant height, as the average height of the three largest trees in a plot), tree variables (DBH, total height, crown ratio) and with distance independent competition indices. The following competition indices were tested: basal area of trees larger than the subject tree (BALT); ratio of subject tree species basal area to stand basal area (BASPratio); ratio of subject tree basal area to stand basal area (BASTratio). Summary statistics on plot location and site features are reported in Table 1; summary statistics on trees and stand variables, as well as competition indices, are shown in Table 2. Fig. S1 [suppl.] shows histograms for all the variables analysed. The stand attributes, also used for computing the distance independent indices, were those recorded at the time of measurement so assuming that stand and competition conditions remained unchanged in the 5-year period. For some variables, also transformed values (log, squared or square root) were tested. The predictors were selected using OLS multiple regression analysis (stepwise forward procedures; Variance Inflation Factor (VIF) = 5; F-to-enter/remove = 0.05); the analysis was undertaken using Systat 12.0 (Systat Software Inc., San Jose, CA, USA).
The observed individual trees are nested within plots, and hence do not satisfy the OLS basic assumption of independence of observations. Failing to address this issue may lead to downward biases of the standard errors of the estimates, which in turn can result in wrong inferential conclusions about the model parameters. Therefore, a two-level mixed-effects model with trees i nested within plots j has been estimated using R software (R Core Team, 2014). The number of trees observed per plot varies between 1 and 10. Such a small average group size at the second level is not an obstacle for estimating the plot effect because what matters is to have a sufficient sample size of plots (e.g. Snijders, 2005) and, in the calibration subset, the total number of sampled plots is 5116. The natural logarithm of BAI was selected as the dependent variable and the model formally described as where is the vector containing the values of the dependent variable, X is a matrix containing the tree level and plot level predictors, is the random effect vector at plot level and is the model residual vector at the tree level. According to the model specification expressed by Equation (1), the only coefficient allowed to vary randomly was the intercept. In other words, the average tree basal area increment can vary across the different plots. This modelling framework allows not only having correct inference, but also control over the non-measured unobserved plot-level local conditions. The fixed-effects component is a composite linear model (Wykoff, 1990) with tree, stand and site variables as predictors as shown in the following more explicit form: where a 0 is the intercept, b k , c l , d m , e n f o and g p are the vectors of the regression coefficients for the k-tree variables, l-stand variables, m-competition variables, n-site variables, o-plot location variables and p-dummies dummies (the species), respectively. Interactions among variables were also tested. The models resulting of different sets of predictors were compared according to their goodnessof-fit (adjusted R 2 ) and performance assessment using the evaluation data. The variables that were finally retained for model calibration are shown in Tables 1 and 2. While the OLS methodology has helped selecting the predictors, the linear mixed model framework was required to consider the hierarchical structure of the data. When retransforming the predicted lnBAI ij back from natural log scale, the logarithmic bias correction factor suggested by Snowdon (1991) was used. Among the various alterna-tive corrections proposed in the literature, this approach has proved to be the most efficient for this empirical circumstance. Snowdon's multiplicative correction factor, which represents an estimate of the proportional bias  Table 2. Summary statistics of fitting data. Number of trees (n), five-year periodic basal area increment (BAI, cm 2 ), initial DBH (DBH0, cm), crown ratio (CRATIO); dominant height (HDOM, m); basal area per hectare (BA, m 2 ha -1 ); basal area of trees larger than the subject tree (BALT, m 2 ha -1 ); ratio of subject tree species basal area to stand basal area (BASPratio); ratio of subject tree basal area to stand basal area (BASTratio) Forest Systems December 2020 • Volume 29 • Issue 3 • e019 in logarithmic regressions, consists of the ratio between the sample mean of BAI ij and the mean of the back-transformed values predicted from the model, say ̂ .
Model performance was assessed through cross-validation on the evaluation data computing the following statistics: the mean error (ME), the mean absolute error (MAE), the root mean squared error (RMSE), the mean percent deviation (MPD) and the mean percent standard error (MPSE) (Zeng, 2015). These were calculated on back-transformed and real scale (cm 2 ) BAI values, as follows: where ̂= −̂, are observed values, ̂ are values estimated with the model. The mean absolute error was also computed at plot level. For the 46 plots that had just evaluation sample trees, BAI predictions were obtained using the unconditional (population-level) intercept.
Regression of model residuals against explanatory variables is non-significant for any covariate as exemplified in Fig. S2 [suppl.], demonstrating a general good performance of the model, although residuals slightly increase with latitude. The model developed provides a good level of fit, as indicated by the McFadden's (1979) Pseudo-R 2 that equals to 0.708. In similar studies, values between 0.2 and 0.4 are already considered a good fit (Laubhann et al., 2009;Cienciala et al., 2016). Moreover, the variance of the random effect at plot level is significant and, according to the intra-class correlation coefficient (equal to 0.512), indicates that half of the variability of diameter growth depends on the plot. Inclusion of random effects for other predictors did not improve the model fit and might have led to overfitting (results not shown). Compared to the OLS regression, the mixed-effects model allowed reducing the mean absolute error of estimates in the sample plots by 64.5% in average, as shown in Table 4. Considering the tree validation data, the MAE for the species was reduced by 20.8% in average; average MAE by species decreased from 7.2% to 37.3% (data not shown).
The great majority of the species included in the model developed in this study characterizes a forest (sub-) category adopted by the Italian NFI (INFC, 2007); the results of the cross-validation, by species, are presented in Table 5. Fig. 1 shows the observed values and the residuals plotted against the predicted values, for the evaluation data. Fig. 2 shows the observed values plotted against the predicted ones, for six selected species, three conifers and three broadleaves, for the evaluation data. Fig. 3 shows, for each species, the BAI plotted against DBH0 for a hypothetical tree growing in a location pinpointed by longitude and latitude medians and under average conditions defined by the mean values of the remaining explanatory variables.

Discussion
To our knowledge, this is the first study that developed a single model for predicting the basal area increment for a so relevant number of species considered as dummy variables. The model is realistic in general, since the signs of the parameter estimates are mostly consistent with biological understanding. Unexpected signs observed for some species and certain parameters should be better evaluated under the awareness that individual drivers vary with species and site-specific realization of the driver of concern . Rohner et al. (2018) developed individual tree growth models for nine species and found that only few explanatory variables displayed generalized consistent patterns, while the others reflected properties of individual species. Models like our, i.e. developed  Table 3. Fixed parameters and the corresponding standard error estimate, degrees of freedom, t value and significance level. For the dummies, only the values significant for intercept variation are reported; interactions between the species and the quantitative variables are omitted for reasons of space (they are given in Table S1 [suppl.], which reports also the variance components) Acronyms -DBH0: initial tree breast height diameter; CRATIO: crown ratio; BALT: basal area of trees larger than the subject tree; sqrt: square root. for a variety of species, may still be helpful to depict the variables that play a foremost role in tree growth. Although the model is empiric and was developed for predictive purposes, its fit statistics are remarkable. The model explained 71% of the variance; in a study at the European level, variance explained ranged between 10%

MAE-OLS
(Quercus ilex L. and Quercus suber L.) and 53% (Other conifers group) (Shelhaas et al., 2018). In the results, the model developed in this study endorses the findings of similar studies. The calibration of the full mixed model for the plots with only evaluation data could have further improved the results. However, the number of trees  Table 5. Results of the cross-validation computed after back-transforming the predicted to BAI (cm 2 ). Number of evaluation observations (n) by species; mean error (ME, cm 2 ); mean absolute error (MAE, cm 2 ); root mean squared error (RMSE, cm 2 ); mean percent deviation (MPD); mean average percent standard error (MPSE) in these plots was limited for the number of parameters in the model, making the estimation of random effects for calibration unfeasible. Tree size is known to be the most significant predictor of tree's growth (Bevilacqua, 1999). Accordingly, even considering the measurement scales of the predictors, current DBH resulted to be the most important explanatory variable of our model. DBH, as a size factor, is crucial for future increment; Laubhann et al. (2009) found DBH to be one of the main factors for modelling the growth of four different species although these models differed notably in structure. In this study DBH was the only explanatory variable whose sign remained the same for all the species studied. Crown ratio proved to be directly correlated with increment for the great majority of species. Such a variable is a measure of foliage quantity and it is indicative of tree vigour (Wykoff, 1990); long and healthy crowns allow greater increments (Bueno & Bevilacqua, 2010). Competition was also crucial for predicting tree growth. Experience with tree-level modelling suggests that tree size and competition variables are dominant in explaining the observed tree growth patterns (Laubhann et al., 2009). Among the three competition indices tested and the basal area per hectare, also a density measure, the model retained only the BALT. BALT has been used in several studies (Jõgiste, 2000) and expresses the status of the subject tree with regard to the surrounding trees within the plot, particularly quantifying the competition effects from above (Zhang et al., 2004) being also considered a surrogate of competition for light (Adame et al., 2008). This study has evidenced that BALT is a decisive distance independet competition index, among others, in particular when developing multi-species models based on national forest inventory data. After tree size, slope was the variable that most met our expectations in terms of effects on growth, as it showed negative parameters for all species but one; this result is consistent with those in Rohner et al. (2018) for the nine species in their study. Relationship with latitude and altitude is primarily commented considering that, in general, trees grow faster in warmer climate conditions and slower moving to the north and or higher altitude. However, along with spatial variability in climate conditions, the wide distribution of plot locations also accounts for other factors affecting growth that display great variability at the country scale. Particularly worth of notice are soil properties (Weiskittel et al., 2011) since many species in this study grow either in the Alps or Apennines on soils that differ even in the parental materials.
The model presented covers a considerable number of tree species and has been developed using data statistically representative of the growing conditions all over the country. Thus the data range is remarkably wide. Latitude spans 10 degrees, from 36.9976 to 47.0449 degree; altitude ranges from sea level to timberline in either the Apennines (e.g. for beech) or the Alps (up to 2255 m a.s.l.) and all the species were sampled across their full altitudinal range. Latitude and altitude are two variables that correlate with climate (Schelhaas et al., 2018) but their role in the variability of expected growing condition is further increased under a bioclimatic perspective, considering that, due to its geographical position, Italy extends overs two biogeographic regions, the Mediterranean and the middle-European ones (Pignatti, 1995). The data also represent all possible options for the management and the silvicultural systems, such as high forests vs coppice, even-aged vs uneven-aged stands, etc. One of the major limitations of using the model is a consequence of the range of DBH values in the modelling data subset, especially for the species whose BAI always increases. Predictions for DBHs exceeding the range explored might be severely overestimated. However, within this range, predictions are unbiased as exemplified by Pinus pinaster Ait. and Quercus petraea Liebl. in Fig. 2. In this respect, it is important to stress that the model was developed for estimating the five-year periodic BAI and so it is appropriate for shortterm predictions, even for the species that show a desirable behaviour in Fig. 3. Bueno & Bevilacqua (2010) showed that adding consecutive annual increments, each predicted by the increment model in hand, to estimate growth over longer periods, is not recommendable and that a model is more valuable in estimating well the response variable it was developed for.
The model presented may serve a number of purposes, depending on the needs and the alternatives available. In our opinion, once a model has been made available, its usefulness is never thoroughly predictable. An example is the international reporting on the removals of CO2 for the Land Use, Land-Use Change and Forestry (LULUCF) sector for which the IPCC (2003) adopted three hierarchical tiers of methods to accommodate national circumstances that clearly reach different levels of accuracy. In this respect, the broad range of abiotic (e.g. climate), biotic (e.g. forest type) and management (e.g. silvicultural system) regimes captured might make the model suitable for use also in other countries, at least the Southern European. Besides use in future modelling studies, the most appropriate use of the model developed in this study is probably within the NFI, to estimate the growth of the trees not cored. However, the performance obtained for the plots only used for evaluation showed possible general use, e.g. for assessing growth in sub-regional networks of plots, especially if it is possible to record data for performing model calibration (e.g. Crecente-Campo et al., 2010). In fact, at present, the annual increment estimation is not mandatory in some Italian regions and/or forest types, especially those with limited economic importance (Gasparini et al., 2017) although they are still valuable for carbon accumulation in an ecosystem services perspective.
A model for predicting the basal area increment of major forest species in Southern Europe Figure 3. BAI plotted vs DBH0 for a hypothetical tree growing under average conditions and in a location pinpointed by the median of the latitude and longitude range for the species in the country. Average conditions were those marked by mean value of the explanatory variables.