We compare state-of-the-art semi-analytic models of galaxy formation as well as advanced sub-halo abundance matching models with a large sample of early-type galaxies from SDSS at z < 0.3 . We focus our attention on the dependence of median sizes of central galaxies on host halo mass . The data do not show any difference in the structural properties of early-type galaxies with environment , at fixed stellar mass . All hierarchical models considered in this work instead tend to predict a moderate to strong environmental dependence , with the median size increasing by a factor of \sim 1.5 - 3 when moving from low to high mass host haloes . At face value the discrepancy with the data is highly significant , especially at the cluster scale , for haloes above \log M _ { halo } \gtrsim 14 . The convolution with ( correlated ) observational errors reduces some of the tension . Despite the observational uncertainties , the data tend to disfavour hierarchical models characterized by a relevant contribution of disc instabilities to the formation of spheroids , strong gas dissipation in ( major ) mergers , short dynamical friction timescales , and very short quenching timescales in infalling satellites . We also discuss a variety of additional related issues , such as the slope and scatter in the local size-stellar mass relation , the fraction of gas in local early-type galaxies , and the general predictions on satellite galaxies .