We present the evolution of the stellar mass function ( SMF ) of galaxies from z = 4.0 to z = 1.3 measured from a sample constructed from the deep NIR MUSYC , the FIRES , and the GOODS-CDFS surveys , all having very high-quality optical to mid-infrared data . This sample , unique in that it combines data from surveys with a large range of depths and areas in a self-consistent way , allowed us to 1 ) minimize the uncertainty due to cosmic variance and empirically quantify its contribution to the total error budget ; 2 ) simultaneously probe the high-mass end and the low-mass end ( down to \sim 0.05 times the characteristic stellar mass ) of the SMF with good statistics ; and 3 ) empirically derive the redshift-dependent completeness limits in stellar mass . We provide , for the first time , a comprehensive analysis of random and systematic uncertainties affecting the derived SMFs , including the effect of metallicity , extinction law , stellar population synthesis model , and initial mass function . We find that the mass density evolves by a factor of \sim 17 ^ { +7 } _ { -10 } since z = 4.0 , mostly driven by a change in the normalization \Phi ^ { \star } . If only random errors are taken into account , we find evidence for mass-dependent evolution , with the low-mass end evolving more rapidly than the high-mass end . However , we show that this result is no longer robust when systematic uncertainties due to the SED-modeling assumptions are taken into account . Another significant uncertainty is the contribution to the overall stellar mass density of galaxies below our mass limit ; future studies with WFC3 will provide better constraints on the SMF at masses below 10 ^ { 10 } M _ { \sun } at z > 2 . Taking our results at face value , we find that they are in conflict with semi-analytic models of galaxy formation . The models predict SMFs that are in general too steep , with too many low-mass galaxies and too few high-mass galaxies . The discrepancy at the high-mass end is susceptible to uncertainties in the models and the data , but the discrepancy at the low-mass end may be more difficult to explain .