We present 3D ” zoom-in ” simulations of the formation of two molecular clouds out of the galactic interstellar medium . We model the clouds – identified from the SILCC simulations – with a resolution of up to 0.06 pc using adaptive mesh refinement in combination with a chemical network to follow heating , cooling , and the formation of H _ { 2 } and CO including ( self- ) shielding . The two clouds are assembled within a few million years with mass growth rates of up to \sim 10 ^ { -2 } M _ { \sun } yr ^ { -1 } and final masses of \sim 50 000 M _ { \sun } . A spatial resolution of \lesssim 0.1 pc is required for convergence with respect to the mass , velocity dispersion , and chemical abundances of the clouds , although these properties also depend on the cloud definition such as based on density thresholds , H _ { 2 } or CO mass fraction . To avoid grid artefacts , the progressive increase of resolution has to occur within the free-fall time of the densest structures ( 1 – 1.5 Myr ) and \gtrsim 200 time steps should be spent on each refinement level before the resolution is progressively increased further . This avoids the formation of spurious , large-scale , rotating clumps from unresolved turbulent flows . While CO is a good tracer for the evolution of dense gas with number densities n \geq 300 cm ^ { -3 } , H _ { 2 } is also found for n \lesssim 30 cm ^ { -3 } due to turbulent mixing and becomes dominant at column densities around 30 – 50 M _ { \sun } pc ^ { -2 } . The CO-to-H _ { 2 } ratio steadily increases within the first 2 Myr whereas X _ { \mathrm { CO } } \simeq 1 – 4 \times 10 ^ { 20 } cm ^ { -2 } ( K km s ^ { -1 } ) ^ { -1 } is approximately constant since the CO ( 1-0 ) line quickly becomes optically thick .