We tested the performance of photometric redshifts for galaxies in the Hubble Ultra Deep field down to 30 ^ { \mathrm { th } } magnitude . We compared photometric redshift estimates from three spectral fitting codes from the literature ( EAZY , BPZ and BEAGLE ) to high quality redshifts for 1227 galaxies from the MUSE integral field spectrograph . All these codes can return photometric redshifts with bias \left| ( z _ { \mathrm { MUSE } } - pz ) / ( 1 + z _ { \mathrm { MUSE } } ) \right| < 0.05 down to \mathrm { m } _ { \mathrm { F 775 W } } = 30 and spectroscopic incompleteness is unlikely to strongly modify this statement . We have , however , identified clear systematic biases in the determination of photometric redshifts : in the 0.4 < z < 1.5 range , photometric redshifts are systematically biased low by as much as ( z _ { \mathrm { MUSE } } - pz ) / ( 1 + z _ { \mathrm { MUSE } } ) = -0.04 in the median , and at z > 3 they are systematically biased high by up to ( z _ { \mathrm { MUSE } } - pz ) / ( 1 + z _ { \mathrm { MUSE } } ) = 0.05 , an offset that can in part be explained by adjusting the amount of intergalactic absorption applied . In agreement with previous studies we find little difference in the performance of the different codes , but in contrast to those we find that adding extensive ground-based and IRAC photometry actually can worsen photo-z performance for faint gaalxies . We find an outlier fraction , defined through \left| ( z _ { \mathrm { MUSE } } - pz ) / ( 1 + z _ { \mathrm { MUSE } } ) \right| > 0.15 , of 8 % for BPZ and 10 % for EAZY and BEAGLE , and show explicitly that this is a strong function of magnitude . While this outlier fraction is high relative to numbers presented in the literature for brighter galaxies , they are very comparable to literature results when the depth of the data is taken into account . Finally , we demonstrate that while a redshift might be of high confidence , the association of a spectrum to the photometric object can be very uncertain and lead to a contamination of a few percent in spectroscopic training samples that do not show up as catastrophic outliers , a problem that must be tackled in order to have sufficiently accurate photometric redshifts for future cosmological surveys .