The photometer section of SPIRE is one of the key instruments on board of Herschel .
Its legacy depends very much on how well the scanmap observations that it carried out during the Herschel mission can be converted to high quality maps .
In order to have a comprehensive assessment on the current status of SPIRE map-making , as well as to provide guidance for future development of the SPIRE scan-map data reduction pipeline , we carried out a test campaign on SPIRE map-making .
In this report , we present results of the tests in this campaign .
The goals are : ( 1 ) Compare the map-makers in the SPIRE pipeline with other mapmakers .
( 2 ) In particular , identify the strengths and limitations of different mapmakers in dealing with the known SPIRE map-making issues , such as the cooler burp effect .
( 3 ) Assess the resolution-enhancement capabilities of the super-resolution mappers , as compared to the destriper ( the pipeline default ) , and investigate their applicability to various kinds of data as well as caveats or pitfalls to avoid .
( 4 ) Enable users to choose the right map-maker for their science .
( 5 ) Provide guidance for future development of the SPIRE scan-map data reduction pipeline .

For these purposes , 13 test cases were generated , including data sets obtained in different observational modes and scan speeds , with different map sizes , source brightness , and levels of complexity of the extended emission .
They also include observations suffering from the “ cooler burp ” effect , and those having strong large-scale gradients in the background radiation .
The input data for these test cases are time-ordered data ( TODs In this report , TOD is used in the broad sense of a collection of samples containing time , flux density and position information .
The data were not formatted as a single HIPE Tod product , but rather consisted of many FITS files , one per scan .
Each file is known within HIPE as a Photometer Scan Product ( PSP ) and contains tables of the calibrated signal , right ascension and declination , with each row corresponding to a time sample and with separate columns for each bolometer . ) .
The map-making process turns the TODs into maps .
Among the test cases , 8 are simulated and 5 are real observations .

Comparing to real observations , a simulated test case has the advantage of possessing the “ truth ” , namely the sky model , based on which the simulation is carried out .
The truth map provides an unbiased standard against which test maps made by different map-makers are to be compared .
Allowing for the effects of noise in a given map , deviations from the truth can be used as objective measures for the bias introduced by the map-making process .
In the simulations , TODs were generated using two layers of data : a noise layer taken from real SPIRE observations of dark fields ( this allows the simulation to include both instrumental noise and confusion noise ) , and a truth layer which is a sky-model map based either on a real Spitzer 24 \mu m map or a map of artificial sources .

Seven map-makers participated in this test campaign , including ( 1 ) Naive mapper ( default of the SPIRE standard pipeline until HIPEÂ 8 ) ; ( 2 ) Destriper in two flavors : ( i ) Destriper-P0 : Destriper with polynomial-orderÂ =Â 0 ( default of SPIRE standard pipeline since HIPEÂ 9 ) , and ( ii ) Destriper-P1 : Destriper with polynomial-orderÂ =Â 1Â ; ( 3 ) Scanamorphos ; ( 4 ) SANEPIC ( GLS mapmaker ) ; ( 5 ) Unimap ( GLS mapmaker ) ; ( 6 ) HiRes ( super-resolution map-maker ) ; ( 7 ) SUPREME ( super-resolution map-maker ) .
Because of time constraints , not all map-makers processed all the test cases ( see TableÂ LABEL : tbl0 : mapmaker_testcases for details ) . 0.1 Caption for LOF TableÂ 0.1 Test Cases Processed by Different Map-Makers
[Table Here]


Results of tests are presented in the framework of four sets of metrics :

( 1 ) Deviation from the truth .
These metrics include : ( i ) visual examinations of the difference map Map - Map _ { true } ; ( ii ) a scatter plot of ( S – S _ { true } ) vs S _ { true } for individual pixels ; ( iii ) slopes of these plots ; ( iv ) absolute deviations : mean and standard deviation of S – S _ { true } ; ( v ) relative deviations : mean and standard deviation of ( S – S _ { true } ) /S _ { true } .
They are applied to maps of 5 simulated test cases ( Cases 2 , 4 , 6 , 9 , 10 ) that are based on real MIPS 24 \mu m maps ( simulated cases based on artificial sources are excluded ) .
The results clearly demonstrate the applicability and limitation of individual map-makers .
Destriper-P0 produces the least deviations in most cases , but its maps show artificial stripes for the cases with “ cooler burp effect ” .
Scanamorphos , running with the “ galactic option ” and without the “ relative gain corrections ” , can minimize the “ cooler burp effect ” .
However , bright pixels in Scanamorphos maps display large deviations , likely due to a slight positional offset introduced by the mapper , and a slight change in the beam size .
Destriper-P1 , SANEPIC , and Unimap introduce different types of large spatial scale noise .
For SANEPIC , this is likely due to mismatches between the assumptions made in the map-maker and the properties of the test data .
For example , SANEPIC assumes that data are circulant , which is not true for the Case 9 .
For Unimap , the large scale distortion in maps of Case 6 is triggered by the “ cooler burp effect ” , which the map-maker does not know how to handle .
For Naive-mapper ( with simple median background removal ) , many maps show large deviations due to the over-subtraction of the background when extended emission is present .

( 2 ) Spatial ( 2-D ) power spectra .
These metrics include ( i ) plots and comparisons of power spectra of maps made by different map-makers ; ( ii ) for simulated cases , plots and comparisons of the divergence from the truth power spectrum of the maps by different map-makers .
Most of the power spectra , either coming from real or simulated data , noise-only Â or with extended emission , show very similar results .
In the “ middle part ” ( k = [ 0.1 , 1 ] arcmin ^ { -1 } ) , results among different map-makers vary little : \sim 1 \% for cases where a truth map was available as benchmark .
At smaller scales ( k > 1 arcmin ^ { -1 } ) , the standard Naive mapper produces higher powers than other map-makers , presumably due to the fine-stripes ( baseline removal errors ) found in its maps .
Meanwhile , at the same scale , results of Destriper-P0 , Destriper-P1 , Unimap , and SANEPIC are always very close , and those of Scanamorphos are usually lower .
The low power at high spatial frequencies in Scanamorphos maps is likely due to the fact that , unlike other map-makers , Scanamorphos distributes the signal measured at a sky position among multiple adjacent map pixels .
This is equivalent to a map smoothing , which takes away high frequency powers .
At larger scales ( k < 0.1 arcmin ^ { -1 } ) , again the Naive mapper produces higher powers because of the poor baseline removal , while the results of other map-makers are all comparable .
In the special cases with the “ cooler burp ” , the power spectra of Naive and Destriper-P0 maps are clearly affected , showing much higher power at k < 0.1 and a peak at k \sim 1.5 in the PLW map .
No significant effects due to the “ cooler burp ” are found in results of the other map-makers It should be noted that Naive mapper and Destriper were not designed to treat the cooler burp .
Previous parts of the standard pipeline will do this in future HIPE versions . .

( 3 ) Point source and extended source photometry .
These metrics include ( i ) astrometry of point sources ; ( ii ) point source and extended source photometry ; ( iii ) detection rates of faint point sources , obtained using Starfinder ( a point source extractor ) ; ( iv ) PSF profiles .
They are applied to the simulated test cases with artificial sources ( Cases 1 , 5 and 8 ) .
The results show that bright sources in maps made by Scanamorphos have systematically larger position errors ( \lower 2.0 pt \hbox { $ { > \atop \hbox { \raise 4.0 pt \hbox { $ \sim$ } } } $ } 0.1 pixel ) than those in maps made by other map-makers , consistent with the results on position offsets in Scanamorphos maps found in Metrics ( 1 ) for the deviation from the truth .
Photometry for bright point sources in all maps has small errors , indicating good energy conservation by all map-makers .
On the other hand , photometry of extended sources in the Naive mapper are significantly affected by a known bias due to the over-subtraction of baselines , while other maps have no such issue .
For faint point sources ( f = 30 mJy ) , no significant difference is found among results for different map-makers on both detection rate and photometry .
Also , there is no significant difference between beam profiles of sources in maps made by different map-makers .

( 4 ) Metrics for super-resolution maps .
These metrics are applied to maps made by HiRes and SUPREME , the two super-resolution mappers , and compare them to maps made by the destriper ( the pipeline default ) .
They include : ( i ) visual examinations of the maps ; ( ii ) spatial power spectra ; ( iii ) point source profiles .
The results show that SUPREME and HiRes yield similar resolution enhancements ( factors of 2-3 ) at spatial scales around 2 arcmin ^ { -1 } for the limited datasets tested at 250 microns .
At higher spatial frequencies corresponding to spatial scales smaller than the beam size , there is less power in the SUPREME maps ( intentionally , to smooth and reduce the noise at scales smaller than the beam ) .
HiRes contains more power than either SUPREME or Destriper-P0 maps between spatial scales of 15-20 arcseconds .
The differences in SUPREME and HiRes arise mainly because SUPREME is tuned to enhance extended emission features , and HiRes is essentially performing a deconvolution in image space .

Summary of Results :

• The Destriper with polynomial order of 0 ( Destriper-P0 ) , which is the default map-maker in the SPIRE scanmap pipeline since HIPE 9 , performed remarkably well and compared favorably among all map-makers in all test cases except for those suffering from the “ cooler burp ” effect , as it does not have a mechanism to deal with this effect .
In particular , it can handle observations with complex extended emission structures and with large scale background gradient very well .

• In contrast , the Destriper with the polynomial order of 1 ( Destriper-P1 ) compared poorly among its peers , introducing significant artificial large scale gradient in many cases .

• Scanamorphos showed noticeable differences in all comparisons .
On the positive side , its maps have the smallest deviation from the truth for faint pixels ( f < 0.2 Jy ) in nearly all cases .
Particularly , as shown in both the difference maps and in the power-spectra , it can handle the “ cooler burp ” effect very well .
On the negative side , for bright pixels ( f > 0.2 Jy ) , its maps show significant deviations from the truth , likely due to a slight positional offset introduced by the mapper as well as a slight change in the beam size .
This effect is also seen in the astrometric errors of the bright sources .
However the offset is very small ( \sim 0.1 pexel ) , therefore it does not affect the photometry of both point sources and extended sources , and does not show up in the comparison between beam profiles ( resolution : 0.2 pixels ) .
The power spectrum analysis indicates some smoothing of the data compared to the other mapmakers .

• The GLS mapper SANEPIC can also minimize the “ cooler burp ” effect .
It performed quite well in most cases .
However , for those cases with strong variations in very large scales ( i.e .
comparable to the map size ) , its maps show significant deviations from the truth .
This is because some of its assumptions ( e.g .
TODs are circulant ) are invalid for the data .

• Unimap , another participating GLS mapper , is among the best performers in most cases .
However , because it does not include a mechanism for handling the “ cooler burp ” , its maps show significant deviations from the truth in the cases affected by the artifact .

• The Naive-mapper ( with simple median background removal ) is inferior among its peers in general .
The most severe bias it introduces is the over-subtraction of the background when extended emission is present .
In the cases where the extended emission is in complex structures , this bias can not be avoided by simple masks in the background removal .

• The two super-resolution mapmakers , SUPREME and HiRes , yield similar resolution enhancements ( factors of 2-3 ) at spatial scales around 2 arcmin ^ { -1 } for the limited datasets tested at 250 microns .
At higher spatial frequencies corresponding to spatial scales smaller than the beam size , there is less power in the SUPREME maps ( intentionally , to smooth and reduce the noise at scales smaller than the beam ) .
HiRes contains more power than either SUPREME or Destriper-P0 maps between spatial scales of 15-20 arcseconds .
The differences in SUPREME and HiRes arise mainly because SUPREME is tuned to enhance extended emission features , and HiRes is essentially performing a deconvolution in image space .