Proceedings of the 2nd International conference on Electronics, Communication and Aerospace Technology (ICECA 2018) IEEE Conference Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1 Geographical Area Mapping and Classification Utilizing Multispectral Satellite Imagery Processing Based On Machine Learning Algorithms Classifying Land based on its use for different purposes Apurva Saksena, Anushka Ringshia, Arnnava Sharma, Aparna Halbe saksena.apurva@gmail.com, anushkaringshia@gmail.com, arnnava31596@gmail.com, aparna_halbe@spit.ac.in Sardar Patel Institute of Technology, Andheri, Mumbai Abstract—Geographically, a city is characterized as a patchwork of intensive land-uses. Land-use is the rational and judicious approach of allocating available land resources for different activities (such as settlements, arable fields, pastures, and managed woods) within a city. It is a way of utilizing the land, including the allocation, planning, and management of its resources. The use of a particular patch of land and its physical character are linked. However, research that establishes this link is lacking despite the proliferation of geospatial data. Linking a city's physical form with its function is the goal of this paper. Keywords—Land-cover; land-use; sustainability; development; remote sensing; mapping; I. INTRODUCTION Any global city cannot be understood without reference to its spatial forms such as commercial, residential, industrial, marshes/lakes, defence and more.[1] Hence, our goal is to use remote sensing to analyze environment variables like vegetation, impervious surfaces and soil; encoding them into numerical categories and classifying, to finally link a city’s physical form with its functions. The aim is to ensure the highest and best use of the land resources by promoting more efficient utilization, acquisition, and disposition of land.[2] land cover. Therefore, quantifying these land resources and mapping them to measure current situations and how they are changing is critical.[5] There are several types of land uses, namely: ● ● ● ● Residential- includes housing area Commercial - for businesses and factories Defence Recreational - comprising of fun and non-essentials like gardens and parks, tourism ● Transit- roads and highways, railways, airports and even seaways ● Agricultural - arable farmlands and pastures ● Managed woods ● Mining refineries- for coal, petroleum, electricity generation and more Land use shows how people use the landscape – whether for development, conservation, or mixed uses. The different types of land cover can be managed or used quite differently.[6] Land cover: Land cover is the physical material at the surface of the earth. It comprises of vegetation and resources like grass, asphalt, trees, bare ground, water, etc. Land cover data basically documents how much of a region is covered by forests, wetlands, impervious surfaces, agriculture, and other land and water types (including wetlands or open water) [3]. By analyzing satellite and aerial imagery, the land cover can be determined. Identification, delineation and mapping of this land cover establish the baseline from which global monitoring activities like change detection, further studies, resource management, and planning activities can take place. The land cover also provides the ground cover information for baseline thematic maps.[4] Land use: Land use is a set of functions that can be applied to the land available to them. Its practices have a significant impact on the natural resources such as soil, water and vegetation. Deforestation of the temperate regions, urban sprawl, soil erosion or degradation, salinization, and desertification are some of the major effects of land use on Fig.1. Overview of Land cover to Land Use (Images show Landsat imagery, Overlay point grid, Image interpretation and Land use to Land cover maps) 978-1-5386-0965-1/18/$31.00 ©2018 IEEE 1065 Proceedings of the 2nd International conference on Electronics, Communication and Aerospace Technology (ICECA 2018) IEEE Conference Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1 II. METHODOLOGY The uniqueness of land-use to particular locations can be exploited by combining human expertise with the advantages offered by machine learning algorithms. The assumption behind this method is that land-use can be modelled in terms of environmental variables: vegetation, impervious surfaces and soil (VIS) . Urban ecosystems are a composite of these three variables and therefore can be observed, quantified and measured from satellite images.[7] In this project, the morphology of impervious surfaces will be measured and characterized per land-use category. Impervious and pervious (vegetation and soil) surfaces can be encoded into numerical categories and classified using machine learning algorithms. In this project, the VIS can be modelled by taking advantage of the linear correlation of impervious and pervious surfaces in very high resolution (0.5mx0.5m pixels) and medium resolution (30mx30m pixels) satellite images. Impervious surfaces can then be further characterized according to their morphology within arbitrarily defined land-use boundaries and classified into land-use categories. [8] A. Pre-processed Satellite Images A Landsat image is a satellite image. The Landsat program was started for the primary reason of obtaining a global archive of satellite images[9] .WorldView-2 is a commercial Earth observation satellite. WorldView-2 provides commercially available panchromatic imagery of 0.46 m (18in) resolution and eight-band multispectral imagery with 1.84 m (72 in) resolution.[4] These images are then preprocessed to give an intensity image. Intensity images are required to prevent unwanted distortions.The WorldView-2 image has a better resolution and hence it is used for landcover classification as well [10]. The preprocessed image undergoes classification using Support-Vector-Machine Algorithm in the Spatial Analyst toolbox of the ArcGIS software.[11] The land-cover map is hence obtained which will eventually lead to the land-use map. B. Computing the NDVI index The normalized difference vegetation index (NDVI) is a simple graphical indicator that can be used to analyze remote sensing measurements, and assess whether the target being observed contains live green vegetation or not.[12] Live green plants absorb solar radiation in the photo synthetically active radiation (PAR) spectral region, which they use as a source of energy in the process of photosynthesis. Leaf cells have also evolved to re-emit solar radiation in the near-infrared spectral region because the photon energy at wavelengths longer than about 700 nanometers is not large enough to synthesize organic molecules. The pigment in plant leaves, chlorophyll, strongly absorbs visible light (from 0.4 to 0.7 µm) for use in photosynthesis. The cell structure of the leaves, on the other hand, strongly reflects near-infrared light (from 0.7 to 1.1 µm). The more leaves a plant has, the more these wavelengths of light are affected, respectively. It was thus possible to exploit the strong differences in plant reflectance to determine their spatial distribution in these satellite images [13] .Red and NIR stand for the spectral reflectance measurements acquired in the red (visible) and near-infrared regions, respectively. Using this index, a certain area on the map can be concluded to be suitable for agriculture.[14] Fig.2. General Flowchart Negative values of NDVI (values approaching -1) correspond to water. Values close to zero (-0.1 to 0.1) generally correspond to barren areas of rock, sand, or snow. Lastly, low, positive values represent shrub and grassland (approximately 0.2 to 0.4), while high values indicate temperate and tropical rainforests (values approaching 1).[15] 978-1-5386-0965-1/18/$31.00 ©2018 IEEE 1066 Proceedings of the 2nd International conference on Electronics, Communication and Aerospace Technology (ICECA 2018) IEEE Conference Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1 Fig.3. The green regions indicate spaces having NDVI index between 0 to 1 whereas the blue regions have NDVI index between -1 to 0 In this way, the NDVI index for both Landsat and WorldView-2 images is computed. We hence obtain an intensity map containing the pixelized version of the map showing areas with different NDVI index.The difference between the Landsat and WorldView-2 is that of resolution. The WorldView-2 has a better and deeper resolution of 0.5m while Landsat images provide a resolution of 30m. C. Image Masking ArcGIS Software is used for image masking.The Image Masking tool takes an input image, masks it, and produces a new image which is a copy of the input image, except that the new image will have its pixel intensity value set to zero (or some other chosen background intensity value) according to the mask and the masking operations performed [16]. Hence, it can be used for sharpening, blurring, embossing or detection of an edge. Here the input is the satellite image and after masking, we should get a clear intensity diagram.[17]. D. Encoding into numerical categories Impervious and pervious (vegetation and soil) surfaces can be encoded into numerical categories and classified using machine learning algorithms. Urban ecosystems are an agglomeration of these three fundamental variables (V-I-S) and therefore can be observed, quantified and measured from satellite images.The basic idea is to encode into numerical categories, upon which machine learning algorithms can then be applied to classify.[18] Fig.4. Vegetation impervious surface-soil model E. Classifying using Support-Vector-Machine Algorithm The Support-Vector-Machine algorithm is used to classify in the Spatial Analyst Toolbox of ArcGIS software version 10.3.1. [11] The working of any machine learning algorithm depends on the training data set. The initial input on the basis of which a machine can learn is crucial for the success of any machine learning algorithm. We classify land into a cloud, water, white roofs, blue roofs, green roofs, dark green roofs, light red roofs, brown roofs, light grey surface, dark grey surface, bare soil, grassland, tree canopy, shadow and farmland.[19] Whenever any satellite image is input, it will be classified into these categories as shown in the fig.6 [20]. 978-1-5386-0965-1/18/$31.00 ©2018 IEEE 1067 Proceedings of the 2nd International conference on Electronics, Communication and Aerospace Technology (ICECA 2018) IEEE Conference Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1 Fig.8. Land Use category III. Fig.5. Preliminary 2015 Land-Cover Map of a city Fig.6. Land Cover Classes OBSERVATION AND RESULTS Here we have used Pre-Processed Landsat And WorldView-2 images. After following the above-given methodology we can observe them from a raw satellite image we can obtain a Land-Cover map using Support-MachineVector Algorithm [10]. We have classified land based the cover of the land like blue, yellow, green covers but our main aim of this paper is to obtain a Land-Use [9]. For this, we have used image masking on the Land-Cover map into categories like agricultural, commercial, industrial, mixed-use and finally residential represented by colours green, red, violet, ochre and yellow respectively. So we can observe how a raw satellite image has been converted to a Land-Use map which can help us classify land according to its use.[12] F. Land Use map The final output displaying the land-use is obtained by masking the land-cover map into categories like agricultural, commercial, industrial, mixed-use and finally residential. The respective colour codes are green, red, violet, ochre and yellow [21]. By displaying the land-use map, the utility of various areas is revealed and a rough estimate of the soil condition is obtained. This method is found to be accurate and reliable. By learning from the training data set, all input satellite images can be effectively converted to land-use maps[22]. A. Land-Use Classification Accuracy TABLE 1. Fig.9. Accuracy report 1 After performing the Land-Use classification we need to check for its accuracy to see if it is viable. The sum total of the values in the rows gives the total number of retrieved instances. The value in the column gives the number of relevant instances. So for example, the total number of retrieved instances for commercial are 4 and the relevant instances are just 2. However, the number of correctly retrieved relevant instances is just 1 as there might be 2 times when commercial has been detected but it was corrected detected just once. [14] Fig.7. Output of the same city is the Land Use map 978-1-5386-0965-1/18/$31.00 ©2018 IEEE 1068 Proceedings of the 2nd International conference on Electronics, Communication and Aerospace Technology (ICECA 2018) IEEE Conference Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1 unfair distribution of economic assets, and the loss of community consciousness. Increasing the sustainability of communities will require a shift from poorly managed sprawl to land use planning that can build and keep up efficient Infrastructure, encourage close-knit neighborhoods and community consciousness, and preserve the environment. TABLE 2. V. Fig.10. The accuracy of the Land-Use Map Classification should be around 90 percent ideally to get near perfect results. Therefore, more efficient and accurate classification algorithms need to be developed in order to increase the accuracy of the Land-Use Map Classification. Also, better image processing software could be developed and used in order to correctly capture and process a geospatial or a hyperspectral image. Accuracy report 2 Precision is the fraction of correct relevant instances among the total number of retrieved instances. A recall is the fraction of correct relevant instances that have been retrieved by the total number of relevant instances. Support is the total number of relevant instances. The F1 score is a measure of a test's accuracy. It considers both the precision and the recall of the test to compute the score.[20] So for example, Precision for Commercial is ¼=0.25 as a number of correct relevant instances are 1 and total instances retrieved are 4. The Recall for Commercial is ½=0.50 as a number of correct relevant instances are 1 and number of relevant instances are 2. The using the above formula we can find the F1 score. Support is 2 as the number of relevant instances is 2. Similarly, we do the same for the rest and find the average precision, recall, F1 score and support which is 0.71, 0.60, 0.62 and 20 respectively. This helps us in knowing that the Land-Use Map that we have obtained by masking the LandCover Map gives us fairly accurate results. This will help us correctly classify land based on its use. [23] IV. FUTURE WORK VI. The authors gratefully acknowledge the contributions of the entire I.T. department and Sardar Patel Institute of Technology along with its staff for their work. REFERENCES [1] O. Arino, P. Bicheron, F. Achard, J. Latham, R. Witt, J.-L. Weber, "GLOBCOVER—The most detailed portrait of Earth", European Space Agency ESA Bulletin, pp. 24-31, 2008 [2] M. C. Hansen, T. R. Loveland, "A review of large area monitoring of land cover change using Landsat data", Remote Sens. Environ., 2012. [3] Yang, Yi, and Shawn Newsam. "Geographic image retrieval using local invariant features." IEEE Transactions on Geoscience and Remote Sensing51.2 (2013): 818-832. [4] Comber, A., Fisher, P., & Wadsworth, R. (2005). What is land cover? Environment and Planning B: Planning and Design, 32(2), 199–209. https://doi.org/10.1068/b31135 [5] Cihlar, J., & Jansen, L. (2001). From Land Cover to Land Use: A Methodology for Efficient Land Use Mapping over Large Areas. The Professional Geographer, 53(2), 275–289. https://doi.org/10.1111/00330124.00285 [6] D. Hoiem, A. A. Efros, M. Hebert, "Recovering surface layout from an image", Int. J. Comput. Vis., vol. 75, no. 1, pp. 151-172, Oct. 2007. [7] J. Feranec, G. Hazeu, S. Christensen, G. Jaffrain, "Corine land cover change detection in Europe (case studies of the Netherlands and Slovakia)", Land Use Policy, vol. 24, pp. 234-247, 2007. [8] RIDD, M. K. (1995). Exploring a V-I-S (vegetation-impervious surfacesoil) model for urban ecosystem analysis through remote sensing: comparative anatomy for cities†. International Journal of Remote Sensing, 16(12), 2165–2185. https://doi.org/10.1080/01431169508954549 [9] J. Knorn, A. Rabe, V. C. Radeloff, T. Kuemmerle, J. Kozak, P. Hostert, "Land cover mapping of large areas using chain classification of neighboring Landsat satellite images", Remote Sens. Environ., vol. 113, pp. 957-964, 2009. CONCLUSION The ways in which municipalities, states, and nations plan the physical arrangement or land use of our communities is critical to sustainability. The land use patterns, which are shared by cities across the world have given rise to complex problems created by urban sprawl faced by all—growing traffic congestion and lengthening commute times, air pollution, wasteful energy consumption and greater reliance on petroleum, elimination of open space and wildlife habitat, ACKNOWLEDGMENT [10] W. B. Cohen, S. N. Goward, "Landsat's role in ecological applications of remote sensing", Bioscience, vol. 54, pp. 535-545, 2004. 978-1-5386-0965-1/18/$31.00 ©2018 IEEE 1069 Proceedings of the 2nd International conference on Electronics, Communication and Aerospace Technology (ICECA 2018) IEEE Conference Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1 [11] Griffiths, Patrick, et al. "A pixel-based Landsat compositing algorithm for large area land cover mapping." IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 6.5 (2013): 2088-2101. [12] M. Baumann, M. Ozdogan, T. Kuemmerle, K. J. Wendland, E. Esipova, V. C. Radeloff, "Using the Landsat record to detect forest-cover changes during and after the collapse of the Soviet Union in the temperate zone of European Russia", Remote Sens. Environ., vol. 124, pp. 174-184, 2012.. [13] J. C.-W. Chan, D. Paelinckx, "Evaluation of random forest and adaboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery", Remote Sens. Environ., vol. 112, no. 6, pp. 2999-3011, Jun. 2008. [14] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, New York:Springer-Verlag, 2009. [15] C. E. Woodcock, S. A. Macomber, M. Pax-Lenney, W. B. Cohen, "Monitoring large areas for forest change using Landsat: Generalization across space time and Landsat sensors", Remote Sens. Environ., vol. 78, pp. 194-203, 2001. [16] Senthilnath, J., et al. "Hierarchical clustering algorithm for land cover mapping using satellite images." IEEE journal of selected topics in applied earth observations and remote sensing 5.3 (2012): 762-768. [17] S. Z. Li, Markov Random Field Modeling in Image Analysis, New York:Springer-Verlag, 2009. [18] C. Tomasi, R. Manduchi, "Bilateral filtering for gray and color images", Proc. 6th Int. Conf. Computer Vis., pp. 839-846, 1998. [19] T. M. Lillesand, R. W. Kiefer, J. W. Chipman, Remote Sensing and Image Interpretation, NJ, Hoboken:Wiley, 2003. [20] Schindler, Konrad. "An overview and comparison of smooth labeling methods for land-cover classification." IEEE transactions on geoscience and remote sensing 50.11 (2012): 4534-4545. [21] P. Coppin, I. Jonckheere, K. Nackaerts, B. Muys, E. Lambin, "Digital change detection methods in ecosystem monitoring: A review", Int. J. Remote Sens., vol. 25, pp. 1565-1596, 2004. [22] B. Sirmaek, C. Unsalan, "Urban-area and building detection using sift keypoints and graph theory", IEEE Trans. Geosci. Remote Sens., vol. 47, no. 4, pp. 1156-1167, Apr. 2009. [23] Storper, M., & Scott, A. J. (2016). Current debates in urban theory: A critical assessment. Urban Studies, 53(6), 1114–1136. https://doi.org/10.1177/0042098016634002 978-1-5386-0965-1/18/$31.00 ©2018 IEEE 1070