# How to find the distance between start of line and start of raster with arcpy/python?

I'am creating a python script tool for ArcGIS Desktop 10.1 to extract profiles. I'm using stack profile to extract values to a table but stack profile only outputs values where the line and raster overlaps.

My solution to create profiles that will start at the beginning of my line is to find the distance between the start of the line and the start of the raster and then offset the x-axis. But how do I find this distance (marked with blue in picture below) with arcpy/python?

Update: the solution needs to work with floating point raster dataset

My suggestion would be to use the Raster to Polygon (Conversion) tool on each of the two rasters, using non-simplified output. You could then use each of the resulting feature classes to erase the overlap from the original line features calculate the length of the remaining end portions.

The Path Distance tools are comparable to the Cost Distance tools in that both determine the minimum accumulative travel cost from or to a source for each location on a raster surface. However, the Path Distance tools add more complexity to the analysis by being able to accommodate for the actual surface distance as well as other horizontal and vertical factors.

The input source data can be a feature class or raster.

When the input source data is a raster, the set of source cells consists of all cells in the source raster that have valid values. Cells that have NoData values are not included in the source set. The value 0 is considered a legitimate source. A source raster can be created using the extraction tools.

When the input source data is a feature class, the source locations are converted internally to a raster before performing the analysis. The resolution of the raster can be controlled with the Cell Size environment. By default, if no other rasters are specified in the tool, the resolution will be determined by the shorter of the width or height of the extent of the input feature, in the input spatial reference, divided by 250.

When using feature data for the input source data, care must be taken with how the output cell size is handled when it is coarse, relative to the detail present in the input. The internal rasterization process uses the same default Cell assignment type as the Feature to Raster tool, which is the cell center method. This means that data not located at the center of the cell will not be included in the intermediate rasterized source output, so it will not be represented in the distance calculations. For example, if your sources are a series of small polygons (such as building footprints) that are small relative to the output cell size, it is possible that only a few will fall under the centers of the output raster cells, seemingly causing most of the others to be lost in the analysis.

To avoid this situation, as an intermediate step, you could rasterize the input features directly with the Feature to Raster tool and set the Field parameter. Then use the resulting output as input to the particular distance tool you want to use. Alternatively, you could select a small cell size to capture the appropriate amount of detail from the input features.

To calculate allocation, source locations can have an associated value, which can be specified by the Source field parameter. If the input source is an integer raster, the default field is VALUE . If it is a feature, it will be the first integer field in the attribute table. If the input source data is a floating-point raster, an integer value raster parameter must be specified.

Cells with NoData act as barriers in the Path Distance tools. The cost distance for cells behind NoData values is calculated by the accumulative cost necessary to move around the NoData barrier. Any cell location that is assigned NoData on any one of the input rasters will receive NoData on all output rasters.

If the input source data and the cost raster are different extents, the default output extent is the intersection of the two. To get a cost distance surface for the entire extent, choose the Union of Inputs option on the output Extent environment settings.

The output of the Aspect tool can be used as input for the Input horizontal raster .

The Maximum distance is specified in the same cost units as those on the cost raster.

For the output distance raster, the least-cost distance (or minimum accumulative cost distance) of a cell from or to a set of source locations is the lower bound of the least-cost distances from the cell to all source locations.

The default values for the Horizontal factor modifiers are the following:

The default values for the Vertical factor modifiers are the following:

The characteristics of the source, or the movers from or to a source, can be controlled by specific parameters. The Source cost multiplier parameter specifies the mode of travel or magnitude at the source, Source start cost sets the starting cost before the movement begins, Source resistance rate is a dynamic adjustment accounting for the impact of accumulated cost, for example, simulating how much a hiker is getting fatigued, and Source capacity sets how much cost a source can assimilate before reaching its limit. Travel direction identifies whether the mover is starting at a source and moving to nonsource locations or starting at nonsource locations and moving back to a source.

If any of the source characteristics parameters are specified using a field, the source characteristic will be applied on a source-by-source basis, according to the information in the given field for the source data. When a keyword or a constant value is given, it will be applied to all sources.

If Source start cost is specified and Travel direction is Travel from source , the source locations on the output cost distance surface will be set to the Source start cost value otherwise, the source locations on the output cost distance surface will be set to zero.

This tool supports parallel processing. If your computer has multiple processors or processors with multiple cores, better performance may be achieved, particularly on larger datasets. The Parallel processing with Spatial Analyst help topic has more details on this capability and how to configure it.

When using parallel processing, temporary data will be written to manage the data chunks being processed. The default temp folder location will be on your local C: drive. You can control the location of this folder by setting up a system environment variable named TempFolders and specifying the path to a folder to use (for example, E:RasterCache ). If you have admin privileges on your machine, you can also use a registry key (for example, [HKEY_CURRENT_USERSOFTWAREESRIArcGISProRaster]).

By default, this tool will use 50 percent of the available cores. If the input data is smaller than 5,000 by 5,000 cells in size, fewer cores may be used. You can control the number of cores the tool uses with the Parallel processing factor environment.

See Analysis environments and Spatial Analyst for additional details on the geoprocessing environments that apply to this tool.

## Syntax

The input source locations.

This is a raster or feature dataset that identifies the cells or locations to which the least accumulated cost distance for every output cell location is calculated.

For rasters, the input type can be integer or floating point.

A raster defining the impedance or cost to move planimetrically through each cell.

The value at each cell location represents the cost-per-unit distance for moving through the cell. Each cell location value is multiplied by the cell resolution while also compensating for diagonal movement to obtain the total cost of passing through the cell.

The values of the cost raster can be integer or floating point, but they cannot be negative or zero (you cannot have a negative or zero cost).

A raster defining the elevation values at each cell location.

The values are used to calculate the actual surface distance covered when passing between cells.

Defines the threshold that the accumulative cost values cannot exceed.

If an accumulative cost distance value exceeds this value, the output value for the cell location will be NoData. The maximum distance defines the extent for which the accumulative cost distances are calculated.

The default distance is to the edge of the output raster.

The output path distance raster.

The output path distance raster identifies, for each cell, the least accumulative cost distance, over a cost surface to the identified source locations, while accounting for surface distance as well as horizontal and vertical surface factors.

A source can be a cell, a set of cells, or one or more feature locations.

The output raster is of floating-point type.

A raster defining the horizontal direction at each cell.

The values on the raster must be integers ranging from 0 to 360, with 0 degrees being north, or toward the top of the screen, and increasing clockwise. Flat areas should be given a value of -1. The values at each location will be used in conjunction with the

The Horizontal Factor object defines the relationship between the horizontal cost factor and the horizontal relative moving angle.

There are several factors with modifiers from which to select that identify a defined horizontal factor graph. Additionally, a table can be used to create a custom graph. The graphs are used to identify the horizontal factor used in calculating the total cost of moving into a neighboring cell.

In the explanations below, two acronyms are used: HF stands for horizontal factor, which defines the horizontal difficulty encountered when moving from one cell to the next and HRMA stands for horizontal relative moving angle, which identifies the angle between the horizontal direction from a cell and the moving direction.

Indicates that if the HRMA is less than the cut angle, the HF is set to the value associated with the zero factor otherwise, it is infinity.

Establishes that only forward movement is allowed. The HRMA must be greater than or equal to 0 and less than 90 (0 <= HRMA < 90). If the HRMA is greater than 0 and less than 45 degrees, the HF for the cell is set to the value associated with the zero factor. If the HRMA is greater than or equal to 45 degrees, then the side value modifier value is used. The HF for any HRMA equal to or greater than 90 degrees is set to infinity.

Specifies that the HF is a linear function of the HRMA.

Specifies that the HF is an inverse linear function of the HRMA.

Identifies that a table file will be used to define the horizontal factor graph used to determine the HFs.

—Establishes the horizontal factor used when the HRMA is 0. This factor positions the y intercept for any of the horizontal factor functions. —Defines the HRMA angle beyond which the HF will be set to infinity. —Establishes the slope of the straight line used with the HfLinear and HfInverseLinear horizontal-factor keywords. The slope is specified as a fraction of rise over run (for example, 45 percent slope is 1/45, which is input as 0.02222). —Establishes the HF when the HRMA is greater than or equal to 45 degrees and less than 90 degrees when the HfForward horizontal-factor keyword is specified. - inTable —Identifies the name of the table defining the HF.

A raster defining the z-values for each cell location.

The values are used for calculating the slope used to identify the vertical factor incurred when moving from one cell to another.

The Vertical factor object defines the relationship between the vertical cost factor and the vertical relative moving angle (VRMA).

There are several factors with modifiers from which to select that identify a defined vertical factor graph. Additionally, a table can be used to create a custom graph. The graphs are used to identify the vertical factor used in calculating the total cost for moving into a neighboring cell.

In the explanations below, two acronyms are used: VF stands for vertical factor, which defines the vertical difficulty encountered in moving from one cell to the next and VRMA stands for vertical relative moving angle, which identifies the slope angle between the FROM or processing cell and the TO cell.

The definitions and parameters of these are the following:

Specifies that if the VRMA is greater than the low-cut angle and less than the high-cut angle, the VF is set to the value associated with the zero factor otherwise, it is infinity.

Indicates that the VF is a linear function of the VRMA.

Indicates that the VF is an inverse linear function of the VRMA.

Specifies that the VF is a linear function of the VRMA in either the negative or positive side of the VRMA, respectively, and the two linear functions are symmetrical with respect to the VF (y) axis.

Specifies that the VF is an inverse linear function of the VRMA in either the negative or positive side of the VRMA, respectively, and the two linear functions are symmetrical with respect to the VF (y) axis.

Identifies the VF as the cosine-based function of the VRMA.

Identifies the VF as the secant-based function of the VRMA.

Indicates that the VF is the cosine-based function of the VRMA when the VRMA is negative and the secant-based function of the VRMA when the VRMA is nonnegative.

Specifies that the VF is the secant-based function of the VRMA when the VRMA is negative and the cosine-based function of the VRMA when the VRMA is nonnegative.

Identifies that a table file will be used to define the vertical-factor graph used to determine the VFs.

The Modifiers to the vertical parameters are the following:

—Establishes the vertical factor used when the VRMA is zero. This factor positions the y-intercept of the specified function. By definition, the zero factor is not applicable to any of the trigonometric vertical functions (Cos, Sec, Cos_Sec, or Sec_Cos). The y-intercept is defined by these functions. —Defines the VRMA angle below which the VF will be set to infinity. —Defines the VRMA angle above which the VF will be set to infinity. —Establishes the slope of the straight line used with the VfLinear and VfInverseLinear parameters. The slope is specified as a fraction of rise over run (for example, 45 percent slope is 1/45, which is input as 0.02222). - inTable —Identifies the name of the table defining the VF.

Multiplier to apply to the cost values.

Allows for control of the mode of travel or the magnitude at a source. The greater the multiplier, the greater the cost to move through each cell.

The values must be greater than zero. The default is 1.

The starting cost from which to begin the cost calculations.

Allows for the specification of the fixed cost associated with a source. Instead of starting at a cost of zero, the cost algorithm will begin with the value set by source_start_cost .

The values must be zero or greater. The default is 0.

The rate to multiply the accumulative cost to determine the resistance adjustment.

This parameter simulates the increase in the effort to overcome costs as the accumulative cost increases. This is used to model fatigue of the traveler. The growing accumulative cost to reach a cell is multiplied by the resistance rate and added to the cost to move into the subsequent cell.

The greater the resistance rate, the more additional cost is added to reach the next cell, which is compounded for each subsequent movement. Since the resistance rate is similar to a compound rate and generally the accumulative cost values are very large, small resistance rates are suggested, such as 0.02, 0.005, or even smaller, depending on the accumulative cost values.

The values must be zero or greater. The default is 0.

Defines the cost capacity for the traveler from a source.

Each source grows to the specified capacity.

The values must be greater than zero. The default capacity is to the edge of the output raster.

### Return Value

The output cost back-link raster.

The back-link raster contains values of 0 through 8, which define the direction or identify the next neighboring cell (the succeeding cell) along the least accumulative cost path from a cell to reach its least cost source, while accounting for surface distance as well as horizontal and vertical surface factors.

If the path is to pass into the right neighbor, the cell will be assigned the value 1, 2 for the lower right diagonal cell, and continuing clockwise. The value 0 is reserved for source cells.

## Syntax

The input source locations.

This is a raster or feature dataset that identifies the cells or locations to which the least accumulated cost distance for every output cell location is calculated.

For rasters, the input type can be integer or floating point.

A raster defining the impedance or cost to move planimetrically through each cell.

The value at each cell location represents the cost-per-unit distance for moving through the cell. Each cell location value is multiplied by the cell resolution while also compensating for diagonal movement to obtain the total cost of passing through the cell.

The values of the cost raster can be integer or floating point, but they cannot be negative or zero (you cannot have a negative or zero cost).

A raster defining the elevation values at each cell location.

The values are used to calculate the actual surface distance covered when passing between cells.

A raster defining the horizontal direction at each cell.

The values on the raster must be integers ranging from 0 to 360, with 0 degrees being north, or toward the top of the screen, and increasing clockwise. Flat areas should be given a value of -1. The values at each location will be used in conjunction with the

The Horizontal Factor object defines the relationship between the horizontal cost factor and the horizontal relative moving angle.

There are several factors with modifiers from which to select that identify a defined horizontal factor graph. Additionally, a table can be used to create a custom graph. The graphs are used to identify the horizontal factor used in calculating the total cost of moving into a neighboring cell.

In the explanations below, two acronyms are used: HF stands for horizontal factor, which defines the horizontal difficulty encountered when moving from one cell to the next and HRMA stands for horizontal relative moving angle, which identifies the angle between the horizontal direction from a cell and the moving direction.

Indicates that if the HRMA is less than the cut angle, the HF is set to the value associated with the zero factor otherwise, it is infinity.

Establishes that only forward movement is allowed. The HRMA must be greater than or equal to 0 and less than 90 (0 <= HRMA < 90). If the HRMA is greater than 0 and less than 45 degrees, the HF for the cell is set to the value associated with the zero factor. If the HRMA is greater than or equal to 45 degrees, then the side value modifier value is used. The HF for any HRMA equal to or greater than 90 degrees is set to infinity.

Specifies that the HF is a linear function of the HRMA.

Specifies that the HF is an inverse linear function of the HRMA.

Identifies that a table file will be used to define the horizontal factor graph used to determine the HFs.

—Establishes the horizontal factor used when the HRMA is 0. This factor positions the y intercept for any of the horizontal factor functions. —Defines the HRMA angle beyond which the HF will be set to infinity. —Establishes the slope of the straight line used with the HfLinear and HfInverseLinear horizontal-factor keywords. The slope is specified as a fraction of rise over run (for example, 45 percent slope is 1/45, which is input as 0.02222). —Establishes the HF when the HRMA is greater than or equal to 45 degrees and less than 90 degrees when the HfForward horizontal-factor keyword is specified. - inTable —Identifies the name of the table defining the HF.

A raster defining the z-values for each cell location.

The values are used for calculating the slope used to identify the vertical factor incurred when moving from one cell to another.

The Vertical factor object defines the relationship between the vertical cost factor and the vertical relative moving angle (VRMA).

There are several factors with modifiers from which to select that identify a defined vertical factor graph. Additionally, a table can be used to create a custom graph. The graphs are used to identify the vertical factor used in calculating the total cost for moving into a neighboring cell.

In the explanations below, two acronyms are used: VF stands for vertical factor, which defines the vertical difficulty encountered in moving from one cell to the next and VRMA stands for vertical relative moving angle, which identifies the slope angle between the FROM or processing cell and the TO cell.

The definitions and parameters of these are the following:

Specifies that if the VRMA is greater than the low-cut angle and less than the high-cut angle, the VF is set to the value associated with the zero factor otherwise, it is infinity.

Indicates that the VF is a linear function of the VRMA.

Indicates that the VF is an inverse linear function of the VRMA.

Specifies that the VF is a linear function of the VRMA in either the negative or positive side of the VRMA, respectively, and the two linear functions are symmetrical with respect to the VF (y) axis.

Specifies that the VF is an inverse linear function of the VRMA in either the negative or positive side of the VRMA, respectively, and the two linear functions are symmetrical with respect to the VF (y) axis.

Identifies the VF as the cosine-based function of the VRMA.

Identifies the VF as the secant-based function of the VRMA.

Indicates that the VF is the cosine-based function of the VRMA when the VRMA is negative and the secant-based function of the VRMA when the VRMA is nonnegative.

Specifies that the VF is the secant-based function of the VRMA when the VRMA is negative and the cosine-based function of the VRMA when the VRMA is nonnegative.

Identifies that a table file will be used to define the vertical-factor graph used to determine the VFs.

The Modifiers to the vertical parameters are the following:

—Establishes the vertical factor used when the VRMA is zero. This factor positions the y-intercept of the specified function. By definition, the zero factor is not applicable to any of the trigonometric vertical functions (Cos, Sec, Cos_Sec, or Sec_Cos). The y-intercept is defined by these functions. —Defines the VRMA angle below which the VF will be set to infinity. —Defines the VRMA angle above which the VF will be set to infinity. —Establishes the slope of the straight line used with the VfLinear and VfInverseLinear parameters. The slope is specified as a fraction of rise over run (for example, 45 percent slope is 1/45, which is input as 0.02222). - inTable —Identifies the name of the table defining the VF.

Defines the threshold that the accumulative cost values cannot exceed.

If an accumulative cost distance value exceeds this value, the output value for the cell location will be NoData. The maximum distance defines the extent for which the accumulative cost distances are calculated.

The default distance is to the edge of the output raster.

The output cost back-link raster.

The back-link raster contains values of 0 through 8, which define the direction or identify the next neighboring cell (the succeeding cell) along the least accumulative cost path from a cell to reach its least cost source, while accounting for surface distance as well as horizontal and vertical surface factors.

If the path is to pass into the right neighbor, the cell will be assigned the value 1, 2 for the lower right diagonal cell, and continuing clockwise. The value 0 is reserved for source cells.

Multiplier to apply to the cost values.

Allows for control of the mode of travel or the magnitude at a source. The greater the multiplier, the greater the cost to move through each cell.

The values must be greater than zero. The default is 1.

The starting cost from which to begin the cost calculations.

Allows for the specification of the fixed cost associated with a source. Instead of starting at a cost of zero, the cost algorithm will begin with the value set by source_start_cost .

The values must be zero or greater. The default is 0.

The rate to multiply the accumulative cost to determine the resistance adjustment.

This parameter simulates the increase in the effort to overcome costs as the accumulative cost increases. This is used to model fatigue of the traveler. The growing accumulative cost to reach a cell is multiplied by the resistance rate and added to the cost to move into the subsequent cell.

The greater the resistance rate, the more additional cost is added to reach the next cell, which is compounded for each subsequent movement. Since the resistance rate is similar to a compound rate and generally the accumulative cost values are very large, small resistance rates are suggested, such as 0.02, 0.005, or even smaller, depending on the accumulative cost values.

The values must be zero or greater. The default is 0.

Defines the cost capacity for the traveler from a source.

Each source grows to the specified capacity.

The values must be greater than zero. The default capacity is to the edge of the output raster.

### Return Value

The output path distance raster.

The output path distance raster identifies, for each cell, the least accumulative cost distance, over a cost surface to the identified source locations, while accounting for surface distance as well as horizontal and vertical surface factors.

A source can be a cell, a set of cells, or one or more feature locations.

The output raster is of floating-point type.

## Methods

A polygon's boundary is a polyline. A polyline's boundary is a multipoint, corresponding to the endpoints of the line. A point or multipoint's boundary is an empty point or multipoint.

The buffer distance is in the same units as the geometry that is being buffered.

A negative distance can only be specified against a polygon geometry.

The buffered polygon geometry.

An extent object used to define the clip extent.

An output geometry clipped to the specified extent.

A return Boolean value of True indicates this geometry contains the second geometry.

The resulting geometry. The convex hull of a single point is the point itself.

A return Boolean value of True indicates the two geometries intersect in a geometry of a lesser shape type.

## 2 Answers 2

By angle between two points, I am assuming you mean angle between two vectors defined by endpoints (and assuming the start is the origin).

The example you used was designed around only a single pair of points, with the t ranspose used only on this principle. It is however robust enough to work in more than 2 dimensions.

Your function should be vectorised as your distance function is, as it is expecting a number of pairs of points (and we are only considering 2 dimensional points).

## 1.5 The history of R-spatial

There are many benefits of using recent spatial packages such as **sf**, but it also important to be aware of the history of R’s spatial capabilities: many functions, use-cases and teaching material are contained in older packages. These can still be useful today, provided you know where to look.

R’s spatial capabilities originated in early spatial packages in the S language (Bivand and Gebhardt 2000) . The 1990s saw the development of numerous S scripts and a handful of packages for spatial statistics. R packages arose from these and by 2000 there were R packages for various spatial methods “point pattern analysis, geostatistics, exploratory spatial data analysis and spatial econometrics”, according to an article presented at GeoComputation 2000 (Bivand and Neteler 2000) . Some of these, notably **spatial**, **sgeostat** and **splancs** are still available on CRAN (Rowlingson and Diggle 1993, 2017 Venables and Ripley 2002 Majure and Gebhardt 2016) .

A subsequent article in R News (the predecessor of The R Journal) contained an overview of spatial statistical software in R at the time, much of which was based on previous code written for S/S-PLUS (Ripley 2001) . This overview described packages for spatial smoothing and interpolation, including **akima** and **geoR** (Akima and Gebhardt 2016 Ribeiro Jr and Diggle 2016) , and point pattern analysis, including **splancs** (Rowlingson and Diggle 2017) and **spatstat** (Baddeley, Rubak, and Turner 2015) .

The following R News issue (Volume 1/3) put spatial packages in the spotlight again, with a more detailed introduction to **splancs** and a commentary on future prospects regarding spatial statistics (Bivand 2001) . Additionally, the issue introduced two packages for testing spatial autocorrelation that eventually became part of **spdep** (Bivand 2017) . Notably, the commentary mentions the need for standardization of spatial interfaces, efficient mechanisms for exchanging data with GIS, and handling of spatial metadata such as coordinate reference systems (CRS).

**maptools** (written by Nicholas Lewin-Koh Bivand and Lewin-Koh 2017) is another important package from this time. Initially **maptools** just contained a wrapper around shapelib and permitted the reading of ESRI Shapefiles into geometry nested lists. The corresponding and nowadays obsolete S3 class called “Map” stored this list alongside an attribute data frame. The work on the “Map” class representation was nevertheless important since it directly fed into **sp** prior to its publication on CRAN.

In 2003 Roger Bivand published an extended review of spatial packages. It proposed a class system to support the “data objects offered by GDAL”, including ‘fundamental’ point, line, polygon, and raster types. Furthermore, it suggested interfaces to external libraries should form the basis of modular R packages (Bivand 2003) . To a large extent these ideas were realized in the packages **rgdal** and **sp**. These provided a foundation for spatial data analysis with R, as described in *Applied Spatial Data Analysis with R* (ASDAR) (Bivand, Pebesma, and Gómez-Rubio 2013) , first published in 2008. Ten years later, R’s spatial capabilities have evolved substantially but they still build on ideas set-out by Bivand (2003) : interfaces to GDAL and PROJ, for example, still power R’s high-performance geographic data I/O and CRS transformation capabilities (see Chapters 6 and 7, respectively).

**rgdal**, released in 2003, provided GDAL bindings for R which greatly enhanced its ability to import data from previously unavailable geographic data formats. The initial release supported only raster drivers but subsequent enhancements provided support for coordinate reference systems (via the PROJ library), reprojections and import of vector file formats (see Chapter 7 for more on file formats). Many of these additional capabilities were developed by Barry Rowlingson and released in the **rgdal** codebase in 2006 (see Rowlingson et al. 2003 and the R-help email list for context) .

**sp**, released in 2005, overcame R’s inability to distinguish spatial and non-spatial objects (Pebesma and Bivand 2005) . **sp** grew from a workshop in Vienna in 2003 and was hosted at sourceforge before migrating to R-Forge. Prior to 2005, geographic coordinates were generally treated like any other number. **sp** changed this with its classes and generic methods supporting points, lines, polygons and grids, and attribute data.

**sp** stores information such as bounding box, coordinate reference system and attributes in slots in Spatial objects using the S4 class system, enabling data operations to work on geographic data (see Section 2.2.2). Further, **sp** provides generic methods such as summary() and plot() for geographic data. In the following decade, **sp** classes rapidly became popular for geographic data in R and the number of packages that depended on it increased from around 20 in 2008 to over 100 in 2013 (Bivand, Pebesma, and Gómez-Rubio 2013) . As of 2018 almost 500 packages rely on **sp**, making it an important part of the R ecosystem. Prominent R packages using **sp** include: **gstat**, for spatial and spatio-temporal geostatistics **geosphere**, for spherical trigonometry and **adehabitat** used for the analysis of habitat selection by animals (E. Pebesma and Graeler 2018 Calenge 2006 Hijmans 2016) .

While **rgdal** and **sp** solved many spatial issues, R still lacked the ability to do geometric operations (see Chapter 5). Colin Rundel addressed this issue by developing **rgeos**, an R interface to the open-source geometry library (GEOS) during a Google Summer of Code project in 2010 (Bivand and Rundel 2018) . **rgeos** enabled GEOS to manipulate **sp** objects, with functions such as gIntersection() .

Another limitation of **sp** — its limited support for raster data — was overcome by **raster**, first released in 2010 (Hijmans 2017) . Its class system and functions support a range of raster operations as outlined in Section 2.3. A key feature of **raster** is its ability to work with datasets that are too large to fit into RAM (R’s interface to PostGIS supports off-disc operations on vector geographic data). **raster** also supports map algebra (see Section 4.3.2).

In parallel with these developments of class systems and methods came the support for R as an interface to dedicated GIS software. **GRASS** (R. S. Bivand 2000) and follow-on packages **spgrass6** and **rgrass7** (for GRASS GIS 6 and 7, respectively) were prominent examples in this direction (Bivand 2016a, 2016b) . Other examples of bridges between R and GIS include **RSAGA** (Brenning, Bangs, and Becker 2018, first published in 2008) , **RPyGeo** (A. Brenning 2012a, first published in 2008) , and **RQGIS** (Muenchow, Schratz, and Brenning 2017, first published in 2016) (see Chapter 9).

Visualization was not a focus initially, with the bulk of R-spatial development focused on analysis and geographic operations. **sp** provided methods for map making using both the base and lattice plotting system but demand was growing for advanced map making capabilities, especially after the release of **ggplot2** in 2007. **ggmap** extended **ggplot2**’s spatial capabilities (Kahle and Wickham 2013) , by facilitating access to ‘basemap’ tiles from online services such as Google Maps. Though **ggmap** facilitated map-making with **ggplot2**, its utility was limited by the need to fortify spatial objects, which means converting them into long data frames. While this works well for points it is computationally inefficient for lines and polygons, since each coordinate (vertex) is converted into a row, leading to huge data frames to represent complex geometries. Although geographic visualization tended to focus on vector data, raster visualization is supported in **raster** and received a boost with the release of **rasterVis**, which is described in a book on the subject of spatial and temporal data visualization (Lamigueiro 2018) . As of 2018 map making in R is a hot topic with dedicated packages such as **tmap**, **leaflet** and **mapview** all supporting the class system provided by **sf**, the focus of the next chapter (see Chapter 8 for more on visualization).

## 13.1 Global Moran’s I

### 13.1.1 Computing the Moran’s I

Let’s start with a working example: 2010 per capita income for the state of Maine.

Figure 13.1: 2010 median per capita income aggregated at the county level.

It may seem apparent that, when aggregated at the county level, the income distribution follows a north-south trend (i.e. high values appear clustered near the southern end of the state and low values seem clustered near the north and east). But a qualitative description may not be sufficient we might want to quantify the degree to which similar (or dissimilar) counties are clustered. One measure of this type or relationship is the Moran’s I statistic.

The Moran’s I statistic is the correlation coefficient for the relationship between a variable (like income) and its surrounding values. But before we go about computing this correlation, we need to come up with a way to define a neighbor. One approach is to define a neighbor as being any contiguous polygon. For example, the northern most county (Aroostook), has four contiguous neighbors while the southern most county (York) has two contiguous counties. Other neighborhood definitions include distance bands (e.g. counties within 100 km) and k nearest neighbors (e.g. the 2 closest neighbors). Note that distance bands and k nearest neighbors are usually measured using the polygon’s centroids and not their boundaries.

Figure 13.2: Maps show the links between each polygon and their respective neighbor(s) based on the neighborhood definition. A contiguous neighbor is defined as one that shares a boundary or a vertex with the polygon of interest. Orange numbers indicate the number of neighbors for each polygon. Note that the top most county has no neighbors when a neighborhood definition of a 100 km distance band is used (i.e. no centroids are within a 100 km search radius)

Once we define a neighborhood for our analysis we identify the neighbors for each polygon in our dataset then summaries the values for each neighborhood cluster (by computing their mean values, for example). This summarized neighbor value is sometimes referred to as a lagging value (*X _{lag}*). In our working example, we adopt a contiguity neighborhood and compute the average neighboring income value (

*Income*) for each county in our dataset. We then plot

_{lag}*Income*vs.

_{lag}*Income*for each county. The Moran’s I coefficient between

*Income*and

_{lag}*Income*is nothing more than the slope of the least squares regression line that best fits the points

*after*having equalized the spread between both sets of data.

* vs. *Income*. If we equalize the spread between both axes (i.e. convert to a z-score) the slope of the regression line represents the Moran" />

Figure 13.3: Scatter plot of *Income _{lag}* vs.

*Income*. If we equalize the spread between both axes (i.e. convert to a z-score) the slope of the regression line represents the Moran’s I statistic.

If there is no relationship between *Income* and *Income _{lag}*, the slope will be close to flat (resulting in a Moran’s I value near 0). In our working example, the Moran’s I value is 0.377. So this begs the question, how significant is this Moran’s I value (i.e. is the computed slope significantly different from 0)? There are two approaches to estimating the significance: an analytical solution and a Monte Carlo solution. The analytical solution makes some restrictive assumptions about the data and thus cannot always be reliable. Another approach (and the one favored here) is a Monte Carlo test which makes

*no*assumptions about the dataset including the shape and layout of each polygon.

### 13.1.2 Monte Carlo approach to estimating significance

In a Monte Carlo test (a permutation bootstrap test, to be exact), the attribute values are randomly assigned to polygons in the data set and for each permutation of the attribute values, a Moran’s I value is computed. The output is a sampling distribution of Moran’s I values under the (null) hypothesis that attribute values are randomly distributed across the study area. We then compare our observed Moran’s I value to this sampling distribution.

Figure 13.4: Results from 199 permutations. Left plot shows Moran’s I slopes (in gray) from each random permutation of income values superimposed with the observed Moran’s I slope (in red). Right plot shows the distribution of Moran’s I values for all 199 permutations red vertical line shows our observed Moran’s I value of 0.377.

In our working example, 199 simulations indicate that out observed Moran’s I value of 0.377 is not a value we would expect to compute if the income values were randomly distributed across each county. A (pseudo) *P-value* can easily be computed from the simulation results:

where (N_*there is a 1% probability that we would be wrong in rejecting the null hypothesis H _{o}.*”

## QGIS: How to find the shortest distance between two line layers

I am working with two layers in QGIS, which are both line layers. Essentially, I would like to add an attribute in one layer indicating the shortest distance to a feature in the other layer, as well as that other feature's ID. Is there a tool to do this without converting one of the two layers to points first? Or some relatively simple python process?

For example, I have a river segments layer and a road segments layer. I would like to add the distance to the nearest road & the name of that road to the river layer.

## Methoden

A polygon's boundary is a polyline. A polyline's boundary is a multipoint, corresponding to the endpoints of the line. A point or multipoint's boundary is an empty point or multipoint.

The buffer distance is in the same units as the geometry that is being buffered.

A negative distance can only be specified against a polygon geometry.

The buffered polygon geometry.

An extent object used to define the clip extent.

An output geometry clipped to the specified extent.

Der boolesche Rückgabetyp True gibt an, dass diese Geometrie die zweite Geometrie enthält.

The resulting geometry. The convex hull of a single point is the point itself.

Der boolesche Rückgabetyp "True" gibt an, dass sich die beiden Geometrien in einer Geometrie mit einem geringeren Shape-Typ schneiden.