TSP in R Part 1

I’ve been playing around recently with some Travelling Salesperson Problems (TSP), and by extension some Vehicle Routing Problems (VRP). For example, when people come to visit Ottawa, are they being the most optimal with visiting a list of sites, that is, spending the least time or distance in their cars travelling between places? If their trip takes more than one day, does that change the order they see things?

They’d like to know that a) their daily routes are optimized, and b) their stops are optimized to the day. The first question is one of the typical TSP, and the second is a basic VRP. Instead of having, say, 3 vehicles running routes from a single starting point (like a hotel), there’s one vehicle running a route each of the 3 days of their visit.

So, we’ll start from a list of things to see, and go from there. First, we need to process the data and get it into a usable format

loadLocationData <- function(f) {
  locations <- read.csv(f, stringsAsFactors = FALSE)
  locations$Number <- as.numeric(locations$Number)
  return(locations)
}
 
locations<-loadLocationData("./_data/museums.csv")
head(locations)
##                                     Name Number               Street    City
## 1             Prime Minister's Residence     24           Sussex Dr.  Ottawa
## 2 Billings Estate National Historic Site   2100            Cabot St.  Ottawa
## 3                   Britannia Yacht Club   2777          Cassels St.  Ottawa
## 4                          Bytown Museum      1           Canal Lane  Ottawa
## 5                          Byward Market     55 Byward Market Square  Ottawa
## 6   Cameron Highlanders of Ottawa Museum      2  Queen Elizabeth Dr.  Ottawa

The list of museums is given as Name, Street Number, Street, City.

We’ll use the Google Maps Distance Matrix API to calculate the distance between each point. We can then optimize the route visiting each one offline before asking Google for the actual route.

We’ll stitch every part of the location frame together to give a list of places for Google. It does better without a place name. We can then feed this into our API URL creator. Note I add the API key, you’ll have to get your own:

#dm_api_key = [YOUR API KEY HERE]
 
constructDistanceUrl <- function(origins, destinations, dm_api_key) {
  root <- "https://maps.googleapis.com/maps/api/distancematrix/"
  u <- paste(root, "json", "?origins=", origins, "&destinations=", destinations, "&key=",
    dm_api_key, sep = "")
  return(u)
}

But hold on. The free tier of the Google API only allows for a maximum of 25 origins or destinations at once. And a total of 100 elements per request (where elements = origins x destinations). And a total of 2500 elements in a day . We have to a) make a way to break up our requests, and b) prevent going over 2500/day. We can also only call

We’ll request in chunks:

Request Origin Destination
1 1:10 1:10
2 1:10 11:20
k 11:20 1:10
k+1 11:20 11:20
n 40:48 40:48
buildRequestList <- function(nlocations) {
  s_main <- 1:10
  s_last <- (1:(nlocations%%10)) + (10 * (nlocations%/%10))
  s_reps <- nlocations%/%10 - 1
 
  call_list <- list()
  counter <- 1
  for (i in 0:s_reps) {
    for (j in 0:s_reps) {
      r <- list(origin = s_main + i * 10, destin = s_main + j * 10)
      call_list[counter] <- list(r)
      counter <- counter + 1
    }
    r <- list(origin = s_last, destin = s_main + i * 10)
    call_list[counter] <- list(r)
    counter <- counter + 1
    r <- list(origin = s_main + i * 10, destin = s_last)
    call_list[counter] <- list(r)
    counter <- counter + 1
  }
  r <- list(origin = s_last, destin = s_last)
  call_list[counter] <- list(r)
  return(call_list)
}

This gives us a set of requests, by which point we’ll have all of the distance matrices we need. Then we’ll sew them together for one giant matrix.

getAPIurls <- function(locations, api_key_file = "./_data/apikey.txt") {
  # Reformat location information for API Calls
  places <- apply(locations, 1, function(x) paste(x[2], x[3], x[4]))
  places <- trimws(places)
  places <- gsub(" ", "+", places)
 
  # Get Request Indices
  request_indices <- buildRequestList(length(places))
 
  urllist <- character()
 
  for (i in 1:length(request_indices)) {
    o <- request_indices[[i]]$origin
    origins <- paste0(places[o], collapse = "|")
    d <- request_indices[[i]]$destin
    destinations <- paste0(places[d], collapse = "|")
    urllist <- c(urllist, constructDistanceUrl(origins = origins, destinations = destinations, dm_api_key = dm_api_key))
  }
 
  return(urllist)
}
 
urllist<-getAPIurls(locations)
substring(urllist[1],1,400)
## [1] "https://maps.googleapis.com/maps/api/distancematrix/json?origins=24+Sussex+Dr.+Ottawa|2100+Cabot+St.+Ottawa|2777+Cassels+St.+Ottawa|1+Canal+Lane+Ottawa|55+Byward+Market+Square+Ottawa|2+Queen+Elizabeth+Dr.+Ottawa|901+Prince+of+Wales+Dr.+Ottawa|11+Aviation+Parkway+Ottawa|2421+Lancaster+Rd.+Ottawa|7800+Golf+Club+Way+Ashton&destinations=24+Sussex+Dr.+Ottawa|2100+Cabot+St.+Ottawa|2777+Cassels+St.+Ottaw"

We can process the returned json with this function. It looks gnarly but will suffice until I rewrite with applys:

processDistanceMatrixJson <- function(jdata, value = "distance", ...) {
  stopifnot(value %in% c("distance", "duration"))
  norigin <- length(jdata$origin_addresses)
  ndestin <- length(jdata$destination_addresses)
  dm <- matrix(NA, nrow = norigin, ncol = ndestin)
  for (i in 1:norigin) {
    for (j in 1:ndestin) {
      q <- jdata$rows[[i]]$elements[[j]]
      if (q$status == "OK") {
        dm[i, j] <- as.numeric(q[[value]]$value)
      }
    }
  }
  rownames(dm) <- jdata$origin_addresses
  colnames(dm) <- jdata$destination_addresses
  return(dm)
}

We can do this for all calculated request groups:

getGDMResults <- function(urllist) {
  dmat_list <- tmat_list <- rawurl_list <- json_list <- error_list <- list(rep(NA, length(urllist)))
  for (i in 1:length(urllist)) {
    rawurl_list[[i]] <- RCurl::getURL(urllist[i])
    json_list[[i]] <- rjson::fromJSON(rawurl_list[[i]])
    if (json_list[[i]]$status == "OK") {
      dmat_list[[i]] <- processDistanceMatrixJson(json_list[[i]], value = "distance")
      tmat_list[[i]] <- processDistanceMatrixJson(json_list[[i]], value = "duration")
    }
    Sys.sleep(2)
  }
  return(list(dmat = dmat_list, tmat = tmat_list, rawurl = rawurl_list, json = json_list,
    errors = error_list))
}
 
matlist<-getGDMResults(urllist = urllist)

Time is given in seconds, and distance in meters. I had to hide the names of each place because they made the table too big. Note that these are not symmetrical, because sometimes there are one way streets or other limitations on routes.

unname(matlist$dmat[[1]])
##        [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
##  [1,]     0  8400 16953  4879  2569  3406  9393  5708  9452 46980
##  [2,]  8370     0 19858 12693  7789  5386  6006 10883  5007 48822
##  [3,] 20202 20489     0 14284 17985 16302 12238 24150 21269 34148
##  [4,]  2480  8473 14727     0  1554  2132  8576  8189  9253 46163
##  [5,]  2224  7719 16445  5841     0  1661  7822  7933  8499 45409
##  [6,]  3509  5291 14793  5199  2095     0  7248 11713  8832 43757
##  [7,]  9218  7065 11045  7575 10121  7281     0 16286 13404 40009
##  [8,]  5613 11051 22613 10493  6238  9019 13990     0 11255 51577
##  [9,]  9066  5037 20281 13843  8211  9120 10594  7959     0 49245
## [10,] 47179 47466 34085 44475 44962 43279 39215 51127 48246     0

Once we have everything available, in lists of matrices, we can stitch them into a majorly big matrix. Again, gnarly until rewritten with applys:

assembleMatrix <- function(mlist) {
  locations <- unique(as.character(unlist(sapply(mlist, rownames))))
  nloc <- length(locations)
  mat_total <- matrix(NA, nrow = nloc, ncol = nloc)
  rownames(mat_total) <- colnames(mat_total) <- locations
  for (i in 1:length(mlist)) {
    d <- mlist[[i]]
    if (!is.na(d) && !is.null(d)) {
        mat_total[match(rownames(d), rownames(mat_total),  nomatch=0), match(colnames(d), colnames(mat_total), nomatch = 0)]<-d
    }
  }
  return(mat_total)
}
 
distance_matrix <- assembleMatrix(matlist$dmat)
time_matrix <- assembleMatrix(matlist$tmat)
 
#Save the distance and times matrices to file for later use
saveRDS(distance_matrix, "./_data/distance_matrix.RDS")
saveRDS(time_matrix, "./_data/time_matrix.RDS")

Now that we have the distance matrices, we can start the Travelling Salesperson Problem work with the R package TSP. More in the next post.

Written on March 28, 2017