Identifying Neighborhoods in Toronto that are like Barrhaven, Ottawa using Machine Learning

1.    Introduction (Business Problem)

 

Background:

 

Toronto is the commercial center/hub of Canada and the capital of the province of Ontario. It is a metropolitan city with booming economy that offers exciting and promising opportunities for both intra-Canadian migrants and immigrants into Canada from other parts of the world.

Moving to Toronto is not an easy task, Toronto is a major commercial hub with lots of activities which make it difficult to find residential neighborhoods that are calm and without the hassles of living in a big city like Toronto. One of the major challenges faced by intending migrants to Toronto especially people moving from a calm city/neighborhood is finding good residential locations in Toronto that are similar to where they currently live.

In this analysis, we will focus on migration from Ottawa to Toronto. The specific case study used is “migration from Barrhaven Ottawa to Toronto”. I attempt to apply machine learning to determine Toronto neighborhoods with similarities to Barrhaven in Ottawa – the model will also work perfectly well for any other cities of consideration in any part of the world.

This analysis aims to determine clusters of neighborhoods in Ottawa and compare them with clusters of neighborhoods in Toronto. We will determine neighborhoods of Ottawa that are in similar clusters as Toronto neighborhoods. 

 

2.    Data

There are two major data requirements for this analysis viz:

a.     List of neighborhoods in Toronto and Ottawa with corresponding longitude and latitude coordinates of each neighborhood. This data was not readily available, alternative source of getting the data was identified and utilized. List of neighborhoods in Ottawa and Toronto were scraped from the internet and corresponding coordinates (longitude & latitude) were obtained using the geopy geodecoder python library. This may not give a 100% accuracy but very good level of accuracy to achieve the desired objective of this project. Below is a sample of the neighborhood data after acquisition and wrangling:

A total of 239 neighborhoods were eventually extracted for Ottawa and 191 neighborhoods were extracted for Toronto

b. List of top venues and amenities within 1000meters (1km) of each neighborhood in the two cities segmented by categories. This is necessary to understand the nature of each neighborhoods in the two cities. This data was obtained using the Foursquare location data API. Across the two cities, a total of 14,051 venues spread across 383 unique categories were returned from Foursquare API. These categories became the attributes for modeling and this was achieved using one hot encoding.


3.    Methodology & Analysis

 

To identify neighborhoods in Toronto that are similar to Barrhaven in Ottawa, the key considerations from our data attributes are:

Peculiarity of the environments in each neighborhood based on venues and amenities generated from Foursquare location data API

Distance of candidate neighborhood from Downtown Toronto. We want to make sure that proximity from Downtown Toronto is a factor as well because most people migrating to Toronto would likely work in Downtown or would want to be as close as possible to Downtown while still living in a relaxed/calm neighborhood.

Based on this, the distance of each neighborhood from Downtown Toronto was calculated in kilometer using the formula below:

 

This was applied to all neighborhoods in Toronto with an additional attribute created to show in one glance – the neighborhood name & it’s distance to Downtown Toronto.

 

 

3.1   Modeling (clustering)

 

K-Means algorithm was used to model the data to segment each neighborhood of Toronto and Ottawa into clusters. To achieve this, the neighborhood data of both Toronto and Ottawa were combined into a single data set and fed into the model as one unit. This resulted in the model classifying the combined data of neighborhoods into clusters

In order to select the optimal value of K for the actual segmentation, the elbow method was used by running the algorithm multiple times and plotting the value of k against cost for all the outputs generated. The algorithm was run 50 times and it was observed that cost continuously decreased with increase in k. However due to computation time & cost, 40 was selected as a value for k and the model was built to cluster the neighborhoods into 40 segments.

 

4.    Results & Discussion

 

It was observed that Barrhaven Ottawa belongs to cluster 25, there were 49 other neighborhoods in Ottawa that belong to this cluster as well.

On checking the neighborhoods in Toronto that belong to cluster 25, it was observed that 5 neighborhoods in Toronto belong to this cluster. 

 

  1. These neighborhoods have a high concentration of sports & gym facilities (including swimming pools)
  2. There is also a variety of restaurants to pick from ranging from European, Asian and Middle East restaurants
  3. They also seem to have good transportation network considering the presence of train stations

5.    Conclusion

 

Through this model/analysis, we have been able to identify 5 neighborhoods within Toronto that are like Barrhaven in Ottawa, they are:

  • Port Lands, Toronto, (Distance to Downtown: 3km)
  • Downsview, North York, Toronto, (Distance to Downtown: 12km)
  • Milliken, Scarborough, Toronto, (Distance to Downtown: 20km)
  • Brown's Corners (historical), Scarborough, Toronto, (Distance to Downtown: 21km)
  • Highland Creek, Scarborough, Toronto, (Distance to Downtown: 22km)

 

Future works are encouraged to improve on the variables used for this analysis like: cost of accommodation in the neighborhoods, crime rate and security of the neighborhood, population per sq. meter etc. These can help to be more decisive in selection of a preferred location among the 5 candidate neighborhoods within the list.

References

  • List of neighborhoods in Ottawa and Toronto - Wikipedia
  • location data for venues and venues category - Foursquare location data API
  • Google Map