Relationship Ramblings Chapter 1

The funny thing about relationships is every day is new. The way you would look at your relationship changes from day to day; depending on if you learn from the previous days experiences. Most people…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Applying Data Science to analyze neighborhoods in Chennai and determine a place for a new Shopping Mall

Read Time: 10 mins (approx.)

A little introduction about the genesis of the idea first.

A famous landmark in Chennai — Chennai Central

Malls are not only a shopping place but a place to rejuvenate, socialize and entertain. In big retail stores you get everything under one roof from branded clothes, grocery and electronics to foot wear. Without a doubt malls have changed the shopping experience of Indians. Doing shopping in the scorching heat of the sun has been replaced by AC shopping. Youth take this as a status symbol. Visiting malls and buying branded products satisfy their thirst for better quality of life. Teenagers do come to show off. Certainly shopping malls are bringing in a new culture in India which is different from the traditional culture as far as shopping is concerned. Chennai offers an immense market opportunity because of increased income and changed lifestyle of middle class families. Property developers are also taking advantage of this trend to build more shopping malls to cater to the demand.

As a result, there are currently many shopping malls in the city of Chennai and many more are being built. Opening shopping malls allows property developers to earn consistent rental income. Of course, as with any business decision, opening a new shopping mall requires serious consideration and is a lot more complicated than it seems. Particularly, the location of the shopping mall is one of the most important decisions that will determine whether the mall will be a success or a failure.

The objective of this capstone project is to analyze and select the best locations in the city of Chennai to open a new shopping mall. Using data science methodology and machine learning techniques like clustering, this project aims to provide solutions to answer the business question: In the city of Chennai, if a property developer is looking to open a new shopping mall, where would you recommend that they open it?

This project is particularly useful to property developers and investors looking to open or invest in new shopping malls in the capital city of Tamil Nadu i.e. Chennai. This project is timely as the city is currently suffering from oversupply of shopping malls. Data from last year showed that an additional 15 per cent will be added to existing mall space, and the agency predicted that total occupancy may dip below 86 per cent.

· List of neighborhoods in Chennai. This defines the scope of this project which is confined to the city of Chennai, the capital city of the country of Tamil Nadu in India.

· Latitude and longitude coordinates of those neighborhoods. This is required in order to plot the map and also to get the venue data.

· Venue data, particularly data related to shopping malls. We will use this data to perform clustering on the neighborhoods.

After that, we will use Foursquare API to get the venue data for those neighborhoods. Foursquare has one of the largest database of 105+ million places and is used by over 125,000 developers.

Foursquare API will provide many categories of the venue data, we are particularly interested in the Shopping Mall category in order to help us to solve the business problem put forward. This is a project that will make use of many data science skills, from web scraping (Wikipedia), working with API (Foursquare), data cleaning, data wrangling, to machine learning (K-means clustering) and map visualization (Folium, a library in python). In the next section, we will present the Methodology section where we will discuss the steps taken in this project, the data analysis that we did and the machine learning technique that was used.

Next, we will use Foursquare API to get the top 100 venues that are within a radius of 2000 meters. We need to register a Foursquare Developer Account in order to obtain the Foursquare ID and Foursquare secret key. We then make API calls to Foursquare passing in the geographical coordinates of the neighborhoods in a Python loop. Foursquare will return the venue data in JSON format and we will extract the venue name, venue category, venue latitude and longitude. With the data, we can check how many venues were returned for each neighborhood and examine how many unique categories can be curated from all the returned venues. Then, we will analyze each neighborhood by grouping the rows and taking the mean of the frequency of occurrence of each venue category. By doing so, we are also preparing the data for use in clustering. Since we are analyzing the “Shopping Mall” data, we could filter out “Shopping Mall” as a venue category for the neighborhoods.

At last, we perform clustering on the data by using k-means clustering. K-means clustering algorithm identifies k number of centroids, and then allocates every data point to the nearest cluster, while keeping the centroids as small as possible. It is one of the simplest and popular unsupervised machine learning algorithms and is particularly suited to solve the problem for this project. We will cluster the neighborhoods into 3 clusters based on their frequency of occurrence for “Shopping Mall”. The results will allow us to identify which neighborhoods have higher concentration of shopping malls and which have fewer numbers. Based on the occurrence of shopping malls in different neighborhoods, it would help us to answer the question as to which neighborhoods are most suitable to open new shopping malls.

The results from the k-means clustering show that we can categorize the neighborhoods into 3 clusters based on the frequency of occurrence for “Shopping Mall”:

· Cluster 0: Neighborhoods with low number to no existence of shopping malls

· Cluster 1: Neighborhoods with high concentration of shopping malls

· Cluster 2: Neighborhoods with moderate number of shopping malls

The results of the clustering are visualized in the map below with cluster 0 in red color, cluster 1 in purple color, and cluster 2 in mint green color.

Kindly note: The data about the count of the shopping malls is quite old. Hence, certain recently constructed malls may not be counted. But, the algorithm is working perfectly fine!

As observations noted from the map in the Results section, most of the shopping malls are concentrated in the central area of Chennai city, with the highest number in cluster 1 and moderate number in cluster 2. On the other hand, cluster 0 has very low number to totally no shopping mall in the neighborhoods. This represents a great opportunity and high potential areas to open new shopping malls as there is very little to no competition from existing malls.

Meanwhile, shopping malls in cluster 1 are likely suffering from intense competition due to oversupply and high concentration of shopping malls. From another perspective, this also shows that the oversupply of shopping malls mostly happened in the central area of the city, with the suburb area still have very few shopping malls.

Therefore, this project recommends property developers to capitalize on these findings to open new shopping malls in neighborhoods in cluster 0 with little to no competition. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighborhoods in cluster 2 with moderate competition.

Lastly, property developers are advised to avoid neighborhoods in cluster 1 which already have high concentration of shopping malls and suffering from intense competition.

In this project, only one factor was considered i.e. frequency of occurrence of shopping malls, there are other factors such as population and income of residents that could influence the location decision of a new shopping mall. However, to the best knowledge of this researcher such data are not available to the neighborhood level required by this project. Future research could devise a methodology to estimate such data to be used in the clustering algorithm to determine the preferred locations to open a new shopping mall. In addition, this project made use of the free Sandbox Tier Account of Foursquare API that came with limitations as to the number of API calls and results returned. Future research could make use of paid account to bypass these limitations and obtain more results.

In this project, we have gone through the process of identifying the business problem, specifying the data required, extracting and preparing the data, performing machine learning by clustering the data into 3 clusters based on their similarities, and lastly providing recommendations to the relevant stakeholders i.e. property developers and investors regarding the best locations to open a new shopping mall. To answer the business question that was raised in the introduction section, the answer proposed by this project is: The neighborhoods in cluster 1 are the most preferred locations to open a new shopping mall. The findings of this project will help the relevant stakeholders to capitalize on the opportunities on high potential locations while avoiding overcrowded areas in their decisions to open a new shopping mall.

Add a comment

Related posts:

28 things I learned at 28.

Nickel is often alloyed with other metals to improve their strength and resistance to corrosion. Much like nickel, I hope this year adds strength and resistance to me.