How dependent is asking price on total square feet, bedrooms, and bathrooms of a house?

Justin Anderson
3 min readMay 4, 2021

Recently, I have been very interested in real estate. So, I decided to examine real estate within Philadelphia County, which is where I currently live.

I began by scraping about 400 houses listed on Zillow, grabbing their asking price, address, square footage, bedrooms, and bathrooms. Once this data was scraped, I cleaned up a little bit so I was able to run some data analyzation on it, i.e. graphs and regressions. After that, I was able to make conclusions on what factors were single-handedly most influential on asking price, and what combination of variables were most influential on asking price.

Philadelphia County is outlined in light blue above. It may be a small county in PA, but it has a high population, so it is worth examining.

Beginning with price against square footage, we can see that there is clearly a positive correlation between the two variables:

There is clearly a positive correlation between the size of a house and its listing price on Zillow.

Although the correlation here is low, approximately 0.14, there still is some sort of positive correlation between the square footage of a house and the price of a house.

Next, I will examine how well bedrooms alone can predict the price of a house.

Moving onto the next solo predictor, bedrooms, we can see that there is also a positive correlation between house price and the number of bedrooms (also excluding one outlier), coming in at around 0.314. With a p-value that is very close to 0 (8.43e-10), the number of bedrooms provides evidence for rejecting the null hypothesis.

Finally, the last solo predictor, number of bathrooms. Looking at this regression, there is not much to see, since the number of bathrooms within a house do not exceed 8, which there is only one house with this many. My opinion on first judgment is that the amount of bathrooms within a home would not have a significant impact on the price of a home, but according to this data, I am wrong. The correlation between bathrooms and price comes out to around 0.56, which is the highest among the three variables (bedrooms, bathrooms, square feet). It also has an r-squared value of 0.317, with a p-value of 2e-16. This data provides against rejecting the null hypothesis, which is what I did not think at first thought.

The picture above represents show a linear model, where bedrooms, bathrooms, and square footage of a home are all used to predict the price of a house. The summary of the model is as follows:

The number of bathrooms and square footage are important in predicting house price, or they show evidence for rejecting the null hypothesis (that no variable predicts house price). The number of bedrooms in a house, though, show evidence that we should not reject the null hypothesis. Overall, 33% of the variance in price can be described using all three variables. In this model, there was only one outlier removed. If there were more outliers removed from the data set, maybe the model would have become more accurate in predicting house price.

I am aware that the focus of this project is in an urban area, but I think that could be optimal for the situation. Since Philadelphia county has such a high population in comparison to many other counties throughout PA, there will be a high supply of houses on Zillow for me to scrape and examine.

--

--