NCC BULGARIA

NCC-Bulgaria is founded by the Institute of Information and Communication Technologies at the Bulgarian Academy of Sciences, Sofia University “St. Kliment Ohridski” and the University of National and World Economy.

NCC-Bulgaria is focused on:

  • Creating a roadmap for successful work in the field of high performance computing, big data analysis and artificial intelligence.
  • Analyzing the existing competencies and facilitating the use of HPC/HPDA/AI in Bulgaria
  • Raising awareness and promoting HPC/HPDA/AI use in companies and the public sector.

Industrial organizations involved:

  • Rental LTD, Bulgaria, 48 Bogomil Str , Plovdiv, SME.

Technical Challenge:

The real estate market in Plovdiv, Bulgaria, demanded a more accurate and efficient method for estimating property prices to replace traditional manual market comparison approaches. The goal was to develop a predictive model for housing prices using data extracted from the local real estate website, Rental. However, the dataset presented challenges due to its limited sample size, which restricted the scope of training the models. To overcome this, High-Performance Computing (HPC) was employed to optimize model performance, especially for hyperparameter tuning, ensuring maximum accuracy despite the small dataset.

Solution:

The project involved collecting structured data from the website, which included detailed property features such as area (m²), floor number, structure type (apartment, house, etc.), location, number of rooms, and view (e.g., south, east). The sample size, though not extensive, provided enough variety to train a machine learning model.

Given the limited size of the dataset, standard approaches often risked overfitting or suboptimal generalization. High-Performance Computing (HPC) resources were integrated into the workflow to address these concerns by performing extensive hyperparameter optimization for multiple machine learning algorithms. The process included:

  • Hyperparameter Grid Search: HPC was used to conduct an exhaustive search over a wide range of hyperparameters for models such as gradient boosting, random forest, and AdaBoost. The computational power allowed simultaneous evaluation of hundreds of parameter combinations to find the most effective settings.
  • Cross-Validation: With HPC, multiple cross-validation runs were performed across different subsets of the dataset, ensuring the models were robust and avoided overfitting.
  • Optimization Algorithms: Advanced optimization techniques, such as Bayesian optimization and genetic algorithms, were tested on HPC resources to further enhance the performance of the predictive models.

The data was analyzed to uncover key patterns and trends. Visualizations, such as distribution plots heatmaps and histograms, helped understand how features like area, location, floor, and view correlated with property prices. For instance, the heatmap revealed clusters of high-priced properties in specific areas, aiding feature engineering.

The final model inputs included:

  • Area: Property size in square meters.
  • Location: Proximity to high-demand zones like city centers or parks.
  • Structure Type: Type of property construction (e.g., apartment, house).
  • Floor: The floor of the property, affecting desirability.
  • Number of Rooms: A key determinant of property value.
  • View: Orientation, with certain views (e.g., south-facing) attracting premium prices.

The resulting models achieved strong predictive power even with the smaller dataset size, as computational resources allowed the maximum extraction of insights from the available data.

Business impact:

  • The predictive model achieved high accuracy in estimating property prices, despite the small dataset. Performance metrics demonstrated that gradient boosting performed best, achieving an R² value of 0.708, with linear regression closely behind at 0.707.
  • HPC significantly contributed by enabling extensive hyperparameter tuning and cross-validation runs that would not have been computationally feasible otherwise.
  • The model provided actionable insights for real estate agents and property owners, enabling them to set competitive and fair prices for new listings while considering the market trends and specific property features.
  • By leveraging data mining, machine learning, and HPC, this project successfully created a predictive model for housing prices even with a small dataset.
  • The use of HPC to optimize hyperparameters ensured maximum performance and robustness, offering a powerful tool for real estate agents, property owners, and investors. This success underscores the transformative potential of HPC and AI in solving complex real-world problems.
  • The model highlights the role of cutting-edge technologies in advancing the real estate market and demonstrates the value of HPC in enhancing machine learning workflows for small and medium-sized datasets.

Benefits: 

The integration of HPC into the project allowed advanced machine learning models to perform optimally, even with a limited dataset. This demonstrated how HPC resources can address challenges related to small sample sizes by ensuring models are fine-tuned and robust. The project also highlighted the potential of combining AI, data mining, and HPC to address real-world challenges in dynamic markets like real estate.

  • Optimized Model Accuracy: HPC-enhanced hyperparameter tuning ensured that the models performed at their best despite data limitations.
  • Scalable Insights: The process laid the groundwork for scaling to larger datasets as more data becomes available.
  • Faster and Precise Pricing: Real estate agents and buyers benefited from more accurate and faster pricing predictions.
  • Improved Market Understanding: The insights generated by the model provided a clearer understanding of how features like location and view affect property value.

Success story # Highlights:

  • Keywords: HPC, Housing price prediction, Data mining, Classification, Machine learning, Analytics
  • Industry sector: Service, Housing market
  • Technology: HPC, Machine Learning and AI, Hyperparameter Tuning, Predictive Modeling
  •  

Figure 1: Distribution:

Figure 2Property area and location heatmap.

Figure 3: Model Results.

Contact:

  • Prof. Emanouil Atanassov,  emanouil at parallel.bas.bg, Institute of Information and Communication Technologies  at the Bulgarian Academy of Sciences (IICT-BAS), Bulgaria