Introduction
In this project, we’ll walk through an end-to-end approach for detecting and classifying different types of dwellings from aerial drone images. The goal is to develop a model that identifies dwelling units and classifies them based on their roof types— tin, thatched or other roofs. This project is particularly useful for analyzing settlement patterns and housing quality, especially in rural areas where traditional housing structures vary. Our dataset includes labeled bounding boxes for dwellings in aerial images, as well as roof type labels.
In this article, I’ll outline the steps I took to build a solution for this task, including data preprocessing, model selection, training, and deployment. I’ll also share some of the challenges faced and the solutions that were used.

Step 1: Understanding the Data
Our dataset consists of drone-captured images of housing areas in southern Africa, specifically in Malawi, where traditional housing structures are prevalent. We also have a train.csv
file that contains labeled bounding boxes for each image, specifying the coordinates and dimensions of each dwelling along with a roof type label.
Key Details:
- Data Fields: Each row in the dataset contains:
image_id
: ID of the image.bbox
: Bounding box coordinates for each dwelling in the format[x_min, y_min, width, height]
.category_id
: Roof type label, where 1 is « Other », 2 is « Tin », and 3 is « Thatch ».id
: Unique identifier for each bounding box.
Having a clear understanding of this data format helps with efficiently parsing the data for model training.
Step 2: Setting Up the Environment
The environment setup is crucial for working with computer vision tasks, as they involve handling and processing large image files. Here’s a quick setup guide:
1. Install Necessary Libraries: We’ll need TensorFlow for model training, OpenCV for image manipulation, and pandas for data processing. Install these with:
pip install tensorflow opencv-python pandas scikit-learn
2. Organize Project Directory: Structuring files correctly makes accessing and managing data easier. Create folders for storing images and annotations:
project_directory/
├── images/
├── annotations/
└── models/
Step 3: Data Preprocessing
In this step, we’ll parse the CSV file, preprocess images, and apply data augmentation.
1. Load and Parse the Dataset: Load train.csv
with pandas and convert bounding box data from string format to numerical format.
import ast data = pd.read_csv('annotations/train.csv') data['bbox'] = data['bbox'].apply(ast.literal_eval) # Convert string bbox to list
2. Data Augmentation: To improve model robustness, apply data augmentation techniques such as rotation, flipping, and scaling to our images. TensorFlow’s tf.image
library offers various image transformation functions:
import tensorflow as tf def augment_image(image, bbox): image = tf.image.random_flip_left_right(image) image = tf.image.random_flip_up_down(image) image = tf.image.resize_with_crop_or_pad(image, 640, 640) return image, bbox
3. Data Splitting: Divide the data into training, validation, and test sets for reliable model evaluation. A common split is 70% training, 20% validation, and 10% test.
from sklearn.model_selection import train_test_split train_data, test_data = train_test_split(data, test_size=0.1, random_state=42) train_data, val_data = train_test_split(train_data, test_size=0.2, random_state=42)
Step 4: Data Loading with TensorFlow
For object detection tasks, using a custom data generator is often more flexible than the image_dataset_from_directory
function in TensorFlow, as each image might have multiple bounding boxes and labels.
Create a Data Generator: Our generator will load images and associated bounding boxes.
class DataGenerator(tf.keras.utils.Sequence): def __init__(self, dataframe, img_dir, batch_size=32, img_size=(640, 640), shuffle=True): self.dataframe = dataframe self.img_dir = img_dir self.batch_size = batch_size self.img_size = img_size self.shuffle = shuffle self.on_epoch_end() def __len__(self): return int(len(self.dataframe) / self.batch_size) def __getitem__(self, index): batch_df = self.dataframe.iloc[index * self.batch_size:(index + 1) * self.batch_size] images, boxes, labels = self.__data_generation(batch_df) return images, {'bbox': boxes, 'label': labels} def on_epoch_end(self): if self.shuffle: self.dataframe = self.dataframe.sample(frac=1).reset_index(drop=True) def __data_generation(self, batch_df): images = [] boxes = [] labels = [] for _, row in batch_df.iterrows(): img_path = f";{self.img_dir}/{row['image_id']}.jpg"; image = tf.image.decode_jpeg(tf.io.read_file(img_path)) image = tf.image.resize(image, self.img_size) bbox = row['bbox'] label = row['category_id'] images.append(image) boxes.append(bbox) labels.append(label) return tf.convert_to_tensor(images), tf.convert_to_tensor(boxes), tf.convert_to_tensor(labels)
Step 5: Model Selection and Configuration
Object detection requires a model capable of both localization and classification. For this, we used a pre-trained Faster R-CNN model as well as EfficientDet, both available in TensorFlow’s Object Detection API.
Load and Configure the Model:
from object_detection.utils import config_util from object_detection.protos import pipeline_pb2 from google.protobuf import text_format # Load pipeline config and build a detection model pipeline_config = 'path/to/pipeline.config' configs = config_util.get_configs_from_pipeline_file(pipeline_config) model_config = configs['model'] model_fn = model_builder.build(model_config=model_config, is_training=True)
Step 6: Model Training
Compile and Train: We’ll compile the model and start training using the custom data generator.
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), loss={'bbox': 'mean_squared_error', 'label': 'sparse_categorical_crossentropy'}) train_generator = DataGenerator(train_data, 'images/train') val_generator = DataGenerator(val_data, 'images/val') model.fit(train_generator, validation_data=val_generator, epochs=50)
Step 7: Model Evaluation and Visualization
Evaluate the Model: Check performance on the validation set using metrics like Mean Average Precision (mAP).
val_loss = model.evaluate(val_generator) print(f"Validation Loss: {val_loss}")
Visualize Predictions: Visualizing bounding boxes can give a qualitative sense of model accuracy.
import matplotlib.pyplot as plt def visualize_predictions(image, true_boxes, pred_boxes): plt.figure(figsize=(10, 10)) plt.imshow(image) for box in true_boxes: plt.gca().add_patch(plt.Rectangle((box[0], box[1]), box[2], box[3], linewidth=2, edgecolor='r', facecolor='none')) for box in pred_boxes: plt.gca().add_patch(plt.Rectangle((box[0], box[1]), box[2], box[3], linewidth=2, edgecolor='g', facecolor='none')) plt.show()
Step 8: Model Deployment
Save and Export the Model:
model.save('dwelling_detection_model.h5')
Create an Inference Script:
model = tf.keras.models.load_model('dwelling_detection_model.h5') def predict(image_path): image = tf.image.decode_jpeg(tf.io.read_file(image_path)) image = tf.image.resize(image, (640, 640)) pred = model.predict(tf.expand_dims(image, axis=0)) return pred
Deployment Options: The model can be deployed as a web service for real-time predictions.
Challenges and Solutions
- Memory Management: Large image datasets can consume substantial memory, especially during training. Using data generators and batch processing help minimize memory usage.
- Imbalanced Classes: With fewer examples for certain classes, training can be biased. Oversampling underrepresented classes can address this.
- Localization Accuracy: The model might struggle with precise bounding box localization, especially for smaller dwellings. Higher-resolution images or more bounding box refinements can improve results.
Conclusion
In this project, we developed a model to detect and classify dwelling units from aerial drone images by using bounding box data and roof type labels. The structured approach we followed—from data preprocessing and augmentation to model deployment—provides a reliable framework for future projects in computer vision. This project showcases the potential of machine learning to support important social and environmental insights, particularly for rural and remote area analysis.