Use appropriate data structure to store data

In this article, I will demonstrate why usage of appropriate data structure important and if not selected properly then it can cause slowness in our code.

1. What is data structure?

Using appropriate data structure to store data in paramount. Every engineer would agree to this.

Let us first define what data structure mean : Wikipedia states “a data structure is a data organization, management, and storage format that enables efficient access and modification.”

2. Use-Case

Now let us understand using an example why selection of correct data structure is important. Let us say that we have a request validation that accepts country code as parameter. If the country code exists in our list of countries we serve our services, then we will let the request proceed else we throw ValidationException. Assume that for every request we will get this parameter and it may or may not be valid hence the validation needs to happen.

We will assume one more thing, we already have an array of country codes.

The current code looks like this

private static final String[] countries = Country.getCountries();

void validateCountry(String countryCode) {
	if (!Arrays.asList(countries).contains(countryCode)) {
		throw new ValidationException("Country code is not valid.");
	}
}

Country.getCountries() : returns String[] of country codes. 

We convert the String[] into List<String> and just search if countryCode is present in List. If not then throw Exception else don’t do anything.

3. What is the problem?

So what is the problem with the code? Remember, country codes are unique. List is not the right data structure to search unique values as List will always so linear scan of elements. You might think, how bad can that be? Well it turns out, pretty bad. Why? Assume your service is taking 1 million requests per day. Worst case time complexity to search in List is O(N). Let us assume N = 100 i.e. it will take 100 iterations for List to search for the country code passed in request. How many iterations per day? 1 million requests per day * 100 iterations/request = 100 million iterations. Now you get the idea.

4. Solution to the problem.

Let us exploit the property of unique country codes. What we can do is to convert the array to Set and query Set for country code. We will use HashSet as collection. Using HashSet gives us an advantage to search a values as HashSet can search for element in O(1) that is constant time. Now the iterating is not needed at all while searching the value.

Question remains how to convert array to Set? Below are 3 different ways to do so.

Convert array to Set using plain JDK classes

private static final Set<String> COUNTRIES = 
		Arrays
			.stream(Country.getCountries()) 
			.collect(
				Collectors.collectingAndThen(
								Collectors.toSet(), 
								Collections::unmodifiableSet));

It might be better to use Google Guava Library to convert array to Set as it provides utility to do so.

private static final Set<String> COUNTRIES = 
				Collections.unmodifiableSet(
						Sets.newHashSet(Country.getCountries()));

My preferred method is as follows, it uses Google Guava’s class ImmutableSet : 

private static final Set<String> 
COUNTRIES =  ImmutableSet.copyOf(Country.getCountries());

validateCountry method will use Set to check if country code exists or not.

void validateCountry(String country) {
	if (!COUNTRIES.contains(country)) {
		throw new ValidationException("Country code is not valid.");
	}
}

Existing test case would work just fine. 

5. Conclusion

The impact of using wrong data structure can catastrophic. Always step back and think about what the data is, what it implies and how will that data be used. Once you answer these questions you will be able to select the correct data structure to use.

Leave a Reply

Your email address will not be published. Required fields are marked *