Writing a Split Function in C++

If you've ever worked with a language like Python, you might be familiar with the split function. It takes in a string, and returns a list (Python array) of string elements that are delimited by some input string. For example:

test_string = "This string, has random, commas, in it"

elements = test_string.split(",")
print(elements)

The output of this is:

['This string', ' has random', ' commas', ' in it']

split is a very useful function, and when I use Python it is an oft used tool in my repertoire.

C++ unfortunately does not have a built-in split function, but fortunately, we can write one ourselves. There are many ways you could do this, but I'll outline one here and go through it line by line. You can also find it on GitHub.

#pragma once

#include <iostream>
#include <string>
#include <vector>

// Return array of elements of string in split by char c
std::vector<std::string> Split(const std::string& in, const char c)
{
	std::vector<std::string> elements;
	elements.reserve(in.length());

	int currentPosition = 0;
	int previousPosition = 0;
	while (currentPosition != in.length())
	{
		if (in[currentPosition] == c)
		{
			elements.push_back(in.substr(previousPosition, currentPosition - previousPosition));
			previousPosition = currentPosition + 1;
		}
		++currentPosition;
	}
	elements.push_back(in.substr(previousPosition, currentPosition - previousPosition));

	return elements;
}

The function Split takes in two arguments:

  • A std::string in which is the string that is to be split by some delimiter
  • A char c which is the delimiter that you split by

Note that in this implementation, a pattern of more than one char is not supported. The Python version of the function does support splitting by a string of more than one character, and this can be added to this function if needed.

The return value is a std::vector<string>, i.e. an array of strings.

Starting with the first two lines:

std::vector<std::string> elements;
elements.reserve(in.length());

We allocate a vector of strings, which will later be returned by the function when it's been populated. As a small optimisation we reserve the length of the in string, as we are guaranteed to never have more elements than there are characters in that string.

Next we define two variables that will increment in the while loop:

int currentPosition = 0;
int previousPosition = 0;
  • currentPosition will increment each loop, and tracks where we are on each iteration as we go through the in string
  • previousPosition will only change every time we hit our char c, and it enables us to get each new string between instances of the delimiter c

We now start a while loop, which will go from 0 and stop when we've gone through the entire string in:

while (currentPosition != in.length())

As we are going through in character by character, we can check whether the current character is c, which we do as the first thing in the while loop. Strings are arrays of characters, so we can access each character by its index as we would in any array:

if (in[currentPosition] == c)

If it is, we get the substring of in between previousPosition and currentPosition and add it to our array:

elements.push_back(in.substr(previousPosition, currentPosition - previousPosition));

The function substr takes in a starting position and the amount of characters forward we want to go from that position, hence why we're passing in currentPosition - previousPosition as our second argument.

We then set previousPosition to the currentPosition, which is currently an instance of our char c , plus 1, so that next time we grab the substring, we start on the character after c.:

previousPosition = currentPosition + 1;

As the final thing in our while loop, we increment currentPosition every iteration of the loop, as that is just tracking which character of the string we're on and doesn't care about our conditions:

++currentPosition;

After we've completed the while loop, we now know that we've gotten to the end of the string in, but we haven't added the final substring of in from the final instance of c. So we need to get the substring one more time to finalise our return array:

elements.push_back(in.substr(previousPosition, currentPosition - previousPosition));

Finally, we return the array that is now populated with all strings that were delimited by c:

return elements;