Writing a Split Function in C++
If you've ever worked with a language like Python, you might be familiar with the split
function. It takes in a string, and returns a list (Python array) of string elements that are delimited by some input string. For example:
test_string = "This string, has random, commas, in it"
elements = test_string.split(",")
print(elements)
The output of this is:
['This string', ' has random', ' commas', ' in it']
split
is a very useful function, and when I use Python it is an oft used tool in my repertoire.
C++ unfortunately does not have a built-in split function, but fortunately, we can write one ourselves. There are many ways you could do this, but I'll outline one here and go through it line by line. You can also find it on GitHub.
#pragma once
#include <iostream>
#include <string>
#include <vector>
// Return array of elements of string in split by char c
std::vector<std::string> Split(const std::string& in, const char c)
{
std::vector<std::string> elements;
elements.reserve(in.length());
int currentPosition = 0;
int previousPosition = 0;
while (currentPosition != in.length())
{
if (in[currentPosition] == c)
{
elements.push_back(in.substr(previousPosition, currentPosition - previousPosition));
previousPosition = currentPosition + 1;
}
++currentPosition;
}
elements.push_back(in.substr(previousPosition, currentPosition - previousPosition));
return elements;
}
The function Split
takes in two arguments:
- A
std::string
in
which is the string that is to be split by some delimiter - A
char
c
which is the delimiter that you split by
Note that in this implementation, a pattern of more than one char
is not supported. The Python version of the function does support splitting by a string of more than one character, and this can be added to this function if needed.
The return value is a std::vector<string>
, i.e. an array of strings.
Starting with the first two lines:
std::vector<std::string> elements;
elements.reserve(in.length());
We allocate a vector of strings, which will later be returned by the function when it's been populated. As a small optimisation we reserve the length of the in
string, as we are guaranteed to never have more elements than there are characters in that string.
Next we define two variables that will increment in the while loop:
int currentPosition = 0;
int previousPosition = 0;
currentPosition
will increment each loop, and tracks where we are on each iteration as we go through thein
stringpreviousPosition
will only change every time we hit our charc
, and it enables us to get each new string between instances of the delimiterc
We now start a while loop, which will go from 0
and stop when we've gone through the entire string in
:
while (currentPosition != in.length())
As we are going through in
character by character, we can check whether the current character is c
, which we do as the first thing in the while loop. Strings are arrays of characters, so we can access each character by its index as we would in any array:
if (in[currentPosition] == c)
If it is, we get the substring of in
between previousPosition
and currentPosition
and add it to our array:
elements.push_back(in.substr(previousPosition, currentPosition - previousPosition));
The function substr
takes in a starting position and the amount of characters forward we want to go from that position, hence why we're passing in currentPosition - previousPosition
as our second argument.
We then set previousPosition
to the currentPosition
, which is currently an instance of our char
c
, plus 1
, so that next time we grab the substring, we start on the character after c
.:
previousPosition = currentPosition + 1;
As the final thing in our while loop, we increment currentPosition
every iteration of the loop, as that is just tracking which character of the string we're on and doesn't care about our conditions:
++currentPosition;
After we've completed the while loop, we now know that we've gotten to the end of the string in
, but we haven't added the final substring of in
from the final instance of c
. So we need to get the substring one more time to finalise our return array:
elements.push_back(in.substr(previousPosition, currentPosition - previousPosition));
Finally, we return the array that is now populated with all strings that were delimited by c
:
return elements;
Comments ()