It’s sometimes necessary either to enumerate, or to just count various components of a text in our apps. Examples of that would be to find the total number of characters, words, paragraphs, lines, and more in the entire or part of the text. Quite probably the first thought towards achieving that would be to go by doing some custom work; to break the original string into pieces based on the space character, new line character, and so on, to count the resulting parts and eventually act on them. However, we really do not have to reinvent the wheel, as Swift provides us with the tools to do all that.
As you’ll see in the next parts of this post, it’s quite trivial to get words, paragraphs and other text components from a string; it’s all there in the Foundation framework. But it’s the kind of APIs that nobody really cares about until they come to need such functionalities in their apps.
The basics
Starting with the fundamentals in an Xcode playground, let’s make the following “lorem ipsum” the sample text to work on in this post:
1 2 3 4 5 6 7 8 9 |
let text = “”“ Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ac felis donec et odio pellentesque diam volutpat commodo sed. Curabitur gravida arcu ac tortor dignissim convallis aenean et tortor. Malesuada fames ac turpis egestas maecenas pharetra convallis. Phasellus egestas tellus rutrum tellus pellentesque eu tincidunt tortor. Ullamcorper eget nulla facilisi etiam dignissim diam quis. Adipiscing commodo elit at imperdiet dui accumsan sit amet. Nibh ipsum consequat nisl vel. Tempor orci eu lobortis elementum nibh. Vestibulum lorem sed risus ultricies tristique nulla. Bibendum at varius vel pharetra vel. Sed risus ultricies tristique nulla. Fermentum iaculis eu non diam. Eu turpis egestas pretium aenean pharetra magna. Aliquam malesuada bibendum arcu vitae elementum curabitur vitae nunc sed. Egestas erat imperdiet sed euismod nisi porta lorem mollis. ““” |
The first thing to focus on for the sake of completeness is how we can get the total number of characters in the above text. That’s easy, it does not require any specific API to manage it, and most likely you know how to do it already by accessing the count
property of text
:
1 2 3 |
text.count |
The above will give us the length of the text
string anywhere we’re going to use it. If we’d rather having something simpler than this statement, we could return it from a read-only computed property like so:
1 2 3 4 5 |
var totalCharacters: Int { text.count } |
totalCharacters
provides us now exactly with what its name says. The above is just for convenience and probably for more clarity into a codebase. But other than that, it doesn’t offer anything different or new.
The interesting part begins when we want to go further than that, and enumerate text components, or simply count them, such as the words contained in it.
Enumerating text components
The Foundation framework contains a particular method that we can invoke through a String property, named enumerateSubstrings(in:options:_:)
. Its purpose is quite specific; to enumerate the substrings of the string value that is accessed from, based on the given options. Being more precise, depending on the value that we’ll supply as the second argument, the substrings that we’ll get back are going to be words, full paragraphs, lines, sentences and more. For the complete list, just take a look here.
Regarding the other two expected arguments, the first one is the range of the string that we are interested in. This can include the entire length, or just a part of it. The last argument is a closure, and that’s the place where we get and handle the substrings. It’s also the place where we can stop the enumeration on demand if necessary; we’ll see all that right next.
Getting straight into the point now, the following will enumerate all words in the given string:
1 2 3 4 5 6 |
text.enumerateSubstrings(in: text.startIndex..<text.endIndex, options: .byWords) { substring, substringRange, enclosingRange, stop in } |
The first thing to notice here is the range of string that we want substrings for. The text.startIndex..<text.endIndex
covers the entire length, but you may specify a different range if circumstances demand so.
The second argument describes the kind of text components that we would like to enumerate. In this particular example we indicate that we’d like to enumerate words by providing the byWords
value as argument; we’ll get the text’s words as substrings.
As it’s already said, the last argument is a closure, and we have to deal with the substrings in it. However, note that we don’t get them all at once. The method works in iterations, so we have only one substring at the time in the closure. Its parameter list contains the following:
substring
: The current substring (word, paragraph, etc) as a String value. Note that this is an optional value, and always make sure that it’s not nil when using it. Additional information is coming up next.substringRange
: The range of the current substring in the original string.enclosingRange
: The range of the substring as before, but including the character the follows after that, such as the next space character, the full stop (period) symbol, and so on.stop
: A boolean value indicating whether enumeration should keep going or not. If there are certain purposes that require to stop the process, then make ittrue
, otherwise simply don’t do anything.
Accessing substrings
The safest way to collect all substrings is by using the substringRange
parameter value that provides the range of the current substring. Suppose that we have this array:
1 2 3 |
var components = [Substring]() |
In the closure that we provide to the method we can assign each new substring to the above array as follows:
1 2 3 4 5 6 |
text.enumerateSubstrings(in: text.startIndex..<text.endIndex, options: .byWords) { substring, substringRange, enclosingRange, stop in components.append(text[substringRange]) } |
Printing the components
contents will show this:
1 2 3 4 5 6 7 8 9 10 11 |
Lorem ipsum dolor sit amet consectetur adipiscing … … |
In an exactly similar way we could use the enclosingRange
value instead, and include subsequent characters in the substring as well:
1 2 3 4 5 6 |
text.enumerateSubstrings(in: text.startIndex..<text.endIndex, options: .byWords) { substring, substringRange, enclosingRange, stop in components.append(text[enclosingRange]) } |
Before seeing the output, let’s replace the non-visible space characters with the underscore on each substring:
1 2 3 |
components.forEach { print($0.replacingOccurrences(of: ” “, with: “_”)) } |
The printed values would look like that:
1 2 3 4 5 6 7 8 9 10 11 12 |
Lorem_ ipsum_ dolor_ sit_ amet,_ consectetur_ adipiscing_ elit,_ … … |
Note that each item in the components
array is a Substring and not a String value. To manipulate it as a String, remember to initialize a String value with a substring item first:
1 2 3 4 5 |
let string = String(components[0]) print(string) // It prints: Lorem |
The substring parameter value
I mentioned previously that the substring
parameter value of the closure might be nil. Actually, this is not the default case, and we’ll keep receiving all substrings as String values through substring
as well. However, if we are not really interested in them, and given that we can also get substrings using their range, we can force substring
being nil by passing the .substringNotRequired
value as an additional option to the enumerateSubstrings(in:options:_:)
method:
1 2 3 4 5 6 7 8 9 10 11 12 |
text.enumerateSubstrings(in: text.startIndex..<text.endIndex, options: [.byWords, .substringNotRequired]) { substring, substringRange, enclosingRange, stop in print(substring) } // It prints: // nil // nil // nil // … |
Stopping enumeration
There might be times where going through the entire text is not necessary. In those cases, we can simply stop enumerating substrings.
To see how that works, suppose that we want to get only the first five words in the sample text we have here. With the assistance of a variable, we’ll be counting the number of encountered substrings until we reach the desired limit. When we get there, all we have to do is to set the true value to stop
parameter value. Note that we can do so because it’s an inout value:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
text.enumerateSubstrings(in: text.startIndex..<text.endIndex, options: .byWords) { substring, substringRange, enclosingRange, stop in print(text[substringRange]) wordCount += 1 if wordCount == 5 { stop = true } } // It prints: // Lorem // ipsum // dolor // sit // amet |
The last parameter name in the closure should be mandatorily named stop
, otherwise don’t expect it to work. If you’re not planning to use it, you can replace it with the underscore symbol. Actually, we can do the same for all names for parameters that they don’t play any role in our implementation, and keep only what we need. For instance:
1 2 3 4 5 6 |
text.enumerateSubstrings(in: text.startIndex..<text.endIndex, options: .byWords) { _, substringRange, _, _ in … } |
Getting the words count
Now that the various parts of the enumerateSubstrings(in:options:_:)
method have been explained, we can focus on counting the words in the given text. There are two simple things to only do; the first is to keep all substrings in an array as shown previously:
1 2 3 4 5 6 |
text.enumerateSubstrings(in: text.startIndex..<text.endIndex, options: .byWords) { _, substringRange, _, _ in components.append(text[substringRange]) } |
The second step is to simply count the items contained in the components
array; that is the total number of words in the text
string:
1 2 3 4 5 6 |
print(“\(components.count) Words”) // It prints: // 133 Words |
One implementation for various enumeration options
In text-related apps, we often need to report the total number of words, paragraphs, lines and so on. However, implementing a call to the enumerateSubstrings(in:options:_:)
method an equal number of times does not sound like a good idea; we’re breaking a basic rule in software engineering called DRY; Don’t Repeat Yourself.
To avoid that, we can resort to a simple, yet handy approach; to define a method that will be invoking enumerateSubstrings(in:options:_:)
once, but it will be accepting the enumeration options as argument. By doing that, we’ll be able to use the same method for different kind of results simply by providing the proper enumeration option value.
The implementation of that method is shown right next; everything that’s included in it has already been discussed:
1 2 3 4 5 6 7 8 9 10 11 |
func countTextComponents(options: String.EnumerationOptions = .byWords) -> Int { var components = [Substring]() text.enumerateSubstrings(in: text.startIndex..<text.endIndex, options: options) { _, substringRange, _, _ in components.append(text[substringRange]) } return components.count } |
It’s now easy to fetch the total number of the various components in the text. For instance, see the next method that prints such values and makes use of countTextComponents(options:)
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
func showTextCountInfo() { print(“\(totalCharacters) Characters”) print(“\(countTextComponents()) Words”) print(“\(countTextComponents(options: .byParagraphs)) Paragraphs”) print(“\(countTextComponents(options: .byLines)) Lines”) } showTextCountInfo() // It prints: // 927 Characters // 133 Words // 5 Paragraphs // 5 Lines |
Conclusion
Enumerating substrings in a string, or counting the various text components, is easier than what we have all probably thought initially; one particular method is there to provide us with everything we need, as long as we know how to use it. There are more methods like the one presented here hidden in the Foundation framework that serve specific purposes, and I may talk about in future posts. Until then, I hope you found today’s topic valuable, and that you met a new tool to use while coding apps in Swift.
Thanks for reading! ????