Enumerating and Counting Text Components in Swift

April 1st, 2022

⏱ Reading Time: 7 mins

It’s sometimes necessary either to enumerate, or to just count various components of a text in our apps. Examples of that would be to find the total number of characters, words, paragraphs, lines, and more in the entire or part of the text. Quite probably the first thought towards achieving that would be to go by doing some custom work; to break the original string into pieces based on the space character, new line character, and so on, to count the resulting parts and eventually act on them. However, we really do not have to reinvent the wheel, as Swift provides us with the tools to do all that.

As you’ll see in the next parts of this post, it’s quite trivial to get words, paragraphs and other text components from a string; it’s all there in the Foundation framework. But it’s the kind of APIs that nobody really cares about until they come to need such functionalities in their apps.

The basics

Starting with the fundamentals in an Xcode playground, let’s make the following “lorem ipsum” the sample text to work on in this post:

let text = “”“

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ac felis donec et odio pellentesque diam volutpat commodo sed. Curabitur gravida arcu ac tortor dignissim convallis aenean et tortor. Malesuada fames ac turpis egestas maecenas pharetra convallis.

Phasellus egestas tellus rutrum tellus pellentesque eu tincidunt tortor. Ullamcorper eget nulla facilisi etiam dignissim diam quis. Adipiscing commodo elit at imperdiet dui accumsan sit amet. Nibh ipsum consequat nisl vel. Tempor orci eu lobortis elementum nibh. Vestibulum lorem sed risus ultricies tristique nulla.

Bibendum at varius vel pharetra vel. Sed risus ultricies tristique nulla. Fermentum iaculis eu non diam. Eu turpis egestas pretium aenean pharetra magna. Aliquam malesuada bibendum arcu vitae elementum curabitur vitae nunc sed. Egestas erat imperdiet sed euismod nisi porta lorem mollis.

““”

The first thing to focus on for the sake of completeness is how we can get the total number of characters in the above text. That’s easy, it does not require any specific API to manage it, and most likely you know how to do it already by accessing the count property of text:

text.count

The above will give us the length of the text string anywhere we’re going to use it. If we’d rather having something simpler than this statement, we could return it from a read-only computed property like so:

var totalCharacters: Int {

text.count

}

totalCharacters provides us now exactly with what its name says. The above is just for convenience and probably for more clarity into a codebase. But other than that, it doesn’t offer anything different or new.

The interesting part begins when we want to go further than that, and enumerate text components, or simply count them, such as the words contained in it.

Enumerating text components

The Foundation framework contains a particular method that we can invoke through a String property, named enumerateSubstrings(in:options:_:). Its purpose is quite specific; to enumerate the substrings of the string value that is accessed from, based on the given options. Being more precise, depending on the value that we’ll supply as the second argument, the substrings that we’ll get back are going to be words, full paragraphs, lines, sentences and more. For the complete list, just take a look here.

Regarding the other two expected arguments, the first one is the range of the string that we are interested in. This can include the entire length, or just a part of it. The last argument is a closure, and that’s the place where we get and handle the substrings. It’s also the place where we can stop the enumeration on demand if necessary; we’ll see all that right next.

Getting straight into the point now, the following will enumerate all words in the given string:

text.enumerateSubstrings(in: text.startIndex..<text.endIndex,

options: .byWords) { substring, substringRange, enclosingRange, stop in

}

The first thing to notice here is the range of string that we want substrings for. The text.startIndex..<text.endIndex covers the entire length, but you may specify a different range if circumstances demand so.

The second argument describes the kind of text components that we would like to enumerate. In this particular example we indicate that we’d like to enumerate words by providing the byWords value as argument; we’ll get the text’s words as substrings.

As it’s already said, the last argument is a closure, and we have to deal with the substrings in it. However, note that we don’t get them all at once. The method works in iterations, so we have only one substring at the time in the closure. Its parameter list contains the following:

substring: The current substring (word, paragraph, etc) as a String value. Note that this is an optional value, and always make sure that it’s not nil when using it. Additional information is coming up next.
substringRange: The range of the current substring in the original string.
enclosingRange: The range of the substring as before, but including the character the follows after that, such as the next space character, the full stop (period) symbol, and so on.
stop: A boolean value indicating whether enumeration should keep going or not. If there are certain purposes that require to stop the process, then make it true, otherwise simply don’t do anything.

Accessing substrings

The safest way to collect all substrings is by using the substringRange parameter value that provides the range of the current substring. Suppose that we have this array:

var components = [Substring]()

In the closure that we provide to the method we can assign each new substring to the above array as follows:

text.enumerateSubstrings(in: text.startIndex..<text.endIndex,

options: .byWords) { substring, substringRange, enclosingRange, stop in

components.append(text[substringRange])

}

Printing the components contents will show this:

Lorem

ipsum

dolor

sit

amet

consectetur

adipiscing

…

In an exactly similar way we could use the enclosingRange value instead, and include subsequent characters in the substring as well:

text.enumerateSubstrings(in: text.startIndex..<text.endIndex,

options: .byWords) { substring, substringRange, enclosingRange, stop in

components.append(text[enclosingRange])

}

Before seeing the output, let’s replace the non-visible space characters with the underscore on each substring:

components.forEach { print($0.replacingOccurrences(of: ” “, with: “_”)) }

The printed values would look like that:

Lorem_

ipsum_

dolor_

sit_

amet,_

consectetur_

adipiscing_

elit,_

…

Note that each item in the components array is a Substring and not a String value. To manipulate it as a String, remember to initialize a String value with a substring item first:

let string = String(components[0])

print(string)

// It prints: Lorem

The substring parameter value

I mentioned previously that the substring parameter value of the closure might be nil. Actually, this is not the default case, and we’ll keep receiving all substrings as String values through substring as well. However, if we are not really interested in them, and given that we can also get substrings using their range, we can force substring being nil by passing the .substringNotRequired value as an additional option to the enumerateSubstrings(in:options:_:) method:

text.enumerateSubstrings(in: text.startIndex..<text.endIndex,

options: [.byWords, .substringNotRequired]) { substring, substringRange, enclosingRange, stop in

print(substring)

}

// It prints:

// nil

// …

Stopping enumeration

There might be times where going through the entire text is not necessary. In those cases, we can simply stop enumerating substrings.

To see how that works, suppose that we want to get only the first five words in the sample text we have here. With the assistance of a variable, we’ll be counting the number of encountered substrings until we reach the desired limit. When we get there, all we have to do is to set the true value to stop parameter value. Note that we can do so because it’s an inout value:

text.enumerateSubstrings(in: text.startIndex..<text.endIndex,

options: .byWords) { substring, substringRange, enclosingRange, stop in

print(text[substringRange])

wordCount += 1

if wordCount == 5 {

stop = true

}

// It prints:

// Lorem

// ipsum

// dolor

// sit

// amet

The last parameter name in the closure should be mandatorily named stop, otherwise don’t expect it to work. If you’re not planning to use it, you can replace it with the underscore symbol. Actually, we can do the same for all names for parameters that they don’t play any role in our implementation, and keep only what we need. For instance:

text.enumerateSubstrings(in: text.startIndex..<text.endIndex,

options: .byWords) { _, substringRange, _, _ in

…

}

Getting the words count

Now that the various parts of the enumerateSubstrings(in:options:_:) method have been explained, we can focus on counting the words in the given text. There are two simple things to only do; the first is to keep all substrings in an array as shown previously:

text.enumerateSubstrings(in: text.startIndex..<text.endIndex,

options: .byWords) { _, substringRange, _, _ in

components.append(text[substringRange])

}

The second step is to simply count the items contained in the components array; that is the total number of words in the text string:

print(“\(components.count) Words”)

// It prints:

// 133 Words

One implementation for various enumeration options

In text-related apps, we often need to report the total number of words, paragraphs, lines and so on. However, implementing a call to the enumerateSubstrings(in:options:_:) method an equal number of times does not sound like a good idea; we’re breaking a basic rule in software engineering called DRY; Don’t Repeat Yourself.

To avoid that, we can resort to a simple, yet handy approach; to define a method that will be invoking enumerateSubstrings(in:options:_:) once, but it will be accepting the enumeration options as argument. By doing that, we’ll be able to use the same method for different kind of results simply by providing the proper enumeration option value.

The implementation of that method is shown right next; everything that’s included in it has already been discussed:

func countTextComponents(options: String.EnumerationOptions = .byWords) -> Int {

var components = [Substring]()

text.enumerateSubstrings(in: text.startIndex..<text.endIndex,

options: options) { _, substringRange, _, _ in

components.append(text[substringRange])

}

return components.count

}

It’s now easy to fetch the total number of the various components in the text. For instance, see the next method that prints such values and makes use of countTextComponents(options:):

func showTextCountInfo() {

print(“\(totalCharacters) Characters”)

print(“\(countTextComponents()) Words”)

print(“\(countTextComponents(options: .byParagraphs)) Paragraphs”)

print(“\(countTextComponents(options: .byLines)) Lines”)

}

showTextCountInfo()

// It prints:

// 927 Characters

// 133 Words

// 5 Paragraphs

// 5 Lines

Conclusion

Enumerating substrings in a string, or counting the various text components, is easier than what we have all probably thought initially; one particular method is there to provide us with everything we need, as long as we know how to use it. There are more methods like the one presented here hidden in the Foundation framework that serve specific purposes, and I may talk about in future posts. Until then, I hope you found today’s topic valuable, and that you met a new tool to use while coding apps in Swift.

Thanks for reading! ????

Tags
count, enumerate, Foundation, paragraph, range, substring, words