LINQ: Writing an Interweave extension

As we have seen in our last few posts on LINQ, the framework comes with some great extension methods to handle the merging of multiple collections into a single one.

However, there are things that LINQ can not easily do on its own. One of them is interweaving collections, i.e. merging them by taking elements from each input collection alternatively.

For example, given the inputs 1, 2, 3 and a, b, c we would get the output 1, a, 2, b, 3, c.

Today, we will look into how to best implement this ourselves.

Using other LINQ methods

When extending LINQ, one of the first things we should always investigate is whether we can make use of the existing extension methods to easily solve the problem for us.

In many circumstances, the composition of two or more LINQ methods together with some smart delegates can solve in a single line of code what would take us dozens of lines to write ourselves.

Mostly, that approach is preferable as the shorter code is more easily shown to be correct (we can assume the correctness of LINQ’s own methods).

In addition, it is often easier to change and improve shorter methods should that become necessary in the future.

Our first approach is one I wrote for last week’s post to show off the usefulness of SelectMany and composition of LINQ methods in general.

var result = first
    .Zip(second, (f, s) => new Type[] {a, b})
    .SelectMany(x => x);

This line of code takes our input collections first and second, and merges them using Zip to create a two element array for each pair of elements. Afterwards, it uses SelectMany to flatten the resulting sequence of arrays into a single sequence.

The main flaw with this approach is that it creates a new array object for each pair of elements, just to discard it again right afterwards. This is a very wasteful use of allocations and will impose unnecessary garbage collection pressure.

Further, this approach is very difficult to extend to more than two collections, since Zip only handles two input collections.

There are other ways to compose multiple LINQ methods to interweave two or more collections, but in the end all of them have flaws similar to this.

Interweaving from scratch

So instead we will start from scratch, and write a method that does not have any of the disadvantages mentioned, nor any others we could come up with.

Let us begin small, by writing a method that interweaves two input sequences. Our method signature will be as simple as follows.

IEnumerable<T> Interweave<T>(IEnumerable<T> first, IEnumerable<T> second)

There are a number of ways to approach our implementation.

Note that in either one however, we have to enumerable both collections at the same time in some way or another.

We could simplify some of the code we are going to write by enumerating our input collections with ToList, and then enumerate the lists with a simple for loop. However, this means that we lose the option for true deferred execution – one of the best features of LINQ.

Additionally, it would require us to allocate memory for our entire input collections, and would fail if we are given input sequences of infinite length. While the latter is a rare case, the many deferred LINQ methods handle it just fine, and so should we.

So, instead we will make proper use of enumerators, and enumerate the input collections element by element as needed, and return these using the yield keyword.

Since we have to enumerate both collections at once and C# does not provide any language structure like foreach to do so, we need to manually enumerate at least one of our inputs.

If we combine foreach with manual enumeration, we would end up with code as follows.

using (var secondEnum = second.GetEnumerator())
{
    foreach (var firstItem in first)
    {
        if (!secondEnum.MoveNext())
            yield break;

        yield return firstItem;
        yield return secondEnum.Current;
    }
}

While working perfectly, I think that this approach is rather ugly since it does not treat our two input sequences the same, seemingly prioritizing one of them.

Instead, and also because this will extrapolate much better to interweaving more than two collections, let us manually enumerate both of our inputs.

using (var firstEnum = first.GetEnumerator())
using (var secondEnum = second.GetEnumerator())
{
    while (true)
    {
        if (!firstEnum.MoveNext() || !secondEnum.MoveNext())
            yield break;

        yield return firstEnum.Current;
        yield return seconEnum.Current;
    }
}

This code is not only shorter, but also virtually symmetrical in the use of first and second, making our intent much clearer.

Note that I omitted the optional curly braces for the first using statement that would wrap around the second one. That way it becomes clear that we intend both enumerators only in the inner block, and that neither of them is of higher priority than the other.

Since both enumerators have the exact same type, we could combine their creation into a single using statement. However, I prefer this approach for its clarity and expressiveness.

Either of these two pieces of code makes for an adequate implementation of a two-sequence interweave. There are still things we could add, like adding the option to append all remaining elements of the longer collections (currently, we will return the same number of elements from both collections in all cases).

However, I will skip those points in favour of a more general implementation that will work for more than two sequences, and discuss how to handle the case on inputs with different lengths there.

Interweaving more than two collections

Clearly, when we want to expand our implementation to handle any number of input collections, we have to change our approach to some degree.

Our new method signature will be as follows; note the nested generic IEnumerable.

IEnumerable<T> Interweave<T>(IEnumerable<IEnumerable<T>> inputs)

We will no longer be able to use a fixed number of using statements, but will have to implement the underlying logic ourselves.

Note: strictly speaking we could use a recursive method, embedding its tail calls into a using statement, passing down a list of enumerators, and remaining enumerables to prevent having to use try-finally blocks below. This very functional approach goes a bit beyond the scope of this post however, and only complicates the resulting code unnecessarily. If you are interested in this approach, be sure to let me know, and I will make sure to cover it in more detail in the future.

Internally, a using statement translates into a try-finally block as follows.

using (var obj = init())
{
    // statements
}
// is equivalent to:
{
    var obj = init();
    try
    {
        // statements
    }
    finally
    {
        if (obj != null)
            ((IDisposable)obj).Dispose();
    }
}

This ensures that our enumerators are cleaned up properly after we are done with them, even if we return from the method or throw an exception.

Since we cannot use the using statement, we have to implement the try-finally pattern ourself as follows.

var enumerators = new List<IEnumerator<T>>;
try
{
    foreach (var i in inputs)
    {
        enumerators.Add(i.GetEnumerator());
    }

    // interweave here!
}
finally
{
    if (enumerators != null)
        foreach (var e in enumerators)
        {
            e.Dispose();
        }
}

Note how we have to declare our list of enumerators variable outside of the try block so that it will still be accessible in the scope of finally.

Further, we cannot use a LINQ statement to construct our list of enumerators, but instead have to add them individually in a foreach loop. This way, if an exception occurs while we are creating enumerators, those that have already been added to the list will still be disposed of correctly.

There is still the case where List.Add could throw an exception, for example if the application ran out of memory when trying to resize the list. We could prevent this with a try-catch block inside the foreach loop, but I will refrain from going to that length. If you really need that kind of security, you can of course always add the relevant code yourself.

Let us now look at how to interweave elements from our list of enumerators.

For two sequences we used the following code.

while (true)
{
    if (!firstEnum.MoveNext() || !secondEnum.MoveNext())
        yield break;

    yield return firstEnum.Current;
    yield return seconEnum.Current;
}

This can be easily abstracted for multiple collections:

while (true)
{
    if (!enumerators.All(e => e.MoveNext()))
        yield break;

    foreach (var e in enumerators)
    {
        yield return e.Current;
    }
}

Note that we are enumerating the list of enumerators twice to make sure that all input sequences have at least another element before we start returning any of them. That way we make sure to always return exactly the same number of elements from each input collection.

Our full method now looks as follows.

IEnumerable<T> Interweave<T>(IEnumerable<IEnumerable<T>> inputs)
{
    var enumerators = new List<IEnumerator<T>>;
    try
    {
        foreach (var i in inputs)
            enumerators.Add(i.GetEnumerator());

        while (true)
        {
            if (!enumerators.All(e => e.MoveNext()))
                yield break;

            foreach (var e in enumerators)
                yield return e.Current;
        }
    }
    finally
    {
        if (enumerators != null)
            foreach (var e in enumerators)
                e.Dispose();
    }
}

Note that I left out some of the optional curly braces for brevity.

Returning all elements

Now that we have a full implementation to interweave elements from any (finite) number of input sequences, let us change it such that it will continue interweaving past the length of the shortest collection.

To do this, we merely have to change the exit condition of our inner loop as follows.

if (!enumerators.All(e => e.MoveNext()))
    yield break;

becomes:

var removeAny = false;
for (var i = 0; i < enumerators.Count; i++)
{
    if (!enumerators[i].MoveNext())
    {
        enumerators[i].Dispose();
        enumerators[i] = null;
        removeAny = true;
    }
}
if (removeAny)
    enumerators.RemoveAll(e => e == null);
if (enumerators.Count == 0)
    yield break;

With every iteration, this piece of code will attempt to advance all enumerators by one and then remove all that have reached their end. Once there are no more enumerators left, it exits the method with yield break.

Note how we have to make sure to also dispose the enumerators here, since we are removing them from the list which we use for disposal in our finally block.

If we like we can rewrite this block to be a bit more concise as well as taking better advantage of List.RemoveAll.

enumerators.RemoveAll(e =>
    {
        if (e.MoveNext())
            return false;
        e.Dispose()
        return true;
    });
if (enumerators.Count == 0)
    yield break;

However, I would consider this a micro optimisation, and either piece of code will work just fine.

In the end, our entire method looks as follows:

IEnumerable<T> Interweave<T>(IEnumerable<IEnumerable<T>> inputs)
{
    var enumerators = new List<IEnumerator<T>>;
    try
    {
        foreach (var i in inputs)
            enumerators.Add(i.GetEnumerator());

        while (true)
        {
            enumerators.RemoveAll(e =>
                {
                    if (e.MoveNext())
                        return false;
                    e.Dispose()
                    return true;
                });
            if (enumerators.Count == 0)
                yield break;

            foreach (var e in enumerators)
                yield return e.Current;
        }
    }
    finally
    {
        if (enumerators != null)
            foreach (var e in enumerators)
                e.Dispose();
    }
}

Note how we could easily modify the enumerator.Count == 0 check to compare the number of remaining enumerators against a parameter instead. We could for example return when there are less than two or three of the input collections remaining.

Overall, this resulting code is reasonably flexible and efficient, which was our goal.

Which of these methods you use will of course depend on your exact needs. Maybe you even want to combine them into a single method that can handle both, or even more complex requirements.

Conclusion

In this post we implemented three versions of a LINQ style Interweave extension method. The first could only handle two input collections, while the latter could handle any number.

Further, we expanded the last version to be able to continue interweaving, even if some of the input collections reached their end.

If you found this post interesting and useful, make sure to share it on your favourite social media!

Let me know if you will use these methods in your own applications, and of course make sure to leave a comment if you have any questions, or if you would like to suggest a topic for me to cover in the future.

Enjoy the pixels!

Leave a Reply