Recently I needed to write some code to query a list of WhoIs servers to find the registration date of each url in another list. I needed to write it in VB.NET so that it would link in with our VB ASP.NET code.

My first approach was to grab a list of whois servers from a flat file and just do a for loop to go through them all until I found the information I was looking for. The problem with this is that it is horridly slow and can take a very, very long time. It was suggested to me to process them in parallel so I did some searching through the web to find a good way to do just that and what I found was not only amazingly simple but also very, very fast. Microsoft has a Parallel.For and Parallel.ForEach. I also needed to know stop the loop when I get the information I need, which is where MSDN’s article on how to stop or break from a Parallel.For loop came in handy.

Notice how Stop and Break are two different things here. The basic difference is the time at which it stops threads running the loop that aren’t the one that the stop or break is being called by.

Stop tells the other threads, including ones that may be executing items after the one that is in the current thread, to finish their processing and stop when convenient. So let’s say that we’re doing a for loop from 0 to 9. 10 items. We might have something similar to the following:

Parallel.For(0, 10, Sub(i, loopState)
                               'processing code here
                               ' Somewhere in here we call:
                               loopState.Stop()
                    End Sub)

Basically, you’re passing an anonymous subroutine to the Parallel.For() function. So, passing a function to a function. That function takes two paramaters: the index (in this example i) and a ParallelLoopState object (i used loopState). The ParallelLoopState object is what you use to call the Stop() method. So for example’s sake, lets say we currently have threads with the index 4, 5, 6 running, and loopState.Stop() gets called from thread 5. It wont start any new threads with any new indexes, but it will wait for both for and 6 to finish what they were doing before finally stopping.

However Break tells any threads that are executing past the current index to drop whatever their doing, but lets anything previous to the current thread to finish before completely ending the loop. So going with the same example, running threads 4, 5, and 6, thread 5 calls loopState.Break(). Doing this lets thread 4 finish, but kills thread 6 immediately.

So now that we understand how the basics of these loops work, lets move on to the whois query part.

I basically wrote two functions; one to query all the whois servers for one particular domain in parallel, and one to execute the previous function on a list of domains I need the information from in parallel. The first function looks something like this:

Public Function ParallelQuery(ByVal domain As String) As String()

        Dim sr As StreamReader = New StreamReader("serverlist.txt")
        ' Read in the entire server list
        Dim whoisstring As String = sr.ReadToEnd()
        ' Split it by new line
        Dim whoislist As String() = whoisstring.Split(New Char() {vbCrLf})

        Dim returnData As String = ""

        Dim endData As String() = Nothing

        Parallel.ForEach(whoislist, Sub(d As String, loopstate As ParallelLoopState)
                                        Try
                                            returnData = ""
                                            d = d.Trim()
                                            ' Create a new TcpClient
                                            Dim tcpClient As TcpClient = New TcpClient(d, 43)

                                            Dim networkStream As NetworkStream = tcpClient.GetStream()
                                            If networkStream.CanRead And networkStream.CanWrite Then
                                                Dim sendBytes As Byte() = Encoding.ASCII.GetBytes(domain + vbCrLf)
                                                networkStream.Write(sendBytes, 0, sendBytes.Length)
                                                Dim bytes(tcpClient.ReceiveBufferSize) As Byte
                                                Dim recvSize As Int32
                                                recvSize = networkStream.Read(bytes, 0, CInt(tcpClient.ReceiveBufferSize))

                                                While recvSize <> 0
                                                    returnData += Encoding.ASCII.GetString(bytes, 0, recvSize)
                                                    recvSize = networkStream.Read(bytes, 0, CInt(tcpClient.ReceiveBufferSize))
                                                End While

                                                If returnData <> "" Then
                                                    ' This is where you do your processing of the return data from the request
                                                    ' I am using a regex expression to get the creation date.
                                                    Dim regex As New RegularExpressions.Regex("creation.date:\s+(.+)", System.Text.RegularExpressions.RegexOptions.IgnoreCase)
                                                    Dim match As RegularExpressions.Match = regex.Match(returnData)

                                                    If match.Value <> "" Then
                                                        tcpClient.Close()
                                                        endData = {domain, match.Groups(1).Value}
                                                        ' If it finds what I need I can stop the loop.
                                                        loopstate.Stop()
                                                    End If
                                                End If

                                            End If
                                            tcpClient.Close()
                                        Catch ex As Exception
                                        End Try
                                    End Sub)
        Return endData
    End Function

The function that ran those queries in parallel looks like this:

 

Public Function processList(ByVal list As List(Of String)) As DataTable

        Parallel.ForEach(list, Sub(e)
                                   ' This returns the domain and creation date as a string for each one
                                   Dim res As String() = ParallelQuery(e)
                                   ' I can now do whatever I need to with the creation date

                               End Sub)

        Return dt

    End Function

 

You’ll notice that I used Parallel.ForEach loops in these functions. I didn’t need to use an index and instead thought it would be easier to have the object I’d be using inside the function passed to the function in place of the index. It’s relatively the same thing as the Parallel.For loop but you can use objects as the first parameter to the anonymous subroutine instead of just looking for an integer

 

So there you have it. I love .NET for creating this. AND it automatically chooses how many threads to use based on your system resources available. These methods make parallelism so easy it almost feels like cheating.


		
© 2012 Code Brain Suffusion theme by Sayontan Sinha