Recently I needed to write some code to query a list of WhoIs servers to find the registration date of each url in another list. I needed to write it in VB.NET so that it would link in with our VB ASP.NET code.
My first approach was to grab a list of whois servers from a flat file and just do a for loop to go through them all until I found the information I was looking for. The problem with this is that it is horridly slow and can take a very, very long time. It was suggested to me to process them in parallel so I did some searching through the web to find a good way to do just that and what I found was not only amazingly simple but also very, very fast. Microsoft has a Parallel.For and Parallel.ForEach. I also needed to know stop the loop when I get the information I need, which is where MSDN’s article on how to stop or break from a Parallel.For loop came in handy.
Notice how Stop and Break are two different things here. The basic difference is the time at which it stops threads running the loop that aren’t the one that the stop or break is being called by.
Stop tells the other threads, including ones that may be executing items after the one that is in the current thread, to finish their processing and stop when convenient. So let’s say that we’re doing a for loop from 0 to 9. 10 items. We might have something similar to the following:
Parallel.For(0, 10, Sub(i, loopState)
'processing code here
' Somewhere in here we call:
loopState.Stop()
End Sub)
Basically, you’re passing an anonymous subroutine to the Parallel.For() function. So, passing a function to a function. That function takes two paramaters: the index (in this example i) and a ParallelLoopState object (i used loopState). The ParallelLoopState object is what you use to call the Stop() method. So for example’s sake, lets say we currently have threads with the index 4, 5, 6 running, and loopState.Stop() gets called from thread 5. It wont start any new threads with any new indexes, but it will wait for both for and 6 to finish what they were doing before finally stopping.
However Break tells any threads that are executing past the current index to drop whatever their doing, but lets anything previous to the current thread to finish before completely ending the loop. So going with the same example, running threads 4, 5, and 6, thread 5 calls loopState.Break(). Doing this lets thread 4 finish, but kills thread 6 immediately.
So now that we understand how the basics of these loops work, lets move on to the whois query part.
I basically wrote two functions; one to query all the whois servers for one particular domain in parallel, and one to execute the previous function on a list of domains I need the information from in parallel. The first function looks something like this:
Public Function ParallelQuery(ByVal domain As String) As String()
Dim sr As StreamReader = New StreamReader("serverlist.txt")
' Read in the entire server list
Dim whoisstring As String = sr.ReadToEnd()
' Split it by new line
Dim whoislist As String() = whoisstring.Split(New Char() {vbCrLf})
Dim returnData As String = ""
Dim endData As String() = Nothing
Parallel.ForEach(whoislist, Sub(d As String, loopstate As ParallelLoopState)
Try
returnData = ""
d = d.Trim()
' Create a new TcpClient
Dim tcpClient As TcpClient = New TcpClient(d, 43)
Dim networkStream As NetworkStream = tcpClient.GetStream()
If networkStream.CanRead And networkStream.CanWrite Then
Dim sendBytes As Byte() = Encoding.ASCII.GetBytes(domain + vbCrLf)
networkStream.Write(sendBytes, 0, sendBytes.Length)
Dim bytes(tcpClient.ReceiveBufferSize) As Byte
Dim recvSize As Int32
recvSize = networkStream.Read(bytes, 0, CInt(tcpClient.ReceiveBufferSize))
While recvSize <> 0
returnData += Encoding.ASCII.GetString(bytes, 0, recvSize)
recvSize = networkStream.Read(bytes, 0, CInt(tcpClient.ReceiveBufferSize))
End While
If returnData <> "" Then
' This is where you do your processing of the return data from the request
' I am using a regex expression to get the creation date.
Dim regex As New RegularExpressions.Regex("creation.date:\s+(.+)", System.Text.RegularExpressions.RegexOptions.IgnoreCase)
Dim match As RegularExpressions.Match = regex.Match(returnData)
If match.Value <> "" Then
tcpClient.Close()
endData = {domain, match.Groups(1).Value}
' If it finds what I need I can stop the loop.
loopstate.Stop()
End If
End If
End If
tcpClient.Close()
Catch ex As Exception
End Try
End Sub)
Return endData
End Function
The function that ran those queries in parallel looks like this:
Public Function processList(ByVal list As List(Of String)) As DataTable
Parallel.ForEach(list, Sub(e)
' This returns the domain and creation date as a string for each one
Dim res As String() = ParallelQuery(e)
' I can now do whatever I need to with the creation date
End Sub)
Return dt
End Function
You’ll notice that I used Parallel.ForEach loops in these functions. I didn’t need to use an index and instead thought it would be easier to have the object I’d be using inside the function passed to the function in place of the index. It’s relatively the same thing as the Parallel.For loop but you can use objects as the first parameter to the anonymous subroutine instead of just looking for an integer
So there you have it. I love .NET for creating this. AND it automatically chooses how many threads to use based on your system resources available. These methods make parallelism so easy it almost feels like cheating.