My first wrangle with NoSQL. An explanation of what I made and my thoughts on NoSQL versus the traditional RDBMS

I had a random idea the other day. I wanted to write a server that would serve, generate and store image manipulations on the fly. So that’s what I made. You can check out the GitHub repository here. The server itself takes HTTP requests much like a standard web server. It takes an image id and any sizing or cropping parameters you would like and returns the image. When you pass sizing or cropping parameters it checks to see if the image with those parameters already exists and returns that if it does. This avoids regenerating images all the time. It stores information about the images in the Mongo database.

This was sort of a new field to me as it was my first venture into the world of NoSQL. I really like the way everything works. I think this project worked well with the “Unstructured Data” that NoSQL is really good for. Different images could have a couple of sets of different attributes (and of course this would expand if I added more types of manipulations). It would certainly have been doable in an relational database, but I think this works better.

MongoDB uses a notation called BSON for its definitions and queries. BSON is much like the  - probably more – familiar JSON, except with the ability to have binary objects (hence the B in BSON). Defining the data in BSON is rather trivial and, in my opinion, more natural than SQL (and certainly less verbose). I also like the fact that you can have nested data. In some instances this can be used as opposed to the using a foreign key in an RDBMS.

I did a good bit of reading before I dove into NoSQL to see if it was really any better. I had heard lots on how “NoSQL is the new wave and it’s so much better!”. Well the answer I found: It depends. There’s lots of arguments that say NoSQL, because of it’s unstructured nature, is much better at scaling. However, there’s plenty that say RDBMSs scale just fine too. I think it’s not that one can scale up better than the other, I think it’s just easier to scale up a NoSQL database in terms of code implementation. What I really got out of my reading was that the reason to use NoSQL is not necessarily for scaling, but really, it comes down to whether you need the ability to have unstructured data. As I stated, what I did with my server was not necessarily completely structured, however it could still have been implemented in typical relational style. I’ve heard that there are scenarios where you would need to have unstructured data storing abilities. I assume it’s true, though I can’t really think of an example so who knows.

All in all, I like the interactions with MongoDB a lot better than I do using SQL like I would with MySQL (Which is what I have been using for data storage since I started programming 8 years ago). However, I could get very similar interactions by using an ORM (Object Relational Mapper). Those pretty much keeps the SQL away from my eyes and lets me use an Object-Oriented interface with the server. In general, for my needs, I could use either one. I don’t really have to worry about the scale issue (for now anyways) but like i said, it really comes down to the need for structured vs. unstructured data.

I should note: my GitHub project code is not ‘ready for distribution’. It was really just a test project. The code is there for anyone to play with and modify, or look at as an example. It wont run straight away if you download it and fire it up. You will need to modify it for your environment.

 

Recently I needed to write some code to query a list of WhoIs servers to find the registration date of each url in another list. I needed to write it in VB.NET so that it would link in with our VB ASP.NET code.

My first approach was to grab a list of whois servers from a flat file and just do a for loop to go through them all until I found the information I was looking for. The problem with this is that it is horridly slow and can take a very, very long time. It was suggested to me to process them in parallel so I did some searching through the web to find a good way to do just that and what I found was not only amazingly simple but also very, very fast. Microsoft has a Parallel.For and Parallel.ForEach. I also needed to know stop the loop when I get the information I need, which is where MSDN’s article on how to stop or break from a Parallel.For loop came in handy.

Notice how Stop and Break are two different things here. The basic difference is the time at which it stops threads running the loop that aren’t the one that the stop or break is being called by.

Stop tells the other threads, including ones that may be executing items after the one that is in the current thread, to finish their processing and stop when convenient. So let’s say that we’re doing a for loop from 0 to 9. 10 items. We might have something similar to the following:

Parallel.For(0, 10, Sub(i, loopState)
                               'processing code here
                               ' Somewhere in here we call:
                               loopState.Stop()
                    End Sub)

Basically, you’re passing an anonymous subroutine to the Parallel.For() function. So, passing a function to a function. That function takes two paramaters: the index (in this example i) and a ParallelLoopState object (i used loopState). The ParallelLoopState object is what you use to call the Stop() method. So for example’s sake, lets say we currently have threads with the index 4, 5, 6 running, and loopState.Stop() gets called from thread 5. It wont start any new threads with any new indexes, but it will wait for both for and 6 to finish what they were doing before finally stopping.

However Break tells any threads that are executing past the current index to drop whatever their doing, but lets anything previous to the current thread to finish before completely ending the loop. So going with the same example, running threads 4, 5, and 6, thread 5 calls loopState.Break(). Doing this lets thread 4 finish, but kills thread 6 immediately.

So now that we understand how the basics of these loops work, lets move on to the whois query part.

I basically wrote two functions; one to query all the whois servers for one particular domain in parallel, and one to execute the previous function on a list of domains I need the information from in parallel. The first function looks something like this:

Public Function ParallelQuery(ByVal domain As String) As String()

        Dim sr As StreamReader = New StreamReader("serverlist.txt")
        ' Read in the entire server list
        Dim whoisstring As String = sr.ReadToEnd()
        ' Split it by new line
        Dim whoislist As String() = whoisstring.Split(New Char() {vbCrLf})

        Dim returnData As String = ""

        Dim endData As String() = Nothing

        Parallel.ForEach(whoislist, Sub(d As String, loopstate As ParallelLoopState)
                                        Try
                                            returnData = ""
                                            d = d.Trim()
                                            ' Create a new TcpClient
                                            Dim tcpClient As TcpClient = New TcpClient(d, 43)

                                            Dim networkStream As NetworkStream = tcpClient.GetStream()
                                            If networkStream.CanRead And networkStream.CanWrite Then
                                                Dim sendBytes As Byte() = Encoding.ASCII.GetBytes(domain + vbCrLf)
                                                networkStream.Write(sendBytes, 0, sendBytes.Length)
                                                Dim bytes(tcpClient.ReceiveBufferSize) As Byte
                                                Dim recvSize As Int32
                                                recvSize = networkStream.Read(bytes, 0, CInt(tcpClient.ReceiveBufferSize))

                                                While recvSize <> 0
                                                    returnData += Encoding.ASCII.GetString(bytes, 0, recvSize)
                                                    recvSize = networkStream.Read(bytes, 0, CInt(tcpClient.ReceiveBufferSize))
                                                End While

                                                If returnData <> "" Then
                                                    ' This is where you do your processing of the return data from the request
                                                    ' I am using a regex expression to get the creation date.
                                                    Dim regex As New RegularExpressions.Regex("creation.date:\s+(.+)", System.Text.RegularExpressions.RegexOptions.IgnoreCase)
                                                    Dim match As RegularExpressions.Match = regex.Match(returnData)

                                                    If match.Value <> "" Then
                                                        tcpClient.Close()
                                                        endData = {domain, match.Groups(1).Value}
                                                        ' If it finds what I need I can stop the loop.
                                                        loopstate.Stop()
                                                    End If
                                                End If

                                            End If
                                            tcpClient.Close()
                                        Catch ex As Exception
                                        End Try
                                    End Sub)
        Return endData
    End Function

The function that ran those queries in parallel looks like this:

 

Public Function processList(ByVal list As List(Of String)) As DataTable

        Parallel.ForEach(list, Sub(e)
                                   ' This returns the domain and creation date as a string for each one
                                   Dim res As String() = ParallelQuery(e)
                                   ' I can now do whatever I need to with the creation date

                               End Sub)

        Return dt

    End Function

 

You’ll notice that I used Parallel.ForEach loops in these functions. I didn’t need to use an index and instead thought it would be easier to have the object I’d be using inside the function passed to the function in place of the index. It’s relatively the same thing as the Parallel.For loop but you can use objects as the first parameter to the anonymous subroutine instead of just looking for an integer

 

So there you have it. I love .NET for creating this. AND it automatically chooses how many threads to use based on your system resources available. These methods make parallelism so easy it almost feels like cheating.


		
 

.NET Reflection is a very, very powerful tool. One thing I used it for is making a basic plugin system for my web server.

I needed some sort of way to know what assemblies to load, so I used a configuration file with a simple format to parse. For each module I wanted to load, I put a line that was the path to the DLL, preceded by the word “mod”

mod C:modulesmodule.dll

I figured that me or someone else might write a plugin that used another external library, so I used “dep” to mark DLL’s that were needed for the modules.

dep C:dependenciesdependency.dll

When the server starts up, it reads through this configuration file, takes all the lines that start with mod or dep and assigns puts their path in module and dependency ArrayList’s respectively. Once the config file is read, it goes through and loads the dependencies first.

 Assembly.LoadFile(DependencyPath);

Loading the modules is a little more complicated. For this I created a simple Module class to hold some information I would need about it, and be able to give me access to the Assembly object that loading the dll returns. In this case, when I load the module, I want to go into a class called ModuleMap that contains the url mapping info that I require for the plugin, and call its GetUrlMap method.

    public class Module
    {
        public string ModulePath;
		public string ModuleNamespace;
        public Assembly ModuleAssembly;
        public List&lt;UrlMapItem&gt; UrlMap;

        public Module(string modulepath)
        {
            ModulePath = modulepath;
        }

        public void Load()
        {
            if (!File.Exists(ModulePath))
            {
                throw new NoSuchModuleException(&quot;Error: No such module at '&quot; + ModulePath + &quot;'.&quot;);
            }
            ModuleAssembly = Assembly.LoadFile(ModulePath);
            int lastslash = ModulePath.LastIndexOf(@&quot;&quot;);
            string assemblynamespace = ModulePath.Substring(lastslash+1, ModulePath.LastIndexOf('.') - lastslash-1);
			ModuleNamespace = assemblynamespace;
            Type t = ModuleAssembly.GetType(assemblynamespace+&quot;.ModuleMap&quot;);
            if (t != null)
            {
                MethodInfo m = t.GetMethod(&quot;GetUrlMap&quot;);
                if (m != null)
                {
                    UrlMap = (List&lt;UrlMapItem&gt;)m.Invoke(null, (new object[]{}));
                }
                else
                {
                    throw new InvalidModuleMapException(&quot;Error: The ModuleMap class is incorrect!&quot;);
                }
            }
            else
            {
                throw new InvalidModuleMapException(&quot;Error: The ModuleMap class is missing!&quot;);
            }
        }
    }

When I use the Invoke() method, I passed it two arguments. In this case, since it is a static method, the first argument is null. The second argument is an object array that contains the arguments for the invoked method. The Invoke() method returns an instance of an object class, so I casted it into the object type I needed, which in this case was a List.

Now, that UrlMapClass, that I now have a list of, contains two pieces of information:

  • A url
  • A method with the full namespace/class path (ex. “testnamespace.testclass.methodname”)

Then I took a Hashtable and used the namespace.class.method path as the key, and the Module.ModuleAssembly as the value object. This way when I go to call it, I can enter the path that I know is being called, take the Assembly object, and do what I did before and call MethodInfo.Invoke().

Assembly assembly = (Assembly)ModuleList[methodnamespace];
Type t = assembly.GetType(method.Substring(0, method.LastIndexOf('.')));
MethodInfo m = t.GetMethod(method.Substring(method.LastIndexOf('.') + 1));
Page p = (Page)m.Invoke(null, (new object[] { rq }));

The method variable contained the whole method.class.methodname path and the methodnamespace variable contained just the namespace. To get the class for GetType() I used a substring of the whole namespace.class.methodname string that was from 0 to the last . which would give you just namespace.class.

This is a pretty simple example. I structured my plugin DLL’s in such a way that it gave me all the info I needed to call the corresponding methods in a very simple manner. There are obviously much more complex ways to do this.

 

I’ve been working on Fizzure A LOT recently. I made a FizzSrvLight that is not a distributed system like the regular one, which therefore allowed me to write one effectively in about 3 hours. On the way I decided to make a few of my own methods and then realized, hey these can be used in other projects too!

So I made a class library (.dll – Dynamically Linked Library ) with a few methods that have to do with TCP Data transmition. The most important of which is the Send method that I made. Now this is really only useful for the client. Anyway, heres the snippet:


public static void Send(TcpClient Client, String Command)
{
Console.WriteLine("Opening Server Stream");
NetworkStream n = Client.GetStream();
String send = Command;
String receive = null;
byte[] msg = System.Text.Encoding.ASCII.GetBytes(send);
n.Write(msg, 0, msg.Length);
Console.WriteLine("SENT: {0}", send);
}

this method is meant for console programs, but if you are using a GUI all you really need to do is delete the Console.WriteLines()’s in there and replace it with wherever you want the output.

Hope this is helpful to everyone!

 

Ok, this is just a quick snippet of code I wrote to get a working server up. Obviously theres more commands I could put in there in plenty of different ways, but I really just wanted to keep things simple for now. This took me about 2 hours.

This snippet is the main body of code that controls everything. If you go through it and read you’ll see that I made a struct to hold the information on files named File, in the namespace Structure. So you would access it by saying in this [MainNamespace].Structure.File; or you can just use Structure.File. I’ll paste the code for the struct at the end.

I didn’t leave too many comments because I used a lot of Writelines to tell me what it was doing, and for debugging purposes. Those kind of tell you what things do what.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Reflection;
using System.Text;

namespace FizzSrvLight
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(“FizzSrvLight :: Non-Distributed Fizzure Serving Capabilities”);
System.Threading.Thread.Sleep(1000);
Console.Write(“Loading…”);
Console.WriteLine(“!”);

Console.WriteLine(“Initiating Server Variables…”);
System.Net.IPAddress localaddr = System.Net.IPAddress.Parse(“127.0.0.1″);

Console.WriteLine(“Constructing Server Objects…”);
System.Net.Sockets.TcpListener MainServer = new System.Net.Sockets.TcpListener(localaddr, 9000);

Console.WriteLine(“Starting Server…”);
MainServer.Start();

Byte[] bytes = new Byte[1024];
String data = null;
String send = null;

while (true)
{
Console.WriteLine(“Waiting for connection…”);

// Accept Requests
System.Net.Sockets.TcpClient client = MainServer.AcceptTcpClient();
Console.WriteLine(“Client Connected!”);

// Clear Buffers
data = null;
send = null;

// Get Stream Object for reading and writing
System.Net.Sockets.NetworkStream stream = client.GetStream();

int i;

// Initialize File Holder
System.Collections.ArrayList CurrentFiles = new System.Collections.ArrayList();

// Loop to recieve all data sent from client
while ((i = stream.Read(bytes, 0, bytes.Length)) != 0)
{
// Clear buffers again
data = null;
send = null;
string message = “OK”;
// Get data as string
data = System.Text.Encoding.ASCII.GetString(bytes, 0, i);
Console.WriteLine(“FIZZ_RCV: {0}”, data);

String[] command = data.Split(‘ ‘);

// Insert Possible Commands Here
if (command[0] == “FIZZ_ADDFILE”)
{
FizzSrvLight.Structure.File file = new FizzSrvLight.Structure.File(command[1], command[2], command[3], command[4], command[5], command[6]);
CurrentFiles.Add(file);
}
else if (command[0] == “FIZZ_RMVFILE”)
{
FizzSrvLight.Structure.File file = new FizzSrvLight.Structure.File(command[1], command[2], command[3], command[4], command[5], command[6]);
CurrentFiles.Remove(file);
}
else if (command[0] == “FIZZ_AUTH”)
{
string username = command[1];
string password = command[2];
}
else
{
Console.WriteLine(“FIZZ_INVALID_INPUT”);
Console.WriteLine(“Error Handled”);
message = “ERROR”;
}

send = message;

byte[] msg = System.Text.Encoding.ASCII.GetBytes(send);

// Send back an OK response;
stream.Write(msg, 0, msg.Length);
Console.WriteLine(“FIZZ_SND: ” + message);
}
System.Threading.Thread.Sleep(1000);
}
}
}
}

Now, time for the struct.

namespace FizzSrvLight
{
namespace Structure
{
public struct File
{
public string FileName;
public string FilePath;
public string FileType;
public string SharedBy;
public string IPAddress;
public string Blacklist;

public File(string name, string path, string type, string user, string ipaddr, string blacklisted)
{
FileName = name;
FilePath = path;
FileType = type;
SharedBy = user;
IPAddress = ipaddr;
Blacklist = blacklisted;
}

}
}
}

Well, there you have it. A very simple TcpListener Serve. Obviously theres better ways to do it but this is pretty simple, straight forward, and just all around easy. Please leave comments if you find bugs in it or see errors or even if you just don’t understand what some of it does.

© 2012 Code Brain Suffusion theme by Sayontan Sinha