Week-10 – The Introspective Thinker

Apprenticeship Pattern – Unleash Your Enthusiasm

April 11, 2021David MacDonaldLeave a comment

The key focus of this design pattern is the situation in which you have more enthusiasm for software development then the rest of your colleagues. Due to this, you end up holding back to fit with the group. I have a few experiences related to both sides of this. While I’ve never partaken in a software job, I have taken part in software courses. There have been plenty of times when I’m in a group project and the topic is so interesting to me I can just get to town and code almost everything. I guess rather than having it hold me back, I end up usually embracing it.

Then there are situations similar to that which I’m in now. It isn’t precisely a mirror for the apprenticeship pattern, but it’s pretty close. I find that when I have other things I need to do are when I’m most passionate about other things. For example, I’ll most intensely want to theorize about physics when I have an assignment due; however, once I actually have free time, I’m mostly content just wasting my time playing video games. Over the last month, I’ve become fixated with Sea. I had been working on creating a Minecraft Server manager in Node.js for my friends and I, then I moved onto creating a way of backing up my playlists. Then I realized it’d probably be easiest to do it in Python, so I started rewriting the code. In that process, I fell in love with Python.

I had used it before and enjoyed it, but I was almost something of an elitist. It had no strict types, it was easy to write. I treated it as if it were a beginners language. So I never really took the time to learn it. I rewrote my code in Python and learned a lot of the joys of Python such as context managers, list/set/dict/string comprehension, etc. (I just need to figure out the SQL commands and maybe I’ll eventually finish it). I had always been aware of the fact that Python running in a single threaded interpreter will never be able to perform as well as something like C. C by itself has a lot of joy to write in. Every language has its own personality you get to learn. That said, C can be tedious to write in and to read. That began my quest to create Sea – a version of the C language with Python-like syntax. I have become passionate about designing what the ideal language would be. Something modular, with high and low level features, and is easy to write and debug. Sea is just the first step in that.

I have found that I can so easily spend six hours straight coding, debugging, and refactoring Sea code. I can then go to bed and while I’m trying to fall asleep or even while I’m dreaming, I’ll be making design decisions. Classes that are otherwise fine can seem boring by comparison. Being able to just create something functional that has a clear use case feels great. At least sometimes, it can be really easy to share that enthusiasm with other students and it overall helps all of us.

Don’t be Overwhelmed by Refactoring

November 15, 2020David MacDonald1 Comment

As I’ve mentioned previously, I taught myself how to code a few years ago. I’ve learned a lot more since then, but I’m always learning. The other day, I had an assignment for another course that involved going back and refactoring old code. Since it was so bad, I’d like to discuss it.

Concept

The code itself is a terminal based Java calculator. I imagine it was really tricky for me at the time considering the way in which I implemented it. I didn’t know about the static keyword nor about how to use methods. This serves as a great example of what not to do:

The Original Code

Now, the code itself is so horrendously bad that I have to get creative to even display it. So here’s a few entertaining lines. Keep in mind the entire program is within the main method:

int load;
double num1, num2, ans;
boolean on, autoclose, autocontinue, loading, numchecking, divide, multiply, add, subtract, other, help;
String in, in2, in3, in4, in5, operation, fa, status1, status2, status3;
char op;

Now, a sane person might ask “Why are there so many variables and why are many of them named so badly?” Well the reason is because I didn’t use methods. I used variables to separate control flow. I also find it funny that I had a String for every user input rather than just reusing one, as well as the fact that I declared all variables at the top for no reason. Here’s an example of that horrible control flow in action:

in = input.next();

if (in.equals("/")) {
	divide = true;
	numchecking = true;
	operation = "divide";
	fa = "by";
	op = '/';
}

else if (in.equals("*")) {
	multiply = true;
	numchecking = true;
	operation = "multiply";
	fa = "by";
	op = '*';
}

I would ask for input, set these variables, and then later on:

if (numchecking) {
	Thread.sleep(1000);
	System.out.println("Enter your first number.");
	num1 = input.nextDouble();
	System.out.println("Enter another number to " + operation + "  " + fa + " " + num1);
	num2 = input.nextDouble();

	if (divide) {
		ans = num1 / num2;
	}
	if (multiply) {
		ans = num1 * num2;
	}
	if (add) {
		ans = num1 + num2;
	}
	if (subtract) {
		ans = num1 - num2;
	}

	Thread.sleep(3000);
	System.out.println("Calculating...");
	Thread.sleep(1000);
	System.out.println(num1 + " " + op + " " + num2 + " = " + ans);

I have no idea why I used if else previously but not here. I can understand I didn’t know how to use a switch statement yet I guess. Anyway, I used Thread.sleep() to add artificial delay for some reason in the program. This code I’ve shown is pretty tame honestly. Notice the line counts. I recommend taking a look at the original source code so you can really appreciate how horrible it is. Unfortunately, I had to convert the .java file to a .docx file to upload it to WordPress:

Original Calculations.java Source Code Download

The Refactoring Process

I find myself fall into the same hole: When I realize I can do something in a better way but it’s large and intimidating, I prefer to start from the beginning rather than modifying the current product. Sometimes, that is extremely useful and you just need a solid clean start. However, often that’s overkill and wastes time. I tend to do that even in video games.

As an example, one of my all time favorite games is Factorio which is an indie game about building factories and trying to automate everything. The goal of the game is to get the game to play itself. Anyway, I have over 500 hours in this game and I haven’t actually reached the end goal of launching a rocket. It’s not because I don’t know how to do it or I die too quickly. It’s because I’m never satisfied with my factory layout or my world generation settings. When it comes to world generation, I do actually have to start over. When it comes to factory layout, I could take the time to manually replace the entire layout and keep my current research. Despite that, I almost always start over.

The saving grace with code is that, when its scale is manageable, it can be incredibly fun and relaxing to refactor. Sometimes it’s tough to get started but once you do, it’s a really fun time. I honestly had less trouble refactoring as I did with the assignment itself. The assignment wanted me to make a change based on a guide. Make the change, write it down, and continue. However, I fell into refactoring and made change after change extremely quickly. With code this bad, it was really easy to just aim at one thing and make 40 changes that all fit into unique categories. So I prioritized refactoring the code well over documenting it well.

Virtually all of those variables I had before are gone. Here is the refactored beginning of the code:

private static final String[] OPERATORS = {"+", "-", "*", "/"};
private static Scanner scanner;
private static boolean autoContinue;
private static boolean autoClose;
private static boolean continueOnce;

public static void main(String[] args) {
    boolean programIsOn = true;

You’ll notice I made use of static variables. I only have one variable in the main method and it controls the loop of the program. Other variables are now in methods or simply don’t exist. I even created an array of operators to allow for easier expansion of functionality later on, despite the fact that I’ll almost certainly never come back to this project. I also have 3 booleans on 3 separate lines. This is my personal preference, but with only 3, I would understand simply writing: private static boolean autoContinue, autoClose, continueOnce; I just tend to lean on keeping things on separate lines. Although now that I’ve written that, I kind-of do prefer that. Although it would mess up the width aesthetic going on because it’s such a wide line.

Before, I showed part of how I took in user input and managed arithmetic operations. I set a bunch of variables and handled it later. Here’s how I manage it now:

private static void handleOperations() {
    String operation = scanner.next();
    System.out.println();

    if(isArithmeticOperator(operation)) {
        arithmeticOperation(operation);
        return;
    }

    switch(operation) {
        case "help":
            help();
            continueOnce = true;
            break;
        case "exit":
            autoClose = true;
            break;
        default:
            System.out.println("Sorry, I don't understand that operation. Try again.");
            continueOnce = true;
    }
}

I created a method to handle it that is called in the main loop. It has good variable names, uses a switch statement, and calls methods that have descriptive names. It could be better but it is leagues better than what it was before. Originally in the refactoring process, I had:

private static boolean doUserRequestedMethod() {
    switch(scanner.next()) {
        case "/":
            divide();
            break;
        case "*":
            multiply();
            break;
        case "+":
            add();
            break;
        case "-":
            subtract();
            break;
        case "help":
            help();
            return true;
        case "exit":
            autoClose = true;
            break;
        default:
            System.out.println("Sorry, I don't understand. Try again.");
            return true;
    }

    return false;
}

It returned a boolean value to allow the loop to continue if it needed to, such as if the user entered an unknown command. I replaced that with global (static) variables to control that. I renamed the method for clarity, and I created a single arithemticOperation() method to avoid code repetition. However, that function itself needed a switch statement, so rather than check the operation twice, I split off the arithmetic operations from the original switch statement in a way that allowed me to add more arithmetic operations in the future. I’m pretty happy with that solution.

Lastly, since I showed how the numbers are calculated originally, I should show that in the refactored version. First I have to check if the input was for an arithmetic operation:

private static boolean isArithmeticOperator(String potentialOperator) {
    for(String operator : OPERATORS) {
        if(potentialOperator.equals(operator))
            return true;
    }

    return false;
}

This was part of why I created the OPERATORS array, so that I could avoid a long boolean of &&s. Then for the actual calculation:

private static void arithmeticOperation(String operator) {
    System.out.println("Enter a number: ");
    double leftOperand = scanner.nextDouble();

    String partialEquation = leftOperand + " " + operator + " ";
    System.out.print(partialEquation);

    double rightOperand = scanner.nextDouble();
    double result = leftOperand;

    switch(operator) {
        case "/":
            result /= rightOperand;
            break;
        case "*":
            result *= rightOperand;
            break;
        case "-":
            result -= rightOperand;
            break;
        case "+":
            result += rightOperand;
            break;
    }

    System.out.println(partialEquation + rightOperand + " = " + result);
}

Again, you’ll find decent variable names and a more clear control flow. Obviously this code isn’t flawless but that’s not the real goal in refactoring. The point of refactoring is to create an improvement. You’ll never have perfect code, but you can always improve your code. This is pretty analogous to life itself. Recognize the value and functionality currently there, recognize that you will never be perfect, but always aim to be better.

Here is the current state of the refactored code. It’s honestly amazing how much better it is:

Refactored Calculations.java Source Code Download

Conclusion

Refactoring is not something to be feared; it should be enjoyed. There is something very relaxing about it if you enter it with the right mentality. Focus on small things you can easily tackle that would make a big improvement and do that. As you slowly cross off changes you need to make, it’ll become more manageable and more readable. In that program above, I cut the number of lines in half after refactoring. I can actually read it and understand what it’s doing. It should be satisfying to go through and make progress towards simplicity and organization. You just have to want it.

The Insatiable Quest for Prime Numbers

November 15, 2020David MacDonaldLeave a comment

I have been coding at least since 2013. I started off self taught and now I’m majoring in CS, while still constantly learning outside of classes. I went from language to language, but something that almost always followed me was the idea of a prime number generator.

I must’ve tried this in almost every language I’ve used. It’s a very simple idea: create a console based program that can both check if an integer is prime, and print out a list of primes in order as quickly as possible. Usually, I don’t get too far for reasons I’ll get into. Although, I did succeed mostly with C++. I had an incredibly quick program (once I realized how terribly slow it was to print out the results to the screen) that ran on multiple threads and could fill up a file with prime numbers. I had two main problems: the file it filled up became so big that notepad wouldn’t be able to open it and the max size of an unsigned long.

Looking back at that code, it was a horrible mess and I’ve come a long way in both C++ and general code design. However, ever since then my quest began to create a data structure from scratch that could allow me to handle enormous primes. I managed to create a class that used strings to hold values. I defined all necessary arithmetic operations and it worked. Incredibly slowly. I should also mention that while I’m constantly coming back to this idea, my attention wanes as I run out of time between vacations.

Using strings was a horrible idea. Not only are they really slow when you try to treat them like numbers, but they are a huge waste of memory. Each character is 1 byte and in base 10 I would only be using 10 values per character. Even using base Z (where I use 0-9 and then a-z and finally A-Z), it would be a very stupid waste of space. So I started thinking about how I could store numbers more efficiently. In C++, it really isn’t that hard.

The solution I came up with involved std::size_t’s (size_t). It’s just a compiler name for the largest possible unsigned integer. Since I have a decent background in number bases (see my blog post on Duodecimal and why we should all switch to it), I was inspired. My design was a linked list structure of size_t’s. I used a linked list rather than an array because I wanted “small” numbers to not take up a lot of space, I wanted to avoid reallocation, and I also wanted to prevent an upper limit for the size. In theory, this object can be as large as memory allows. If I used something like a std::vector, it’s size itself is stored as a size_t.

Each size_t is the largest possible integer in C++. And each node in the linked list acts like a digit of a base. Let n be the largest size_t number. Suppose you have n + 1. That number would basically just be 1<—>0 in this form. Then you keep ticking up the 1’s digit, etc. This means that the numbers are stored incredibly densely. We’re dealing with base (n+1). In my case, I think n is 64 bits. So n=2⁶⁴ – 1. Anyway, to store n² , you would need 2 nodes. To store n^k you only need 64*k bits.

This is insanely better than using strings. Since they’re integers already as well, I get to avoid conversion. My problem recently has been finding the time and motivation to work on this. I try to create unit tests so I can guarantee it’s working as expected and I get bored. I also need to find efficient algorithms for arithmetic operations.

Despite my inability to complete this project, it does a good job of showing the benefits of Object Oriented Programming. I can take all of this complexity and shove it into a class. Once that class is done, I can create the functions to simply calculate primes the normal way.

Checking Primality

The most basic way to check if an integer is prime is to check if any positive integers less than it (other than 1) divide it. From there, you might realize that you can skip all of the even numbers after 2 because if 2 doesn’t divide it, then neither will any of the evens. Then you can also skip all integers larger than it’s square root as factors come in pairs. A few proofs would easily show these results to be true. Then you can continue to progress downwards. However, I’ve come up with a method as well.

How come after you check 2 you can skip all of the even numbers? Okay I suppose I should do one simple proof here:

In fact, let me do this in general:

I enjoy a nice simple proof every now and then, even though I could’ve just cited transitivity of divides. Anyway, what this is telling us is that this applies for any factors. Any time an integer does not divide our potential prime, we can ignore all multiples.

What I’m getting at is that the hypothetically most efficient method for verifying primality is to check all primes up to it’s square root. Since every number is either prime or the multiple of primes, checking a prime also checks every single multiple of that prime. The problem is that you need a list of primes to check. Hence, my program would store primes and use them to verify if a number is a prime.

You may be thinking that if I have a list of primes, why bother at all with arithmetic. And thinking about it, having an “exhaustive” list of primes (an exhaustive infinite list…) and simply checking that list could be very efficient to verify a number is prime. However it would be pretty inefficient to verify that the average number is not prime. The reason I’m not using that method is because my list of primes is small and slowly grows. I only need to store primes up to the square root recall. That means to check 100, I’ll only have 2,3,5, and 7 stored. To check 10,000, I’ll only have all 2 digit primes.

So as I verify primes the list would grow and add more factors in order. You would also want to check the smaller primes first. While you can’t exactly say that more integers are even than are multiples of 5 due to the set being infinite, any complete finite subset of consecutive integers has this property. Multiples of 2 will make up half of the elements, multiples of 3 one third, etc. So you’ll want to start with the smaller primes first.

How to determine divisibility is also up to you. For instance, in base 10 we can rule out any integer who’s first digit is a 0 or 5 (except the number 5 itself) since that rules out multiples of 5 and 10. However, when working with duodecimal, I realized different number bases have different properties for divisibility. For example, in base 12 every multiple of 4 and 8 ends in 0,4, or 8. Every multiple of 3 and 9 ends in 0,3,6, or 9. Every multiple of 6 ends in 0 or 6. Every multiple of 12 ends in 0.

You could potentially have an algorithm to convert the integer to a different number base that is more efficient than performing an arithmetic calculation. Especially considering that for these tricks, you only need to convert the first few digits of the number. I think it’s a really interesting idea, however you risk circumventing the benefits of the machine code. Often times you try to make something more efficient, and then it’s either unnecessary or makes performance worse because the compiler was able to do a better job “fixing” your code. I think either way I’ll have to experiment with these idea in the future. Thinking about it, you only need to convert the number to the base of the prime and then check if the 1s digit is 0. You can then modify how many digits you actually convert based on how many digits the base-prime number will be. For larger numbers this might have potential.

Conclusion

While this project may not be the best example, I think it’s very useful for programmers to have a “go to” project for practicing an unfamiliar language. Something simple enough to allow you to write it from memory, while complex enough to give you a good grasp of the language. In this primes project, I have to handle the terminal, arithmetic, functions, objects, files, threads, etc. If I simplified it down and ignored my ambitions, it would server as a very functional example to allow you to learn the syntax and libraries of a new language. Ideally, you wouldn’t even have the original code to look at. You would just program the functionality from memory using your IDE and Google to figure out syntax. The best way to learn a language is to use it. If you have something familiar in your mind, then you have a great example project to work on already. I find struggling to implement functionality and spending a lot of time searching for a solution has taught me an invaluable amount about the languages I use.