Let’s have fun with prime numbers? In this post, I would like to share some results I got from using ...
O post Let’s have fun with prime numbers, threads, thread pool, TPL and CUDA? apareceu primeiro em Elemar JR.
]]>Let’s have fun with prime numbers? In this post, I would like to share some results I got from using multi-threading with .NET and CUDA to find prime numbers in a range.
My machine:
It is important to say that I am NOT using the best algorithms here. I know there are better approaches to find prime numbers. Also, I am pretty sure there are a lot of improvements that I could implement in my code. So, take it easy. Right?
The book Pro .NET performance inspired the code in this post.
Let’s start with a straightforward sequential implementation.
static void Main() { var sw = new Stopwatch(); sw.Start(); var result = PrimesInRange(200, 800000); sw.Stop(); Console.WriteLine($"{result} prime numbers found in {sw.ElapsedMilliseconds / 1000} seconds ({Environment.ProcessorCount} processors)."); } public static long PrimesInRange(long start, long end) { long result = 0; for (var number = start; number < end; number++) { if (IsPrime(number)) { result++; } } return result; } static bool IsPrime(long number) { if (number == 2) return true; if (number % 2 == 0) return false; for (long divisor = 3; divisor < (number / 2); divisor += 2) { if (number % divisor == 0) { return false; } } return true; }
Time to run: ~76 seconds!
public static long PrimesInRange(long start, long end) { long result = 0; var lockObject = new object(); var range = end - start; var numberOfThreads = (long) Environment.ProcessorCount; var threads = new Thread[numberOfThreads]; var chunkSize = range / numberOfThreads; for (long i = 0; i < numberOfThreads; i++) { var chunkStart = start + i * chunkSize; var chunkEnd = (i == (numberOfThreads - 1)) ? end : chunkStart + chunkSize; threads[i] = new Thread(() => { for (var number = chunkStart; number < chunkEnd; ++number) { if (IsPrime(number)) { lock (lockObject) { result++; } } } }); threads[i].Start(); } foreach (var thread in threads) { thread.Join(); } return result; }
This is a naïve implementation. Do you know why? Share your thoughts in the comments.
Time to run: ~23 seconds.
public static long PrimesInRange2_1(long start, long end) { //var result = new List(); var range = end - start; var numberOfThreads = (long)Environment.ProcessorCount; var threads = new Thread[numberOfThreads]; var results = new long[numberOfThreads]; var chunkSize = range / numberOfThreads; for (long i = 0; i < numberOfThreads; i++) { var chunkStart = start + i * chunkSize; var chunkEnd = i == (numberOfThreads - 1) ? end : chunkStart + chunkSize; var current = i; threads[i] = new Thread(() => { results[current] = 0; for (var number = chunkStart; number < chunkEnd; ++number) { if (IsPrime(number)) { results[current]++; } } }); threads[i].Start(); } foreach (var thread in threads) { thread.Join(); } return results.Sum(); }
Time to run: ~23 seconds.
public static long PrimesInRange(long start, long end) { long result = 0; var range = end - start; var numberOfThreads = (long)Environment.ProcessorCount; var threads = new Thread[numberOfThreads]; var chunkSize = range / numberOfThreads; for (long i = 0; i < numberOfThreads; i++) { var chunkStart = start + i * chunkSize; var chunkEnd = i == (numberOfThreads - 1) ? end : chunkStart + chunkSize; threads[i] = new Thread(() => { for (var number = chunkStart; number < chunkEnd; ++number) { if (IsPrime(number)) { Interlocked.Increment(ref result); } } }); threads[i].Start(); } foreach (var thread in threads) { thread.Join(); } return result; }
Time to Run: ~23 seconds.
public static long PrimesInRange(long start, long end) { long result = 0; const long chunkSize = 100; var completed = 0; var allDone = new ManualResetEvent(initialState: false); var chunks = (end - start) / chunkSize; for (long i = 0; i < chunks; i++) { var chunkStart = (start) + i * chunkSize; var chunkEnd = i == (chunks - 1) ? end : chunkStart + chunkSize; ThreadPool.QueueUserWorkItem(_ => { for (var number = chunkStart; number < chunkEnd; number++) { if (IsPrime(number)) { Interlocked.Increment(ref result); } } if (Interlocked.Increment(ref completed) == chunks) { allDone.Set(); } }); } allDone.WaitOne(); return result; }
Time to Run: ~16 seconds.
public static long PrimesInRange4(long start, long end) { long result = 0; Parallel.For(start, end, number => { if (IsPrime(number)) { Interlocked.Increment(ref result); } }); return result; }
Time to Run: ~16 seconds.
#include "device_launch_parameters.h" #include "cuda_runtime.h" #include <ctime> #include <cstdio> __global__ void primes_in_range(int *result) { const auto number = 200 + (blockIdx.x * blockDim.x) + threadIdx.x; if (number >= 800000) { return; } if (number % 2 == 0) return; for (long divisor = 3; divisor < (number / 2); divisor += 2) { if (number % divisor == 0) { return; } } atomicAdd(result, 1); } int main() { auto begin = std::clock(); int *result; cudaMallocManaged(&result, 4); *result = 0; primes_in_range<<<800, 1024>>>(result); cudaDeviceSynchronize(); auto end = std::clock(); auto duration = double(end - begin) / CLOCKS_PER_SEC * 1000; printf("%d prime numbers found in %d milliseconds", *result, static_cast<int>(duration) ); getchar(); return 0; }
Time to Run: Less than 2 seconds.
I strongly recommend you to reproduce this tests on your machine. If you see something that I could do better, please, share your ideas.
I understand that performance is a feature. I will continue to blog about it. Subscribe the contact list, and I will send you an email every week with the new content.
O post Let’s have fun with prime numbers, threads, thread pool, TPL and CUDA? apareceu primeiro em Elemar JR.
]]>Some days ago, I heard a fantastic interview with Phil Haack on the IT Career Energizer Podcast. Here is the ...
O post About “Difficult Conversations” (book recommended by Phil Haack) apareceu primeiro em Elemar JR.
]]>Some days ago, I heard a fantastic interview with Phil Haack on the IT Career Energizer Podcast. Here is the episode description:
In this episode, Phil Haack tells us why we need to be prepared to have difficult conversations and why this can help your career. Phil also talks about the importance of taking care when writing code and why you should test your code carefully.
What amazing show!
In this interview, Phil recommended an excellent book: Difficult Conversations: How to Discuss What Matters Most. I listened to the audio version of the book, and it is impressive. I strongly recommend it.
The book gives a lot of good suggestions about how to succeed in challenging conversations. Here is a short description (from Amazon):
We attempt or avoid difficult conversations every day-whether dealing with an underperforming employee, disagreeing with a spouse, or negotiating with a client. From the Harvard Negotiation Project, the organization that brought you Getting to Yes, Difficult Conversations provides a step-by-step approach to having those tough conversations with less stress and more success. you’ll learn how to:
I wish I had the opportunity to read (or listen) it some years ago. For sure I would have less stressful moments.
Thanks Phil!
O post About “Difficult Conversations” (book recommended by Phil Haack) apareceu primeiro em Elemar JR.
]]>If you ask me one tip to improve the performance of your applications, it would be: Design your objects to ...
O post The #1 Rule for .NET Performance apareceu primeiro em Elemar JR.
]]>If you ask me one tip to improve the performance of your applications, it would be:
Design your objects to be collected on gen #0
or not at all.
Naturally, following this recommendation demands you to know, at least the basics, of how the garbage collection works. But, if you are interested in improving the performance of your applications, this is necessary at all.
In fact, the garbage collector was explicitly designed to be very efficient performing gen #0.
Garbage collector gets more expensive in each generation. Even with background processing for gen #2, there is still a very high CPU cost to pay. Consider that you should avoid gen #1 collections too. In most scenarios, gen #1 objects are directly promoted to gen #2.
Ideally, every object you allocate goes out of scope by the time next gen #0 comes around. You can measure how long is that interval and compare it to the duration that data is alive in your application using tools such as PerfView (By the way, take some time and learn how to use PerfView – this is such great tool).
Following this recommendation requires a shift in your mindset. It will inform nearly every aspect of your application (I strongly recommend you this Ayende’s talk about this topic).
Here are some guidelines to keep in mind when writing your code:
Isn’t that enough? In the next weeks, I will share in-depth information about how you could follow these recommendations to get improved .NET applications performance. Sign in my contacts list, and you will receive notification when new content arrives.
Last but not least, remember that Performance is a feature. If the performance of your application is not that good, chances are you missing opportunities to deliver business value.
Cover image: chuttersnap
O post The #1 Rule for .NET Performance apareceu primeiro em Elemar JR.
]]>In this post, I would like to explain a basic but confusing concept of CUDA programming: Thread Hierarchies. It will ...
O post Understanding the basics of CUDA Thread Hierarchies apareceu primeiro em Elemar JR.
]]>In this post, I would like to explain a basic but confusing concept of CUDA programming: Thread Hierarchies. It will not be an exhaustive reference. We will not cover all aspects, but it could be a nice first step.
If you are starting with CUDA and want to know how to setup your environment, using VS2017, I recommend you to read this post.
To get started, let’s write something straightforward to run on the CPU.
#include "cuda_runtime.h" #include "device_launch_parameters.h" #include <cstdio> void printHelloCPU() { printf("Hello World from the CPU"); } int main() { printHelloCPU(); getchar(); return 0; }
Now, let’s change this code to run on the GPU.
#include "cuda_runtime.h" #include "device_launch_parameters.h" #include <cstdio> __global__ void printHelloGPU() { printf("Hello World from the GPU\n"); } int main() { printHelloGPU<<<1, 1>>>(); cudaDeviceSynchronize(); getchar(); return 0; }
The cudaDeviceSyncronize function determines that all the processing on the GPU must be done before continuing.
Let’s remember some concepts we learned in a previous post:
__global__
keyword indicates that the following function will run on the GPU.__global__
keyword return type void
.<<< ... >>>
syntax.At a high level, the execution configuration allows programmers to specify the thread hierarchy for a kernel launch, which defines the number of thread blocks, as well as
how many threads to execute in each block.
Notice, in the previous example, the kernel is launching with 1
block of threads (the first execution configuration argument) which contains 1
thread (the second configuration argument).
The execution configuration allows programmers to specify details about launching the kernel to run in parallel on multiple GPU threads. The syntax for this is:
<<< NUMBER_OF_BLOCKS, NUMBER_OF_THREADS_PER_BLOCK>>>
A kernel is executed once for every thread in every thread block configured when the kernel is launched.
Thus, under the assumption that a kernel called printHelloGPU
has been defined, the following are true:
printHelloGPU<<<1, 1>>>()
is configured to run in a single thread block which has a single thread and will, therefore, run only once.printHelloGPU<<<1, 5>>>()
is configured to run in a single thread block which has 5 threads and will, therefore, run 5 times.printHelloGPU<<<5, 1>>>()
is configured to run in 5 thread blocks which each have a single thread and will, therefore, run five times.printHelloGPU<<<5, 5>>>()
is configured to run in 5 thread blocks which each have five threads and will, therefore, run 25 times.Let me try to explain this graphically:
In the drawing, each blue rectangle represents a thread. Each gray rectangle represents a block.The green rectangle represents the grid.
In the kernel’s code, we can access variables provided by CUDA. These variables describe the thread, thread block, and grid.
gridDim.x
is the number of the blocks in the grids.
blockIdx.x
is the index of the current block within the grid.
blockDim.x
is the number of threads in the block. All blocks in a grid contain the same number of threads.
threadIdx.x
is index of the thread within a block (starting at 0).
As you noted, we have been using the suffix .x
for all variables. But, we could use, .y
and .z
as well.
The CUDA threads hierarchy can be 3-dimensional.
#include "cuda_runtime.h" #include "device_launch_parameters.h" #include <cstdio> __global__ void printHelloGPU() { printf("Hello x: #%d y: #%d\n", threadIdx.x, threadIdx.y); } int main() { dim3 threads(3, 3); printHelloGPU<<<1, threads>>>(); cudaDeviceSynchronize(); getchar(); return 0; }
We can use the dim3
structure to specify dimensions for blocks and threads. In the example, we specified we are creating a 2-dimensional structure (3x3x1).
Consider:
printHelloGPU<<<1, 25>>>()
is configured to run in a single thread block which has 25 threads and will, therefore, run 25 times.printHelloGPU<<<1, dim3(5, 5)>>>()
is configured to run in a single thread block which has 25 threads and will, therefore, run 25 times.printHelloGPU<<<5, 5>>>()
is configured to run in 5 thread blocks which each has 5 threads and will therefore run 25 times.So, what configuration is right? Answer: All choices are valid. What should you use? It depends.
As you know, each thread will run the kernel once. If you are working on some data in memory, you should use the configuration that makes easier to address the data, using the thread hierarchy variables. Also, your graphics card has limitations that you need to consider.
If you are like me, you will need some time to understand Thread Hierarchies. In future posts, I will start to share some practical examples that can make it simpler.
For a while, feel free to comment this post.
Cover: Ilze Lucero
O post Understanding the basics of CUDA Thread Hierarchies apareceu primeiro em Elemar JR.
]]>In this post, I’m going to share with you one of the RavenDB 4 features that I like the most: ...
O post Let me Help You to Start Using RavenDB4: Data Subscriptions apareceu primeiro em Elemar JR.
]]>In this post, I’m going to share with you one of the RavenDB 4 features that I like the most: Data Subscriptions.
It’s simpler to explain this concept with an example. Consider the following query:
from Orders where Lines.length > 5
This would retrieve all the big orders from your database. But, what if a new big order is added after you run this query. What if you want to be notified whenever a big order occurs? YEAH! THIS IS WHAT I WAS TALKING ABOUT.
I assume that, at this point, you know how to open the Studio and the Northwind database. If this is not the case, I would recommend you to read this post where I teach you, step-by-step, the basics.
Do that! I will wait…
Done?!
To create a Data Subscription, you need to open the Settings section of your database, click on the Manage Ongoing Tasks option, then click on the Add Task button.
Then, click on Subscription. RavenDB will ask you a Task Name (let’s call it Big Orders) and a query. In this demo, I will provide the very same query we used before.
The user interface allows me to do a test before save (I did it!).
Now, I will show you how to subscribe and consume a data subscription.
Subscriptions are consumed by processing batches of documents received from the server.
static void Main(string[] args) { var subscriptionWorker = DocumentStoreHolder.Store .Subscriptions .GetSubscriptionWorker("Big Orders"); var subscriptionRuntimeTask = subscriptionWorker.Run(batch => { foreach (var order in batch.Items) { // business logic here. Console.WriteLine(order.Id); } }); WriteLine("Press any key to exit..."); ReadKey(); }
There are some important facts that you need to know to use this feature correctly.
Pretty exciting feature. Am I right? So, please, share the scenarios where you would like to use it. I would be glad to help you if you need.
O post Let me Help You to Start Using RavenDB4: Data Subscriptions apareceu primeiro em Elemar JR.
]]>As professionals, we should focus on developing great technical solutions. But, these solutions need to solve real problems. At the ...
O post 3 mistakes that Developers make that prevent them from creating software that meets the business needs apareceu primeiro em Elemar JR.
]]>Yes! The business world is dynamic and changes every time.
Managers and marketing people don’t understand how costly it is to change software.
Cover image from: Nathan Dumlao
O post 3 mistakes that Developers make that prevent them from creating software that meets the business needs apareceu primeiro em Elemar JR.
]]>In the previous post, I shared some good things about our new query language: RQL. Now, I will show you ...
O post Let me Help You To Start Using RavenDB4: Goodbye Transformers, Welcome Server-side Projections apareceu primeiro em Elemar JR.
]]>In the previous post, I shared some good things about our new query language: RQL. Now, I will show you how to shape your query results using server-side projections.
Transformers were removed and substituted by server-side projection support. Methods like TransformWith are no longer available, and simple Select should be used instead.
Instead of pulling full documents in query results you can just grab some pieces of data from documents. You can also transform the projected results.
Let me share a short example:
// request Name, City and Country for all entities from 'Companies' collection var results = session .Query<Company>() .Select(x => new { Name = x.Name, City = x.Address.City, Country = x.Address.Country }) .ToList();
This approach is a lot easier than Transformers! Right?
The related RQL is pretty simple as well:
from Companies select Name, Address.City as City, Address.Country as Country
I love how expressive and easy to understand RQL is.
Another example? Here we go:
var results = (from e in session.Query<Employee>() select new { FullName = e.FirstName + " " + e.LastName, }).ToList();
And here it is the RQL:
from Employees as e select { FullName : e.FirstName + " " + e.LastName }
Let’s do something more complicated.
var results = (from e in session.Query<Employee>() let format = (Func<Employee, string>)(p => p.FirstName + " " + p.LastName) select new { FullName = format(e) }).ToList();
What are we doing? We just created a function that will run on the server-side.
But, … let’s look the RQL.
declare function output(e) { var format = function(p){ return p.FirstName + " " + p.LastName; }; return { FullName : format(e) }; } from Employees as e select output(e)
OMG! Yes, that is it! We can define functions when coding with RQL. These functions are pure Javascript (You shouldn’t have doubts about the0 Javascript power).
What if we need to mix data from multiple documents? Easy and simple:
var results = (from o in session.Query<Order>() let c = RavenQuery.Load<Company>(o.Company) select new { CompanyName = c.Name, ShippedAt = o.ShippedAt }).ToList();
Very similar in RQL:
from Orders as o load o.Company as c select { CompanyName: c.Name, ShippedAt: o.ShippedAt }
RavenDB4 makes easier to shape query results. No more transformers!
Want to learn more about this functionality, I recommend you to read to read our article that covers the projection functionality which can be found here.
Go write some code!
Cover: Ignacio Giri
O post Let me Help You To Start Using RavenDB4: Goodbye Transformers, Welcome Server-side Projections apareceu primeiro em Elemar JR.
]]>If you are interested in performance, you need to know more about CUDA. From the official website: CUDA® is a ...
O post Getting started with CUDA (using VS2017) apareceu primeiro em Elemar JR.
]]>If you are interested in performance, you need to know more about CUDA.
From the official website:
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
In GPU-accelerated applications, the sequential part of the workload runs on the CPU – which is optimized for single-threaded performance – while the compute intensive portion of the application runs on thousands of GPU cores in parallel. When using CUDA, developers program in popular languages such as C, C++, Fortran, Python and MATLAB and express parallelism through extensions in the form of a few basic keywords.
In this post, I will guide you through your first steps with CUDA. Also, I will show you how to move some basic processing from the CPU to the GPU.
Let’s start getting some bytes.
I am using Visual Studio 2017 (version 15.6.0). To code with CUDA, you will need to download and install the CUDA Toolkit. The Toolkit includes Visual Studio project templates and the NSight IDE (which it can use from Visual Studio).
Also, you will need to install the VC++ 2017 toolset (CUDA is still not compatible with the latest version of Visual Studio).
The easiest way to start a project that uses CUDA is utilizing the CUDA template.
To be able to compile this, you will need to change the Project Properties to use the Visual Studio 2015 toolset.
I recommend you to clean the template’s boilerplate changing the content of the file kernel.cu to this:
#include "cuda_runtime.h" #include "device_launch_parameters.h"
Let’s add two arrays (a and b) putting the results in a third array (c).
#include "cuda_runtime.h" #include "device_launch_parameters.h" #include <stdio.h> void add_arrays_cpu(int* a, int* b, int* c, const int count) { for (auto i = 0; i < count; i++) { c[i] = a[i] + b[i]; } } int main() { const auto count = 5; int a[] = { 1, 2, 3, 4, 5 }; int b[] = { 10, 20, 30, 40, 60 }; int c[count]; add_arrays_cpu(a, b, c, count); for (auto i = 0; i < count; i ++) { printf("%d ", c[i]); } getchar(); return 0; }
This code is pretty simple. But it is tough to parallelize.
Whenever we want to use parallelism, we need to create functions that could be executed independently. The function add_arrays_cpu has a for-loop that runs the add process in a sequential fashion. But, there is no reason to work in this way. Let’s change it:
#include "cuda_runtime.h" #include "device_launch_parameters.h" #include <stdio.h> void add_array_element_cpu(int* a, int * b, int* c, const int index) { c[index] = a[index] + b[index]; } int main() { const auto count = 5; int a[] = { 1, 2, 3, 4, 5 }; int b[] = { 10, 20, 30, 40, 60 }; int c[count]; add_arrays_cpu(a, b, c, count); for (auto i = 0; i < count; i++) { add_array_element_cpu(a, b, c, i); } for (auto i = 0; i < count; i ++) { printf("%d ", c[i]); } getchar(); return 0; }
The function add_array_element_cpu can be executed independently, and that is great. We could start a thread for each position of the array, and that would work fine (I am not saying that would be the right thing to do with CPUs, just as an example).
The next logical step is to start using the GPU to run our code. We will not move all the functions to the GPU, but only the functions we think we could be parallelized.
#include "cuda_runtime.h" #include "device_launch_parameters.h" #include <stdio.h> __global__ void add_arrays_gpu(int* a, int *b, int* c) { c[threadIdx.x] = a[threadIdx.x] + b[threadIdx.x]; } int main() { const auto count = 5; int host_a[] = { 1, 2, 3, 4, 5 }; int host_b[] = { 10, 20, 30, 40, 60 }; int host_c[count]; int *device_a, *device_b, *device_c; const int size = count * sizeof(int); cudaMalloc(&device_a, size); cudaMalloc(&device_b, size); cudaMalloc(&device_c, size); cudaMemcpy( device_a, host_a, size, cudaMemcpyHostToDevice ); cudaMemcpy( device_b, host_b, size, cudaMemcpyHostToDevice ); add_arrays_gpu <<<1, count >>> (device_a, device_b, device_c); cudaMemcpy( host_c, device_c, size, cudaMemcpyDeviceToHost ); for (auto i = 0; i < count; i ++) { printf("%d ", host_c[i]); } getchar(); return 0; }
We marked the add_arrays_gpu function as __global__. Using CUDA terminology, this function is a kernel, and that will be executed from the GPU. It runs on the GPU, called from the CPU (there is another qualifier location qualifier, __device__ that should be used with functions that run on GPU, called from the GPU).
__global__ functions are invoked using the special <<<…>>> syntax. The parameters we used indicates that we are running the kernel function in 1 block of count (5) threads. We will talk more about it in the future.
Note that we don’t need to inform an index anymore – we retrieve the index position from threadIdx.x. threadIdx is a special variable, provided by CUDA runtime, that informs the position of the current thread in the thread block (we are using five threads, one thread block to run our code. Because of that, we were able to use this information as the index in the array).
I would like to make a special consideration about an vital pattern that you will see a lot of times when doing CUDA programming. To run, GPU code is not allowed to access CPU memory (and vice-versa). So we will need:
I know, I know! Adding two arrays with only five elements each is not an exciting example. But, in this post, we did the setup to use CUDA on our machines. Then, I helped you to move your could from sequential running on CPU to parallel on GPU. In the future, let’s return to it and get some real benefits.
Cover image: Jean Gerber
O post Getting started with CUDA (using VS2017) apareceu primeiro em Elemar JR.
]]>I wrote this post in 2016. Unfortunately, I lost it when I “rebooted” my blog. Anyway, I have a good ...
O post How to parse, simplify, differentiate and evaluate/solve (using Newton’s method) equations and expressions in F#. apareceu primeiro em Elemar JR.
]]>I wrote this post in 2016. Unfortunately, I lost it when I “rebooted” my blog. Anyway, I have a good friend who said this was the blog post that the most liked ever. So, I decided to bring it back (with some improvements and adjustments).
This post was originally written in Portuguese (you can see it here). I do not usually translate my posts, but I will make an exception here. Right?
In this post, I will show how to parse, simplify, differentiate and solve (using Newton’s method) equations using F#.
The first step to parse expressions and equations is to create a “good-enough” object model.
type Expr = | X | Const of value: double | Add of Expr * Expr | Sub of Expr * Expr | Mult of Expr * Expr | Div of Expr * Expr | Pow of Expr * Expr | Neg of Expr // 2 + 2 / 2 let sample = Add(Const(2.), Div(Const(2.), Const(2.))
It is fantastic how easy it is when using F#.
In this implementation, I use F# Active Patterns to do the parsing.
open System let (|Digit|_|) = function | x::xs when Char.IsDigit(x) -> Some(Char.GetNumericValue(x), xs) | _ -> None let (|IntegerPart|_|) input = match input with | Digit(h, t) -> let rec loop acc = function | Digit(x, xs) -> loop ((acc * 10.) + x) xs | xs -> Some(acc, xs) loop 0. input | _ -> None "10" |> List.ofSeq |> (|IntegerPart|_|) let (|FractionalPart|_|) = function | '.'::t-> let rec loop acc d = function | Digit(x, xs) -> loop ((acc * 10.) + x) (d * 10.) xs | xs -> (acc/d, xs) Some(loop 0. 1. t) | _ -> None "10" |> List.ofSeq |> (|FractionalPart|_|) ".34" |> List.ofSeq |> (|FractionalPart|_|) let (|Number|_|) = function | IntegerPart(i, FractionalPart(f, t)) -> Some(i+f, t) | IntegerPart(i, t) -> Some(i, t) | FractionalPart(f, t) -> Some(f, t) | _ -> None "10" |> List.ofSeq |> (|Number|_|) ".35" |> List.ofSeq |> (|Number|_|) "10.35" |> List.ofSeq |> (|Number|_|) let parse (expression) = let rec (|Expre|_|) = function | Multi(e, t) -> let rec loop l = function | '+'::Multi(r, t) -> loop (Add(l, r)) t | '-'::Multi(r, t) -> loop (Sub(l, r)) t | [] -> Some(l, []) | _ -> None loop e t | _ -> None and (|Multi|_|) = function | Atom(l, '*'::Powi(r, t)) -> Some(Mult(l, r), t) | Atom(l, '/'::Powi(r, t)) -> Some(Div(l, r), t) | Powi(e, t) -> Some(e, t) | _ -> None and (|Powi|_|) = function | '+'::Atom(e, t) -> Some(e, t) | '-'::Atom(e, t) -> Some(Neg(e), t) | Atom(l, '^'::Powi(r, t)) -> Some(Pow(l, r), t) | Atom(e, t) -> Some(e, t) | _ -> None and (|Atom|_|) = function | 'x'::t -> Some(X, t) | Number(e, t) -> Some(Const(e), t) | '('::Expre(e, ')'::t) -> Some(e, t) | _ -> None let parsed = (expression |> List.ofSeq |> (|Expre|_|)) match parsed with | Some(result, _) -> result | None -> failwith "failed to parse expression" parse "2+2" // Add(Const(2.), Const(2.)) exec 0. (parse "2+2") // 4 exec 2. (parse "x^3") parse "x^2-3" // Sub(Pow(X, Const(2.)), Const(3.)
The following code can simplify equations/expressions removing steps to solve it.
let rec simplify e = let result = match e with // add | Add(Const(0.), r) -> simplify r | Add(l, Const(0.)) -> simplify l | Add(Const(l), Const(r)) -> Const (l + r) | Add(l, r) -> (Add(simplify l, simplify r)) // sub | Sub(Const(0.), r) -> Neg (simplify r) | Sub(l, Const(0.)) -> l | Sub(Const(l), Const(r)) -> Const (l - r) | Sub(X, r) -> Sub (X, simplify r) | Sub(l, X) -> Sub (simplify l, X) | Sub(l, r) -> (Sub(simplify l, simplify r)) // mult | Mult(Const(0.), _) -> Const(0.) | Mult(_, Const(0.)) -> Const(0.) | Mult(Const(1.), r) -> r | Mult(l, Const(1.)) -> l | Mult(Const(l), Const(r)) -> Const (l * r) | Mult(l, r) when l = r -> (Pow (simplify l, Const(2.))) | Mult(Pow(b, p), r) when b = r -> Pow (simplify b, (simplify (Add((simplify p), Const(1.))))) | Mult(X, r) -> Mult (X, simplify r) | Mult(l, X) -> Mult (simplify l, X) | Mult(l, r) -> (Mult(simplify l, simplify r)) // div | Div(Const(0.), _) -> Const(0.) | Div(l, Const(1.)) -> l | Div(Const(l), Const(r)) -> Const (l / r) | Div(X, r) -> Div (X, simplify r) | Div(l, X) -> Div (simplify l, X) | Div(l, r) -> simplify (Div(simplify l, simplify r)) // pow | Pow(_, Const(0.)) -> Const(1.) | Pow(b, Const(1.)) -> simplify b | Pow(Const(l), Const(r)) -> Const(System.Math.Pow(l, r)) | Pow(X, r) -> Pow (X, simplify r) | Pow(l, X) -> Pow (simplify l, X) | Pow(b, p) -> (Pow(simplify b, simplify p)) // neg | Neg(Const(k)) -> Const (-k) | Neg(X) -> Neg(X) | Neg(x) -> (Neg(simplify x)) // | other -> other if (result = e) then result else simplify result simplify (Mult(Mult(X, X), X)) simplify (Pow(Const(2.), Const(3.))) simplify (Mult(Const(2.), X)) simplify (Add(Const(2.), Div(Const(2.), Const(2.) )))
I love local functions! The simplification process works as an evaluator for expressions. With equations, the process will stop when there are no more possible simplification steps to take.
let exec x expr = let rec replaceX = function | Add(l, r) -> Add(replaceX l, replaceX r) | Sub(l, r) -> Sub(replaceX l, replaceX r) | Mult(l, r) -> Mult(replaceX l, replaceX r) | Div(l, r) -> Div(replaceX l, replaceX r) | Pow(l, r) -> Pow(replaceX l, replaceX r) | Neg(e) -> Neg(replaceX e) | Const(v) -> Const(v) | X -> Const(x) match simplify (replaceX expr) with | Const(result) -> result | _ -> failwith "impossible to execute" // resulta 8 Pow(Const(2.), X) |> exec 3.
Newton’s method will need derivatives to work. So, let’s produce it.
let rec deriv = function | X -> Const(1.) | Const(_) -> Const(0.) | Add(l, r) -> Add(deriv l, deriv r) | Sub(l, r) -> Sub(deriv l, deriv r) | Mult(l, r) -> Add(Mult(deriv l, r), Mult(l, deriv r)) | Neg(v) -> Neg(deriv v) | Pow(b, e) -> Mult(e, simplify (Pow(b, Sub(e, Const(1.))))) | _ -> failwith "expression not supported." deriv (Pow(X, Const(3.)))
We now already have all the elements we need to solve equations using Newton’s method.
Here is my implementation.
let newton expr guess error maxdepth = let o = parse expr let d = deriv o let eq = Sub(X, Div(o, d)) let rec iter g depth = if depth > maxdepth then failwith "maxdepth exceeded." else let newg = exec g eq printfn "%A" g if (abs (newg - g) < error) then newg else iter newg (depth + 1) iter guess 0 newton "x^3-27" 5. 0.000001 100 // 3
The parameters are the equation we need to solve, a solution guess, an acceptable error, and iterations.
We can make it simpler to use:
let solve expr = newton expr 100. 0.00001 100 solve "x^2-9" // 3 solve "3*x^2-4*x+1" // 1
This post was written just for fun. I did it trying to learn F#. Do you like? Could I do something different? Please, give me your feedback.
O post How to parse, simplify, differentiate and evaluate/solve (using Newton’s method) equations and expressions in F#. apareceu primeiro em Elemar JR.
]]>Sometimes it is not enough to know if two strings are equals or not. Instead, we need to get an ...
O post Computing the Levenshtein (Edit) Distance of Two Strings using C# apareceu primeiro em Elemar JR.
]]>Sometimes it is not enough to know if two strings are equals or not. Instead, we need to get an idea of how different they are.
For example, ant and aunt are two different words. However, not as different as ant and antidote. The “edit distance” between ant and aunt is smaller than the “edit distance” between ant and antidote. By edit distance, I mean the number of edits (removals, inserts, and replacements) needed to turn one string into another.
In this post, I share an implementation of the Levenshtein’s algorithm that solves the edit distance problem.
In 1965, Vladimir Levenshtein created a beautiful distance algorithm. Here is a C# implementation.
public static class Levenshtein { public static int Compute( string first, string second ) { if (first.Length == 0) { return second.Length; } if (second.Length == 0) { return first.Length; } var d = new int[first.Length + 1, second.Length + 1]; for (var i = 0; i <= first.Length; i++) { d[i, 0] = i; } for (var j = 0; j <= second.Length; j++) { d[0, j] = j; } for (var i = 1; i <= first.Length; i++) { for (var j = 1; j <= second.Length; j++) { var cost = (second[j - 1] == first[i - 1]) ? 0 : 1; d[i, j] = Min( d[i - 1, j] + 1, d[i, j - 1] + 1, d[i - 1, j - 1] + cost ); } } return d[first.Length, second.Length]; } private static int Min(int e1, int e2, int e3) => Math.Min(Math.Min(e1, e2), e3); }
And here it is some tests to ensure this implementation works:
public class LevenshteinShould { [Theory] [InlineData("ant", "aunt", 1)] [InlineData("fast", "cats", 3)] [InlineData("Elemar", "Vilmar", 3)] [InlineData("kitten", "sitting", 3)] public void ComputeTheDistanceBetween( string s1, string s2, int expectedDistance ) { Assert.Equal( expectedDistance, Levenshtein.ComputeDistance(s1, s2) ); } }
Levenshtein is a dynamic programming algorithm.
The main idea is to fill the entries of a matrix m, whose two dimensions equal the lengths of the two strings whose the edit distance is being computed.
At the end of the execution, each entry (i, j) holds the edit distance between the strings consisting the first i characters of the first string and first j characters of the second string.
Let’s see an example:
As you can see, the cell at the bottom-right position contains the edit distance for the two string that we are comparing.
Let’s see how the algorithm works, step-by-step. This time let’s compare the strings “ant” and “aunt” (edit distance = 1)
To get started we can quickly fill the first row and column.
Then, for each cell we will compare the corresponding character from the first string and the second string, then select the minimum one of these three values:
And the process will continue filling the matrix.
As you probably noted in the step-by-step process. We only use two rows of the matrix to compute each cell value. So, we don’t need to keep all the costs, all the time, in memory.
public static class Levenshtein { public static int ComputeDistance( string first, string second ) { if (first.Length == 0) { return second.Length; } if (second.Length == 0) { return first.Length; } var current = 1; var previous = 0; var r = new int[2, second.Length + 1]; for (var i = 0; i <= second.Length; i++) { r[previous, i] = i; } for (var i = 0; i < first.Length; i++) { r[current, 0] = i + 1; for (var j = 1; j <= second.Length; j++) { var cost = (second[j - 1] == first[i]) ? 0 : 1; r[current, j] = Min( r[previous, j] + 1, r[current, j - 1] + 1, r[previous, j - 1] + cost ); } previous = (previous + 1) % 2; current = (current + 1) % 2; } return r[previous, second.Length]; } private static int Min(int e1, int e2, int e3) => Math.Min(Math.Min(e1, e2), e3); }
Much, much better!
Levenshtein is a very interesting and robust algorithm. In the future, I will use this implementation to improve the querying experience in the simple search library that I have been working.
Again, if you have any suggestions, feel free to share it in the comments.
O post Computing the Levenshtein (Edit) Distance of Two Strings using C# apareceu primeiro em Elemar JR.
]]>