In my previous post I already mentioned how we’re rolling out some minor tweaks and fixes for various edge-cases when it comes to automated processing of original texts before our ENL Semantic Spinning algorithms begin their work.
Now, in some extreme cases our processing algorithm could sometimes get confused when it encountered the “<” sign because it thought that maybe it signifies the beginning of an HTML tag. All HTML tags begin with “<” and then usually end with either “/>” or another “>” immediately after the name of the tag.
Here’s an example of what used to cause a problem for our parser:
Mathematicians sometimes use p < 0.01% to signify low risk.
This would in some cases (depending on the context) turn into:
Mathematicians sometimes use p (the rest of the sentence is missing)
…after the Step 1 processing because our parser mistakenly took the “< 0.01%” as a malformed HTML tag instead of what it truly was.
Well, it certainly doesn’t do that anymore — we just fixed this issue, and rolled the update out to our production servers! 😀