You might have noticed I picked the “Programming” category for this blog post. The reason for this is quite simple – this post won’t be very interesting to 95% of you who aren’t programmers. 😉
You see, out of thousands of active users, we have received about 2-3 customer tickets each month saying that Spin Rewriter somehow garbled up the original text. We’ve been looking into this for a while now, and we found out that:
- 99.5% of all submitted texts are processed normally
- texts that begin with the bytecode EF BB BF are encoded in the standard UTF-8 format (works well with Spin Rewriter)
- texts that begin with the bytecode FE FF are encoded in the UTF-16/UCS-2, little endian format (some issues)
- texts that begin with the bytecode FF FE are encoded in the UTF-16/UCS-2, little endian format (some issues)
- texts that begin with the bytecode FF FE 00 00 are encoded in the UTF-16/UCS-2, little endian format (sporadic issues)
- texts that begin with the bytecode 00 00 FE FF are encoded in the UTF-16/UCS-2, little endian format (sporadic issues)
For instance, if our user entered “It іs nevеr a сonvenіent timе tо hаvе уour vеhicle quіt оn уоu.” in the UTF-16/UCS-2, little endian format, Step 2 of the spinning process appeared fine, however Step 3 showed this: “It Ñ�s nevеr a Ñ�onvenÑ�ent timе tо hаvе Ñ�our vеhicle quÑ�t оn Ñ�оu.”
We have now resolved all these issues and Spin Rewriter will process all articles that you can throw at it. 😉