A few minor adjustments to the P8 strspn gives us an almost equally optimized P8 strcspn.
This utilizes vectors and bitmasks. For small needle, large haystack, the performance improvement is upto 8x. For short strings (0-4B), the cost of computing the bitmask dominates, and is a tad slower.