How to Get the Unicode Code Points of a JavaScript Character?

JavaScript Character to Unicode Code Point Conversion

You can get the respective Unicode code point of a character that lies in the BMP (Basic Multilingual Plane) by simply using the String.prototype.codePointAt() method, for example, like so:

// ES6+
const codePoint = 'ยฉ'.codePointAt(0);
const hexCodePoint = codePoint.toString(16);

console.log(codePoint); // 169
console.log(hexCodePoint); // a9

You can verify the result by using String.fromCodePoint(), for example, in the following way:

// using decimal code point
String.fromCodePoint(169); // 'ยฉ'

// using hex code point
String.fromCodePoint(0xa9); // 'ยฉ'

This also works for characters that are beyond the BMP (Basic Multilingual Plane). For example, consider a character with "surrogate pair":

// ES6+
const codePoint = '๐Ÿ˜'.codePointAt(0);
const hexCodePoint = codePoint.toString(16);

console.log(codePoint); // 128525
console.log(hexCodePoint); // 1f60d

You can verify the result by using String.fromCodePoint(), like so:

// using decimal code point
String.fromCodePoint(128525); // '๐Ÿ˜'

// using hex code point
String.fromCodePoint(0x1f60d); // '๐Ÿ˜'

This works because when you use the String.prototype.codePointAt() method on a character composed of UTF-16 high and low surrogates (i.e. a surrogate pair), the following values are returned based on the argument you supply to the method:

Argument Return Value
Position of high surrogate (e.g. codePointAt(0)) Code point of the surrogate pair
Position of low surrogate (e.g. codePointAt(1)) Code point of the low surrogate only
Position having no element undefined

Furthermore, this even works with ZWJ (zero-width joiner sequences). However, you would need to loop over the elements of ZWJ sequence to get code points of each element. You can do so by using the for...of loop or Array.prototype.forEach() (or anything which correctly iterates UTF-16 surrogates), and use codePointAt(0) to get the code point of each element.

Consider, for example, ZWJ emoji "๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ" ("family: man, woman, girl, boy") which is a combination of "๐Ÿ‘จ ๐Ÿ‘ฉ ๐Ÿ‘ง ๐Ÿ‘ฆ" (i.e. U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F466). You can get the code points for each element in this ZWJ sequence in the following way:

// ES6+
const codePoints = [];
const hexCodePoints = [];

for (const element of '๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ') {
    const codePoint = element.codePointAt(0);

    codePoints.push(codePoint);
    hexCodePoints.push(codePoint.toString(16));
}

console.log(codePoints); // [128104, 8205, 128105, 8205, 128103, 8205, 128102]
console.log(hexCodePoints); // ['1f468', '200d', '1f469', '200d', '1f467', '200d', '1f466']

You can verify the result by using String.fromCodePoint(), like so:

// using decimal code points
String.fromCodePoint(128104, 8205, 128105, 8205, 128103, 8205, 128102); // '๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ'

// using hex code points
String.fromCodePoint(0x1f468, 0x200d, 0x1f469, 0x200d, 0x1f467, 0x200d, 0x1f466); // '๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ'

Hope you found this post useful. It was published . Please show your love and support by sharing this post.