You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

338 lines
9.5 KiB

3 months ago
  1. # ES Module Lexer
  2. [![Build Status][actions-image]][actions-url]
  3. A JS module syntax lexer used in [es-module-shims](https://github.com/guybedford/es-module-shims).
  4. Outputs the list of exports and locations of import specifiers, including dynamic import and import meta handling.
  5. Supports new syntax features including import attributes and source phase imports.
  6. A very small single JS file (4KiB gzipped) that includes inlined Web Assembly for very fast source analysis of ECMAScript module syntax only.
  7. For an example of the performance, Angular 1 (720KiB) is fully parsed in 5ms, in comparison to the fastest JS parser, Acorn which takes over 100ms.
  8. _Comprehensively handles the JS language grammar while remaining small and fast. - ~10ms per MB of JS cold and ~5ms per MB of JS warm, [see benchmarks](#benchmarks) for more info._
  9. > [Built with](https://github.com/guybedford/es-module-lexer/blob/main/chompfile.toml) [Chomp](https://chompbuild.com/)
  10. ### Usage
  11. ```
  12. npm install es-module-lexer
  13. ```
  14. See [src/lexer.ts](src/lexer.ts) for the type definitions.
  15. For use in CommonJS:
  16. ```js
  17. const { init, parse } = require('es-module-lexer');
  18. (async () => {
  19. // either await init, or call parse asynchronously
  20. // this is necessary for the Web Assembly boot
  21. await init;
  22. const source = 'export var p = 5';
  23. const [imports, exports] = parse(source);
  24. // Returns "p"
  25. source.slice(exports[0].s, exports[0].e);
  26. // Returns "p"
  27. source.slice(exports[0].ls, exports[0].le);
  28. })();
  29. ```
  30. An ES module version is also available:
  31. ```js
  32. import { init, parse } from 'es-module-lexer';
  33. (async () => {
  34. await init;
  35. const source = `
  36. import { name } from 'mod\\u1011';
  37. import json from './json.json' assert { type: 'json' }
  38. export var p = 5;
  39. export function q () {
  40. };
  41. export { x as 'external name' } from 'external';
  42. // Comments provided to demonstrate edge cases
  43. import /*comment!*/ ( 'asdf', { assert: { type: 'json' }});
  44. import /*comment!*/.meta.asdf;
  45. // Source phase imports:
  46. import source mod from './mod.wasm';
  47. import.source('./mod.wasm');
  48. `;
  49. const [imports, exports] = parse(source, 'optional-sourcename');
  50. // Returns "modထ"
  51. imports[0].n
  52. // Returns "mod\u1011"
  53. source.slice(imports[0].s, imports[0].e);
  54. // "s" = start
  55. // "e" = end
  56. // Returns "import { name } from 'mod'"
  57. source.slice(imports[0].ss, imports[0].se);
  58. // "ss" = statement start
  59. // "se" = statement end
  60. // Returns "{ type: 'json' }"
  61. source.slice(imports[1].a, imports[1].se);
  62. // "a" = assert, -1 for no assertion
  63. // Returns "external"
  64. source.slice(imports[2].s, imports[2].e);
  65. // Returns "p"
  66. source.slice(exports[0].s, exports[0].e);
  67. // Returns "p"
  68. source.slice(exports[0].ls, exports[0].le);
  69. // Returns "q"
  70. source.slice(exports[1].s, exports[1].e);
  71. // Returns "q"
  72. source.slice(exports[1].ls, exports[1].le);
  73. // Returns "'external name'"
  74. source.slice(exports[2].s, exports[2].e);
  75. // Returns -1
  76. exports[2].ls;
  77. // Returns -1
  78. exports[2].le;
  79. // Import type is provided by `t` value
  80. // (1 for static, 2, for dynamic)
  81. // Returns true
  82. imports[2].t == 2;
  83. // Returns "asdf" (only for string literal dynamic imports)
  84. imports[2].n
  85. // Returns "import /*comment!*/ ( 'asdf', { assert: { type: 'json' } })"
  86. source.slice(imports[3].ss, imports[3].se);
  87. // Returns "'asdf'"
  88. source.slice(imports[3].s, imports[3].e);
  89. // Returns "( 'asdf', { assert: { type: 'json' } })"
  90. source.slice(imports[3].d, imports[3].se);
  91. // Returns "{ assert: { type: 'json' } }"
  92. source.slice(imports[3].a, imports[3].se - 1);
  93. // For non-string dynamic import expressions:
  94. // - n will be undefined
  95. // - a is currently -1 even if there is an assertion
  96. // - e is currently the character before the closing )
  97. // For nested dynamic imports, the se value of the outer import is -1 as end tracking does not
  98. // currently support nested dynamic immports
  99. // import.meta is indicated by imports[3].d === -2
  100. // Returns true
  101. imports[4].d === -2;
  102. // Returns "import /*comment!*/.meta"
  103. source.slice(imports[4].s, imports[4].e);
  104. // ss and se are the same for import meta
  105. // Returns "'./mod.wasm'"
  106. source.slice(imports[5].s, imports[5].e);
  107. // Import type 4 and 5 for static and dynamic source phase
  108. imports[5].t === 4;
  109. imports[6].t === 5;
  110. })();
  111. ```
  112. ### CSP asm.js Build
  113. The default version of the library uses Wasm and (safe) eval usage for performance and a minimal footprint.
  114. Neither of these represent security escalation possibilities since there are no execution string injection vectors, but that can still violate existing CSP policies for applications.
  115. For a version that works with CSP eval disabled, use the `es-module-lexer/js` build:
  116. ```js
  117. import { parse } from 'es-module-lexer/js';
  118. ```
  119. Instead of Web Assembly, this uses an asm.js build which is almost as fast as the Wasm version ([see benchmarks below](#benchmarks)).
  120. ### Escape Sequences
  121. To handle escape sequences in specifier strings, the `.n` field of imported specifiers will be provided where possible.
  122. For dynamic import expressions, this field will be empty if not a valid JS string.
  123. ### Facade Detection
  124. Facade modules that only use import / export syntax can be detected via the third return value:
  125. ```js
  126. const [,, facade] = parse(`
  127. export * from 'external';
  128. import * as ns from 'external2';
  129. export { a as b } from 'external3';
  130. export { ns };
  131. `);
  132. facade === true;
  133. ```
  134. ### ESM Detection
  135. Modules that uses ESM syntaxes can be detected via the fourth return value:
  136. ```js
  137. const [,,, hasModuleSyntax] = parse(`
  138. export {}
  139. `);
  140. hasModuleSyntax === true;
  141. ```
  142. Dynamic imports are ignored since they can be used in Non-ESM files.
  143. ```js
  144. const [,,, hasModuleSyntax] = parse(`
  145. import('./foo.js')
  146. `);
  147. hasModuleSyntax === false;
  148. ```
  149. ### Environment Support
  150. Node.js 10+, and [all browsers with Web Assembly support](https://caniuse.com/#feat=wasm).
  151. ### Grammar Support
  152. * Token state parses all line comments, block comments, strings, template strings, blocks, parens and punctuators.
  153. * Division operator / regex token ambiguity is handled via backtracking checks against punctuator prefixes, including closing brace or paren backtracking.
  154. * Always correctly parses valid JS source, but may parse invalid JS source without errors.
  155. ### Limitations
  156. The lexing approach is designed to deal with the full language grammar including RegEx / division operator ambiguity through backtracking and paren / brace tracking.
  157. The only limitation to the reduced parser is that the "exports" list may not correctly gather all export identifiers in the following edge cases:
  158. ```js
  159. // Only "a" is detected as an export, "q" isn't
  160. export var a = 'asdf', q = z;
  161. // "b" is not detected as an export
  162. export var { a: b } = asdf;
  163. ```
  164. The above cases are handled gracefully in that the lexer will keep going fine, it will just not properly detect the export names above.
  165. ### Benchmarks
  166. Benchmarks can be run with `npm run bench`.
  167. Current results for a high spec machine:
  168. #### Wasm Build
  169. ```
  170. Module load time
  171. > 5ms
  172. Cold Run, All Samples
  173. test/samples/*.js (3123 KiB)
  174. > 18ms
  175. Warm Runs (average of 25 runs)
  176. test/samples/angular.js (739 KiB)
  177. > 3ms
  178. test/samples/angular.min.js (188 KiB)
  179. > 1ms
  180. test/samples/d3.js (508 KiB)
  181. > 3ms
  182. test/samples/d3.min.js (274 KiB)
  183. > 2ms
  184. test/samples/magic-string.js (35 KiB)
  185. > 0ms
  186. test/samples/magic-string.min.js (20 KiB)
  187. > 0ms
  188. test/samples/rollup.js (929 KiB)
  189. > 4.32ms
  190. test/samples/rollup.min.js (429 KiB)
  191. > 2.16ms
  192. Warm Runs, All Samples (average of 25 runs)
  193. test/samples/*.js (3123 KiB)
  194. > 14.16ms
  195. ```
  196. #### JS Build (asm.js)
  197. ```
  198. Module load time
  199. > 2ms
  200. Cold Run, All Samples
  201. test/samples/*.js (3123 KiB)
  202. > 34ms
  203. Warm Runs (average of 25 runs)
  204. test/samples/angular.js (739 KiB)
  205. > 3ms
  206. test/samples/angular.min.js (188 KiB)
  207. > 1ms
  208. test/samples/d3.js (508 KiB)
  209. > 3ms
  210. test/samples/d3.min.js (274 KiB)
  211. > 2ms
  212. test/samples/magic-string.js (35 KiB)
  213. > 0ms
  214. test/samples/magic-string.min.js (20 KiB)
  215. > 0ms
  216. test/samples/rollup.js (929 KiB)
  217. > 5ms
  218. test/samples/rollup.min.js (429 KiB)
  219. > 3.04ms
  220. Warm Runs, All Samples (average of 25 runs)
  221. test/samples/*.js (3123 KiB)
  222. > 17.12ms
  223. ```
  224. ### Building
  225. This project uses [Chomp](https://chompbuild.com) for building.
  226. With Chomp installed, download the WASI SDK 12.0 from https://github.com/WebAssembly/wasi-sdk/releases/tag/wasi-sdk-12.
  227. - [Linux](https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-12/wasi-sdk-12.0-linux.tar.gz)
  228. - [Windows (MinGW)](https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-12/wasi-sdk-12.0-mingw.tar.gz)
  229. - [macOS](https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-12/wasi-sdk-12.0-macos.tar.gz)
  230. Locate the WASI-SDK as a sibling folder, or customize the path via the `WASI_PATH` environment variable.
  231. Emscripten emsdk is also assumed to be a sibling folder or via the `EMSDK_PATH` environment variable.
  232. Example setup:
  233. ```
  234. git clone https://github.com:guybedford/es-module-lexer
  235. git clone https://github.com/emscripten-core/emsdk
  236. cd emsdk
  237. git checkout 1.40.1-fastcomp
  238. ./emsdk install 1.40.1-fastcomp
  239. cd ..
  240. wget https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-12/wasi-sdk-12.0-linux.tar.gz
  241. gunzip wasi-sdk-12.0-linux.tar.gz
  242. tar -xf wasi-sdk-12.0-linux.tar
  243. mv wasi-sdk-12.0-linux.tar wasi-sdk-12.0
  244. cargo install chompbuild
  245. cd es-module-lexer
  246. chomp test
  247. ```
  248. For the `asm.js` build, git clone `emsdk` from is assumed to be a sibling folder as well.
  249. ### License
  250. MIT
  251. [actions-image]: https://github.com/guybedford/es-module-lexer/actions/workflows/build.yml/badge.svg
  252. [actions-url]: https://github.com/guybedford/es-module-lexer/actions/workflows/build.yml