hexo博客的标题转链接算法

确切来讲，这是一篇关于 hexo 的 slugize 算法的博文。

0x00 需求简述

众所周知，在使用 hexo 向博客添加文章时，其实操作是非常简便的：

$ hexo new ARTICLE_TITLE
$ vim ./source/_posts/ARTICLE_TITLE.md
$ hexo generate
$ hexo server

于是 hexo 就会自动给你的新文章分配一个网址（路由），并更新到你的博客首页。

这个路由的具体格式是由博客根目录下 _config.yml 配置文件中的 permalink 一项指定的，默认是

1	:year/:month/:day/:title/

博客的地址在 blog.example.com，个人主页在 example.com，现在需要：

在个人主页上添加一个模块，其中包含了最新的三篇博客的标题
点击博客标题可以跳转到指定文章

0x01 动手实现

稍微考虑一下，发现这两个需求好像还是比较好操作的，读取博客根目录下 ./source/_posts/*.md，并解析文件头部的 yaml 信息，得到 :title、 :year、:month 和 :day 。

文件头部的 yaml 样例如下：

title: hexo博客的标题转链接算法
date: 2019-03-06 10:46:11
tags: [技术, 算法]
...

然而问题在于，这样得到的 :title 只是文章内显示的标题，和文章所在网址里的 :title 还不是一个东西。例如，我之前的博文 “使用O(1)时间复杂度计算比x大的最小的2的整数次幂” 的 :title 就是 使用O-1-时间复杂度计算比x大的最小的2的整数次幂，和文章的标题还是有一些区别。可以发现很多非常规字符都被转化为了 -，所以如果我们直接将文章标题当成 :title 的话，最终只能得到 404。

所以说，我们还得想办法知道 hexo 是如何处理文章标题的。

0x02 定位代码

0x02.1 基础信息收集

第一步：

1 2	$ hexo -h \| grep debug --debug Display all verbose messages in the terminal

第二步：

$ hexo new --help
Usage: hexo new [layout] <title>

Description:
Create a new post.

Arguments:
  layout  Post layout. Use post, page, draft or whatever you want.
  title   Post title. Wrap it with quotations to escape.

Options:
  -p, --path     Post path. Customize the path of the post.
  -r, --replace  Replace the current post if existed.
  -s, --slug     Post slug. Customize the URL of the post.

线索出现了！-s, —slug 参数可以“自定义文章的URL（Customize the URL of the post）”，所以处理 hexo new 的函数里必然有相关的代码。这个 slug 关键词我们得多留意。

0x02.2 查找入口函数

这时候就体现出强大的编辑器的好处了。在 VS Code 里直接全 node_modules 文件夹搜索 Create a new post，发现唯一一处 occurrence 在 ./node_modules/hexo/lib/plugins/console/index.js：

1
2
3

console.register('new', 'Create a new post.', {
  // ...
}, require('./new'));

顺藤摸瓜到 console/new.js，在第 42 行看到了

1
2
3

return this.post.create(data, args.r || args.replace).then(post => {
  self.log.info('Created: %s', chalk.magenta(tildify(post.path)));
});

于是到这里就会发现没法接着摸瓜了。在这种情况下，this 的语义和指向都不能确定；所以无法光靠看这几行源代码判断 this.post.create 的来路。

0x02.3 没有线索就要创造线索

看来得换个思路了。

先列举一下手上有的万能法宝（划重点）：

抛出错误暴露调用栈

function getStackTrace() {
  try { throw new Error("get stack trace"); } catch (e) {
    return e.stack;
  }
}

使用 Function.prototype.toString 获得函数源代码，配合编辑器全局搜索功能使用：
1
2
3
function showCode(func) {
console.log(func.toString());
}

使用长阻塞循环，配合分析 performance 的软件来定位代码。操作上较为复杂，优势不明显。

function stopTheWorld(millisec) {
  const start = Date.now();
  while(true)
    if(Date.now() - start >= millisec)
      break;
}

下断点，这个不同的编辑器 / IDE 操作方式不一样，这里不再赘述。

稍作分析，发现这个函数返回了一个 Promise，还注册了一个回调函数。那么如果用抛出错误暴露调用栈的方法的话，没法拿到 this.post.create 的信息（而只能拿到一堆没啥用的 process._nextTick 之类的东西）。

而这个函数是用户定义的，并且没有经过 Function.prototype.bind 处理（因为调用时是从 this 上发起的），所以可以直接 toString 看到源代码。

较早版本的 V8 引擎会给用户定义的但是经过 bind 的函数的 toString 返回 function () { [native code] }。但是较新版本的可以直接查看到源代码

所以我们稍微修改一下 console/new.js，如下：

1
2
3

+ console.log(this.post.create.toString());
return this.post.create(data, args.r || args.replace).then(post => {
...

再次运行，成功得到了回显：

function(data, replace, callback) {
  if (!callback && typeof replace === 'function') {
    callback = replace;
    replace = false;
  }
  // ...
}

全局搜索得到这个函数定义在 node_modules/hexo/lib/hexo/post.js 的 51 行：

1
2
3

Post.prototype.create = function(data, replace, callback) {
  //...
};

其中就定义了 data.slug：

1	data.slug = slugize((data.slug \|\| data.title).toString(), {transform: config.filename_case});

跟进 slugize 函数，发现是从 node_modules/hexo-util/lib/slugize.js 引入的：

'use strict';

var escapeDiacritic = require('./escape_diacritic');
var escapeRegExp = require('./escape_regexp');
var rControl = /[\u0000-\u001f]/g;
var rSpecial = /[\s~`!@#\$%\^&\*\(\)\-_\+=\[\]\{\}\|\\;:"'<>,\.\?\/]+/g;

function slugize(str, options) {
  if (typeof str !== 'string') throw new TypeError('str must be a string!');
  options = options || {};

  var separator = options.separator || '-';
  var escapedSep = escapeRegExp(separator);

  var result = escapeDiacritic(str)
    // Remove control characters
    .replace(rControl, '')
    // Replace special characters
    .replace(rSpecial, separator)
    // Remove continous separators
    .replace(new RegExp(escapedSep + '{2,}', 'g'), separator)
    // Remove prefixing and trailing separtors
    .replace(new RegExp('^' + escapedSep + '+|' + escapedSep + '+$', 'g'), '');

  switch (options.transform){
    case 1:
      return result.toLowerCase();

    case 2:
      return result.toUpperCase();

    default:
      return result;
  }
}

module.exports = slugize;

完美，直接拿来用就行了。

来源：https://blog.jiejiss.com/

文章目录