用javascript在nodejs环境下读取超大的Stackoverflow XML文件

举报
Jet Ding 发表于 2020/09/30 16:55:53 2020/09/30
【摘要】 本文旨在为nodejs环境下读取超大XML文件提供一种解决方案。编程语言: javascript平台:nodejs测试文件大小: 73G程序包: "node-xml-stream": "^1.0.2"代码示例:const loadXml = () => { try { let Parser = require("node-xml-stream"); let fs =...

本文旨在为nodejs环境下读取超大XML文件提供一种解决方案。

编程语言: javascript

平台:nodejs

测试文件大小: 73G

程序包:

    "node-xml-stream""^1.0.2"

代码示例:

const loadXml = () => { 
  try { 
    let Parser = require("node-xml-stream"); 
    let fs = require("fs"); 

    let parser = new Parser(); 

    // <tag attr="hello"> 
    parser.on("opentag", (nameattrs=> { 
      // name = 'tag' 
      // attrs = { attr: 'hello' } 
      console.log(nameattrs); 
    }); 

    // </tag> 
    parser.on("closetag"name => { 
      // name = 'tag' 
    }); 

    // <tag>TEXT</tag> 
    parser.on("text"text => { 
      // text = 'TEXT' 
    }); 

    // <[[CDATA['data']]> 
    parser.on("cdata"cdata => { 
      // cdata = 'data' 
    }); 


    // <?xml version="1.0"?> 
    parser.on("instruction", (nameattrs=> { 
      // name = 'xml' 
      // attrs = { version: '1.0' } 
    }); 

    // Only stream-errors are emitted. 
    parser.on("error"err => { 
      // Handle a parsing error 
    }); 

    parser.on("finish", () => { 
      // Stream is completed 
    }); 

    // Write data to the stream. 
    parser.write("<root>TEXT</root>"); 

    // Pipe a stream to the parser 
    let stream = fs.createReadStream("D:/data/stackoverflow/stackoverflow.com-Posts/PostsCopy.xml"); 
    stream.pipe(parser); 
  } catch (e) { 
    console.log(e); 
  } 
};

运行结果片段:

row {
Id: '451',
PostTypeId: '2',
ParentId: '371',
CreationDate: '2008-08-02T13:45:57.197',
Score: '13',
Body: '&lt;p&gt;Yahoo uses a method called Sender ID, which can be configured at &lt;a',
href: '&quot;http://old.openspf.org/wizard.html?mydomain',
rel: '&quot;nofollow noreferrer&quot;&gt;The SPF Setup Wizard&lt;/a&gt; and entered in to your DNS. Also one of the important ones for Exchange, Hotmail, AOL, Yahoo, and others is to have a Reverse DNS for your domain. Those will knock out most of the issues. 
However you can never prevent a person intentionally blocking your or custom rules.&lt;/p&gt;&#xA;',
OwnerUserId: '17',
LastEditorUserId: '246246',
LastEditDate: '2017-04-20T16:17:40.470',
LastActivityDate: '2017-04-20T16:17:40.470',
CommentCount: '1 /'
}
row {
Id: '467',
PostTypeId: '2',
ParentId: '17',
CreationDate: '2008-08-02T14:57:13.043',
Score: '22',
Body: '&lt;p&gt;While you havent said what youre storing, and you may have a great reason for doing so, often the answer is as a filesystem reference and the actual data is on the filesystem somewhere.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;&lt;a',
href: '&quot;http://www.onlamp.com/pub/a/onlamp/2002/07/11/MySQLtips.html&quot;',
rel: '&quot;noreferrer&quot;&gt;http://www.onlamp.com/pub/a/onlamp/2002/07/11/MySQLtips.html&lt;/a&gt;&lt;/p&gt;&#xA;',
OwnerUserId: '144',
LastActivityDate: '2008-08-02T14:57:13.043',
CommentCount: '0 /'
}

image.png

【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。