用Python读取超大的StackOverflow XML文件

举报
Jet Ding 发表于 2020/09/30 16:45:39 2020/09/30
【摘要】 本文旨在为python环境下读取超大XML文件提供一种解决方案。编程语言:python测试文件大小: 73G程序示例:from lxml import etreecontext = etree.iterparse('D:/data/stackoverflow/stackoverflow.com-Posts/PostsCopy.xml', events=('end',), tag='row')...

本文旨在为python环境下读取超大XML文件提供一种解决方案。

编程语言:
python

测试文件大小: 73G

程序示例:
from lxml import etree

context = etree.iterparse('D:/data/stackoverflow/stackoverflow.com-Posts/PostsCopy.xml'events=('end',), tag='row') 
for event, element in context: 
    temp = element.attrib 
    for key in temp: 
        print(key, '=', temp[key]) 

运行结果:
OwnerUserId = 1483
OwnerDisplayName = ggierlik
LastActivityDate = 2008-08-27T14:14:29.417
CommentCount = 0
Id = 30237
PostTypeId = 2
ParentId = 30152
CreationDate = 2008-08-27T14:15:55.887
Score = 1
Body =

"Send to --> Compressed (zipped) Folder" creates a zip file. What it puts in there is based on your settings. It does not include hidden files with the default settings. If you have your explorer view settings set as Kibbee mentioned to "Show hidden files and folders", then "Send to --> Compressed (zipped) Folder" will put the hidden files into the zip file.

There is what I would call a bug in XP where hidden folders aren't include when recursing a folder tree. You can get them if they are in the folder that you are in. Recursing works in Vista.

Files starting with "." have no special to windows except that Windows Explorer won't let you create one. It is a valid file name though.

I would recommend using something like <a href="http://www.7-zip.org/" rel="nofollow noreferrer">7-Zip</a> if your folders contain hidden/system files/folders.

OwnerUserId = 791
OwnerDisplayName = bruceatk
LastEditorUserId = 791
LastEditorDisplayName = bruceatk
LastEditDate = 2008-08-27T14:30:40.907
LastActivityDate = 2008-08-27T14:30:40.907
CommentCount = 0
Id = 30238
PostTypeId = 2
ParentId = 30099
CreationDate = 2008-08-27T14:15:57.243
Score = 2
Body =

In addition to the other answers:

The C++ language actually has the <code>auto</code> keyword to explicitly declare the storage class of an object. Of course, it's completely needless because this is the implied storage class for local variables and cannot be used anywhere. The opposite of <code>auto</code> is <code>static</code> (both locally and globall).

The following two declarations are equivalent:

<pre><code>int main() {
int a;
auto int b;
}
</code></pre>

Because the keyword is utterly useless, it will actually be recycled in the next C++ standard (“C++0x”) and gets a new meaning, namely, it lets the compiler infer the variable type from its initialization (like <code>var</code> in C#):

<pre><code>auto a = std::max(1.0, 4.0); // `a` now has type double.
</code></pre>


【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。