用Python读取超大的StackOverflow XML文件
本文旨在为python环境下读取超大XML文件提供一种解决方案。
编程语言:
python
测试文件大小: 73G
程序示例:
from lxml import etree
context = etree.iterparse('D:/data/stackoverflow/stackoverflow.com-Posts/PostsCopy.xml', events=('end',), tag='row')
for event, element in context:
temp = element.attrib
for key in temp:
print(key, '=', temp[key])
运行结果:
OwnerUserId = 1483
OwnerDisplayName = ggierlik
LastActivityDate = 2008-08-27T14:14:29.417
CommentCount = 0
Id = 30237
PostTypeId = 2
ParentId = 30152
CreationDate = 2008-08-27T14:15:55.887
Score = 1
Body =
"Send to --> Compressed (zipped) Folder" creates a zip file. What it puts in there is based on your settings. It does not include hidden files with the default settings. If you have your explorer view settings set as Kibbee mentioned to "Show hidden files and folders", then "Send to --> Compressed (zipped) Folder" will put the hidden files into the zip file.
There is what I would call a bug in XP where hidden folders aren't include when recursing a folder tree. You can get them if they are in the folder that you are in. Recursing works in Vista.
Files starting with "." have no special to windows except that Windows Explorer won't let you create one. It is a valid file name though.
I would recommend using something like <a href="http://www.7-zip.org/" rel="nofollow noreferrer">7-Zip</a> if your folders contain hidden/system files/folders.
OwnerUserId = 791
OwnerDisplayName = bruceatk
LastEditorUserId = 791
LastEditorDisplayName = bruceatk
LastEditDate = 2008-08-27T14:30:40.907
LastActivityDate = 2008-08-27T14:30:40.907
CommentCount = 0
Id = 30238
PostTypeId = 2
ParentId = 30099
CreationDate = 2008-08-27T14:15:57.243
Score = 2
Body =
In addition to the other answers:
The C++ language actually has the <code>auto</code> keyword to explicitly declare the storage class of an object. Of course, it's completely needless because this is the implied storage class for local variables and cannot be used anywhere. The opposite of <code>auto</code> is <code>static</code> (both locally and globall).
The following two declarations are equivalent:
<pre><code>int main() {
int a;
auto int b;
}
</code></pre>
Because the keyword is utterly useless, it will actually be recycled in the next C++ standard (“C++0x”) and gets a new meaning, namely, it lets the compiler infer the variable type from its initialization (like <code>var</code> in C#):
<pre><code>auto a = std::max(1.0, 4.0); // `a` now has type double.
</code></pre>
- 点赞
- 收藏
- 关注作者
评论(0)