Python Slots To Dict
PEP: | 412 |
---|---|
Title: | Key-Sharing Dictionary |
Author: | Mark Shannon <mark at hotpy.org> |
Status: | Final |
Type: | Standards Track |
Created: | 08-Feb-2012 |
Python-Version: | 3.3 or 3.4 |
Post-History: | 08-Feb-2012 |
Python老鸟都应该看过那篇非常有吸引力的 Saving 9 GB of RAM with Python’s slots 文章,作者使用了slots让内存占用从25.5GB降到了16.2GB。在当时来说,这相当于用一个非常简单的方式就降低了30%的内存. Python copy & deepcopy with dicts and slots. GitHub Gist: instantly share code, notes, and snippets. Let’s pop one more element. In listresize, size – 1 = 4 – 1 = 3 is less than half of the allocated slots so the list is shrunk to 6 slots and the new size of the list is now 3. You can observe that slot 3 and 4 still point to some integers but the important thing is the size of the list which is now 3. Upgrade your Python skills: Examining the Dictionary Photo by Romain Vignes on Unsplash a hash table (hash map) is a data structure that implements an associative array abstract data type, a structure that can map keys to values. If it smells like a Python dict, feels like a dict, and looks like one well, it must be a dict.
Contents
- Performance
- Implementation
- Pros and Cons
This PEP proposes a change in the implementation of the builtindictionary type dict. The new implementation allows dictionarieswhich are used as attribute dictionaries (the __dict__ attributeof an object) to share keys with other attribute dictionaries ofinstances of the same class.
The current dictionary implementation uses more memory than isnecessary when used as a container for object attributes as the keysare replicated for each instance rather than being shared across manyinstances of the same class. Despite this, the current dictionaryimplementation is finely tuned and performs very well as ageneral-purpose mapping object.
By separating the keys (and hashes) from the values it is possible toshare the keys between multiple dictionaries and improve memory use.By ensuring that keys are separated from the values only whenbeneficial, it is possible to retain the high-performance of thecurrent dictionary implementation when used as a general-purposemapping object.
The new dictionary behaves in the same way as the old implementation.It fully conforms to the Python API, the C API and the ABI.
Python Dict To Array
Memory Usage
Reduction in memory use is directly related to the number ofdictionaries with shared keys in existence at any time. Thesedictionaries are typically half the size of the current dictionaryimplementation.
Benchmarking shows that memory use is reduced by 10% to 20% forobject-oriented programs with no significant change in memory use forother programs.
Speed
The performance of the new implementation is dominated by memorylocality effects. When keys are not shared (for example in moduledictionaries and dictionary explicitly created by dict() or{}) then performance is unchanged (within a percent or two) fromthe current implementation.
For the shared keys case, the new implementation tends to separatekeys from values, but reduces total memory usage. This will improveperformance in many cases as the effects of reduced memory usageoutweigh the loss of locality, but some programs may show a small slowdown.
Python Two List To Dict
Benchmarking shows no significant change of speed for most benchmarks.Object-oriented benchmarks show small speed ups when they create largenumbers of objects of the same class (the gcbench benchmark shows a10% speed up; this is likely to be an upper limit).
Both the old and new dictionaries consist of a fixed-sized dict structand a re-sizeable table. In the new dictionary the table can befurther split into a keys table and values array. The keys tableholds the keys and hashes and (for non-split tables) the values aswell. It differs only from the original implementation in that itcontains a number of fields that were previously in the dict struct.If a table is split the values in the keys table are ignored, insteadthe values are held in a separate array.
Split-Table dictionaries
When dictionaries are created to fill the __dict__ slot of an object,they are created in split form. The keys table is cached in the type,potentially allowing all attribute dictionaries of instances of oneclass to share keys. In the event of the keys of these dictionariesstarting to diverge, individual dictionaries will lazily convert tothe combined-table form. This ensures good memory use in the commoncase, and correctness in all cases.
When resizing a split dictionary it is converted to a combined table.If resizing is as a result of storing an instance attribute, and thereis only instance of a class, then the dictionary will be re-splitimmediately. Since most OO code will set attributes in the __init__method, all attributes will be set before a second instance is createdand no more resizing will be necessary as all further instancedictionaries will have the correct size. For more complex usepatterns, it is impossible to know what is the best approach, so theimplementation allows extra insertions up to the point of a resizewhen it reverts to the combined table (non-shared keys).
A deletion from a split dictionary does not change the keys table, itsimply removes the value from the values array.
Combined-Table dictionaries
Explicit dictionaries (dict() or {}), module dictionaries andmost other dictionaries are created as combined-table dictionaries. Acombined-table dictionary never becomes a split-table dictionary.Combined tables are laid out in much the same way as the tables in theold dictionary, resulting in very similar performance.
The new dictionary implementation is available at [1].
Python String To Dict
Pros
Significant memory savings for object-oriented applications. Smallimprovement to speed for programs which create lots of similarobjects.
Cons
Change to data structures: Third party modules which meddle with theinternals of the dictionary implementation will break.
Changes to repr() output and iteration order: For most cases, thiswill be unchanged. However, for some split-table dictionaries theiteration order will change.
Neither of these cons should be a problem. Modules which meddle withthe internals of the dictionary implementation are already broken andshould be fixed to use the API. The iteration order of dictionarieswas never defined and has always been arbitrary; it is different forJython and PyPy.
Alternative Implementation
An alternative implementation for split tables, which could save evenmore memory, is to store an index in the value field of the keys table(instead of ignoring the value field). This index would explicitlystate where in the value array to look. The value array would thenonly require 1 field for each usable slot in the key table, ratherthan each slot in the key table.
This 'indexed' version would reduce the size of value array by aboutone third. The keys table would need an extra 'values_size' field,increasing the size of combined dicts by one word. The extraindirection adds more complexity to the code, potentially reducingperformance a little.
The 'indexed' version will not be included in this implementation, butshould be considered deferred rather than rejected, pending furtherexperimentation.
[1] | Reference Implementation:https://bitbucket.org/markshannon/cpython_new_dict |
This document has been placed in the public domain.
Source: https://github.com/python/peps/blob/master/pep-0412.txt