免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 2336 | 回复: 3

[函数] a-short-visit-to-common-data-structures [复制链接]

论坛徽章:
27
水瓶座
日期:2014-08-22 21:06:34程序设计版块每日发帖之星
日期:2015-11-25 06:20:0015-16赛季CBA联赛之新疆
日期:2015-12-19 19:05:48IT运维版块每日发帖之星
日期:2015-12-25 06:20:31IT运维版块每日发帖之星
日期:2015-12-25 06:20:31IT运维版块每日发帖之星
日期:2015-12-25 06:20:3315-16赛季CBA联赛之上海
日期:2016-04-15 19:51:31程序设计版块每日发帖之星
日期:2016-04-17 06:23:29程序设计版块每日发帖之星
日期:2016-04-23 06:20:00程序设计版块每日发帖之星
日期:2016-05-26 06:20:00每日论坛发贴之星
日期:2016-05-26 06:20:0015-16赛季CBA联赛之辽宁
日期:2017-02-16 23:59:47
发表于 2014-11-02 11:35 |显示全部楼层
For small amounts of data, there are basically two data structures that can be used. The first one is called a proplist. A proplist is any list of tuples of the form [{Key,Value}]. They're a weird kind of structure because there is no other rule than that. In fact the rules are so relaxed that the list can also contain boolean values, integers and whatever you want. We're rather interested by the idea of a tuple with a key and a value in a list here, though. To work with proplists, you can use the proplists module. It contains functions such as proplists:delete/2, proplists:get_value/2, proplists:get_all_values/2, proplists:lookup/2 and proplists:lookup_all/2.

You'll notice there is no function to add or update an element of the list. This shows how loosely defined proplists are as a data structure. To get these functionalities, you must cons your element manually ([NewElement|OldList]) and use functions such as lists:keyreplace/4. Using two modules for one small data structure is not the cleanest thing, but because proplists are so loosely defined, they're often used to deal with configuration lists, and general description of a given item. Proplists are not exactly complete data structures. They're more of a common pattern that appears when using lists and tuples to represent some object or item; the proplists module is a bit of a toolbox over such a pattern.

If you do want a more complete key-value store for small amounts of data, the orddict module is what you need. Orddicts (ordered dictionaries) are proplists with a taste for formality. Each key can be there once, the whole list is sorted for faster average lookup, etc. Common functions for the CRUD usage include orddict:store/3, orddict:find/2 (when you do not know whether the key is in the dictionaries), orddict:fetch/2 (when you know it is there or that it must be there) and orddict:erase/2.

A dictionary with the definition of 'Awesome' being 'it's you!'
Orddicts are a generally good compromise between complexity and efficiency up to about 75 elements (see my benchmark). After that amount, you should switch to different key-value stores.

There are basically two key-value structures/modules to deal with larger amounts of data: dicts and gb_trees. Dictionaries have the same interface as orddicts: dict:store/3, dict:find/2, dict:fetch/2, dict:erase/2 and every other function, such as dict:map/2 and dict:fold/2 (pretty useful to work on the whole data structure!) Dicts are thus very good choices to scale orddicts up whenever it is needed.

General Balanced Trees, on the other hand, have a bunch more functions leaving you more direct control over how the structure is to be used. There are basically two modes for gb_trees: the mode where you know your structure in and out (I call this the 'smart mode'), and the mode where you can't assume much about it (I call this one the 'naive mode'). In naive mode, the functions are gb_trees:enter/3, gb_trees:lookup/2 and gb_trees:delete_any/2. The related smart functions are gb_trees:insert/3, gb_trees:get/2, gb_trees:update/3 and gb_trees:delete/2. There is also gb_trees:map/2, which is always a nice thing when you need it.

The disadvantage of 'naive' functions over 'smart' ones is that because gb_trees are balanced trees, whenever you insert a new element (or delete a bunch), it might be possible that the tree will need to balance itself. This can take time and memory (even in useless checks just to make sure). The 'smart' function all assume that the key is present in the tree: this lets you skip all the safety checks and results in faster times.

When should you use gb_trees over dicts? Well, it's not a clear decision. As the benchmark module I have written will show, gb_trees and dicts have somewhat similar performances in many respects. However, the benchmark demonstrates that dicts have the best read speeds while the gb_trees tend to be a little quicker on other operations. You can judge based on your own needs which one would be the best.

Oh and also note that while dicts have a fold function, gb_trees don't: they instead have an iterator function, which returns a bit of the tree on which you can call gb_trees:next(Iterator) to get the following values in order. What this means is that you need to write your own recursive functions on top of gb_trees rather than use a generic fold. On the other hand, gb_trees let you have quick access to the smallest and largest elements of the structure with gb_trees:smallest/1 and gb_trees:largest/1.

I would therefore say that your application's needs is what should govern which key-value store to choose. Different factors such as how much data you've got to store, what you need to do with it and whatnot all have their importance. Measure, profile and benchmark to make sure.

Note: some special key-value stores exist to deal with resources of different size. Such stores are ETS tables, DETS tables and the mnesia database. However, their use is strongly related to the concepts of multiple processes and distribution. Because of this, they'll only be approached later on. I'm leaving this as a reference to pique your curiosity and for those interested.

Update:
Starting with version 17.0, the language supports a new native key-value data type, described in Postscript: Maps. They should be the new de-facto replacement for dicts.

http://learnyousomeerlang.com/a- ... mon-data-structures

论坛徽章:
27
水瓶座
日期:2014-08-22 21:06:34程序设计版块每日发帖之星
日期:2015-11-25 06:20:0015-16赛季CBA联赛之新疆
日期:2015-12-19 19:05:48IT运维版块每日发帖之星
日期:2015-12-25 06:20:31IT运维版块每日发帖之星
日期:2015-12-25 06:20:31IT运维版块每日发帖之星
日期:2015-12-25 06:20:3315-16赛季CBA联赛之上海
日期:2016-04-15 19:51:31程序设计版块每日发帖之星
日期:2016-04-17 06:23:29程序设计版块每日发帖之星
日期:2016-04-23 06:20:00程序设计版块每日发帖之星
日期:2016-05-26 06:20:00每日论坛发贴之星
日期:2016-05-26 06:20:0015-16赛季CBA联赛之辽宁
日期:2017-02-16 23:59:47
发表于 2014-11-02 11:45 |显示全部楼层
用法:
  1. -module(keyval_benchmark).
  2. -compile(export_all).

  3. %% Runs all benchmarks with Reps number of elements.
  4. bench(Reps) ->
  5.     io:format("Base Case:~n"),
  6.     io:format("Operation\tTotal (碌s)\tAverage (碌s)~n"),
  7.     print(base_case(Reps)),
  8.     io:format("~nNaive Orddict:~n"),
  9.     io:format("Operation\tTotal (碌s)\tAverage (碌s)~n"),
  10.     print(naive_orddict(Reps)),
  11.     io:format("~nSmart Orddict:~n"),
  12.     io:format("Operation\tTotal (碌s)\tAverage (碌s)~n"),
  13.     print(smart_orddict(Reps)),
  14.     io:format("~nNaive Dict:~n"),
  15.     io:format("Operation\tTotal (碌s)\tAverage (碌s)~n"),
  16.     print(naive_dict(Reps)),
  17.     io:format("~nSmart Dict:~n"),
  18.     io:format("Operation\tTotal (碌s)\tAverage (碌s)~n"),
  19.     print(smart_dict(Reps)),
  20.     io:format("~nNaive gb_trees:~n"),
  21.     io:format("Operation\tTotal (碌s)\tAverage (碌s)~n"),
  22.     print(naive_gb_trees(Reps)),
  23.     io:format("~nSmart gb_trees:~n"),
  24.     io:format("Operation\tTotal (碌s)\tAverage (碌s)~n"),
  25.     print(smart_gb_trees(Reps)).

  26. %% formats the benchmark results cleanly.
  27. print([]) -> ok;
  28. print([{Op, Total, Avg} | Rest]) ->
  29.     io:format("~8s\t~10B\t~.6f~n", [Op, Total, Avg]),
  30.     print(Rest).

  31. %% Example of a base benchmark function. This one actually does
  32. %% nothing and can be used as a control for all the benchmark - it
  33. %% will measure how much time it takes just to loop over data and
  34. %% apply functions.
  35. base_case(Reps) ->
  36.     benchmark(Reps,                 % N repetitions
  37.               [],                   % Empty data structure
  38.               fun ?MODULE:null/3,   % Create
  39.               fun ?MODULE:null/2,   % Read
  40.               fun ?MODULE:null/3,   % Update
  41.               fun ?MODULE:null/2).  % Delete

  42. %% Ordered dictionaries. Assumes that the value is present on reads.
  43. smart_orddict(Reps) ->
  44.     benchmark(Reps,
  45.               orddict:new(),
  46.               fun orddict:store/3,
  47.               fun orddict:fetch/2,
  48.               fun orddict:store/3,
  49.               fun orddict:erase/2).

  50. %% Ordered dictionaries. Doesn't know whether a key is there or not on
  51. %% reads.
  52. naive_orddict(Reps) ->
  53.     benchmark(Reps,
  54.               orddict:new(),
  55.               fun orddict:store/3,
  56.               fun orddict:find/2,
  57.               fun orddict:store/3,
  58.               fun orddict:erase/2).

  59. %% Dictionary benchmark. Assumes that the value is present on reads.
  60. smart_dict(Reps) ->
  61.     benchmark(Reps,
  62.               dict:new(),
  63.               fun dict:store/3,
  64.               fun dict:fetch/2,
  65.               fun dict:store/3,
  66.               fun dict:erase/2).

  67. %% Dictionary benchmark. Doesn't know if the value exisst at read time.
  68. naive_dict(Reps) ->
  69.     benchmark(Reps,
  70.               dict:new(),
  71.               fun dict:store/3,
  72.               fun dict:find/2,
  73.               fun dict:store/3,
  74.               fun dict:erase/2).

  75. %% gb_trees benchmark. Uses the most general functions -- i.e.: it never
  76. %% assumes that the value is not in a tree when inserting and never assumes
  77. %% it is there when updating or deleting.
  78. naive_gb_trees(Reps) ->
  79.     benchmark(Reps,
  80.               gb_trees:empty(),
  81.               fun gb_trees:enter/3,
  82.               fun gb_trees:lookup/2,
  83.               fun gb_trees:enter/3,
  84.               fun gb_trees:delete_any/2).

  85. %% gb_trees benchmark. Uses specific function: it assumes that the values
  86. %% are not there when inserting and assumes it exists when updating or
  87. %% deleting.
  88. smart_gb_trees(Reps) ->
  89.     benchmark(Reps,
  90.               gb_trees:empty(),
  91.               fun gb_trees:insert/3,
  92.               fun gb_trees:get/2,
  93.               fun gb_trees:update/3,
  94.               fun gb_trees:delete/2).

  95. %% Empty functions used for the 'base_case/1' benchmark. They must do
  96. %% nothing interesting.
  97. null(_, _) -> ok.
  98. null(_, _, _) -> ok.

  99. %% Runs all the functions of 4 formats: Create, Read, Update, Delete.
  100. %% Create: it's a regular insertion, but it goes from an empty structure
  101. %%         to a filled one. Requires an empty data structure,
  102. %% Read: reads every key from a complete data structure.
  103. %% Update: usually, this is the same as the insertion from 'create',
  104. %%         except that it's run on full data structures. In some cases,
  105. %%         like gb_trees, there also exist operations for updates when
  106. %%         the keys are known that act differently from regular inserts.
  107. %% Delete: removes a key from a tree. Because we want to test the
  108. %%         efficiency of it all, we will always delete from a complete
  109. %%         structure.
  110. %% The function returns a list of all times averaged over the number
  111. %% of repetitions (Reps) needed.
  112. benchmark(Reps, Empty, CreateFun, ReadFun, UpdateFun, DeleteFun) ->
  113.     Keys = make_keys(Reps),
  114.     {TimeC, Struct} = timer:tc(?MODULE, create, [Keys, CreateFun, Empty]),
  115.     {TimeR, _} = timer:tc(?MODULE, read, [Keys, Struct, ReadFun]),
  116.     {TimeU, _} = timer:tc(?MODULE, update, [Keys, Struct, UpdateFun]),
  117.     {TimeD, _} = timer:tc(?MODULE, delete, [Keys, Struct, DeleteFun]),
  118.     [{create, TimeC, TimeC/Reps},
  119.      {read, TimeR, TimeR/Reps},
  120.      {update, TimeU, TimeU/Reps},
  121.      {delete, TimeD, TimeD/Reps}].

  122. %% Generate unique random numbers. No repetition allowed
  123. make_keys(N) ->
  124.     %% The trick is to generate all numbers as usual, but match them
  125.     %% with a random value in a tuple of the form {Random, Number}.
  126.     %% The idea is to then sort the list generated that way; done in
  127.     %% this manner, we know all values will be unique and the randomness
  128.     %% will be done by the sorting.
  129.     Random = lists:sort([{random:uniform(N), X} || X <- lists:seq(1, N)]),
  130.     %% it's a good idea to then filter out the index (the random index)
  131.     %% to only return the real numbers we want. This is simple to do
  132.     %% with a list comprehension where '_' removes the extraneous data.
  133.     %% The keys are then fit into a tuple to make the test a bit more
  134.     %% realistic for comparison.
  135.     [{some, key, X} || {_, X} <- Random].

  136. %% Loop function to apply the construction of a data structure.
  137. %% The parameters passed are a list of all keys to use and then the
  138. %% higher order function responsible of the creation of a data
  139. %% structure. This is usually a function of the form
  140. %% F(Key, Value, Structure).
  141. %% In the first call, the structure has to be the empty data structure
  142. %% that will progressively be filled.
  143. %% The only value inserted for each key is 'some_data'; we only care
  144. %% about the keys when dealing with key/value stuff.
  145. create([], _, Acc) -> Acc;
  146. create([Key|Rest], Fun, Acc) ->
  147.     create(Rest, Fun, Fun(Key, some_data, Acc)).

  148. %% Loop function to apply successive readings to a data structure.
  149. %% The parameters passed are a list of all keys, the complete data
  150. %% structure and then a higher order function responsible for
  151. %% fetching the data. Such functions are usually of the form
  152. %% F(Key, Structure).
  153. read([], _, _) -> ok;
  154. read([Key|Rest], Struct, Fun) ->
  155.     Fun(Key, Struct),
  156.     read(Rest, Struct, Fun).

  157. %% Loop function to apply updates to a data structure.
  158. %% Takes a list of keys, a full data structure and a higher order
  159. %% function responsible for the updating, usually of the form
  160. %% F(Key, NewValue, Structure).
  161. %% All values for a given key are replaced by 'newval', again because
  162. %% we don't care about the values, but merely the operations with
  163. %% the keys.
  164. update([], _, _) -> ok;
  165. update([Key|Rest], Struct, Fun) ->
  166.     Fun(Key, newval, Struct),
  167.     update(Rest, Struct, Fun).

  168. %% Loop function to apply deletions to a data structure.
  169. %% Each deletion is made on a full data structure.
  170. %% Takes a list of keys, a data structure and a higher order function
  171. %% to do the deletion. Usually of the form
  172. %% F(Key, Structure).
  173. delete([], _, _) -> ok;
  174. delete([Key|Rest], Struct, Fun) ->
  175.     Fun(Key, Struct),
  176.     delete(Rest, Struct, Fun).
复制代码

论坛徽章:
27
水瓶座
日期:2014-08-22 21:06:34程序设计版块每日发帖之星
日期:2015-11-25 06:20:0015-16赛季CBA联赛之新疆
日期:2015-12-19 19:05:48IT运维版块每日发帖之星
日期:2015-12-25 06:20:31IT运维版块每日发帖之星
日期:2015-12-25 06:20:31IT运维版块每日发帖之星
日期:2015-12-25 06:20:3315-16赛季CBA联赛之上海
日期:2016-04-15 19:51:31程序设计版块每日发帖之星
日期:2016-04-17 06:23:29程序设计版块每日发帖之星
日期:2016-04-23 06:20:00程序设计版块每日发帖之星
日期:2016-05-26 06:20:00每日论坛发贴之星
日期:2016-05-26 06:20:0015-16赛季CBA联赛之辽宁
日期:2017-02-16 23:59:47
发表于 2014-11-02 11:45 |显示全部楼层

论坛徽章:
27
水瓶座
日期:2014-08-22 21:06:34程序设计版块每日发帖之星
日期:2015-11-25 06:20:0015-16赛季CBA联赛之新疆
日期:2015-12-19 19:05:48IT运维版块每日发帖之星
日期:2015-12-25 06:20:31IT运维版块每日发帖之星
日期:2015-12-25 06:20:31IT运维版块每日发帖之星
日期:2015-12-25 06:20:3315-16赛季CBA联赛之上海
日期:2016-04-15 19:51:31程序设计版块每日发帖之星
日期:2016-04-17 06:23:29程序设计版块每日发帖之星
日期:2016-04-23 06:20:00程序设计版块每日发帖之星
日期:2016-05-26 06:20:00每日论坛发贴之星
日期:2016-05-26 06:20:0015-16赛季CBA联赛之辽宁
日期:2017-02-16 23:59:47
发表于 2014-11-02 11:50 |显示全部楼层
  1. 2> keyval_benchmark:bench(3).
  2. Base Case:
  3. Operation       Total (μs)      Average (μs)
  4.   create                 0      0.000000
  5.     read                 0      0.000000
  6.   update                 0      0.000000
  7.   delete                 0      0.000000

  8. Naive Orddict:
  9. Operation       Total (μs)      Average (μs)
  10.   create                 0      0.000000
  11.     read                 0      0.000000
  12.   update                 0      0.000000
  13.   delete                 0      0.000000

  14. Smart Orddict:
  15. Operation       Total (μs)      Average (μs)
  16.   create                 0      0.000000
  17.     read                 0      0.000000
  18.   update                 0      0.000000
  19.   delete                 0      0.000000

  20. Naive Dict:
  21. Operation       Total (μs)      Average (μs)
  22.   create                 0      0.000000
  23.     read                 0      0.000000
  24.   update                 0      0.000000
  25.   delete                 0      0.000000

  26. Smart Dict:
  27. Operation       Total (μs)      Average (μs)
  28.   create                 0      0.000000
  29.     read                 0      0.000000
  30.   update                 0      0.000000
  31.   delete                 0      0.000000

  32. Naive gb_trees:
  33. Operation       Total (μs)      Average (μs)
  34.   create                 0      0.000000
  35.     read                 0      0.000000
  36.   update                 0      0.000000
  37.   delete                 0      0.000000

  38. Smart gb_trees:
  39. Operation       Total (μs)      Average (μs)
  40.   create                 0      0.000000
  41.     read                 0      0.000000
  42.   update                 0      0.000000
  43.   delete                 0      0.000000
  44. ok
  45. 3> keyval_benchmark:bench(1000).
  46. Base Case:
  47. Operation       Total (μs)      Average (μs)
  48.   create                 0      0.000000
  49.     read                 0      0.000000
  50.   update                 0      0.000000
  51.   delete                 0      0.000000

  52. Naive Orddict:
  53. Operation       Total (μs)      Average (μs)
  54.   create             40000      40.000000
  55.     read             30000      30.000000
  56.   update             40000      40.000000
  57.   delete             40000      40.000000

  58. Smart Orddict:
  59. Operation       Total (μs)      Average (μs)
  60.   create             20000      20.000000
  61.     read             20000      20.000000
  62.   update             40000      40.000000
  63.   delete             50000      50.000000

  64. Naive Dict:
  65. Operation       Total (μs)      Average (μs)
  66.   create             10000      10.000000
  67.     read                 0      0.000000
  68.   update                 0      0.000000
  69.   delete                 0      0.000000

  70. Smart Dict:
  71. Operation       Total (μs)      Average (μs)
  72.   create                 0      0.000000
  73.     read                 0      0.000000
  74.   update                 0      0.000000
  75.   delete                 0      0.000000

  76. Naive gb_trees:
  77. Operation       Total (μs)      Average (μs)
  78.   create                 0      0.000000
  79.     read                 0      0.000000
  80.   update             10000      10.000000
  81.   delete                 0      0.000000

  82. Smart gb_trees:
  83. Operation       Total (μs)      Average (μs)
  84.   create                 0      0.000000
  85.     read                 0      0.000000
  86.   update                 0      0.000000
  87.   delete                 0      0.000000
  88. ok
  89. 4> keyval_benchmark:bench(100000).
  90. Base Case:
  91. Operation       Total (μs)      Average (μs)
  92.   create             10000      0.100000
  93.     read                 0      0.000000
  94.   update                 0      0.000000
  95.   delete                 0      0.000000

  96. Naive Orddict:
  97. Operation       Total (μs)      Average (μs)
复制代码
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP