MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster.
集合之間的轉換,與元素操作是資料處理的日常,第一次認識MapReduce概念其實是從Python來,但發現.NET也有等價的處理(從Stackoverflow發現,搜尋"Python Map Equipvalent in c#"):
- .NET-Select/Python-Map
- 目的:將AA集合轉成BB集合
- 基數(Cardinalty)變化:|A|→|B|,|A|≥|B||A|→|B|,|A|≥|B|
- 帶入函式F為A元素-B元素轉換關係
- F(a)=b,a∈A,b∈BF(a)=b,a∈A,b∈B
- 例子:
- 計算多組向量的向量長
class Program { static void Main(string[] args) { List__list = new List (){ new float[]{0,8,0}, new float[]{4,5,6}, new float[]{0.577F,0.577F,0.577F}, new float[]{1,0,0}}; //Turns to vector length for each vector List __normTable = __list.Select(new Func (norm2)).ToList(); } static float norm2(float[] vector) { return vector.Sum(x=>x*x); } }
- .NET-Aggregate/Python-Reduce
- 目的:將AA集合累加成單一元素
- 基數變化:|A|→1|A|→1
- 轉換過程:
- 進行暫存與A[0]A[0]的運算,將運算結果存回暫存,並刪去A[0]A[0],遞迴至集合元素全無
- 不符合交換律
- 元素相同但排序不同可能會形成不同結果
- Sum是Aggrerate的特例
- 將A集合轉成純量
- 將個別元素轉換成純量再加總
- 由於是加總,此特例符合交換律
- 因此元素在A集合內的順序不重要
- 例子:
- 將多組向量拼成矩陣
class Program { static void Main(string[] args) { List__list = new List (){ new float[]{0,8,0}, new float[]{4,5,6}, new float[]{0.577F,0.577F,0.577F}, new float[]{1,0,0}}; //Accumulate all row vector in list into a integrated matrix float[][] matrix = __list.Aggregate(new float[][]{}, (float[][] x,float[] y)=>{ return x.Append(y).ToArray(); }).ToArray(); } }
- .NET-Where/Python-Filter
- 目的:將AA集合轉成A′集合,A′為A子集合,A′⊆A
- 基數變化:|A|→|B|,|A|≥|B|
- 帶入函式F為測試A元素是否符合特定條件
- F(a)=a′,a∈A,a′∈A
- 咦?跟FindAll功能一樣?朽木怪哉
- 例子:
- 留下單位向量(向量單位為1)
class Program { static void Main(string[] args) { List__list = new List (){ new float[]{0,8,0}, new float[]{4,5,6}, new float[]{0.577F,0.577F,0.577F}, new float[]{1,0,0}}; //filter out those vector which's norm is equlas to 1 List __unityTable = __list.Where(x=>norm2(x)==1).ToList(); } static float norm2(float[] vector) { return vector.Sum(x=>x*x); } }
沒有留言:
張貼留言