MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster.
集合之間的轉換,與元素操作是資料處理的日常,第一次認識MapReduce概念其實是從Python來,但發現.NET也有等價的處理(從Stackoverflow發現,搜尋"Python Map Equipvalent in c#"):
- .NET-Select/Python-Map
- 目的:將$A$集合轉成$B$集合
- 基數(Cardinalty)變化:$\left | A \right | \rightarrow \left | B \right | , \left | A \right | \ge \left | B \right | $
- 帶入函式F為A元素-B元素轉換關係
- $F(a)=b, a\in A, b\in B $
- 例子:
- 計算多組向量的向量長
class Program { static void Main(string[] args) { List__list = new List (){ new float[]{0,8,0}, new float[]{4,5,6}, new float[]{0.577F,0.577F,0.577F}, new float[]{1,0,0}}; //Turns to vector length for each vector List __normTable = __list.Select(new Func (norm2)).ToList(); } static float norm2(float[] vector) { return vector.Sum(x=>x*x); } }
- .NET-Aggregate/Python-Reduce
- 目的:將$A$集合累加成單一元素
- 基數變化:$\left | A \right | \rightarrow 1 $
- 轉換過程:
- 進行暫存與$A[0]$的運算,將運算結果存回暫存,並刪去$A[0]$,遞迴至集合元素全無
- 不符合交換律
- 元素相同但排序不同可能會形成不同結果
- Sum是Aggrerate的特例
- 將A集合轉成純量
- 將個別元素轉換成純量再加總
- 由於是加總,此特例符合交換律
- 因此元素在A集合內的順序不重要
- 例子:
- 將多組向量拼成矩陣
class Program { static void Main(string[] args) { List__list = new List (){ new float[]{0,8,0}, new float[]{4,5,6}, new float[]{0.577F,0.577F,0.577F}, new float[]{1,0,0}}; //Accumulate all row vector in list into a integrated matrix float[][] matrix = __list.Aggregate(new float[][]{}, (float[][] x,float[] y)=>{ return x.Append(y).ToArray(); }).ToArray(); } }
- .NET-Where/Python-Filter
- 目的:將$A$集合轉成${A}'$集合,${A}'$為$A$子集合,${A}' \subseteq A$
- 基數變化:$\left | A \right | \rightarrow \left | B \right | , \left | A \right | \ge \left | B \right | $
- 帶入函式$F$為測試A元素是否符合特定條件
- $F(a)={a}' , a\in A , {a}'\in A$
- 咦?跟FindAll功能一樣?朽木怪哉
- 例子:
- 留下單位向量(向量單位為1)
class Program { static void Main(string[] args) { List__list = new List (){ new float[]{0,8,0}, new float[]{4,5,6}, new float[]{0.577F,0.577F,0.577F}, new float[]{1,0,0}}; //filter out those vector which's norm is equlas to 1 List __unityTable = __list.Where(x=>norm2(x)==1).ToList(); } static float norm2(float[] vector) { return vector.Sum(x=>x*x); } }
沒有留言:
張貼留言