问题描述
我正在做一个学校的任务,在那里我得到一些稍后可以使用的示例代码。我理解这段代码的90%,但是我有一个小小的行/功能,我无法弄清楚它的功能(我对Haskell btw很新)。
示例代码:
data Profile = Profile {matrix :: [[(Char,Int)]], moleType :: SeqType,nrOfSeqs :: Int,nm :: String}派生(显示)
核苷酸=ACGT
氨基酸=排序ARNDCEQGHILKMFPSTWYVX
makeProfileMatrix :: [MolSeq] - >
makeProfileMatrix sl = res
其中
t = seqType(head sl)
makeProfileMatrix [] =默认值=
if(t == DNA)然后
zip核苷酸(复制(长度核苷酸)0) - 行1
else
zip aminoacids(replicate(length aminoacids)0 ) - 行2
strs = map seqSequence sl - 行3
tmp1 = map(map(\ x - >((head x),(length x)))。sort )
(转置strs) - 第4行
equalFst ab =(fst a)==(fst b)
res =地图排序(map(\l - > unionBy equalFst l默认值)tmp1)
{-Row 1:'replicate'创建一个等于'核苷酸'字符串长度的零的列表。
然后将这个列表'压缩'(将每个列表中的每个元素与核苷酸结合成对/元组) - }
{--Row 2:'replicate'创建一个零列表等于'氨基酸'字符串的长度。
然后将这个列表'压缩'(将每个列表中的每个元素结合到对/元组中)与氨基酸 - }
{-Row 3:函数'seqSequence'应用于每个元素放在'sl'列表中,然后返回一个新的更改列表。
换句话说,'strs'变成了一个包含'sl'中所有序列的列表(sl包含MolSeq对象,而不是字符串) - }
{-Row 4 :(转置strs )创建一个列表,每个序列的'列'作为一个元素(第一个元素由每个序列中的每个第一个元素组成)。
- }
我已经为代码中的每个标记Row写了一个解释我认为到目前为止是正确的),但当我试图找出第4行时,我陷入了困境。我理解'转置'位,但我根本无法弄清楚内部映射函数的作用。据我所知,一个'map'函数需要一个列表作为函数的第二个参数,但内部map函数只有一个匿名函数,但没有列表来操作。要完全清楚我不明白整个内线 map(\ x - >((head x),(length x)))。组。排序
的确如此。请帮助!
奖金!:
这是另一个我无法弄清的示例代码(从来没有在Haskell中使用类):
class Evol对象其中
name :: object - >字符串
distance :: object - >对象 - > Double
distanceMatrix :: [object] - > [(String,String,Double)]
addRow :: [object] - > Int - > [(String,String,Double)]
distanceMatrix [] = []
distanceMatrix对象=
addRow对象0 ++ distanceMatrix(尾对象)
addRow对象num - 添加行到距离矩阵
| num< length对象=(名字a,名字b,距离a b):addRow对象(num + 1)
|否则= []
其中
a =头对象
b =对象!! num
- 如果实例是MolSeq,则确定Evol实例的名称和距离。
实例Evol MolSeq其中
名称= seqName
距离= seqDistance
- 确定Evol实例的名称和距离(如果实例是简介。
实例Evol Profile其中
name = profileName
distance = profileDistance
特别是这部分:
addRow object num - 将行添加到距离矩阵
| num< length对象=(名字a,名字b,距离a b):addRow对象(num + 1)
|否则= []
其中
a =头对象
b =对象!! num
如果你不想让我轻微地解释这一点
谢谢!
map(\ x - >(head x,length x))。组。 sort
是生成直方图的惯用方式。当你看到这样的东西你不明白的时候,试着把它分解成更小的片断,然后在样本输入中测试它们:
(\ x - >(head x,length x))AAAA
- ('A',4)
(group。sort)CABABA
- [AAA,BB,C]
(map(\ x - >(head x,length x)).group。sort)CABABA
map(\ x - >(head x,length x))(group(sortCABABA))
- [('A',3),('B',2 ),('C',1)]
它是用 em> style作为3个函数的组合, map(...)
, group
和排序
,但也可以写成lambda:
\row - > map(...)(group(sort row))
对于转置矩阵中的每一行,该行数据的直方图。
let
showHistogramRow row = concat您可以通过格式化并打印出来获得更直观的表示形式:
[show $ head row
,:\ t
,replicate(length row)'#'
]
input = [3,1,4 ,1,5,9,2,6,5,3,5]
putStr
$ unlines
$ map showHistogramRow
$ group
$排序输入
- 1:##
- 2:#
- 3:##
- 4:#
- 5:###
- 6:#
- 9:#
addRow object num - 将行添加到距离矩阵
| num< length对象=(名字a,名字b,距离a b):addRow对象(num + 1)
|否则= []
其中
a =头对象
b =对象!! num
addRow
列出距离从 object
中的第一个元素到每个其他元素。它使用一种非显而易见的方式将索引引入列表中,当一个更简单,更习惯的 map
就足够了:
addRow object = map(\ b - >(name a,name b,distance ab))object
where a = head object
通常避免 partial 函数是很好的,例如
head
,因为它们可以在某些输入上引发异常(例如 head []
)。但是,这样做很好,因为如果输入列表为空,那么 a
将永远不会被使用,所以 head
将永远不会被调用。
distanceMatrix
可以用 map
,因为它只是在列表的所有尾部
上调用一个函数( addRow
),并连接它们与 ++
:
distanceMatrix object = concatMap addRow(tails对象)
这也可以用无点式书写。
\x - > f(g x)
可以写成 f。克
;在这里, f
是 concatMap addRow
和 g
是 tails
:
distanceMatrix = concatMap addRow。尾巴
Evol
只描述了一组类型您可以生成 distanceMatrix
,其中包括 MolSeq
和配置文件
。请注意, addRow
和 distanceMatrix
不需要成为此类的成员,因为它们完全按照 name
和距离
,所以您可以将它们移动到顶层:
distanceMatrix ::(Evol object)=> [object] - > [(String,String,Double)]
distanceMatrix = concatMap addRow。尾巴
addRow ::(Evol对象)=> [object] - > Int - > [(String,String,Double)]
addRow object = map(\ b - >(name a,name b,distance ab))object
where a = head object
I am doing a school task where I am given a small bit of sample code which I can use later. I understand 90% of this code but there is one little line/function that I for the life of me can't figure out what it does (I am very new to Haskell btw).
Sample code:
data Profile = Profile {matrix::[[(Char,Int)]], moleType::SeqType, nrOfSeqs::Int, nm::String} deriving (Show)
nucleotides = "ACGT"
aminoacids = sort "ARNDCEQGHILKMFPSTWYVX"
makeProfileMatrix :: [MolSeq] -> [[(Char, Int)]]
makeProfileMatrix [] = error "Empty sequence list"
makeProfileMatrix sl = res
where
t = seqType (head sl)
defaults =
if (t == DNA) then
zip nucleotides (replicate (length nucleotides) 0) -- Row 1
else
zip aminoacids (replicate (length aminoacids) 0) -- Row 2
strs = map seqSequence sl -- Row 3
tmp1 = map (map (\x -> ((head x), (length x))) . group . sort)
(transpose strs) -- Row 4
equalFst a b = (fst a) == (fst b)
res = map sort (map (\l -> unionBy equalFst l defaults) tmp1)
{-Row 1: 'replicate' creates a list of zeros that is equal to the length of the 'nucleotides' string.
This list is then 'zipped' (combines each element in each list into pairs/tuples) with the nucleotides-}
{-Row 2: 'replicate' creates a list of zeros that is equal to the length of the 'aminoacids' string.
This list is then 'zipped' (combines each element in each list into pairs/tuples) with the aminoacids-}
{-Row 3: The function 'seqSequence' is applied to each element in the 'sl' list and then returns a new altered list.
In other words 'strs' becomes a list that contains the all the sequences in 'sl' (sl contains MolSeq objects, not strings)-}
{-Row 4: (transpose strs) creates a list that has each 'column' of sequences as a element (the first element is made up of each first element in each sequence etc.).
--}
I have written an explanation for each marked Row in the code (which I think so far is correct) but I get stuck when I try to figure out what Row 4 does. I understand the 'transpose' bit but I can't at all figure out what the inner map function does. As far as I know a 'map' function needs a list as a second parameter to function but the inner map function only has an anonymous function but no list to operate on. To be perfectly clear I don't understand what the entire inner line
map (\x -> ((head x), (length x))) . group . sort
does. Please help!
Bonus!:
Here is another piece of sample code that I can't figure out (never worked with classes in Haskell):
class Evol object where
name :: object -> String
distance :: object -> object -> Double
distanceMatrix :: [object] -> [(String, String, Double)]
addRow :: [object] -> Int -> [(String, String, Double)]
distanceMatrix [] = []
distanceMatrix object =
addRow object 0 ++ distanceMatrix (tail object)
addRow object num -- Adds row to distance matrix
| num < length object = (name a, name b, distance a b) : addRow object (num + 1)
| otherwise = []
where
a = head object
b = object !! num
-- Determines the name and distance of an instance of "Evol" if the instance is a "MolSeq".
instance Evol MolSeq where
name = seqName
distance = seqDistance
-- Determines the name and distance of an instance of "Evol" if the instance is a "Profile".
instance Evol Profile where
name = profileName
distance = profileDistance
Especially this part:
addRow object num -- Adds row to distance matrix
| num < length object = (name a, name b, distance a b) : addRow object (num + 1)
| otherwise = []
where
a = head object
b = object !! num
You don't have to explain this one if you don't want to I am just slightly confused as to what 'addRow' actually is trying to do (in detail).
Thanks!
解决方案
map (\x -> (head x, length x)) . group . sort
is an idiomatic way of generating a histogram. When you see something like this that you don’t understand, try breaking it down into smaller pieces and testing them on sample inputs:
(\x -> (head x, length x)) "AAAA"
-- ('A', 4)
(group . sort) "CABABA"
-- ["AAA", "BB", "C"]
(map (\x -> (head x, length x)) . group . sort) "CABABA"
map (\x -> (head x, length x)) (group (sort "CABABA"))
-- [('A', 3), ('B', 2), ('C', 1)]
It’s written in point-free style as a composition of 3 functions,
map (…)
, group
, and sort
, but could also be written as a lambda:
\row -> map (…) (group (sort row))
For each row in the transposed matrix, it produces a histogram of the data in that row. You could get a more visual representation of this by formatting it and printing it out:
let
showHistogramRow row = concat
[ show $ head row
, ":\t"
, replicate (length row) '#'
]
input = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
putStr
$ unlines
$ map showHistogramRow
$ group
$ sort input
-- 1: ##
-- 2: #
-- 3: ##
-- 4: #
-- 5: ###
-- 6: #
-- 9: #
As for this:
addRow object num -- Adds row to distance matrix
| num < length object = (name a, name b, distance a b) : addRow object (num + 1)
| otherwise = []
where
a = head object
b = object !! num
addRow
makes a list of the distances from the first element in object
to each of the other elements. It uses indexing into the list in a sort of non-obvious way, when a simpler and more idiomatic map
would suffice:
addRow object = map (\ b -> (name a, name b, distance a b)) object
where a = head object
Ordinarily it’s good to avoid partial functions such as
head
because they can throw an exception on some inputs (e.g. head []
). Here it’s fine, however, because if the input list is empty, then a
will never be used, and so head
will never be called.
distanceMatrix
could be expressed with a map
as well, because it’s just calling a function (addRow
) on all the tails
of the list and concatenating them together with ++
:
distanceMatrix object = concatMap addRow (tails object)
This could be written in point-free style too.
\x -> f (g x)
can be written as just f . g
; here, f
is concatMap addRow
and g
is tails
:
distanceMatrix = concatMap addRow . tails
Evol
just describes the set of types for which you can generate a distanceMatrix
, including MolSeq
and Profile
. Note that addRow
and distanceMatrix
don‘t need to be members of this class, because they’re implemented entirely in terms of name
and distance
, so you could move them to the top level:
distanceMatrix :: (Evol object) => [object] -> [(String, String, Double)]
distanceMatrix = concatMap addRow . tails
addRow :: (Evol object) => [object] -> Int -> [(String, String, Double)]
addRow object = map (\ b -> (name a, name b, distance a b)) object
where a = head object
这篇关于Haskell - 无法理解一小段代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!